Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to use in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
use (0) - 457 freq user (1) - 3 freq usj (1) - 1 freq usa (1) - 52 freq 'use (1) - 2 freq usy (1) - 1 freq muse (1) - 35 freq uses (1) - 47 freq ese (1) - 2 freq sse (1) - 27 freq bse (1) - 4 freq us (1) - 3606 freq ure (1) - 5 freq pse (1) - 1 freq ise (1) - 2 freq se (1) - 90 freq ruse (1) - 1 freq uise (1) - 281 freq uce (1) - 1 freq ue (1) - 2 freq ume (1) - 2 freq ouse (1) - 1 freq mse (1) - 1 freq used (1) - 663 freq guse (1) - 1 freq	use (0) - 457 freq ese (1) - 2 freq yuse (1) - 1 freq ouse (1) - 1 freq us (1) - 3606 freq ise (1) - 2 freq se (1) - 90 freq uise (1) - 281 freq usa (1) - 52 freq ase (1) - 1 freq usy (1) - 1 freq su (2) - 8 freq as (2) - 17482 freq eese (2) - 106 freq es (2) - 798 freq youse (2) - 131 freq ys (2) - 4 freq seo (2) - 1 freq oese (2) - 4 freq sea (2) - 833 freq sey (2) - 62 freq sue (2) - 12 freq ease (2) - 91 freq isa (2) - 24 freq uis (2) - 17 freq	SoundEx code - U200 us - 3606 freq use - 457 freq ugh - 12 freq 'use - 2 freq us' - 2 freq uis - 17 freq uise - 281 freq uiss - 130 freq 'us - 3 freq uk - 358 freq uisge - 5 freq uik - 1 freq usa - 52 freq uz - 50 freq ug - 6 freq uzziah - 2 freq 'us' - 4 freq usc - 1 freq uch - 3 freq ukie - 1 freq uk's - 3 freq uize - 14 freq ugie - 1 freq �ujsko - 1 freq usy - 1 freq ��uk - 1 freq uk-eu - 1 freq 'uise - 1 freq ��use - 2 freq ��us - 1 freq ��us - 1 freq usj - 1 freq uiq - 2 freq uaq - 1 freq uki - 2 freq uku - 1 freq ucia - 1 freq uggs - 2 freq uj - 3 freq uzo - 1 freq uqj - 2 freq uoxqw - 1 freq uq - 2 freq uce - 1 freq ujqzzs - 1 freq uesw - 1 freq uks - 1 freq uhj - 2 freq ux - 3 freq uog - 1 freq uoyxy - 1 freq uc - 2 freq uzh - 2 freq uyji - 1 freq uiox - 1 freq uok - 1 freq uqa - 1 freq 'uk - 1 freq uqay - 1 freq uisge - 1 freq uaxhu - 1 freq	MetaPhone code - US us - 3606 freq use - 457 freq 'use - 2 freq us' - 2 freq uis - 17 freq uise - 281 freq uiss - 130 freq 'us - 3 freq usa - 52 freq uz - 50 freq uzziah - 2 freq 'us' - 4 freq uize - 14 freq usy - 1 freq 'uise - 1 freq ��use - 2 freq ��us - 1 freq ��us - 1 freq uzo - 1 freq uce - 1 freq uesw - 1 freq uzh - 2 freq	USE use - 457 freq used - 663 freq using - 35 freq usin - 94 freq user - 3 freq uised - 277 freq uise - 281 freq uises - 36 freq uiser - 5 freq aised - 6 freq yised - 4 freq yuised - 4 freq uissed - 4 freq uized - 21 freq
Time to execute Levenshtein function - 0.413791 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.679397 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.028393 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.099534 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000940 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics