Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to thousan in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
thousan (0) - 44 freq thousand (1) - 16 freq thousans (1) - 24 freq thousant (1) - 4 freq thoosan (1) - 114 freq choosan (2) - 2 freq thoosan' (2) - 1 freq thoosin (2) - 1 freq shoutan (2) - 5 freq thousants (2) - 2 freq thoosand (2) - 115 freq thusa (2) - 1 freq housin (2) - 5 freq houpan (2) - 1 freq thoosen (2) - 2 freq thousands (2) - 18 freq thomson (2) - 26 freq thowsand (2) - 3 freq thoan (2) - 1 freq tossan (2) - 4 freq thoosant (2) - 6 freq thoosans (2) - 60 freq hsuan (3) - 1 freq thoumed (3) - 1 freq througang (3) - 1 freq	thousan (0) - 44 freq thoosan (1) - 114 freq thousand (2) - 16 freq thoosin (2) - 1 freq thousans (2) - 24 freq thoosen (2) - 2 freq thousant (2) - 4 freq thomson (3) - 26 freq thoan (3) - 1 freq thoosans (3) - 60 freq thoosant (3) - 6 freq tossan (3) - 4 freq thusa (3) - 1 freq choosan (3) - 2 freq thoosan' (3) - 1 freq thoosand (3) - 115 freq housin (3) - 5 freq ahsan (4) - 1 freq chusin (4) - 4 freq those (4) - 296 freq tisan (4) - 2 freq thamson (4) - 1 freq thoarn (4) - 11 freq thaun (4) - 2 freq chasan (4) - 2 freq	SoundEx code - T250 teachin - 163 freq thoosan - 114 freq takkin - 508 freq touchin - 45 freq takin - 453 freq token - 12 freq taken - 122 freq tossin - 18 freq takken - 15 freq tiggin - 2 freq teasin - 9 freq tichin - 1 freq taksna - 1 freq tookna - 1 freq tokin - 4 freq tuckin - 9 freq tookin - 2 freq takin' - 10 freq teasin' - 1 freq thousan - 44 freq thoosin - 1 freq techno - 3 freq tisen - 1 freq theikin - 2 freq takan - 10 freq taison - 1 freq tuggin - 7 freq taikin - 1 freq thoosen - 2 freq tikken - 6 freq taykeen - 1 freq tak'n - 1 freq thoosan' - 1 freq tickin - 8 freq twesna - 1 freq taakin - 107 freq taxin - 8 freq thickin - 1 freq ticino - 1 freq taggin - 3 freq teason - 2 freq taichin - 11 freq tak'in - 1 freq tichen - 3 freq tackin - 1 freq tizin-wye - 1 freq tizin - 3 freq takkan - 14 freq 'techno' - 1 freq touchan - 7 freq tuscan - 3 freq tuscanie - 1 freq twasum - 18 freq tisan - 2 freq tuck-in - 1 freq twiggin - 1 freq thakin - 1 freq tacn - 2 freq tickan - 2 freq tossan - 4 freq taakan - 2 freq taisin - 1 freq teachin' - 10 freq thickham - 1 freq t'ken - 2 freq takna - 2 freq tooken - 1 freq tackan - 1 freq taiken - 7 freq 'tswana - 1 freq taukin - 3 freq teachan - 1 freq twasome - 1 freq teacheen - 1 freq tyeukna - 1 freq taken' - 1 freq thiggin - 2 freq ��takkin - 1 freq tocum - 1 freq 'twisna - 1 freq tacchini - 1 freq ��teachin - 1 freq texin - 1 freq tigh'en - 1 freq tweakin - 1 freq tacken - 2 freq tjnwe - 1 freq tsmh - 1 freq tsgm - 1 freq tkkun - 1 freq	MetaPhone code - 0SN thoosan - 114 freq thousan - 44 freq thoosin - 1 freq thoosen - 2 freq thoosan' - 1 freq	THOUSAN thoosand - 115 freq thoosan - 114 freq thoosans - 60 freq thoosands - 53 freq thoosant - 6 freq thousant - 4 freq thousan - 44 freq thousans - 24 freq thousands - 18 freq thousand - 16 freq
Time to execute Levenshtein function - 0.195489 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.397179 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.028266 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.038629 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.001022 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics