Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to thoomit in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
thoomit (0) - 1 freq thoomin (1) - 2 freq thooms (2) - 8 freq thooht (2) - 1 freq thoomb (2) - 6 freq whoopit (2) - 1 freq houmit (2) - 1 freq hoodit (2) - 1 freq sholmit (2) - 1 freq tholit (2) - 4 freq shootit (2) - 1 freq sloomit (2) - 1 freq hootit (2) - 2 freq thoom (2) - 12 freq thoosin (2) - 1 freq thoumie (2) - 1 freq hookit (2) - 1 freq tootit (2) - 1 freq troopit (2) - 1 freq thoomed (2) - 3 freq shooit (2) - 1 freq shoomin (2) - 1 freq doomit (2) - 2 freq loomit (2) - 4 freq thoosint (2) - 2 freq	thoomit (0) - 1 freq thoomin (2) - 2 freq thoumie (3) - 1 freq tholit (3) - 4 freq houmit (3) - 1 freq thoomb (3) - 6 freq thooms (3) - 8 freq thooht (3) - 1 freq thoomed (3) - 3 freq thoom (3) - 12 freq athoot (4) - 319 freq thit (4) - 568 freq thaot (4) - 1 freq thoum (4) - 13 freq threit (4) - 18 freq thait (4) - 3 freq throat (4) - 115 freq thrait (4) - 3 freq shaimit (4) - 2 freq thoct (4) - 7 freq shamit (4) - 2 freq thowt (4) - 97 freq tholt (4) - 1 freq thomas (4) - 81 freq theikit (4) - 6 freq	SoundEx code - T530 twenty - 165 freq thinned - 3 freq tuned - 8 freq tynt - 12 freq tent - 460 freq tend - 56 freq twin't - 1 freq timid - 16 freq twinty - 162 freq tenth - 14 freq tentie - 38 freq tint - 218 freq tamed - 15 freq thoomed - 3 freq timed - 4 freq then-i'd - 1 freq tuimt - 8 freq tuimed - 10 freq tomata - 5 freq teemt - 3 freq tomatae - 10 freq team-mate - 1 freq tuimit - 3 freq teem't - 4 freq teen't - 1 freq tounheid - 7 freq tntae - 1 freq tand - 1 freq thoomit - 1 freq tyned - 9 freq tweenty - 1 freq tymed - 1 freq taunt - 2 freq tanned - 11 freq tomato - 4 freq tomatoey - 1 freq tnt - 1 freq taimed - 1 freq twunty - 10 freq tnt' - 1 freq toun-wide - 1 freq tinned - 8 freq tant - 1 freq toomed - 2 freq twintie - 7 freq twined - 11 freq twuntie - 8 freq tamet - 1 freq teemed - 21 freq toonty - 2 freq temid - 1 freq tned - 1 freq tinto - 3 freq time-oot - 2 freq tint' - 3 freq tamata - 8 freq toonheid - 1 freq tumed - 1 freq twinit - 6 freq tuined - 1 freq twyned - 3 freq thanthay - 1 freq tunity - 1 freq temote - 1 freq tinnd - 1 freq toned - 1 freq tonto - 5 freq tim'tae - 1 freq themed - 2 freq tenty - 6 freq tumit - 1 freq taint - 2 freq tømed - 1 freq tumt - 1 freq tyne't - 1 freq the-intae - 2 freq twynit - 2 freq twantie - 2 freq timit - 1 freq tanta - 1 freq taen't - 1 freq tuim-eed - 1 freq taand - 1 freq twinned - 3 freq tined - 2 freq twanty - 3 freq teamed - 1 freq ��twinty - 2 freq tannoid - 1 freq twinet - 2 freq twonty - 1 freq twentie - 1 freq taind - 1 freq timmd - 1 freq thoumed - 1 freq teammate - 1 freq tammata - 1 freq tonite - 6 freq tnawdaw - 1 freq taand' - 1 freq timothy - 1 freq	MetaPhone code - 0MT thoomed - 3 freq thoomit - 1 freq thumbit - 1 freq themed - 2 freq thoumed - 1 freq	THOOMIT
Time to execute Levenshtein function - 0.207708 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.361426 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.031206 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.042526 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000956 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics