Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to toun-wide in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
toun-wide (0) - 1 freq uk-wide (3) - 1 freq douneside (3) - 4 freq tounheid (3) - 7 freq donside (4) - 4 freq moss-side (4) - 1 freq tousie (4) - 11 freq taen-like (4) - 3 freq onywie (4) - 3 freq touchie (4) - 30 freq doun-come (4) - 1 freq dounlaid (4) - 1 freq tourie (4) - 4 freq warl-wide (4) - 1 freq sousnside (4) - 1 freq tounship (4) - 1 freq doonside (4) - 4 freq coincide (4) - 4 freq loch-side (4) - 1 freq toonheid (4) - 1 freq tounward (4) - 1 freq tounies (4) - 1 freq dug-wise (4) - 1 freq onside (4) - 2 freq youngwd (4) - 18 freq	toun-wide (0) - 1 freq tounheid (5) - 7 freq uk-wide (5) - 1 freq time-wise (6) - 1 freq tounward (6) - 1 freq een-tide (6) - 1 freq toannwe (6) - 1 freq toonheid (6) - 1 freq tounbraid (6) - 1 freq tyneside (6) - 2 freq toun-en (6) - 1 freq ten-week (6) - 1 freq youngwd (6) - 18 freq haun-made (6) - 1 freq taen-like (6) - 3 freq tirn-wye (6) - 1 freq douneside (6) - 4 freq toun-bred (6) - 3 freq founde (7) - 1 freq toune (7) - 2 freq gun-ile (7) - 1 freq confide (7) - 4 freq sun-bed (7) - 4 freq launside (7) - 1 freq true-life (7) - 1 freq	SoundEx code - T530 twenty - 165 freq thinned - 3 freq tuned - 8 freq tynt - 12 freq tent - 460 freq tend - 56 freq twin't - 1 freq timid - 16 freq twinty - 162 freq tenth - 14 freq tentie - 38 freq tint - 218 freq tamed - 15 freq thoomed - 3 freq timed - 4 freq then-i'd - 1 freq tuimt - 8 freq tuimed - 10 freq tomata - 5 freq teemt - 3 freq tomatae - 10 freq team-mate - 1 freq tuimit - 3 freq teem't - 4 freq teen't - 1 freq tounheid - 7 freq tntae - 1 freq tand - 1 freq thoomit - 1 freq tyned - 9 freq tweenty - 1 freq tymed - 1 freq taunt - 2 freq tanned - 11 freq tomato - 4 freq tomatoey - 1 freq tnt - 1 freq taimed - 1 freq twunty - 10 freq tnt' - 1 freq toun-wide - 1 freq tinned - 8 freq tant - 1 freq toomed - 2 freq twintie - 7 freq twined - 11 freq twuntie - 8 freq tamet - 1 freq teemed - 21 freq toonty - 2 freq temid - 1 freq tned - 1 freq tinto - 3 freq time-oot - 2 freq tint' - 3 freq tamata - 8 freq toonheid - 1 freq tumed - 1 freq twinit - 6 freq tuined - 1 freq twyned - 3 freq thanthay - 1 freq tunity - 1 freq temote - 1 freq tinnd - 1 freq toned - 1 freq tonto - 5 freq tim'tae - 1 freq themed - 2 freq tenty - 6 freq tumit - 1 freq taint - 2 freq tømed - 1 freq tumt - 1 freq tyne't - 1 freq the-intae - 2 freq twynit - 2 freq twantie - 2 freq timit - 1 freq tanta - 1 freq taen't - 1 freq tuim-eed - 1 freq taand - 1 freq twinned - 3 freq tined - 2 freq twanty - 3 freq teamed - 1 freq ��twinty - 2 freq tannoid - 1 freq twinet - 2 freq twonty - 1 freq twentie - 1 freq taind - 1 freq timmd - 1 freq thoumed - 1 freq teammate - 1 freq tammata - 1 freq tonite - 6 freq tnawdaw - 1 freq taand' - 1 freq timothy - 1 freq	MetaPhone code - TNWT toun-wide - 1 freq denwette - 1 freq	TOUN-WIDE
Time to execute Levenshtein function - 0.217289 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.378864 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.028283 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.041186 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000931 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics