Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to tokens in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
tokens (0) - 5 freq tokes (1) - 1 freq tockens (1) - 1 freq token (1) - 12 freq dokens (1) - 2 freq jokes (2) - 50 freq stokes (2) - 2 freq unkens (2) - 1 freq theens (2) - 1 freq t'ken (2) - 2 freq takins (2) - 2 freq dokkens (2) - 1 freq yokes (2) - 3 freq toked (2) - 1 freq dozens (2) - 3 freq omens (2) - 3 freq opens (2) - 69 freq trokes (2) - 1 freq touns (2) - 46 freq dockens (2) - 5 freq taken (2) - 122 freq toke (2) - 2 freq toves (2) - 1 freq coke's (2) - 1 freq totes (2) - 1 freq	tokens (0) - 5 freq tokes (2) - 1 freq takins (2) - 2 freq taikens (2) - 1 freq dokens (2) - 2 freq token (2) - 12 freq tockens (2) - 1 freq teens (3) - 13 freq tykes (3) - 7 freq tokenism (3) - 1 freq tones (3) - 21 freq tons (3) - 10 freq takers (3) - 2 freq trens (3) - 1 freq tens (3) - 13 freq takes (3) - 137 freq tyke's (3) - 2 freq taken' (3) - 1 freq tikes (3) - 1 freq thens (3) - 2 freq kens (3) - 532 freq toons (3) - 80 freq toks (3) - 1 freq tokin (3) - 4 freq aitkens (3) - 1 freq	SoundEx code - T252 thoosans - 60 freq taking - 42 freq taxing - 1 freq teaching - 30 freq touching - 7 freq teachins - 4 freq technical - 26 freq thousans - 24 freq tokens - 5 freq tcenayger - 1 freq tecnaiger - 1 freq thoosin's - 2 freq thcing - 1 freq thickness - 7 freq technically - 8 freq toughness - 1 freq technique - 12 freq techniques - 4 freq tossing - 2 freq 'tossing - 2 freq taichins - 1 freq tiggy-winkle - 1 freq 'thoosans - 1 freq takins - 2 freq touchan's - 1 freq tekno-economic - 1 freq tcm's - 1 freq thoosan-star - 1 freq tokenistic - 1 freq technicians - 4 freq takkins - 1 freq tecumseh - 1 freq techincally - 1 freq taikens - 1 freq thoos'ns - 1 freq technicalities - 1 freq 'tecumseh' - 1 freq technicolor - 7 freq teachings - 2 freq ticking - 1 freq teknicly - 1 freq tokenism - 1 freq techneecian - 2 freq teuchness - 1 freq technician - 1 freq taikenistic - 1 freq touchingly - 1 freq taegang - 1 freq teasing - 2 freq t-sionnaich - 1 freq tzwmcauzbk - 1 freq tkmesg - 1 freq txnx - 1 freq tacking - 3 freq tackings - 1 freq tockens - 1 freq thejamhouseedin - 1 freq tcmck - 1 freq tejmuk - 2 freq tighnacoille - 2 freq tsgnq - 1 freq thickens - 1 freq tieganstevenson - 1 freq	MetaPhone code - TKNS dickens - 13 freq dockens - 5 freq tokens - 5 freq deacon's - 1 freq diagnose - 1 freq takins - 2 freq dicken's - 2 freq takkins - 1 freq taikens - 1 freq dokens - 2 freq diggin's - 1 freq dokkens - 1 freq dockins - 1 freq tockens - 1 freq deaconess - 1 freq	TOKENS
Time to execute Levenshtein function - 0.187585 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.379420 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.030815 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.041847 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000952 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics