Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to tokenistic in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
tokenistic (0) - 1 freq taikenistic (2) - 1 freq tokenism (3) - 1 freq holistic (4) - 4 freq skenstid (4) - 1 freq moralistic (4) - 1 freq tokens (4) - 5 freq atheistic (4) - 1 freq forensic (4) - 3 freq monastic (4) - 2 freq co-existit (4) - 1 freq tweistin (4) - 1 freq voyeuristic (4) - 1 freq domestic (4) - 20 freq logistic (4) - 1 freq towerists (4) - 1 freq keistie (4) - 1 freq tensin (5) - 1 freq terestin (5) - 1 freq modrenist (5) - 3 freq elestic (5) - 2 freq molestit (5) - 1 freq tentin (5) - 10 freq optimistic (5) - 11 freq pokeit (5) - 1 freq	tokenistic (0) - 1 freq taikenistic (2) - 1 freq tokenism (5) - 1 freq monastic (6) - 2 freq tokens (6) - 5 freq atheistic (6) - 1 freq skenstid (6) - 1 freq agnostic (7) - 1 freq thainstin (7) - 1 freq kinetic (7) - 1 freq gymnastic (7) - 1 freq autistic (7) - 2 freq keistie (7) - 1 freq atavistic (7) - 1 freq teeniest (7) - 1 freq kensit (7) - 1 freq dynastic (7) - 1 freq logistic (7) - 1 freq towerists (7) - 1 freq domestic (7) - 20 freq tweistin (7) - 1 freq voyeuristic (7) - 1 freq moralistic (7) - 1 freq forensic (7) - 3 freq holistic (7) - 4 freq	SoundEx code - T252 thoosans - 60 freq taking - 42 freq taxing - 1 freq teaching - 30 freq touching - 7 freq teachins - 4 freq technical - 26 freq thousans - 24 freq tokens - 5 freq tcenayger - 1 freq tecnaiger - 1 freq thoosin's - 2 freq thcing - 1 freq thickness - 7 freq technically - 8 freq toughness - 1 freq technique - 12 freq techniques - 4 freq tossing - 2 freq 'tossing - 2 freq taichins - 1 freq tiggy-winkle - 1 freq 'thoosans - 1 freq takins - 2 freq touchan's - 1 freq tekno-economic - 1 freq tcm's - 1 freq thoosan-star - 1 freq tokenistic - 1 freq technicians - 4 freq takkins - 1 freq tecumseh - 1 freq techincally - 1 freq taikens - 1 freq thoos'ns - 1 freq technicalities - 1 freq 'tecumseh' - 1 freq technicolor - 7 freq teachings - 2 freq ticking - 1 freq teknicly - 1 freq tokenism - 1 freq techneecian - 2 freq teuchness - 1 freq technician - 1 freq taikenistic - 1 freq touchingly - 1 freq taegang - 1 freq teasing - 2 freq t-sionnaich - 1 freq tzwmcauzbk - 1 freq tkmesg - 1 freq txnx - 1 freq tacking - 3 freq tackings - 1 freq tockens - 1 freq thejamhouseedin - 1 freq tcmck - 1 freq tejmuk - 2 freq tighnacoille - 2 freq tsgnq - 1 freq thickens - 1 freq tieganstevenson - 1 freq	MetaPhone code - TKNSTK tokenistic - 1 freq taikenistic - 1 freq	TOKENISTIC
Time to execute Levenshtein function - 0.319989 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.530053 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.033428 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.047083 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.001149 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics