Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to calendars in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
calendars (0) - 3 freq calendar (1) - 23 freq calanders (2) - 1 freq clears (3) - 19 freq calders (3) - 2 freq ledars (3) - 1 freq cedars (3) - 1 freq cheddars (3) - 1 freq lenders (3) - 1 freq caledons (3) - 2 freq calendula (3) - 1 freq glenda's (3) - 1 freq agendas (3) - 1 freq colanders (3) - 1 freq callender (3) - 1 freq alandas (3) - 8 freq lender (4) - 2 freq erlend's (4) - 1 freq allstars (4) - 1 freq scalders (4) - 1 freq alienar (4) - 3 freq 'amanda's (4) - 1 freq casenotes (4) - 1 freq calmness (4) - 3 freq fenders (4) - 1 freq	calendars (0) - 3 freq calanders (2) - 1 freq calendar (2) - 23 freq colanders (3) - 1 freq ceelinders (4) - 1 freq lenders (4) - 1 freq calders (4) - 2 freq flanders (5) - 15 freq flinders (5) - 2 freq cylinder (5) - 6 freq clangers (5) - 1 freq slanders (5) - 1 freq clinkers (5) - 2 freq cleaners (5) - 8 freq hielenders (5) - 1 freq islanders (5) - 2 freq blinders (5) - 2 freq colander (5) - 1 freq landers (5) - 10 freq callender (5) - 1 freq cedars (5) - 1 freq cheddars (5) - 1 freq glenda's (5) - 1 freq calendula (5) - 1 freq caledons (5) - 2 freq	SoundEx code - C453 calmed - 14 freq clammed - 1 freq calendar - 23 freq calamity - 5 freq claimed - 35 freq caulmed - 1 freq callant - 21 freq clients - 14 freq cleaned - 35 freq climate - 34 freq clints - 3 freq calendars - 3 freq callants - 7 freq claimit - 9 freq clontarf - 8 freq clammit - 1 freq caulmit - 1 freq cylinder - 6 freq climates - 6 freq claimt - 3 freq clintie - 1 freq climmed - 13 freq clients' - 1 freq calmit - 2 freq callanetics - 2 freq climate's - 1 freq clientele - 2 freq colander - 1 freq clined - 5 freq 'clinton' - 1 freq client - 2 freq clinton - 3 freq celandine - 2 freq climt - 5 freq colanders - 1 freq clintie's - 1 freq calamitous - 1 freq climatic - 1 freq cloned - 1 freq clamed - 2 freq clammiehewit - 1 freq ceilinder - 1 freq ceelinders - 1 freq cleant - 4 freq clandeboye - 1 freq cléante - 1 freq clandestine - 1 freq callender - 1 freq callander - 2 freq 'claimed - 1 freq clownt - 1 freq clint - 4 freq calendula - 1 freq cwulmde - 1 freq calmdoon - 1 freq clandeboyes - 1 freq clintonsk - 1 freq cleendeen - 1 freq clint's - 1 freq calumdan - 1 freq calanders - 1 freq cluniedonna - 4 freq colintkirk - 1 freq colinton - 1 freq cland - 1 freq	MetaPhone code - KLNTRS calendars - 3 freq colanders - 1 freq calanders - 1 freq	CALENDARS
Time to execute Levenshtein function - 0.280477 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.453613 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.029038 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.040267 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000903 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics