Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to moustache in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
moustache (0) - 20 freq moustached (1) - 2 freq moutach (2) - 1 freq obstacle (3) - 7 freq mustae (3) - 6 freq maussacre (3) - 1 freq outch (4) - 1 freq douche (4) - 1 freq bourached (4) - 1 freq gustave (4) - 1 freq moussaka (4) - 1 freq stacher (4) - 3 freq myspace (4) - 1 freq coutch (4) - 1 freq mullachs (4) - 1 freq loofache (4) - 1 freq toothache (4) - 4 freq lugache (4) - 1 freq mustnae (4) - 1 freq bourrach (4) - 1 freq mollach (4) - 2 freq mistacken (4) - 3 freq houstrie (4) - 1 freq bourachie (4) - 1 freq tache (4) - 1 freq	moustache (0) - 20 freq moustached (2) - 2 freq moutach (3) - 1 freq mutch (5) - 1 freq pastiche (5) - 2 freq meshach (5) - 1 freq maussacre (5) - 1 freq obstacle (5) - 7 freq mustae (5) - 6 freq mortlach (6) - 1 freq molach (6) - 176 freq mounth (6) - 2 freq metch (6) - 1 freq mortice (6) - 4 freq mastic (6) - 2 freq mustart (6) - 8 freq mouth (6) - 45 freq must've (6) - 26 freq poutch (6) - 1 freq mustafa (6) - 1 freq stechie (6) - 4 freq 'stech (6) - 1 freq styachie (6) - 1 freq mostlee (6) - 1 freq mystic (6) - 9 freq	SoundEx code - M232 mists - 8 freq macdougall's - 3 freq michtiest - 2 freq mystic - 9 freq mistake - 73 freq meestic - 2 freq mosquitoes - 2 freq mistakes - 19 freq misjudged - 2 freq masts - 8 freq moustache - 20 freq mistak - 41 freq mistaks - 13 freq mygets - 1 freq misjuidgments - 1 freq muskets - 10 freq mistuik - 1 freq mist's - 1 freq mactaggart's - 1 freq mact's - 1 freq mactaggart - 1 freq mishtake - 1 freq mistaken - 7 freq micht's - 1 freq masticatin - 1 freq mistys - 2 freq macdougal - 1 freq macdougall - 8 freq mistook - 3 freq mystical - 15 freq machutus - 1 freq mastic - 2 freq mistakenly - 3 freq mistaek - 2 freq moustached - 2 freq mistacken - 3 freq mauchtiest - 2 freq mostest - 1 freq meistical - 1 freq mauchts - 1 freq mist-covert - 1 freq maggots - 3 freq mistakkin - 1 freq maist-cited - 1 freq messidge - 1 freq machetes - 1 freq mcdougall's - 1 freq mysticism - 2 freq mystique - 1 freq machts - 2 freq misjudging - 1 freq mistiess - 1 freq mcdougal - 9 freq misjudgin - 1 freq msthhqojc - 1 freq mojito's - 2 freq mastectomy - 1 freq mcd's - 2 freq makitauk - 1 freq maxitekuk - 1 freq mmyckdzpr - 1 freq misstahcook - 1 freq macdos - 1 freq mwggtqb - 1 freq mikewadejourno - 5 freq mctuj - 1 freq msdc - 1 freq	MetaPhone code - MSTX moustache - 20 freq	MOUSTACHE
Time to execute Levenshtein function - 0.215186 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.382839 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.027059 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.041161 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000836 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics