Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to tweeted in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
tweeted (0) - 8 freq tweated (1) - 1 freq weeted (1) - 1 freq teeted (1) - 1 freq sweeted (1) - 1 freq tweeter (1) - 2 freq tweetled (1) - 1 freq tweet (2) - 100 freq tweetit (2) - 2 freq teeter (2) - 1 freq tweetin (2) - 20 freq treeded (2) - 1 freq treated (2) - 42 freq weened (2) - 1 freq sweeter (2) - 7 freq weeded (2) - 1 freq teemed (2) - 21 freq tented (2) - 1 freq sweetned (2) - 3 freq sweeped (2) - 2 freq weeter (2) - 2 freq sweeled (2) - 6 freq tweaked (2) - 1 freq twisted (2) - 24 freq retweeted (2) - 5 freq	tweeted (0) - 8 freq tweated (1) - 1 freq tweetled (2) - 1 freq sweeted (2) - 1 freq tweeter (2) - 2 freq weeted (2) - 1 freq teeted (2) - 1 freq tested (3) - 21 freq retweeted (3) - 5 freq tweaked (3) - 1 freq traeted (3) - 2 freq twisted (3) - 24 freq swaeted (3) - 1 freq tented (3) - 1 freq texted (3) - 4 freq tweets (3) - 56 freq sweated (3) - 4 freq tweed (3) - 25 freq tweet (3) - 100 freq treated (3) - 42 freq tweetit (3) - 2 freq tweetin (3) - 20 freq tutted (4) - 3 freq towered (4) - 1 freq twa-taed (4) - 2 freq	SoundEx code - T330 totety - 1 freq toted - 1 freq tow-heidit - 1 freq tattooed - 10 freq tidied - 7 freq teetit - 5 freq tittit - 11 freq tattooit - 1 freq tuttet - 2 freq tutted - 3 freq toddied - 1 freq tweetit - 2 freq teeth'd - 1 freq tweated - 1 freq twa-taed - 2 freq tie-dyed - 1 freq tootit - 1 freq teeted - 1 freq tweeted - 8 freq	MetaPhone code - TWTT tweetit - 2 freq tweated - 1 freq twa-taed - 2 freq tweeted - 8 freq	TWEETED
Time to execute Levenshtein function - 0.380821 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 0.699879 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.027277 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.082532 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000796 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics