Letter / character frequencies

Earlier versions of this website were blind to accented characters, I think I've sorted it out now, but I'm not 100% certain.

To investigate the phenomenom of accented characters in scots writing, I have written a script to count the number of occurrences of every character. It counts through each piece of text, character by character, converting the character from the UTF-8 unicode encoding using the perl ord() function, this gives a decimal value for each character.

The letter characters are identified, converted into uppercase and then create table of letter frequencies that combine the two cases. The occurrences of letters in each dialect are listed.

The next step is to identify which specific writers use accents where others do not, or if the use of accents is common in various dialects.

Number of occurrencesutf8 decimalappears aspercentage of corpusCentral / LallansDoric / NorthernShetlandOrkneySouthern / BordersUlster
85739669"E"12.69913%49288019965431288158114754870033
68375965"A"10.12735%38780814769732336115233585568399
61490284"T"9.10749%36221213457420673105653294653798
54846573"I"8.12347%3142951310902414088062925940760
48005178"N"7.11017%2746031097181969883642542642121
43494983"S"6.44215%257115937961852877552215235490
40236079"O"5.95947%232242898131793180922053633628
39106172"H"5.79211%227255831481110070272376138694
37062782"R"5.48946%213173828271491367172074332174
25394276"L"3.76121%148173546641094246021342622058
23616368"D"3.49788%130931534251754548861245416866
17926167"C"2.65508%1090383884966492669912812883
17604185"U"2.60739%10702031859573830721065617647
16393487"W"2.42807%956643367974792824960614651
16038077"M"2.37543%940433509569312804873512730
14064489"Y"2.08312%804423364255282322679111886
11917570"F"1.76513%63703326645054215660179536
11898466"B"1.76231%68450275284739206963389824
11791571"G"1.74647%69695252084753228263069633
11314780"P"1.67585%66851262394992196453007756
9304575"K"1.37812%52222223224287186347687566
4386086"V"0.64962%2614510154175479319873008
2062074"J"0.30541%1093258048112148881965
1077588"X"0.15959%69292052465138438748
835690"Z"0.12376%4292261332336404688
633681"Q"0.09384%3086180325361335798
3455207"Ï"0.05117%71193428
804214"Ö"0.01191%437934
431200"È"0.00638%111419
345201"É"0.00511%6891405123
101205"Í"0.00150%9173
78220"Ü"0.00116%191141241
47192"À"0.00070%291116
41211"Ó"0.00061%1526
20210"Ò"0.00030%173
17196"Ä"0.00025%21131
14208"Ð"0.00021%14
13216"Ø"0.00019%1111
11193"Á"0.00016%812
10195"Ã"0.00015%82
10194"Â"0.00015%46
9199"Ç"0.00013%81
8198"Æ"0.00012%71
6212"Ô"0.00009%6
6217"Ù"0.00009%51
6221"Ý"0.00009%6
5218"Ú"0.00007%41
4204"Ì"0.00006%22
4203"Ë"0.00006%4
3222"Þ"0.00004%3
2274"Ē"0.00003%2
2256"Ā"0.00003%2
2209"Ñ"0.00003%11
2332"Ō"0.00003%2
1268"Č"0.00001%1
1540"Ȝ"0.00001%1
1262"Ć"0.00001%1
1490"Ǫ"0.00001%1
1362"Ū"0.00001%1
1197"Å"0.00001%1
1202"Ê"0.00001%1