Letter / character frequencies

Earlier versions of this website were blind to accented characters, I think I've sorted it out now, but I'm not 100% certain.

To investigate the phenomenom of accented characters in scots writing, I have written a script to count the number of occurrences of every character. It counts through each piece of text, character by character, converting the character from the UTF-8 unicode encoding using the perl ord() function, this gives a decimal value for each character.

The letter characters are identified, converted into uppercase and then create table of letter frequencies that combine the two cases. The occurrences of letters in each dialect are listed.

The next step is to identify which specific writers use accents where others do not, or if the use of accents is common in various dialects.

Number of occurrencesutf8 decimalappears aspercentage of corpusCentralDoric / NorthernShetlandOrkneySouthern / BordersUlster
140478669"E"12.85959%163132289539755405160651848100835
111277565"A"10.18648%12455121389578468381094039895348
98527184"T"9.01929%11432618782447415335003648478956
90178173"I"8.25502%10636618896258362279933295362690
78072478"N"7.14685%8911015725247391273612874062620
70426083"S"6.44688%8223513194045187258882579552363
63913679"O"5.85073%6780912541940154248182445849348
63568172"H"5.81910%7788511660925010239212532951391
60211882"R"5.51186%7048011618536319219922369049623
41175876"L"3.76928%472458047525313151011601832303
39458568"D"3.61208%412947583042925165931384723711
28311285"U"2.59164%34186455551365196431196825974
28263867"C"2.58730%31296545261452085971158821904
26940487"W"2.46616%32803483251782395681019719210
25650177"M"2.34804%28251497481647990571000418433
22129789"Y"2.02578%2249947977131427378841615403
19000466"B"1.73932%2126638457111506834743613603
18910570"F"1.73109%2005445244119126547670014259
18662371"G"1.70837%2151935344110417144756613235
17734680"P"1.62345%1827635765110526400612411996
15090975"K"1.38144%171363218699336153586510519
6980886"V"0.63903%7296145574418245824044473
2917374"J"0.26705%25577237193857611882248
1512888"X"0.13848%1221256010084305311095
1148190"Z"0.10510%10433289470214480782
966981"Q"0.08851%9272706417227388996
3530207"Ï"0.03231%2193502
1915214"Ö"0.01753%5418835
1249200"È"0.01143%10611220
597220"Ü"0.00547%815251641
329201"É"0.00301%27261163166
280216"Ø"0.00256%124631
225217"Ù"0.00206%11217
192198"Æ"0.00176%118191
153205"Í"0.00140%13275
72192"À"0.00066%18112116
65196"Ä"0.00060%1181331
51211"Ó"0.00047%12231
29600"ɘ"0.00027%29
28210"Ò"0.00026%713
27193"Á"0.00025%4463
25208"Ð"0.00023%19
20194"Â"0.00018%115
16222"Þ"0.00015%12
15602"ɚ"0.00014%15
15199"Ç"0.00014%511
12256"Ā"0.00011%10
12218"Ú"0.00011%1511
11197"Å"0.00010%141
10618"ɪ"0.00009%10
10195"Ã"0.00009%2
8221"Ý"0.00007%
7212"Ô"0.00006%1
7204"Ì"0.00006%2
6330"Ŋ"0.00005%4
5540"Ȝ"0.00005%
5332"Ō"0.00005%3
4260"Ą"0.00004%4
4203"Ë"0.00004%1
3592"ɐ"0.00003%3
3288"Ġ"0.00003%3
3209"Ñ"0.00003%111
2286"Ğ"0.00002%2
2268"Č"0.00002%
2482"Ǣ"0.00002%2
2298"Ī"0.00002%2
2274"Ē"0.00002%
2202"Ê"0.00002%
2608"ɠ"0.00002%2
1206"Î"0.00001%1
1362"Ū"0.00001%
1300"Ĭ"0.00001%1
1262"Ć"0.00001%
1276"Ĕ"0.00001%1
1490"Ǫ"0.00001%