To investigate the phenomenom of accented characters in scots writing, I have written a script to count the number of occurrences of every character. It counts through each piece of text, character by character, converting the character from the UTF-8 unicode encoding using the perl ord() function, this gives a decimal value for each character.
The letter characters are identified, converted into uppercase and then create table of letter frequencies that combine the two cases. The occurrences of letters in each dialect are listed.
The next step is to identify which specific writers use accents where others do not, or if the use of accents is common in various dialects.
Number of occurrences | utf8 decimal | appears as | percentage of corpus | Central | Doric / Northern | Shetland | Orkney | Southern / Borders | Ulster |
---|---|---|---|---|---|---|---|---|---|
1404786 | 69 | "E" | 12.85959% | 163132 | 289539 | 75540 | 51606 | 51848 | 100835 |
1112775 | 65 | "A" | 10.18648% | 124551 | 213895 | 78468 | 38109 | 40398 | 95348 |
985271 | 84 | "T" | 9.01929% | 114326 | 187824 | 47415 | 33500 | 36484 | 78956 |
901781 | 73 | "I" | 8.25502% | 106366 | 188962 | 58362 | 27993 | 32953 | 62690 |
780724 | 78 | "N" | 7.14685% | 89110 | 157252 | 47391 | 27361 | 28740 | 62620 |
704260 | 83 | "S" | 6.44688% | 82235 | 131940 | 45187 | 25888 | 25795 | 52363 |
639136 | 79 | "O" | 5.85073% | 67809 | 125419 | 40154 | 24818 | 24458 | 49348 |
635681 | 72 | "H" | 5.81910% | 77885 | 116609 | 25010 | 23921 | 25329 | 51391 |
602118 | 82 | "R" | 5.51186% | 70480 | 116185 | 36319 | 21992 | 23690 | 49623 |
411758 | 76 | "L" | 3.76928% | 47245 | 80475 | 25313 | 15101 | 16018 | 32303 |
394585 | 68 | "D" | 3.61208% | 41294 | 75830 | 42925 | 16593 | 13847 | 23711 |
283112 | 85 | "U" | 2.59164% | 34186 | 45555 | 13651 | 9643 | 11968 | 25974 |
282638 | 67 | "C" | 2.58730% | 31296 | 54526 | 14520 | 8597 | 11588 | 21904 |
269404 | 87 | "W" | 2.46616% | 32803 | 48325 | 17823 | 9568 | 10197 | 19210 |
256501 | 77 | "M" | 2.34804% | 28251 | 49748 | 16479 | 9057 | 10004 | 18433 |
221297 | 89 | "Y" | 2.02578% | 22499 | 47977 | 13142 | 7378 | 8416 | 15403 |
190004 | 66 | "B" | 1.73932% | 21266 | 38457 | 11150 | 6834 | 7436 | 13603 |
189105 | 70 | "F" | 1.73109% | 20054 | 45244 | 11912 | 6547 | 6700 | 14259 |
186623 | 71 | "G" | 1.70837% | 21519 | 35344 | 11041 | 7144 | 7566 | 13235 |
177346 | 80 | "P" | 1.62345% | 18276 | 35765 | 11052 | 6400 | 6124 | 11996 |
150909 | 75 | "K" | 1.38144% | 17136 | 32186 | 9933 | 6153 | 5865 | 10519 |
69808 | 86 | "V" | 0.63903% | 7296 | 14557 | 4418 | 2458 | 2404 | 4473 |
29173 | 74 | "J" | 0.26705% | 2557 | 7237 | 1938 | 576 | 1188 | 2248 |
15128 | 88 | "X" | 0.13848% | 1221 | 2560 | 1008 | 430 | 531 | 1095 |
11481 | 90 | "Z" | 0.10510% | 1043 | 3289 | 470 | 214 | 480 | 782 |
9669 | 81 | "Q" | 0.08851% | 927 | 2706 | 417 | 227 | 388 | 996 |
3530 | 207 | "Ï" | 0.03231% | 2 | 19 | 3502 | |||
1915 | 214 | "Ö" | 0.01753% | 5 | 4 | 1883 | 5 | ||
1249 | 200 | "È" | 0.01143% | 10 | 6 | 1 | 1220 | ||
597 | 220 | "Ü" | 0.00547% | 8 | 1 | 525 | 1 | 6 | 41 |
329 | 201 | "É" | 0.00301% | 27 | 26 | 11 | 6 | 3 | 166 |
280 | 216 | "Ø" | 0.00256% | 1 | 246 | 31 | |||
225 | 217 | "Ù" | 0.00206% | 1 | 1 | 217 | |||
192 | 198 | "Æ" | 0.00176% | 1 | 181 | 9 | 1 | ||
153 | 205 | "Í" | 0.00140% | 13 | 27 | 5 | |||
72 | 192 | "À" | 0.00066% | 18 | 1 | 1 | 2 | 1 | 16 |
65 | 196 | "Ä" | 0.00060% | 11 | 8 | 13 | 31 | ||
51 | 211 | "Ó" | 0.00047% | 1 | 2 | 2 | 31 | ||
29 | 600 | "ɘ" | 0.00027% | 29 | |||||
28 | 210 | "Ò" | 0.00026% | 7 | 1 | 3 | |||
27 | 193 | "Á" | 0.00025% | 4 | 4 | 6 | 3 | ||
25 | 208 | "Ð" | 0.00023% | 1 | 9 | ||||
20 | 194 | "Â" | 0.00018% | 1 | 15 | ||||
16 | 222 | "Þ" | 0.00015% | 12 | |||||
15 | 602 | "ɚ" | 0.00014% | 15 | |||||
15 | 199 | "Ç" | 0.00014% | 5 | 1 | 1 | |||
12 | 256 | "Ā" | 0.00011% | 10 | |||||
12 | 218 | "Ú" | 0.00011% | 1 | 5 | 1 | 1 | ||
11 | 197 | "Å" | 0.00010% | 1 | 4 | 1 | |||
10 | 618 | "ɪ" | 0.00009% | 10 | |||||
10 | 195 | "Ã" | 0.00009% | 2 | |||||
8 | 221 | "Ý" | 0.00007% | ||||||
7 | 212 | "Ô" | 0.00006% | 1 | |||||
7 | 204 | "Ì" | 0.00006% | 2 | |||||
6 | 330 | "Ŋ" | 0.00005% | 4 | |||||
5 | 540 | "Ȝ" | 0.00005% | ||||||
5 | 332 | "Ō" | 0.00005% | 3 | |||||
4 | 260 | "Ą" | 0.00004% | 4 | |||||
4 | 203 | "Ë" | 0.00004% | 1 | |||||
3 | 592 | "ɐ" | 0.00003% | 3 | |||||
3 | 288 | "Ġ" | 0.00003% | 3 | |||||
3 | 209 | "Ñ" | 0.00003% | 1 | 1 | 1 | |||
2 | 286 | "Ğ" | 0.00002% | 2 | |||||
2 | 268 | "Č" | 0.00002% | ||||||
2 | 482 | "Ǣ" | 0.00002% | 2 | |||||
2 | 298 | "Ī" | 0.00002% | 2 | |||||
2 | 274 | "Ē" | 0.00002% | ||||||
2 | 202 | "Ê" | 0.00002% | ||||||
2 | 608 | "ɠ" | 0.00002% | 2 | |||||
1 | 206 | "Î" | 0.00001% | 1 | |||||
1 | 362 | "Ū" | 0.00001% | ||||||
1 | 300 | "Ĭ" | 0.00001% | 1 | |||||
1 | 262 | "Ć" | 0.00001% | ||||||
1 | 276 | "Ĕ" | 0.00001% | 1 | |||||
1 | 490 | "Ǫ" | 0.00001% |