Corpus Statistics
Statistical profile of the Voynich Manuscript, computed from the Zandbergen (ZL3b) EVA transliteration.
Total words
38,053
Unique words
8,261
Total glyphs
191,562
Unique glyphs
25
Avg word length
5.03
Hapax legomena
5,794 (70.1%)
Dis legomena
892
Total folios
226
Entropy Analysis
| Scope | Glyph entropy | Word entropy | Words |
|---|---|---|---|
| Overall | 3.8627 bits | 10.4508 bits | 38,053 |
| astronomical | 3.7471 bits | 8.8881 bits | 872 |
| biological | 3.7977 bits | 8.5625 bits | 6,315 |
| cosmological | 3.8515 bits | 9.231 bits | 2,199 |
| herbal | 3.8478 bits | 10.0025 bits | 10,872 |
| pharmaceutical | 3.7772 bits | 9.0301 bits | 2,538 |
| recipes | 3.8586 bits | 9.8552 bits | 11,611 |
| text-only | 3.9016 bits | 8.9622 bits | 2,349 |
| zodiac | 3.7149 bits | 8.9477 bits | 1,297 |
| Currier Languages | |||
| Language ? | 3.7682 bits | 9.6723 bits | 3,265 |
| Language A | 3.8321 bits | 9.9392 bits | 11,022 |
| Language B | 3.8611 bits | 9.895 bits | 23,766 |
Natural language comparison: English ~4.11 bits/char, Latin ~4.0 bits/char, Italian ~3.95 bits/char. The Voynich's 3.86 bits is within the natural language range.
Zipf's Law
Exponent0.8946
R²0.9084
Fitmoderate Zipf fit
Natural language: exponent ~1.0, R² > 0.95. The Voynich shows moderate fit — consistent with meaningful text but not a perfect match.
Top 30 Words
1.daiin
799
2.ol
553
3.aiin
506
4.chedy
501
5.shedy
431
6.ar
402
7.chol
380
8.or
378
9.chey
349
10.y
329
11.qokeey
307
12.s
305
13.qokeedy
302
14.dar
299
15.qokain
276
16.qokedy
270
17.shey
269
18.qokaiin
265
19.al
261
20.dy
233
21.dal
232
22.okaiin
212
23.o
200
24.chor
199
25.l
192
26.qokal
191
27.dain
190
28.cheey
184
29.okeey
179
30.shol
173
Glyph Frequency
o
25,373
e
20,239
y
17,779
h
17,494
a
14,672
d
13,150
c
12,987
i
11,779
k
10,924
l
10,621
r
7,496
s
7,220
t
6,872
n
6,147
q
5,416
p
1,637
m
1,055
f
479
g
127
x
41
v
14
b
13
j
10
z
9
u
8
Top Glyph Bigrams
ch10968he8172dy6944ai6704ok6147in6015ol5588qo5289ee5256ed5103ii4731sh4514da4175ey4095ke3990ho3986ot3926ar3414eo3399al3185ka3073or2664od2334hy2176te1862ta1589hd1097lk1059kc1044tc968
Section Breakdown
| Section | Folios | Words | Unique | Avg length |
|---|---|---|---|---|
| recipes | 25 | 11,611 | 3,258 | 5.18 |
| herbal | 129 | 10,872 | 3,524 | 4.98 |
| biological | 19 | 6,315 | 1,467 | 5.02 |
| pharmaceutical | 16 | 2,538 | 1,111 | 5.01 |
| text-only | 7 | 2,349 | 979 | 4.83 |
| cosmological | 10 | 2,199 | 1,087 | 4.64 |
| zodiac | 12 | 1,297 | 753 | 5.23 |
| astronomical | 8 | 872 | 610 | 5.17 |