Hypothesis Board
213 total hypotheses. 162 active, 24 eliminated, 7 parked.
Active (162)
The manuscript's herbal section uses a combination of substitution and transposition ciphers, which would explain the higher entropy levels compared to other sections.
The Voynich text encodes a natural language using a null-cipher or homophones, where multiple glyphs map to the same plaintext character, which would explain the high hapax ratio (70.1%) and lower glyph entropy (3.86 bits) relative to expected natural language entropy while preserving Zipf-like structure.
The Currier A/B split reflects two different scribal hands encoding the same underlying language with different but related cipher alphabets, such that glyph-level bigram transition matrices in A and B sections are structurally isomorphic under a permutation mapping.
The zodiac and astronomical sections use a systematically different word-order encoding than herbal and recipes sections, reflecting a positional transposition cipher layer applied on top of substitution, detectable as reduced local bigram predictability at section boundaries relative to within-section transitions.
The high hapax ratio (70.1%) is partially artifactual, caused by consistent scribal abbreviation or word-compounding conventions where morphological suffixes are concatenated inconsistently, such that word-final glyph sequences 'y', 'n', 'l', 'r' function as detachable morphological markers — splitting words on these terminals would reduce unique vocabulary by at least 25%.
The Currier A/B split reflects two distinct scribal hands encoding the same language using different glyph-frequency profiles, such that the top-10 word overlap between A and B sections is less than 40%, and the word-initial glyph distributions of A and B differ by Jensen-Shannon divergence > 0.15
The text-only section (entropy 3.9016 bits, 7 folios) represents the closest approximation to natural language and encodes a different register or genre than the herbal/zodiac sections, such that its bigram entropy is statistically distinguishable (p < 0.05) from all other sections and its type-token ratio is highest among all sections
The zodiac section (entropy 3.7149 bits, lowest of all sections) uses a label-oriented encoding in which most word tokens are proper nouns or short nominal forms, producing a word-length distribution significantly shorter than the herbal section (mean word length < 4.5 vs herbal > 5.0) and a lower bigram entropy reflecting repetitive naming conventions
The text-only section (entropy 3.9016 bits, highest of all sections) represents unenciphered or lightly enciphered text in the underlying language, and its word frequency distribution should most closely match expected distributions for medieval Latin or northern Italian compared to all other sections.
The Currier A/B split corresponds to two different scribal hands applying the same underlying cipher to the same language but with different glyph allograph preferences, predictable via bigram transition matrix divergence between A and B corpora exceeding random variation.
The text-only section (entropy 3.9016 bits, highest of all sections) contains unenciphered or minimally enciphered natural language prose, distinguishable from other sections by significantly higher conditional entropy H(glyph | preceding glyph) that approaches values for unenciphered medieval Latin or Italian.
The Currier A/B scribal split represents two different scribes encoding the same underlying language using two distinct but related glyph substitution tables (i.e., a two-key polyalphabetic or two-alphabet substitution), such that the same plaintext phoneme maps to different glyphs in A vs. B — testable by checking whether word-length distributions and initial/final glyph frequencies in A and B are statistically compatible with a common underlying vocabulary after glyph remapping.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) is caused by a label-encoding convention in which each label token is drawn from a closed vocabulary of fewer than 150 distinct words, and the type-token ratio for zodiac-section words is significantly lower than for any prose section (herbal, recipes, text-only).
The Currier A/B dialect split reflects two scribal hands encoding the same underlying language using different homophonic substitution tables, where Currier A substitutes fewer glyphs per phoneme than Currier B, explaining B's larger word count (23,766 vs 11,022) via greater glyph variety per syllable.
The zodiac and label-heavy sections (zodiac entropy 3.7149 bits, astronomical 3.7471 bits) use a homophonic substitution scheme in which common plaintext letters map to multiple glyphs, suppressing entropy relative to the herbal and text-only sections, while the text-only section (entropy 3.9016 bits) uses a simpler monoalphabetic or unenciphered encoding.
The zodiac section's anomalously low entropy (3.7149 bits) and label-heavy structure indicate it uses a homophonic cipher with a reduced symbol set (fewer active glyphs per label), while the text-only section's high entropy (3.9016 bits) reflects unenciphered or lightly enciphered natural language prose — meaning these two sections require different decipherment strategies.
The manuscript's text structure reflects a mix of prose and verse, with the herbal section being primarily composed of short, rhyming couplets.
The Currier A/B split encodes two different plaintext languages (e.g., Latin in sections assigned to A, Italian or an Italian dialect in sections assigned to B), detectable by different glyph-level entropy and different word-initial/final bigram profiles between A and B corpora.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) reflects unenciphered or lightly enciphered natural-language prose, and its word bigram entropy is significantly higher than that of the zodiac and astronomical sections, consistent with natural syntactic variation rather than label repetition.
The Currier A/B split reflects a genuine linguistic difference between two distinct languages or dialects.
Word-initial glyph constraints (o, c, q, s, d) and word-final glyph constraints (y, n, l, r, o) are not phonological but structural tokens marking word boundaries or grammatical roles (e.g., prefixes and suffixes encoding case or tense), analogous to a morpheme-boundary cipher layered on top of a root encoding.
The recipes section (25 folios, 11,611 words, entropy 3.8586 bits — second highest and largest word count) encodes a list-structured text in which recurring syntactic templates produce predictable word-position bigrams, such that positional word-order entropy (entropy of word_n given word_{n-1} within a folio) is significantly lower in recipes than in the herbal or text-only sections.
The Currier A/B scribal split encodes the same underlying language but with two distinct homophonic substitution tables of different sizes: Currier A uses a smaller homophone set (lower token count, 11,022 words) with higher per-glyph entropy, while Currier B uses a larger homophone set producing lower per-glyph entropy, such that the weighted average glyph entropy of A-folios exceeds that of B-folios by at least 0.05 bits.
The Currier A/B split reflects a genuine linguistic difference between two distinct languages or dialects.
The Currier A/B split reflects a genuine linguistic difference between two distinct languages or dialects.
The zodiac section (entropy 3.7149 bits, lowest of all sections) uses a label-only encoding scheme where each word is a proper name or fixed label rather than running prose, producing a vocabulary distribution that deviates significantly from Zipf's law relative to other sections.
The dominant word-initial glyph constraint (o, c, q, s, d account for most word starts) reflects a cipher rule in which a fixed 'onset marker' glyph class must precede the root — analogous to a mandatory null prefix — rather than reflecting the initial phoneme distribution of the underlying language, which would produce a flatter initial-glyph distribution matching Latin or Italian onset frequencies.
The Currier A/B split reflects two scribes encoding the same underlying language using two different cipher keys or glyph assignments, not two different languages or dialects, such that word-position glyph statistics are structurally identical across A and B but the specific glyph inventories differ.
The manuscript's text structure reflects a mix of prose and verse, with the herbal section being primarily composed of short, rhyming couplets.
The dominance of word-initial glyphs o, c, q, s, d and word-final glyphs y, n, l, r, o reflects a Vigenere-style polyalphabetic cipher in which cipher-alphabet assignment is positionally determined within the word (position 1, position-final), producing artificial initial/final glyph constraints that do not reflect underlying language phonotactics.
The Currier A/B split reflects a genuine linguistic difference between two distinct languages or dialects.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-only encoding where Voynich words are one-to-one mappings to a closed set of astrological or calendrical terms (month names, zodiac signs, star names), rather than running prose, making it the most tractable section for frequency-matching decipherment.
The Voynich text uses a homophonic substitution cipher on medieval Latin or Italian, where multiple distinct glyphs encode the same plaintext letter, which would artificially inflate the unique-word count and produce the observed hapax ratio of 70.1%.
The word-initial glyph constraints {o,c,q,s,d} and word-final glyph constraints {y,n,l,r,o} are artifacts of a cipher that maps plaintext syllable-onset and syllable-coda phonemes to distinct glyph classes, not morphological or grammatical constraints, meaning that these positional biases are cipher-structural rather than language-structural.
The text-only section (7 folios, entropy 3.9016 bits — highest of all sections) represents unenciphered or minimally enciphered running text in a natural language, while all other sections apply an additional layer of encoding (transposition, null insertion, or homophone expansion) that suppresses entropy, meaning the text-only section should serve as the primary decipherment anchor.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) and the recipes section's second-highest entropy (3.8586 bits) reflect unenciphered or lightly enciphered running prose in the underlying natural language, while the diagrammatic sections (zodiac 3.7149, astronomical 3.7471, pharmaceutical 3.7772) reflect a more heavily enciphered or formulaic register — meaning entropy difference between sections correlates with cipher strength, not genre.
The Voynich text uses a homophonic substitution cipher on medieval Latin where 25 glyphs encode approximately 18-20 Latin phonemes, with 2-4 glyphs mapping to each high-frequency Latin phoneme (e, a, i, o, t, n), which would explain the observed entropy of 3.8627 bits — lower than plain Latin (~4.0) but higher than a monoalphabetic cipher.
The zodiac and astronomical sections (entropy 3.7149 and 3.7471 bits respectively, lowest of all sections) encode a label-and-numeral system where recurring short words function as positional labels (month names, star names, degree markers) rather than running prose, and these labels follow a fixed template grammar with ≤3 syntactic slots, explaining low entropy through high repetition of a small closed vocabulary.
The zodiac section's anomalously low glyph entropy (3.7149 bits vs. corpus mean 3.8627 bits) is fully explained by a restricted label vocabulary: the section predominantly contains short repeated labels for month names, zodiac figures, and positional markers rather than running text. This predicts that the zodiac section's type-token ratio and word length distribution differ significantly from prose sections.
The extremely high hapax ratio (70.1%) is produced by a systematic suffix-stripping or abbreviation convention in which a small set of word-final glyph sequences (e.g., -aiin, -dy, -ol) are optionally dropped, making many hapax legomena morphological variants of ~30% of the vocabulary rather than distinct words.
The text-only section (7 folios, entropy 3.9016 bits — highest of all sections) represents unenciphered or minimally-enciphered natural language, while all other sections apply an additional cipher layer, making text-only the most direct window into the underlying language and the optimal starting point for statistical language identification.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) is caused by a label-repetition encoding scheme in which a small set of high-frequency label words are systematically repeated around circular diagrams, and the section's word-frequency distribution is therefore best fit by a truncated Zipf distribution rather than a full Zipf law.
The extremely high hapax ratio (70.1%) is produced by a systematic null-suffix appended to a smaller core vocabulary: each base word receives one of a small set of suffixes (e.g., -y, -in, -ain, -dy, -edy) to produce surface tokens, meaning the effective vocabulary after stripping common suffixes would shrink to roughly 2,000–2,500 unique roots with a Zipf exponent closer to 1.0.
The Currier A/B split represents two different scribes encoding the same underlying Latin text using the same cipher but with different personal orthographic conventions — specifically, Scribe A preferentially uses one set of homophone variants while Scribe B uses a complementary set, causing the apparent 'dialect' difference to be a cipher-level artifact rather than a linguistic one.
The Currier A/B split encodes two different plaintext languages (e.g., Latin in Currier A sections, early Italian or Occitan in Currier B sections), with each language using the same substitution cipher key but differing in phonological inventory, producing measurable differences in bigram transition probabilities between the two corpora.
The text-only section (entropy 3.9016 bits, highest of all sections) represents unenciphered or minimally enciphered natural language prose, while lower-entropy sections use a homophonic expansion layer that artificially reduces entropy by replacing single plaintext glyphs with multiple ciphertext glyphs.
The Currier A/B split encodes two different scribal dialects of the same underlying language (not two different languages), where Scribe A uses a slightly different homophonic key than Scribe B, producing measurable differences in glyph bigram entropy between the two sub-corpora rather than differences in underlying vocabulary.
The Currier A and B sub-corpora encode the same underlying plaintext language using two different but structurally related substitution alphabets (i.e., same cipher family, different keys). This would produce similar Zipf exponents and bigram entropy profiles within each sub-corpus but systematically divergent glyph-level frequency distributions between them.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) combined with its small size (7 folios, 2,349 words) reflects less-compressed or less-enciphered natural prose — possibly a section where the scribe applied fewer homophones or abbreviations. This predicts that the text-only section's word frequency distribution follows a Zipf law more closely (R-squared closer to 1.0) than the cipher-heavy sections.
Coverage >=80% on the Voynich corpus is achievable by 1,300 phonotactically plausible nonsense skeletons (20-trial mean 83.56%, sigma 1.03pp) and therefore cannot, alone, be evidence of decipherment. Any abjad-reducible lexicon of 1,000-4,000 entries matching corpus word-length and bigram distributions clears this floor by construction.
EVA 'q' is a categorical word-initial marker, appearing word-initial in 98.9% of its 5,416 corpus occurrences. This is the strongest positional constraint of any EVA glyph, supporting Brady's H-BRADY-03 (q maps to Syriac waw / wa- conjunction) as a structural claim independent of language identification.
The Currier A and B sub-corpora encode the same underlying plaintext language using two different but systematically related cipher alphabets (a digraphic or keyed-variant cipher), such that glyph bigram transition probabilities in A and B are permutations of each other rather than independent distributions.
Currier A and Currier B use two structurally distinct cipher tables (not merely two scribal hands) encoding the same underlying language: the rank-order frequency distribution of glyphs in Currier A and Currier B should show high Spearman rank correlation (> 0.85) if encoding the same language, but the specific high-frequency glyphs in each sub-corpus should differ, indicating a key rotation between the two tables.
The Currier A and B sub-corpora encode the same underlying language but use two structurally distinct homophonic cipher tables, each mapping plaintext letters to different glyph sets, which explains why the A/B vocabularies share Zipf-law structure but differ in high-frequency word forms.
Preparation route (external topical application vs internal ingestion) does NOT explain the _.oii vowel-pattern fire rate on plant folios. External-classified folios (n=24) and internal-classified folios (n=54) have essentially identical mean _.oii rates (0.561% vs 0.557%, ratio 1.01x). One-tailed Welch's t-test p = 0.494, Cohen's d = 0.003, Mann-Whitney U p = 0.341. Bootstrap 95% CI on difference [-0.0046, +0.0048] straddles zero.
The extremely high hapax ratio (70.1%) is an artifact of systematic suffix variation: scribes appended variable word-final glyphs from the set {y,n,l,r,o} as inflectional or abbreviation markers, inflating apparent vocabulary size. Stripping these final glyphs should collapse unique word count toward natural-language hapax ratios (~40-50%).
The Currier A and B sub-corpora use two distinct cipher alphabets encoding the same underlying Latin plaintext, such that glyph-frequency distributions are statistically different between A and B but word-length distributions and entropy levels are statistically indistinguishable.
The 70.1% hapax ratio is substantially artifactual: stripping word-final glyphs drawn from the set {y, n, l, r, o} normalizes words to a stem form, reducing the unique vocabulary to roughly 30-40% of current count and the hapax ratio to below 45%, consistent with a systematic suffixation or abbreviation cipher.
The manuscript's text structure reflects a mix of prose and verse, with the herbal section being primarily composed of short, rhyming couplets.
The text uses a combination of substitution and transposition ciphers in the herbal section.
The manuscript's text structure reflects a mix of prose and verse, with the herbal section being primarily composed of short, rhyming couplets.
Currier A and Currier B use two structurally distinct cipher alphabets encoding the same underlying Latin plaintext: the two sub-corpora should exhibit the same bigram entropy and the same Zipf exponent once glyph-level correspondences are remapped
The extremely high hapax ratio (70.1%) is produced by a systematic suffix-agglutination cipher mechanism: a small closed vocabulary of roots (~500-800 types) is combined with a set of suffix glyphs drawn from the word-final constraint set {y, n, l, r, o}, generating surface forms that appear unique but share common roots.
The recipes section (25 folios, 11,611 words, entropy 3.8586 bits) encodes running prose rather than labels or lists, as evidenced by a word bigram conditional entropy significantly higher than label-heavy sections (zodiac: 3.7149, astronomical: 3.7471), and a type-token ratio and sentence-length distribution consistent with connected discourse rather than enumerated items.
Voynich scribes A and B divide the manuscript by section — Hand A writes the herbal (95 folios), pharmaceutical (16), and a handful of recipes (2); Hand B writes the biological (19), cosmological (3), recipes (23), and some herbal (32). Only herbal and recipes have both hands present. In recipes — the only content-shared section — the two hands use completely disjoint vowel-pattern dialects: 0 of 8 Hand-A markers overlap with 8 Hand-B markers. Hand A's marker vocabulary across all sections is 'o'-heavy (o/e ratio 2.00); Hand B's is more balanced (o/e 1.36). Both hands use 'eo'-containing patterns for preparation-related content (Hand A pharma, Hand B recipes).
The strong word-initial and word-final glyph positional constraints ({o,c,q,s,d} and {y,n,l,r,o}) reflect a syllabic or consonant-vowel template in the underlying language (e.g., CV or CVC structure), not a cipher artifact, and can be modeled as a Markov chain of order 2 that predicts glyph position within a word with accuracy > 80%.
The strong positional constraints on word-initial glyphs {o,c,q,s,d} and word-final glyphs {y,n,l,r,o} are artifacts of a structured nulls-and-affixes system rather than natural phonotactics: removing these positional glyphs as null markers should produce a residual corpus whose glyph entropy approaches Latin's ~4.0 bits
The Voynich text uses a homophonic substitution cipher on medieval Latin where 2-3 Voynich glyphs map to each Latin letter, which would reduce measured entropy below natural Latin (~4.0 bits) toward the observed 3.8627 bits while preserving Zipf-like word frequency distribution.
Currier A and Currier B sub-corpora have glyph unigram distributions that are more statistically divergent from each other than any random equal-sized split of the same combined corpus, confirming they encode under structurally distinct cipher tables rather than merely reflecting scribal style variation.
The Currier A and Currier B sub-corpora use two structurally distinct cipher alphabets that both encode the same underlying Latin plaintext, with Currier B employing a larger effective alphabet (more glyph diversity per word position) to explain B's larger word count (23,766 vs 11,022) without a proportional increase in semantic content.
The anomalously low entropy in the zodiac (3.7149 bits) and astronomical (3.7471 bits) sections relative to the text-only section (3.9016 bits) is produced by a label-encoding convention in which short positional labels (star names, month labels) are drawn from a restricted glyph sub-alphabet, reducing effective entropy rather than reflecting a different cipher.
Word-initial glyph constraints (o, c, q, s, d dominating) and word-final glyph constraints (y, n, l, r, o dominating) are not merely phonotactic but reflect a codebook structure in which word-initial glyphs encode semantic category (e.g., plant part, action, quantity) and word-final glyphs encode grammatical role (e.g., noun, verb, adjective), such that co-occurrence of specific initial+final glyph pairs is non-uniform and significantly exceeds chance across all sections.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-registry encoding where each word is drawn from a small closed vocabulary of astrological terms (< 50 unique words), producing a highly non-uniform unigram distribution unlike the rest of the manuscript.
EVA 'm' (1,055 tokens) and 'g' (127 tokens) are suffix/final-marker glyphs, extending the suffix class beyond {y,n,l,r} per H023. Final-position rates: m 93.6%, g 83.5%. EVA 'l' (previously assumed final-dominant) is actually balanced (53.6% final / 32.6% mid) and should be demoted from the suffix class.
Word-order syntactic structure is absent from Voynichese. Across all tested lexicons (Schechter Latin, Brady Syriac proxy, Hebrew medieval, Brain-V v1), in-order decoded text scores equal to or LOWER than across-corpus-shuffled decoded text on the connector-to-content bigram metric. Both-matched-adjacency shows a small positive cluster effect (+0.003 to +0.028 pp) that is lexicon-size-monotonic and therefore a topical-clustering artefact, not grammatical signal.
Non-vowel structural features (q-initial flag, suffix class, bench-gallows presence, skeleton length, line-position, plain-gallows line-initial) individually add at most +1.9pp to section-prediction accuracy and collectively add only +3.6pp over majority baseline. Combining them with vowel-pattern features slightly DEGRADES performance (38.9% vs 40.1% vowel-only). Per-class precision collapses to only herbal/recipes; the classifier never predicts the other 6 sections.
A simple 3-ring volvelle (6 prefixes x 26 roots x 8 suffixes) with per-section root-cartridge swap reproduces 4 of 7 Voynich statistical properties including vowel-section chi-square coupling (90% vs real 79%). It fails Zipf exponent (0.18 vs 0.65) and hapax ratio (0.24 vs 0.70) by large margins. The vowel-section coupling Brain-V treated as its strongest positive structural finding (H-BV-VOWEL-01) is NOT uniquely diagnostic of meaning: a content-free volvelle mechanism produces it trivially via section-specific cartridges.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-only register drawn from a vocabulary of fewer than 200 distinct word types — a much smaller effective vocabulary than other sections — producing a Zipf exponent significantly above the corpus-wide value of 0.8946.
The recipes section (25 folios, 11,611 words, entropy 3.8586 bits — the largest section by word count) encodes a natural-language text with less cipher transformation than the herbal section, as evidenced by its higher entropy and larger word count approximating the statistical footprint of an unenciphered medieval recipe corpus.
The text-only section (entropy 3.9016 bits, 7 folios) represents unenciphered or minimally enciphered running text in medieval Latin or Northern Italian, and its word-frequency distribution should match a Zipf exponent significantly closer to 1.0 than the manuscript-wide exponent of 0.8946
Currier A and Currier B use two structurally distinct homophonic cipher tables mapping a single underlying Latin text, such that the bigram transition matrices of Currier A and B are statistically dissimilar (chi-squared p < 0.01) but both exhibit second-order entropy compatible with Latin (approximately 3.6-4.0 bits per glyph at order-2).
The recipes section (25 folios, 11,611 words, entropy 3.8586 bits) and the herbal section (129 folios, 10,872 words, entropy 3.8478 bits) share a common cipher and possibly the same underlying language, while the biological section (19 folios, 6,315 words, entropy 3.7977 bits) uses a distinct encoding, as evidenced by its lower entropy and different word-length distribution.
The text uses a combination of substitution and transposition ciphers in the biological section.
Currier A and Currier B employ two structurally distinct cipher alphabets in which a core set of glyphs is shared but a subset of approximately 5-8 glyphs is exclusive or predominantly exclusive to each sub-corpus, producing measurable vocabulary intersection below 60% when controlling for word length.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) is produced by a label-only encoding where each token is drawn from a small closed vocabulary of fewer than 50 distinct labels, rather than running prose
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-register vocabulary of fewer than 150 distinct content words, with a word-frequency distribution that fits a truncated power law rather than full Zipf, consistent with a glossary or label list rather than running prose.
The 70.1% hapax ratio is substantially artifactual: a systematic suffix composed of word-final glyphs drawn from {y, n, l, r, o} acts as a grammatical or cipher suffix, and stripping these final glyphs from all word tokens reduces the effective vocabulary size and hapax ratio to levels consistent with natural language (~30–45% hapax).
The text-only section (7 folios, entropy 3.9016 bits — highest of all sections) represents a functionally distinct register encoded with minimal or no transposition, i.e., a near-plaintext or lightly enciphered layer, while lower-entropy sections (zodiac 3.7149, astronomical 3.7471) use additional transposition steps that suppress entropy.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) is explained by a label-oriented encoding in which the same small set of ~15–20 root words are repeatedly inflected and reused as astronomical labels, such that the top 20 most frequent words in the zodiac section account for over 55% of all word tokens in that section
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-encoding regime in which most tokens are drawn from a restricted lexicon of fewer than 80 unique word types, producing a word-frequency distribution with Zipf exponent significantly steeper (greater than 1.2) than the full corpus (0.8946), consistent with a closed enumeration rather than free prose.
The manuscript uses a combination of substitution and transposition ciphers in the biological section.
The text-only section (7 folios, entropy 3.9016 bits, highest of all sections) encodes unenciphered or lightly enciphered text in a language with entropy naturally near 3.9 bits, consistent with medieval Italian vernacular (estimated entropy 3.85-3.95 bits) rather than Latin (estimated ~4.0 bits) or a heavily enciphered text.
Currier A is structurally distinct from Currier B at the lexical-accessibility level, resisting decipherment across three independent language hypotheses: Schechter Latin (B-A gap +8.21pp), Brady Syriac proxy (+3.92pp), Hebrew medieval (+3.07pp). B consistently fits lexicons better than A by 3-8 percentage points regardless of source language.
Plant illustration properties (use class: toxic/food/medicinal; geographic origin: Mediterranean vs non-Mediterranean; plant family) do NOT correlate with measurable text properties (mean word length, glyph entropy, q-initial rate, gallows-initial-line rate, top vowel-pattern frequency, _.oii fire rate) on the same folio. Across 24 tests (4 botanical axes x 6 text features, n=117 plant folios), ZERO reach p<=0.05. The Perun visual-key hypothesis is not supported at this resolution.
The text uses a combination of substitution and transposition ciphers in the biological section.
The zodiac section's anomalously low entropy (3.7149 bits) and label-heavy structure reflect a fixed-vocabulary encoding in which each label token is drawn from a closed set of fewer than 50 distinct label templates, with high within-label glyph redundancy serving as a positional marker rather than encoding phonemic content.
The zodiac and astronomical sections use a label-register encoding — short fixed-length labels drawn from a closed vocabulary — rather than running prose, which would produce anomalously low entropy (observed: zodiac 3.7149 bits, astronomical 3.7471 bits) and a high type-token ratio relative to sections with running text.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) results from a label-only encoding where each glyph token maps to one of a small closed set of calendar or ordinal terms (month names, numerals, star names), making the zodiac section structurally independent from the prose cipher used in herbal and recipes sections.
The Currier A and Currier B sub-corpora use two structurally distinct cipher tables encoding the same underlying Latin plaintext, evidenced by differing word-initial glyph frequency distributions between the two sub-corpora that are nonetheless both consistent with Latin word-onset phoneme distributions.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) reflects an underlying plaintext that has undergone minimal or no transposition — i.e., it is a pure substitution cipher or near-plaintext — whereas sections with lower entropy (zodiac 3.7149, biological 3.7977) involve additional transposition steps that reduce apparent glyph entropy by increasing local repetition.
The Voynich text uses a homophonic substitution cipher where word-final glyphs {y, n, l, r, o} function as morphological suffixes encoding inflectional endings of a single underlying Latin text, causing artificial vocabulary inflation and explaining the 70.1% hapax ratio.
Currier A and Currier B employ two structurally distinct cipher alphabets encoding the same underlying Latin plaintext: the two sub-corpora should exhibit near-identical word-length distributions and Zipf exponents but divergent bigram transition matrices, consistent with two key tables applied to the same plaintext.
The Currier A and Currier B sub-corpora use two structurally distinct homophonic cipher tables encoding the same underlying Latin text: Currier A uses a wider homophone set per plaintext letter (lower per-glyph entropy ~3.75 bits) while Currier B uses a narrower set (higher per-glyph entropy ~3.90 bits), producing the observed entropy difference between the two dialects
The high hapax ratio (70.1%) is substantially artifactual: word-final glyphs drawn from the set {y, n, l, r, o} function as grammatical suffixes, and stripping them would reduce the effective vocabulary by at least 30%, collapsing hapax rate to below 50%.
The high word-initial glyph constraint ({o,c,q,s,d} dominating word starts) and word-final glyph constraint ({y,n,l,r,o} dominating word ends) are artifacts of a Vigenere-like polyalphabetic cipher where the key resets at word boundaries, causing the first and last characters of each word to be systematically drawn from whichever cipher-alphabet rows correspond to the key's initial and terminal positions.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) reflects unenciphered or minimally enciphered natural language prose, and its glyph unigram distribution should show statistically significantly less deviation from the expected distribution of a known natural language (medieval Latin or Italian) than any other section
The high hapax ratio (70.1%) is substantially produced by a systematic suffix morphology where word-final glyphs {y, n, l, r, o} function as inflectional suffixes on a smaller stem vocabulary, such that stripping final glyphs from the set {y, n, l, r, o} would reduce the unique word count by at least 35% while preserving a stem vocabulary with Zipf exponent closer to 1.0.
The Voynich glyphs encode a homophonic substitution cipher on medieval Latin, where the 25 glyphs map to a Latin alphabet expanded with homophones for high-frequency letters (e, a, i, t, s), reducing glyph entropy below the expected Latin ~4.0 bits to the observed 3.8627 bits.
The text-only section (7 folios, entropy 3.9016 bits, highest of all sections) represents unenciphered or lightly enciphered natural language prose, while all other sections apply an additional transposition or homophonic layer on top of a base substitution cipher, producing their characteristically lower entropy.
The high hapax ratio (70.1%) is substantially artifactual: word-final glyphs drawn from {y,n,l,r,o} function as inflectional suffixes, and stripping the final glyph of every word reduces unique word types by at least 30%, collapsing the vocabulary toward a size consistent with a 3,000–5,000-root natural language
Plain gallows characters EVA 't' and 'p' are non-phonetic paragraph/section markers, not consonants: 'p' shows 5.4x line-initial enrichment (70.9% vs 13.1% baseline), 't' shows 3.2x (42.3%), while bench gallows (cth/ckh/cph) appear in normal mid-word positions.
Glyph positional constraints (specific glyphs appearing predominantly word-initially vs word-finally) reflect a syllabic or consonant-vowel encoding structure rather than grammatical affixes, where initial glyphs encode consonant onsets and final glyphs encode vowel codas of a CV or CVC syllable scheme.
The text-only section's elevated entropy (3.9016 bits, highest of all sections) reflects a prose register of the underlying natural language encoded with minimal or no transposition, while the herbal and biological sections apply additional transposition or null insertion on top of the base substitution cipher, reducing their effective entropy.
The text-only section (7 folios, entropy 3.9016 bits — highest of all sections) represents either unenciphered text or a significantly weaker cipher than the illustrated sections, such that its inter-word mutual information is statistically higher than in the herbal or biological sections, betraying more plaintext syntactic structure.
The 70.1% hapax ratio is primarily artifactual: word-initial glyphs from {o,c,q,s,d} and word-final glyphs from {y,n,l,r,o} are morphological affixes, not root characters. Stripping one character from each end when it belongs to these sets will reduce the unique root vocabulary below 2,500 types and the hapax rate below 40%, consistent with inflectional morphology in a natural language.
The anomalously high hapax ratio (70.1%) is substantially produced by a systematic word-final suffix drawn from the set {y, n, l, r, o} that is a cipher artifact (e.g., a null, padding, or word-delimiter glyph), such that stripping that final glyph from all words reduces the unique-word count by at least 30%.
Short skeletons (1-2 consonants) cover 42.9% of corpus tokens and are irreducibly ambiguous, meaning ANY decipherment hypothesis using consonant-skeleton methodology has ceiling ~57% on word-level accuracy without a vowel-disambiguation layer.
Three vowel-pattern rules survive honest held-out validation at 70%+ precision: '_.o._.o' -> herbal (82.4% precision on 34 held-out fires), '_._.eee' -> recipes (72.7% on 11 fires), '_.e.ai' -> recipes (70.6% on 17 fires). These three rules together cover 0.8% of held-out tokens at 77.4% aggregate precision. They are Brain-V's first honestly-validated decipherment fragments.
The Voynich text uses a homophonic substitution cipher on medieval Latin or Italian, where 2-4 glyphs map to each high-frequency plaintext letter, reducing glyph entropy below what a simple substitution would produce (~4.0 bits) and inflating the hapax ratio by producing artificial variant spellings of the same underlying word.
The dominant word-initial glyph constraints {o, c, q, s, d} are artifacts of a cipher mechanism that encodes plaintext vowels (a, e, i, o, u) as these five glyphs in word-initial position, with each cipher glyph mapping to a single plaintext vowel, producing a detectable co-occurrence pattern between word-initial cipher glyphs and word-final glyphs that mirrors vowel-consonant harmony in Latin or Italian.
The glyph positional constraints (word-initial set {o,c,q,s,d}, word-final set {y,n,l,r,o}) are produced by a Vigenère-style polyalphabetic cipher where the key position within a word determines the allowable glyph set, such that position 1 maps to a restricted alphabet and the final position maps to a different restricted alphabet, compressing the effective entropy at word boundaries.
Per-section Zipf exponents vary monotonically with section entropy: the text-only section (entropy 3.9016 bits) has a Zipf exponent closest to 1.0, the zodiac section (entropy 3.7149 bits) has the flattest Zipf exponent (furthest below 1.0), and all other sections fall in between—consistent with the text-only section being least transformed from natural language and label-heavy sections being most compressed or formulaic.
The glyph positional constraints (word-initial {o,c,q,s,d}, word-final {y,n,l,r,o}) are not cipher artifacts but reflect systematic morphological prefixes and suffixes of the underlying plaintext language. Specifically, the initial constraint distribution should match the expected frequency of Latin or Italian morphological prefixes (ob-, con-, sub-, de-, ad-) better than a random cipher assignment would predict.
The Zipf exponent of 0.8946 (below the natural-language baseline of ~1.0) is caused by systematic word-level homophones: multiple surface word forms encode the same plaintext word, inflating low-frequency tail counts. Collapsing homophones defined by shared word-initial bigram and word length would restore the Zipf exponent to >= 0.95.
Words containing gallows glyphs (EVA t, p, k, f and bench variants cth, ckh, cph) occupy structurally distinct syntactic positions: they appear predominantly as the first content word after line-initial position and are followed by non-gallows words at a rate significantly higher than the corpus baseline, consistent with gallows functioning as topic-marking or noun-phrase-head indicators rather than phonemes.
EVA vowel patterns are non-randomly distributed across manuscript sections at the aggregate level (chi-square significant at p<0.01 in 55/70 testable skeleton groups) but this distributional coupling does NOT translate to per-token section prediction on held-out folios. The narrower surviving result: sparse high-precision vowel-pattern rules exist (e.g. '_.eo' is pharma-modal with rule-precision 0.827 on held-out data) but fire on only ~40% of tokens, so F1 is below always-predict-majority baselines.
The Voynich script encodes plaintext using a systematic nulls-and-abbreviations scheme in which roughly 20–25% of all word tokens are null words (carrying no semantic content) inserted at predictable positional intervals, detectable by the fact that the most frequent short words (length 1–2 glyphs, e.g., 'y', 'ol', 'ar', 'or') appear at inter-word positions with a non-random distribution inconsistent with natural language function words
The word 'daiin' (799 occurrences, rank 1) decomposes as 'da' (give) + 'in' (in water), a pharmaceutical instruction appearing after plant-part descriptions. This morphological decomposition is systematic across the corpus.
Within Hand A folios, vowel pattern '_.oii' fires at elevated rate on plant-identified folios. Headline corpus-wide enrichment (5.04x) is Hand-A-inflated: Hand B shows zero _.oii across both plant (n=25) and non-plant (n=7) herbal folios. Within-Hand-A plant-vs-non-plant enrichment is only 1.72x (n=88 plant vs 8 non-plant, within-hand t-test p=0.258, not significant at alpha=0.05). The 5-fold CV stability established in v2 applies to the Hand-A plant subset, not to plants generally. Quire and length confounds are ruled out; Currier hand is not.
EVA vowel choice within a fixed consonant skeleton is section-linked. Across 70 skeleton groups with >=3 vowel variants and >=100 tokens each, 55 (78.6%) show section-distribution chi-square significant at p<0.01. Headline case 'kdy' (Brady's chedy/chody): chi2=262.17, df=28, critical 50.89, i.e. 5.15x over threshold.
The text is encoded using a simple substitution cipher on an unknown language.
The text is encoded using a simple substitution cipher on an unknown language.
The glyph positional constraints (word-initial {o,c,q,s,d} and word-final {y,n,l,r,o}) are artifacts of a Vigenere-family polyalphabetic cipher in which word boundaries reset the key, causing systematic glyph-position biases that are not present in the underlying plaintext.
The dominant word-initial glyph constraints {o, c, q, s, d} and word-final constraints {y, n, l, r, o} are artefacts of a Vigenere-like polyalphabetic cipher with a key length of 3-7 characters, such that glyph positional bias within words is a function of key-phase position rather than the underlying language's phonotactics, and the index of coincidence computed within each key-phase position would be elevated (above 0.065) relative to the overall corpus index of coincidence.
The dominant word-initial glyph constraint (o, c, q, s, d accounting for most word starts) and word-final constraint (y, n, l, r, o) reflect a Vigenere-type polyalphabetic cipher in which the cipher alphabet at word boundaries is fixed, causing structural glyph repetition that is unrelated to plaintext phoneme distribution at those positions.
The text uses a simple substitution cipher on an unknown language.
78.7% of herbal folios have a unique skeleton in position B (word immediately following a gallows marker), with lower lexicon match rates (52.6% vs 65.8%) and higher hapax (76.6% vs 67.8%) — consistent with plant-name headings behind gallows.
The random-permutation baseline of 71.5% coverage is anomalously high — random permutations of 10 consonants against a 1,334-entry lexicon should not match >70% of arbitrary token streams. This suggests the lexicon is over-permissive and the 15.4pp gap to true mapping is smaller than typical for a genuine signal.
The encoding drops pharyngeal consonants (Syriac het and ayin have no EVA representation) — consistent with a non-Semitic (European) copyist who lacks these phonemes.
EVA vowel characters (a,o,e,i), currently stripped by the consonant-skeleton pipeline, encode real phonetic distinctions in Syriac vowels: EVA 'e'→ī, EVA 'o'→o/u, EVA 'a'→a/ā. Disambiguation layer resolves 7,007 tokens (20% of corpus) at >=70% confidence.
74.4% of Voynich plant illustrations match Mediterranean species at >=80% confidence: 100% Mediterranean flora, 0% New World. Plant families: Lamiaceae 33%, Asteraceae 22%, Solanaceae 21%, consistent with Dioscorides pharmaceutical tradition.
The text is encoded using a simple substitution cipher on a language other than Latin or Italian.
The Syriac temporal adverb kaddīn (skeleton kdy) appears 1,326 times (3.8% of corpus) while the Jewish Babylonian Aramaic equivalent kəḏēn (skeleton kdn) appears 178 times — an 88:12 ratio supporting primarily Syriac text with minor JBA influence.
Herbal pages describe Galenic pharmaceutical properties (kyānā 'nature', ṭārā 'press/apply') rather than botanical catalogs — adding Löw botanical entries (42 terms) increased herbal coverage by only 0.1pp, and top decoded words match pharma page vocabulary.
Toxic plant folios (15 specified Latin names, 14 with >=20 tokens) show elevated _.oii vowel-pattern fire rate: toxic 1.08% vs non-toxic 0.34% (3.15x ratio). One-tailed Welch's t-test p = 0.060, Cohen's d = 0.73, Hedges' g = 0.72. Bootstrap 95% CI on mean difference: [-0.00006, +0.01649] — misses excluding zero by 6 parts per million. The pre-registered alpha=0.05 threshold is NOT met, but the effect size is medium-to-large and the direction matches prediction. Signal is bimodal: 5 of 14 toxic folios fire heavily (Paris quadrifolia 4.2%, Rhododendron 4.8%, Delphinium 1.9%, Euphorbia 1.7%, Cuscuta 1.6%, Nymphaea 0.9%), 9 fire at zero.
EVA vowel patterns encode section/domain content independent of the consonant skeleton. Specifically: pattern '_.eo' predicts pharmaceutical section modality at 100% across 8 unrelated skeletons; pattern '_.o.e' predicts biological at 100% across 3 skeletons; pattern '_.e' predicts biological at 58% across 12 skeletons (vs 30.5% random baseline). A naive-Bayes classifier using vowel-pattern features alone achieves 27.2% 5-fold CV accuracy vs 21.1% majority baseline (+6.1pp).
The Voynich text uses a homophonic substitution cipher on medieval Latin where 2-3 Voynich glyphs map to each Latin letter, accounting for the observed entropy gap (3.8627 bits actual vs ~4.0 bits expected for Latin) and the anomalously high hapax ratio (70.1%) caused by variant glyph combinations encoding the same Latin word.
The Voynich Manuscript encodes an Aramaic pharmaceutical text in the Syriac tradition, using a consonant-skeleton abjad with stripped vowels, matching 86.9% of 35,259 filtered tokens against a 1,334-entry Syriac pharmaceutical lexicon.
The Voynich text uses the Sergian translation tradition (6th century Syriac), evidenced by: (a) tak-sa for dynamis (not haylā), (b) <=8 Greek loanwords (40 tokens total, all short transliterations), (c) native Syriac plant names dominate.
Syriac āsyā ('physician') dominates JBA rappā 37:1, confirming primarily Syriac rather than JBA tradition.
The encoding reflects a 15th-century European scribe transcribing a Syriac pharmaceutical source: kaph/qoph merger (velar/uvular confusion), pharyngeal drop (het/ayin absent), gallows as paragraph markers — all consistent with a non-Semitic copyist using a purpose-built script.
EVA 'q' maps to Syriac waw, producing the conjunction wa- ('and') at 14.9% of tokens — a frequency consistent with attested Syriac prose.
Nine Syriac pharmaceutical phrase-structure templates (TREATMENT, RECIPE_ACTION, etc.) match 109 decoded passages across all 225 folios, clustered on pharma and biological pages (f75v, f84r, f102r2, f102v2, f107r).
Jaccard Index between Voynich vocabulary and proposed decipherment vocabulary is J~0.08, independently confirmed. Low but non-zero overlap indicates partial but real linguistic correspondence.
The text uses a simple substitution cipher on medieval Latin, which would produce glyph entropy ~4.0 bits and word-final glyph distribution matching Latin word endings.
The Voynich script is a semi-syllabic system derived from Balkan scribal traditions, where complex glyphs are built from base characters plus modifiers (loops, triangles). The giant P glyph = N (known Balkan value), with loop = NO. The 4-shaped glyph = D, with triangle = DN, with loop = DNO. The g-shaped glyph = je/j/ja/ju. Combined glyphs produce Serbo-Croatian words (e.g., g + modified-P = jedan/jedno meaning 'one').
The manuscript uses a null-derived word-construction rule (e.g., a gallows-glyph prefix system) such that a small set of ~5 prefix glyphs accounts for > 60% of all word tokens, producing the observed Zipf exponent of ~0.89 via combinatorial explosion of a limited stem vocabulary
The 70.1% hapax ratio is substantially artifactual: stripping word-final glyphs from the set {y,n,l,r,o} reduces unique word types by at least 40%, revealing a core vocabulary consistent with a homophonic substitution cipher on medieval Latin or Italian
The high hapax ratio (70.1%) is substantially artifactual: word-final glyphs drawn from {y,n,l,r,o} function as inflectional suffixes, and stripping them reduces unique word types by at least 40%, collapsing the effective vocabulary toward the ~2,500-3,500 unique stems expected in a medieval herbal or recipe corpus.
The zodiac section's anomalously low entropy (3.7149 bits, lowest of all sections) reflects a label-heavy encoding in which a large fraction of tokens are proper-name labels (star names, month names) encoded with a homophonic cipher, producing reduced glyph entropy relative to prose sections.
The Voynich Manuscript encodes a late medieval Moravian (Old Czech/Slavic) dialect using a constructed cipher alphabet derived from Latin, Glagolitic, and Cyrillic characters. Folio 14v (Acanthus mollis) yields medicinal instructions consistent with period phytotherapy when decoded as Moravian, and the final row contains the Gujarati/Gojri plant name Adulsa Vasa.
The Voynich Manuscript is a trilingual pharmaceutical compendium encoding plant-based medicine in a mixed Latin/Greek/Arabic scribal system, with a 168-term dictionary covering ~80% of the text.
Parked (7)
Unproven but not debunked. May be revisited with new evidence or methods.
Attempted systematic cryptanalysis with the most sophisticated military techniques of the era. Friedman hypothesized the text was a constructed/synthetic universal language, not a cipher of a natural language.
Systematic analytical effort using Cold War-era cryptanalytic techniques. No specific decipherment claim — produced statistical analysis.
The text could encode a Malay or Southeast Asian language, based on word structure similarities to Austronesian languages with prefixing and suffixing morphology
Using botanical anchoring (identifying plants, then mapping their names to glyph sequences), identified approximately 14 characters and two words. Proposed a natural language encoding.
The manuscript is written in phonetic Old Turkic, containing medical and botanical content
The manuscript was created in early 15th-century northern Italy, possibly by Antonio Averlino (Filarete), using a verbose/compressed cipher technique
Proposed decipherment using a cipher system called the Naibbe cipher, with supplementary materials published alongside a peer-reviewed paper
Eliminated (24)
These approaches have been tested and failed. Brain-V will not re-test them.
The extremely high hapax legomena ratio (70.1% of 8,261 unique word types) is generated by a systematic scribal abbreviation convention — specifically, word-final glyph truncation — rather than representing a genuinely large underlying vocabulary, such that reconstructing truncated forms by appending the statistically most probable word-final glyph to hapax tokens would reduce the hapax ratio to below 50% and increase Zipf fit R².
The extreme hapax ratio (70.1%) is produced by a systematic word-boundary segmentation error in the transcription, where a small set of suffix or prefix tokens are being attached to base words inconsistently, artificially inflating vocabulary size by 30-45%.
The Voynich text uses a homophonic substitution cipher on medieval Latin where 2-3 Voynich glyphs map to each Latin letter, which would reduce observed glyph entropy below true Latin entropy (~4.0 bits) by approximately log2(avg_homophones) bits while preserving Zipf scaling.
The extremely high hapax ratio (70.1%) is partially produced by a systematic scribal abbreviation convention where a base word form appears fully in first use and is abbreviated (via suffix truncation or initial-letter substitution) in subsequent uses within the same folio or page, making most 'unique' word types artifactual variants of a smaller true vocabulary.
The zodiac and astronomical sections use a label-encoding scheme where words are proper nouns or fixed labels (star names, month names, zodiac terms) rather than running prose, which explains their anomalously low entropy (3.7149 and 3.7471 bits) as a reduced effective vocabulary with high repetition of a small label lexicon.
The high hapax ratio (70.1%) is produced by a systematic scribal abbreviation scheme in which word-final suffixes are consistently dropped or truncated, such that what appear as unique words are abbreviated forms of a smaller set of ~2,400 base words — consistent with the observed vocabulary size being roughly 3.4x larger than expected for a natural language corpus of 38,053 words.
The high hapax ratio (70.1%) arises from a systematic abbreviation or suffix-stripping convention rather than from a large vocabulary, such that each Voynich 'word' represents a root plus a dropped inflectional ending — testable by checking whether hapax forms cluster around specific glyph-final patterns that would correspond to stripped suffixes.
The high hapax ratio (70.1%) is largely an artifact of a systematic abbreviation or truncation cipher, where scribes consistently dropped word-final syllables or morphemes, causing each abbreviated form to appear unique even though the underlying vocabulary is much smaller.
The high hapax ratio (70.1%) is primarily an artifact of systematic scribal abbreviation, where common root words are truncated with consistent suffixes, making truncated forms appear unique when they share stems with high-frequency words.
The high hapax rate (70.1%) is produced by a systematic nulls-insertion or verbose encoding scheme — specifically, that hapax words are morphologically related to non-hapax words by addition or substitution of a single terminal glyph, such that > 50% of hapax words have a Levenshtein distance of 1 from a high-frequency word
The manuscript's text structure reflects a single, coherent narrative.
The text encodes Hebrew using a substitution system
The manuscript is written in 'proto-Romance' — a proposed now-extinct spoken precursor to modern Romance languages. No cipher involved. Content is a reference compendium for a Dominican nun about women's health.
The text was generated by self-citation: the scribe copied and modified words from elsewhere in the manuscript while writing, producing structured but meaningless text
The plants depict New World (Mesoamerican) species and the text is written in Nahuatl (Aztec language). Claimed to identify 37 plants as species from Mexico.
The manuscript was written by young Leonardo da Vinci using a simple substitution cipher with mirror writing, encoding Italian text
Statistical analysis showing the text was generated by a stochastic random process, supporting the hoax hypothesis
The manuscript is a hoax. Text with Voynich-like statistical properties could be generated using a Cardan grille over a table of syllables. Proposed Edward Kelley as the hoaxer.
The manuscript was a Cathar liturgical manual for the Endura ritual of assisted suicide, written in a creole of Flemish, Old French, and Old High German
The manuscript was written in Ukrainian without vowels, containing letters about the fall of a Ukrainian kingdom
The manuscript used multiple simple substitution ciphers (different keys on different pages) encoding Latin and/or early Italian. Claimed to have partially decoded plant names.
The manuscript was authored by Anthony Ascham (16th-century English physician) using a double arithmetic progression polyalphabetic cipher encoding English text
The manuscript uses a simple substitution cipher encoding abbreviated medieval Latin
The manuscript was written by Roger Bacon using a microscopic shorthand cipher embedded in pen strokes, encoding descriptions of cells under a microscope and the Andromeda nebula