Methodology
The score measures where a given text falls on Walter Ong's oral-literate spectrum.
Oral cultures (without writing) developed language patterns to aid memory: repetition, rhythm, concrete imagery, direct address, formulaic phrases. You can't look back at a page, so you reinforce, repeat, and keep it vivid.
Literate cultures developed patterns that exploit writing: complex subordination, abstract terminology, hedged assertions, nested clauses. You can re-read, so you can pack more in.
These reflect different cognitive modes, not just stylistic choices. The score captures how much a text relies on one mode versus the other.
Training labels for documents were derived through a pairwise tournament. We assembled a corpus spanning genres (epic poetry, philosophy, speeches, podcasts, journalism, social media) and compared them head-to-head via the Claude API. Each comparison asked: "Which of these two texts is more oral?" with no numeric anchoring.
The TrueSkill ranking algorithm converted these binary judgments into a normalized 0-100 scale. Pairwise comparison avoids the calibration problem of absolute scoring: Claude doesn't need to know what "50" means, only which text is more oral than another.
Sentence annotations were generated by prompting Claude to identify specific markers in each document. We defined 68 marker types based on Ong's framework, split between oral and literate categories:
Each span was labeled with its marker type and classified as oral or literate. Both the document scores and sentence annotations were used to train BERT models that generalize these patterns to new text.
The final score combines two signals:
Holistic document judgment: A BERT model with a sigmoid-bounded regression head, trained on tournament scores. It looks at the full text and outputs a score naturally constrained to the 0-1 range.
Sentence-level classification: A BERT classifier that labels each sentence as oral or literate based on its markers. The proportion of oral sentences contributes to the final score.