Computational Methods

NLP approaches to style analysis: stylometry, formality detection, readability metrics, and Biber's multidimensional analysis. The tools that let us measure orality at scale.

Stylometry & Authorship

Multitask Learning for Stylometry
NAACL 2021 · Deep learning authorship attribution
PAN 2023 Authorship Verification
Annual shared task on authorship
Stylometry and the Federalist Papers
Mosteller & Wallace, 1963 · Classic statistical authorship

Biber's Multidimensional Analysis

Variation across Speech and Writing
Biber, 1988 · Cambridge · The MDA framework
Dimensions of Register Variation
Biber, 1995 · Cross-linguistic MDA
Longman Grammar of Spoken and Written English
Biber et al., 1999 · Longman · Corpus grammar reference
Register, Genre, and Style
Biber & Conrad, 2009 · Cambridge · Textbook on register analysis

Formality & Style Transfer

GYAFC: Grammarly's Yahoo Answers Formality Corpus
Rao & Tetreault, NAACL 2018 · Benchmark dataset

Readability & Coh-Metrix

Coh-Metrix: Analysis of Text on Cohesion and Language
McNamara et al., 2014 · Behavior Research Methods
Coh-Metrix Tool
Online tool for text analysis
A New Readability Yardstick (Flesch)
Flesch, 1948 · Journal of Applied Psychology · The original formula
The Dale-Chall Readability Formula
Dale & Chall, 1948 · Educational Research Bulletin

Transformers for Text Analysis

BERT: Pre-training of Deep Bidirectional Transformers
Devlin et al., arXiv:1810.04805 · NAACL 2019
BERTweet: A Pre-trained Language Model for English Tweets
Nguyen et al., arXiv:2005.10200 · EMNLP 2020

Lexical Analysis Tools

TAALES: Tool for Automatic Analysis of Lexical Sophistication
Kyle & Crossley · Free tool with documentation
TAALED: Tool for Automatic Analysis of Lexical Diversity
Kyle et al. · MTLD, vocd-D, and more
Academic Word List (AWL)
Coxhead, 2000 · 570 word families for academic English
← back to /learn