Metaslop-cop/references/sources.md

Sources

~50 published sources backing every pattern. Peer-reviewed linguistics, vendor methodology, practitioner literature, viral takedowns.

Raw on GitHub ↗·3,236 words·30 KB

slop-cop — Research Sources

This skill is grounded in roughly 130 published sources across two axes:

AI-Slop axis (sections 1–5): roughly 50 sources spanning peer-reviewed linguistic research, the canonical Wikipedia "Signs of AI writing" guide, AI-detector vendor methodology, practitioner literature, and viral takedowns in popular press.
Comprehension axis (sections 6–10): roughly 80 sources spanning readability formula primary literature, cognitive load and working memory psychology, plain-language standards from government and accessibility bodies, web-reading research, and writing-craft canon.

The catalog is current through early 2026. Each source is annotated with a one-line description of what it contributes. URLs are listed verbatim; if a link rots, the title and description should be searchable.

1. Peer-reviewed / academic

The strongest evidence. These papers measure spike frequencies against pre-2022 baselines, run stylometric classifiers, or analyze corpora.

arXiv: Delving into ChatGPT usage in academic writing through excess vocabulary — Quantifies the +6,697% rise in "delves" in 2024 PubMed abstracts vs 2020, plus +904% for "underscores" and +611% for "intricate." Estimates ≥13.5% of 2024 biomedical abstracts were processed with LLMs. Link
arXiv: Why Does ChatGPT "Delve" So Much? — Identifies the top-21 focal AI-frequency words (garnered, boasts, groundbreaking, advancements, etc.) and analyzes the RLHF mechanism behind them. Link
medRxiv: Delving into PubMed Records — terms changed after ChatGPT — Measures the term-frequency shift in PubMed abstracts pre/post-ChatGPT, confirms meticulous, pivotal, commendable as spike words. Link
arXiv: Measuring AI "Slop" in Text — Defines and operationalizes "slop" as a quantitative property of text, with measurement methodology. Link
arXiv: Towards Understanding Sycophancy in Language Models (Anthropic) — The canonical paper on RLHF-induced sycophancy. Documents how reward models train obsequiousness across Claude and other LLMs. Link
PLOS One: Distinguishing ChatGPT from human writing via Japanese stylometry — Achieves ~99.8% classifier accuracy across 7 LLMs using stylometric features. Shows all models cluster tightly while humans spread broadly. Link
Nature: Stylometric comparisons of human vs AI creative writing — Cross-corpus stylometric analysis confirming model fingerprint clustering. Link
SciRP: Hedging Devices in AI vs Human Essays — Measures hedge frequency and hedge stacking in AI vs human writing samples. Foundation for the hedge-stacking pattern. Link

2. Canonical / community-maintained references

Wikipedia: Signs of AI writing — The canonical community-maintained list. Updated regularly as models change. Explicit on the limitation: "Not all text featuring these indicators is AI-generated." Treats em dashes as contested post-GPT-5.1 (Nov 2025). Link
Wikipedia: AI slop — Definitions and broader context for the AI-slop phenomenon. Link

3. Detector vendors (methodology pages)

These document what features classifiers actually use. Often candid about the limits of their own detectors.

GPTZero: Perplexity and burstiness explained — Foundational explainer of the two metrics. Humans cluster 0.6-1.2 burstiness; LLMs 0.2-0.4. Link
GPTZero: Top 10 most common AI vocabulary — Frequency data: "play a significant role in shaping" appears 182x more in AI; "today's fast-paced world" 107x; etc. Link
Pangram Labs: Comprehensive guide to AI writing patterns — 24 core LLM patterns with examples. Documents copula avoidance, present-participle tails, fabricated case studies. Link
Pangram: Why perplexity and burstiness fail — Critique of older detection metrics; explains why prompt-engineered prose evades them. Link
Originality.ai: Most obvious ChatGPT sayings — Practitioner blacklist with frequency-based ranking. Includes knowledge-cutoff disclaimer leakage. Link
DeGPT: 50+ ChatGPT tells — Comprehensive phrase-level catalog including sycophancy openers/closers, pedagogical voice, throat-clearing. Link
Quillbot: Burstiness and perplexity — Vendor explainer with similar themes. Link

4. Practitioner literature

Writers, editors, and structured catalogs. Softer evidence individually, but agreement across sources strengthens any individual tell.

tropes.fyi: AI writing tropes directory — Comprehensive structured catalog of LLM tropes with examples. Source for many less-famous patterns (magic adverbs, invented concept labels, dead-metaphor repetition). Link
tropes.fyi: full markdown reference — Complete catalog in markdown format. Link
agentkit AI tropes (sentence-structure) — Reference list of sentence-level AI tells; source for anaphora abuse and dramatic countdown. Link
avoid-ai-writing skill (36 patterns) — Conor Bronsdon's catalog of patterns to avoid in AI prose. Link
Beutler Ink: How to spot AI writing — Practitioner guide; coined "compulsive summary" terminology. Link
Olivia Cal: 17 AI writing tells — Editorial perspective on common giveaways. Link
Hyacinth.ai: 42 phrases AI bots can't resist — Phrase-level catalog with examples. Link
GoWinston: Most common ChatGPT words — Frequency analysis of ChatGPT-favored vocabulary. Link
Embryo: AI overuse list — Practitioner blacklist with replacement suggestions. Link
Content Beta: 300+ AI words — Larger blacklist; useful for "While X, Y" opener documentation. Link
Twixify: 124+ overused words — Frequency-ranked catalog. Link
Kraabel: 200+ overused words — Extended blacklist. Link
Hanalarock: 10 ChatGPT-isms — Editor's perspective on rhetorical tics including both-sides-ism. Link
WriteWithAI: 10 dead giveaways — Self-posed rhetorical question and crafted closer documentation. Link
Hastewire: Linguistic patterns of AI writing — Sentence-level pattern analysis. Link
STRYNG: Common AI sentence structures — Top-six AI sentence-pattern analysis (false range, whether-or, present-participle tail). Link
Blake Stockton: Don't write like AI series — Detailed per-pattern series; especially good on negation reversals. Link
Blake Stockton: Red flag words — Vocabulary blacklist. Link
Augmented Educator: 10 telltale signs — Education-context AI detection. Link
aiphrasefinder: 100 common ChatGPT phrases — Phrase-frequency catalog. Link
Atlas: Top 10 AI clichés — High-frequency phrase ranking; documents "objective study aimed" 269x ratio. Link
Grammarly: Hedging language guide — Foundation for hedge-stacking and hedged-superlative patterns. Link
Hemingway App: Fix adverbs and toggle highlights — General writing guidance on adverb minimization that compounds with AI's adverb overuse. Link
GRC Health: Predictable rhetoric of AI — Documents copula avoidance with specific examples. Link
Sai Gaddam Medium: It isn't just X, it's Y — Deep analysis of the negation reversal pattern. Link
Ruben Hassid Substack: It's not X, it's Y — Companion piece on the same pattern. Link
Storylab.ai: Blog title generator analysis — Documents the "X: A Comprehensive Guide" title pattern. Link
Hunting the Muse: How to tell if writing is AI — Practitioner guide on title and structural patterns. Link

5. Viral takedowns / popular press / commentary

Cultural-moment documentation. Useful for understanding which tells went viral and when (which informs the sanding-off problem).

Rolling Stone: ChatGPT em dash giveaway — The viral em-dash takedown. Pre-dates GPT-5.1's opt-out. Link
TechRadar: Em dash era is over — Post-GPT-5.1 reassessment of the em dash as a tell. Important for the contested-tell calibration. Link
LitHub: Handy guide to spotting AI — Editor's catalog including servile-positivity and tone-uplift patterns. Link
Scientific American: ChatGPT and Gemini have unique writing styles — Source for model-fingerprint analysis (GPT vs Claude vs Gemini). Link
Type.ai: Claude vs GPT comparison — Practitioner-level comparison of model voices. Link
The Conversation: AI's "it's not X, it's Y" stylistic negation — Academic-flavored analysis of the negation pattern with cognitive-psych grounding. Link
The Decoder: Reddit users compile ChatGPT phrase list — Coverage of the community-sourced phrase blacklist. Link
The Ignorance Field Guide to AI Slop — Cultural commentary on the slop phenomenon. Link
Influence Intelligence: AI vocabulary of the internet — Documents how AI-favored vocabulary spreads through the web. Link
Dead Language Society: Why ChatGPT writes like that — Colin Gorrie's rhetorical analysis; foundation for tricolon-abuse documentation. Link
LessWrong: Why do LLMs say "It's not X, it's Y" — Technical analysis of the RLHF mechanism. Link
LessWrong: Demands are all you need — Prompt-imperativeness research relevant to hedge stacking. Link
Sean Goedecke: Sycophancy as the first LLM dark pattern — Engineer's perspective on RLHF-induced sycophancy. Link
The Batch: OpenAI pulls GPT-4o update after sycophancy — News coverage of the April 2025 GPT-4o sycophancy scare. Link
Cory Doctorow: Writing vs AI — Author commentary on AI writing, including em-dash defense. Link
SFU Library: Writing conclusions — Pre-AI but relevant to compulsive-summary documentation. Link
aiproductivity.ai: The negation pattern — Practitioner-level analysis of "It's not X, it's Y" as a marker. Link
trySight: SEO content generation — Documents the "Comprehensive Guide" title pattern in SEO contexts. Link

6. Comprehension axis — readability formulas (primary literature)

The eight metrics in readability-metrics.md come from this body of work. Treat formula scores as diagnostic, not decisive — every primary source notes domain-specific limits.

Flesch (1948): A new readability yardstick — Journal of Applied Psychology 32(3). The original Flesch Reading Ease formula (206.835 − 1.015 ASL − 84.6 ASW). Validated on news, business, and government writing. DOI
Kincaid, Fishburne, Rogers, Chissom (1975): Derivation of new readability formulas — Naval Technical Training, Research Branch Report 8-75. Source for the Flesch-Kincaid Grade Level formula used by US federal documents and Microsoft Word. PDF
McLaughlin (1969): SMOG grading — A new readability formula — Journal of Reading 12(8). Polysyllable-count formula stable above ~30 sentences. JSTOR
Coleman & Liau (1975): A computer readability formula — Journal of Applied Psychology 60(2). Letters-per-100-words and sentences-per-100-words formula; deliberately avoids syllable counting. DOI
Dale & Chall (1948, revised 1995): A formula for predicting readability — Educational Research Bulletin 27(1). Uses a curated 3,000-word "easy" wordlist. The 1995 revision is the canonical version. Original
DuBay (2004): The principles of readability — Comprehensive review of 200+ readability formulas, their derivation, and validation. Best single overview of the field's history and limits. PDF
Halliday & Hasan (1976): Cohesion in English — Longman. Foundation for lexical density measurement (content words ÷ total words). The original definition that automated tools approximate. Worldcat
Ure (1971): Lexical density and register differentiation — In Applications of Linguistics. Empirical study showing 40–55% lexical density for spoken/casual prose, 55–65% for written/academic. Source for our audience targets. Citation
Bormuth (1969): Cloze readability procedure — Reading Research Quarterly. Establishes the cloze test as the validation gold-standard for readability formulas. JSTOR

7. Comprehension axis — cognitive psychology (working memory, load, and processing)

These papers ground the "why" — why dense acronyms or run-on sentences cause comprehension failure, not just style preference.

Miller (1956): The magical number seven, plus or minus two — Psychological Review 63(2). The classic working-memory paper. Foundation for chunking thresholds in our F-group patterns. DOI
Cowan (2001): The magical number 4 in short-term memory — Behavioral and Brain Sciences 24(1). Modern revision of Miller's number. The 4-chunk limit drives our acronym and named-entity density caps. DOI
Baddeley (2000): The episodic buffer — A new component of working memory? — Trends in Cognitive Sciences 4(11). Explains why integrating new entities while parsing syntax is so costly. DOI
Sweller (1988): Cognitive load during problem solving — Cognitive Science 12(2). Original cognitive load theory. Distinguishes intrinsic, extraneous, and germane load — extraneous load is what bad prose imposes. DOI
Sweller, van Merriënboer, Paas (1998): Cognitive architecture and instructional design — Educational Psychology Review 10(3). Foundational for the "split-attention effect" — relevant to our forward-reference and undefined-acronym patterns. DOI
Just & Carpenter (1992): A capacity theory of comprehension — Psychological Review 99(1). Demonstrates working-memory capacity as the bottleneck for sentence comprehension. DOI
Gibson (1998): Linguistic complexity — Locality of syntactic dependencies — Cognition 68(1). The Dependency Locality Theory: long syntactic dependencies (subject-verb separation, embedded clauses) tax working memory. Drives the run-on and long-sentence patterns. DOI
Pinker (2014): The Sense of Style — Viking. Source for "curse of knowledge" framing — experts forget what was hard to learn. Drives the H-group audience-assumption patterns. Worldcat
Chase & Simon (1973): Perception in chess — Cognitive Psychology 4(1). Original chunking-by-expertise study. Why writers in their own field cannot judge a fresh reader's comprehension. DOI

8. Comprehension axis — plain-language standards and government style guides

Empirical and policy-driven thresholds. These are what real public-facing institutions enforce.

Plain Writing Act of 2010 (US Public Law 111-274) — Federal law requiring agencies to use plain language. Implementation guidance recommends 8th-grade reading level. Text
plainlanguage.gov: Federal plain language guidelines — The US federal style guide for plain writing. Active voice, short sentences, 8th-grade target. Site
GOV.UK: Content design — Style and tone — UK government style guide. Hard targets: 9-year-old reading age for most content. Strict on jargon and acronyms. Site
GOV.UK: Service Manual — Writing for users — Companion guide for service designers. Source of the "explain on first use, abbreviate on second" rule. Site
WCAG 2.1 Success Criterion 3.1.5 (Reading Level) — W3C accessibility standard. AAA-conformant content must not exceed lower-secondary reading ability (~9th grade) or provide a supplemental version. Spec
WCAG 2.1 Success Criterion 3.1.3 (Unusual Words) — AAA: provide definition mechanism for jargon, idioms, and abbreviations. Source for our undefined-acronym threshold. Spec
WCAG 2.1 Success Criterion 3.1.4 (Abbreviations) — AAA: provide expansion mechanism for abbreviations. Spec
CDC: Clear Communication Index — US Centers for Disease Control framework with 4-question scoring system for health-communication content. Strong on numeric framing (which informs our stat-bombing pattern). PDF
NIH: Clear and simple — Developing effective print materials for low-literate readers — National Institutes of Health guide; source for plain-language thresholds in healthcare audience. Archive
EU: Joint Practical Guide for the drafting of EU legislation — Codifies plain-drafting rules across 24 EU languages. Notable for explicit caps on sentence length in legal text. PDF
Center for Plain Language: 2024 Federal Plain Language Report Card — Annual audit of US federal agency compliance with plain-writing rules. Data on which patterns reliably tank scores. Site
Australian Government Style Manual — Modern reference for government writing; particularly clear on the difference between "writing simply" and "writing simplistically." Site

9. Comprehension axis — web reading research and information architecture

How people actually read on screens. Drives the I-group structural patterns (no skim layer, hierarchy collapse, wall-of-text).

Nielsen (1997): How users read on the web — Nielsen Norman Group. The original eye-tracking finding: 79% of users scan, only 16% read word-for-word. Article
Nielsen (2006): F-shaped pattern for reading web content — NN/g. The classic eye-tracking heatmap study showing horizontal-then-vertical scanning. Foundation for the "no skim layer" pattern. Article
Pernice (2017): F-pattern reading on the web debunked? — NN/g follow-up confirming the F-pattern still applies but is interrupted by good visual hierarchy (headings, lists, emphasis). Article
Nielsen (2008): How little do users read? — Empirical study showing readers absorb only ~20% of text on a typical page. Justifies our short-paragraph and skim-layer thresholds. Article
Loranger & Nielsen (2013): Plain language is for everyone, even experts — NN/g. Demonstrates that even subject-matter experts prefer simpler prose for fast scanning. Counters the "my audience can handle complexity" defense. Article
Krug (2014): Don't Make Me Think (3rd ed.) — New Riders. Canonical book on web usability. Foundation for "scannability over readability" framing. Worldcat
Wilkinson, Payne, Heinz, Stoops (2011): Online reading vs paper reading — Journal of Computing in Higher Education 23(2). Shows comprehension drops 20–30% when the same text is read on screen vs paper. DOI
Mangen, Walgermo, Brønnick (2013): Reading linear text on paper vs computer screen — International Journal of Educational Research 58. Key finding: hypertext and on-screen formats fragment reading. DOI
GOV.UK Design System: Typography — Codifies a 75-character measure (line length) cap based on legibility research. Relevant to wall-of-text and paragraph-length patterns. Site

10. Comprehension axis — writing-craft canon and editorial practice

Writers and editors are the practical authority on what creates and removes friction. These overlap with section 4 (practitioner AI-slop sources) but focus specifically on comprehension and clarity.

Strunk & White (1959): The Elements of Style — Macmillan. The classic. Source for "omit needless words," "use the active voice," and the active-vs-passive heuristics. Worldcat
Williams & Bizup (2017): Style — Lessons in Clarity and Grace (12th ed.) — Pearson. The most rigorous modern style guide. Source for nominalization detection ("zombie nouns") and old-information-first principle. Worldcat
Pinker (2014): Why academic writing stinks — Chronicle of Higher Education. Companion essay to The Sense of Style. Key on "curse of knowledge" and metadiscourse. Article
Sword (2012): Stylish Academic Writing — Harvard University Press. Empirical study of 1,000 published academic articles. Documents which patterns predict reader engagement. Worldcat
Sword: The Writer's Diet — Online tool implementing Sword's friction-detection heuristics: be-verbs, abstract nouns, prepositions, "it/this/that," waste words. Algorithmic precursor to several of our J-group patterns. Site
Hemingway Editor: Method documentation — Web tool that flags adverbs, passive voice, complex sentences, hard-to-read sentences. Direct ancestor of our long-sentence and decorative-qualifier patterns. Site
Garner (2016): Garner's Modern English Usage (4th ed.) — Oxford. Exhaustive reference on usage; particularly strong on bureaucratic prose. Source for several telegraphic-colon and parallelism-failure patterns. Worldcat
Zinsser (2006): On Writing Well (30th ann. ed.) — Harper. Practical canon. Foundation for the "every sentence should pull its own weight" heuristic embedded in our density scoring. Worldcat
Lanham (2006): Revising Prose (5th ed.) — Pearson. Introduces the "Paramedic Method" for sentence revision: circle prepositions, find "is," find action, kick the doer back to the front. Algorithmic source for several J-group patterns. Worldcat
Provost (1985): 100 Ways to Improve Your Writing — Mentor. Pithy practitioner reference. Source for paragraph-rhythm and sentence-length-variance heuristics that complement our burstiness measure. Worldcat
Klare (1976): A second look at the validity of readability formulas — Journal of Reading Behavior 8(2). Critical review showing that formulas predict difficulty but not comprehension; backs our "metrics calibrate, patterns score" architecture. DOI
Bailin & Grafstein (2016): Readability — Text and Context — Palgrave Macmillan. Modern critique of readability formulas; argues for context-sensitive measurement. Strongest argument for our audience-calibration approach. Worldcat
Chartered Institute of Editing and Proofreading: Editorial Style Guide — UK professional standards body. Practical authority on what editors actually correct. Site

How to use this file

The reference files cite specific sources for specific claims. Each axis has its own catalog and citation conventions:

AI-Slop axis: patterns.md, vocabulary.md, formatting-tells.md cite sections 1–5 above.
Comprehension axis: comprehension.md and readability-metrics.md cite sections 6–10 above.
calibration.md: cross-references both axes.

When you see a source name in a reference file, this file is the master index — find it here for full title and link.

Three pillars of evidence:

Peer-reviewed linguistics (Section 1) — strongest. These papers measure spike frequencies against pre-2022 baselines, run stylometric classifiers, or analyze corpora.
Wikipedia: Signs of AI writing (Section 2) — the canonical community-maintained list. Updated regularly as models change. Treats em dashes as contested post-GPT-5.1.
Practitioner / vendor / popular press (Sections 3-5) — softer evidence individually, but agreement across sources strengthens any individual tell.

When sources contradict each other (em dashes are the main case — peer-reviewed work flags them; popular press post-Nov 2025 reports the era ending), calibration.md documents the contestation rather than picking a side.

Coverage gaps to flag:

Most sources are English-language and US/UK-oriented. Multilingual AI tells likely have different signatures.
Older sources (2023, early 2024) document the v1 vocabulary list (delve, tapestry). Sophisticated authors prompt-engineer around these. Newer sources (late 2024, 2025) document the post-sanding tells (copula avoidance, present-participle tails, hedge stacking).
Detection itself is an evolving target. The catalog will need updating as models, RLHF objectives, and writer-side prompt engineering co-evolve.