Comprehensionslop-cop/references/comprehension.md

Comprehension

35 patterns across 5 groups for the comprehension axis: density overload, telegraphic compression, audience assumptions, scannability, cognitive friction.

Raw on GitHub ↗·3,400 words·28 KB

Comprehension Patterns

These patterns hurt the reader, not the texture. A piece can pass the AI-slop axis (no delve, no em-dash clusters, no sycophancy) and still be unreadable: jargon-bombed, structure-less, telegraphic, or written for an audience that already knows everything.

This file is the comprehension axis of slop-cop. Where the AI-slop axis asks "did a machine write this?", the comprehension axis asks "can a fresh reader follow this?" Both axes can fail independently. The output is two parallel verdicts, not one merged score.

Most patterns here are catalogued from cognitive-load research (Miller, Sweller, Cowan), plain-language style guides (plainlanguage.gov, GOV.UK, Microsoft, Google developer style), the curse-of-knowledge literature (Pinker), web-readability eyetracking (NN/g), and the Plain Writing Act / WCAG accessibility standards. See sources.md for the full bibliography.

The 35 patterns are organized into 5 groups by failure mode:

Group F: Density overload (5) — too much per unit of text
Group G: Telegraphic compression (5) — author crammed instead of explained
Group H: Audience-assumption failures (5) — writer assumes shared context the reader lacks
Group I: Structure / scannability (12) — reader can't navigate or preview
Group J: Sentence-level cognitive friction (8) — the prose itself is hard to parse

Severity tiers

H (high) — comprehension blocker; reader can't extract meaning. Always cut.
M (medium) — comprehension friction; readers slow down or re-read. Cut unless context-justified.
L (low) — informational; a tell among others, doesn't kill the verdict alone.

Group F — Density overload

F1. Undefined acronym stacking

Pattern: Multiple acronyms used as if known, no inline definition, no glossary link.

Examples:

"12-month SRA implementation: $50M HIRO pipeline, $14M closed-won ARR."
"Use the API to POST to the SDK endpoint."
"AEO/GEO Brand Report"

Mechanism: Each undefined acronym costs a working-memory slot to "park as unknown." Three undefined acronyms in 100 words exceeds Cowan's 4-chunk limit. Reader either guesses or quits.

Fix: Define on first mention: "search request agent (SRA)". If 3+ acronyms in 100 words, restructure to spell out the most consequential ones.

Severity: H. Detection: mechanical — count uppercase 2–5 letter tokens not in a "known acronym" allowlist (USB, FAQ, URL, API for dev contexts, etc.).

Sources: Microsoft style: acronyms, Google developer style: abbreviations, Chicago Manual of Style, Miller's 7±2

F2. Named-entity bombing

Pattern: Five or more proper nouns (companies, people, products) introduced without context per 100 words.

Examples:

"Sprout/Hootsuite for company page + Taplio/AuthoredUp for individual exec profiles; Cision Trajaan AEO/GEO Brand Report."
"Refine Labs / Passetto — 12-month SRA implementation."

Mechanism: Cold reader has no schema for unknown brand names. Each unfamiliar entity competes for working-memory buffer. Reader drowns.

Fix: Pick one anchor example, contextualize it, reference others by category: "two LinkedIn-focused tools, like Taplio."

Severity: H. Detection: mechanical — capitalized-token density (excluding sentence starts), or NER if available.

Sources: NN/g on technical jargon, Miller's 7±2

F3. Stat bombing without comparative anchor

Pattern: Three or more numeric claims in a sentence (or six in a paragraph) without baseline, comparison, or source.

Examples:

"$50M pipeline, $14M ARR, 93% gap, 50% pipeline lift, 40K audience, $680K spend."
"83% of B2B buying time happens in dark social."

Mechanism: Numbers without baseline don't register. A reader can't tell if $50M is impressive without context.

Fix: For each stat: add (a) baseline ("vs. industry average of X"), (b) source ("per Gartner"), or (c) interpretation ("3× the previous quarter").

Severity: H. Detection: mechanical — count numerals + units per sentence/paragraph, flag when ratio of numerals to comparative phrases is high.

Sources: CDC Clear Communication Index — Numbers section, AHRQ on writing about numbers

F4. Wall-of-text / paragraph density

Pattern: Paragraphs over 5–6 sentences for web; lack of white space; visually intimidating.

Mechanism: Eye fatigue plus lack of natural break points. 79% of web readers scan; long paragraphs defeat scanning.

Fix: Break at every clear topic shift. Cap web paragraphs at 3 sentences / 100 words.

Severity: M. Detection: mechanical — sentences per paragraph; flag if any paragraph >5 sentences or >100 words.

Sources: NN/g: Applying writing guidelines to web pages, Vayce: ideal paragraph length for web

F5. Density-without-headings trap

Pattern: 500+ words of dense prose with no subheadings, no bolded keywords, no callouts.

Mechanism: F-pattern readers rely on subheadings to navigate. Without them, scanners bail. Three levels of headings is ideal for medium-long articles.

Fix: Insert H2 or H3 every 200–300 words. Add bold for key phrases (sparingly).

Severity: H (web). Detection: mechanical — count headings vs word count.

Sources: NN/g: Applying writing guidelines, W3C: heading hierarchy

Group G — Telegraphic compression

G1. Telegraphic colon-labeling

Pattern: A paragraph compressed by colon labels rather than sentences. "Anchor case: X. Tools that win: Y. What changed: Z."

Examples:

"Anchor case: Refine Labs / Passetto. Tools that win: dual stack. What changed in v3: added 6 named cases."

Mechanism: Forces 3+ topic shifts in one paragraph; no semantic glue between them. Labels read as headings without the visual hierarchy that headings provide.

Fix: Either convert each colon-label to its own paragraph with a topic sentence, or convert the block to a true list with sub-headings.

Severity: H. Detection: mechanical — count : (colon-space-Capital) mid-paragraph; flag at 3+ per paragraph.

Sources: ERIC on telegraphic prose, NN/g: be succinct

G2. List-pretending-to-be-prose

Pattern: A paragraph that's actually a list of parallel items joined by commas, semicolons, or + signs — would be cleaner as a bulleted list.

Examples:

"Tools that win: dual stack — Sprout/Hootsuite for company page + Taplio/AuthoredUp for individual exec profiles; Cision Trajaan AEO/GEO Brand Report."

Mechanism: Parallel discrete items don't need narrative. Prose creates artificial transitions and obscures structure.

Fix: Convert to bulleted list.

Severity: M. Detection: mechanical — paragraphs with 2+ semicolons or 3+ + separators creating sub-lists.

Sources: Writing Skills: bullets won't make case

G3. Long sentences past comprehension cliff

Pattern: Sentences over 25 words. At 43 words, comprehension drops below 10%.

Mechanism: Working memory holds ~4–7 chunks. A 30+ word sentence overflows the buffer before the reader reaches the verb or main clause.

Fix: Break at conjunctions (and, but, which) into two sentences. Cap at 20–25 words.

Severity: H. Detection: mechanical — word count per sentence; flag any over 30.

Sources: Letter Counter on sentence length, Siteimprove: long sentences over 20 words

G4. Paragraph-length sentence (run-on continuation)

Pattern: Sentence joining 4+ independent clauses with conjunctions and dashes; often comma-spliced.

Mechanism: Working memory exhausted before reader reaches the end. Different from "long sentence" — these are structurally compound, not lexically dense.

Fix: Break at every and / but / — that introduces a new clause.

Severity: H. Detection: mechanical — clause counter (commas + conjunctions per sentence).

Sources: Letter Counter

G5. Glue-word bloat

Pattern: There is, it is, what is happening is — placeholders that delay the actual subject.

Examples:

"There are many factors that influence the outcome." → "Many factors influence the outcome."
"It is important that you remember to..." → "Remember to..."

Mechanism: Empty leading phrases steal attention without delivering meaning.

Fix: Cut there is/are, it is, start with the real subject.

Severity: L. Detection: mechanical — regex on sentence starts.

Sources: Microsoft style: top 10 tips

Group H — Audience-assumption failures

H1. Coined insider terms used as known

Pattern: Author-invented or niche terminology deployed without definition.

Examples:

"two-stack social management thesis"
"1-3-5 atomization method"
"founder LinkedIn rhythm playbook"
"dark social"

Mechanism: Pinker's curse of knowledge — writer assumes reader shares jargon the writer just minted.

Fix: First mention spelled out: "the two-stack approach (one tool for the company page, one for individual executives)." Or strip the coined term and describe in plain language.

Severity: H. Detection: partial — flag multi-word noun phrases used without an article ("the", "a") where the phrase wasn't introduced earlier; reading required to confirm.

Sources: Pinker on curse of knowledge — Harvard, APS on Pinker

H2. Curse of knowledge

Pattern: Writer assumes shared jargon, intermediate steps, or mental images. Doesn't bother to define terms or spell out logic.

Mechanism: Pinker: "the failure to understand that other people don't know what we know." The single biggest cause of opaque writing.

Fix: Show the draft to someone outside the field. Mark every spot they ask "what's that?" Spell those out.

Severity: H. Detection: qualitative — requires knowing the target audience.

Sources: Pinker — Harvard, Poynter applied

H3. Definition-by-synonym (empty definition)

Pattern: Defining a term with another equally-jargony term. "Schema markup is structured data that uses microdata vocabulary."

Mechanism: Reader still can't form a mental model — the unknown just shifts.

Fix: Define with a concrete example, not a synonym chain. "Schema markup is invisible code on a webpage that tells Google 'this is a recipe' or 'this is a product.'"

Severity: M. Detection: qualitative.

Sources: Pinker — Harvard

H4. Mixed audience ambiguity

Pattern: Document sometimes assumes expert knowledge, sometimes assumes naïveté, with no signaling of which paragraph is for whom.

Fix: Pick one audience. If multiple, segment with explicit headings: "For experienced users", "Background for newcomers."

Severity: M. Detection: qualitative.

Sources: Lumen technical writing: audience

H5. Forward-reference / "we'll see later"

Pattern: Author defers explanation to a later section, leaving reader holding an unresolved question.

Examples:

"We'll cover this in section 4."
"More on this later."
"As we'll see..."

Mechanism: Each forward reference loads working memory with an "open ticket" the reader has to remember.

Fix: Define inline at first mention, or restructure so the explanation comes first.

Severity: M. Detection: mechanical — regex for "as we'll see", "more on this later", "covered below", "we'll discuss".

Sources: Pinker — Harvard

Group I — Structure / scannability

I1. Buried lede

Pattern: Most important information arrives after secondary detail. Reader wades through 2–5 paragraphs to find the point.

Mechanism: First paragraph is the most-read; deferring the thesis means most readers leave before reaching it. F-pattern eyetracking confirms upper-left dominance.

Fix: BLUF — bottom line up front. Inverted pyramid: most newsworthy first, supporting details after.

Severity: H. Detection: qualitative.

Sources: Wikipedia: BLUF, Wikipedia: inverted pyramid, NN/g: F-pattern

I2. Missing thesis / no throughline

Pattern: Reader can't summarize the central claim after reading. No single sentence carries the point.

Mechanism: Without a top-level claim, all sub-claims are isolated facts. Working memory has nothing to hang them on. Violates Minto's Pyramid Principle.

Fix: Insert one sentence within the first 100 words: "The point is X." Each section thereafter supports it.

Severity: H. Detection: qualitative.

Sources: Barbara Minto — McKinsey

I3. No topic sentence

Pattern: Paragraph has no opening sentence that telegraphs its point.

Mechanism: F-pattern readers scan first words and first sentences. Without topic sentences, scanning fails. Reader can't preview.

Fix: First sentence of each paragraph = the paragraph's claim.

Severity: H. Detection: qualitative.

Sources: Indiana writing center on topic sentences

I4. Missing transitions / no signposts

Pattern: Paragraphs jump between ideas without however, in contrast, as a result, the second factor.

Mechanism: Without transition words, the reader infers logical relationships, increasing extraneous load.

Fix: Add explicit transitions; signal direction changes.

Severity: M. Detection: heuristic — transition word density per N paragraphs (note: differs from AI-tell connector clichés like "furthermore" which are AI texture, not comprehension).

Sources: UNC writing center on transitions

I5. Hierarchy collapse

Pattern: Heading levels skip (H1 → H4) or only one level used. Long body of content with no subheadings at all.

Mechanism: Without visual hierarchy, scanners can't navigate. Skipping H2 to H4 confuses screen readers and breaks the document outline.

Fix: Use H1 once; H2 for major sections; H3 for sub-sections; never skip levels going down.

Severity: M (web). Detection: mechanical — heading-level scan.

Sources: W3C heading structure, A11Y project

I6. No concrete examples

Pattern: Claim made in the abstract with no specific instance. "Companies improved efficiency" with no example company, action, or before/after.

Mechanism: Concrete words boost understanding by ~43%; pictures plus words by ~76%. Memory recall is higher for concrete sentences.

Fix: For each abstract claim, append "for example" + a specific instance.

Severity: M. Detection: qualitative.

Sources: Wylie: concrete images in writing, Vanderbilt: show don't tell

I7. Nut-graf missing

Pattern: After the lede, no paragraph explains why the topic matters. Reader doesn't know stakes.

Fix: Add a "why this matters" paragraph within the first 200 words.

Severity: M. Detection: qualitative.

Sources: Wikipedia: inverted pyramid

I8. First sentence doesn't hook

Pattern: Opening sentence is throat-clearing, vague, or buries the question.

Mechanism: F-pattern readers consume the first 1–2 inches of headlines and the first sentence most heavily. If it doesn't deliver, they leave (55% spend <15 seconds on a page).

Fix: First sentence = clear question, claim, or stake.

Severity: H (web/marketing); M (academic). Detection: qualitative.

Sources: NN/g: F-pattern reading, Omnizant: how people read online

I9. No skim layer

Pattern: All content presented at one density level. No bolded keywords, no callouts, no summary at top.

Mechanism: 79% of web users scan, only 16% read word-by-word. Without a skim layer (bold keywords, summary, callouts), scanners get nothing.

Fix: Add TL;DR at top; bold key phrases sparingly; use callout boxes for critical points.

Severity: M (web). Detection: mechanical — count bold/strong elements vs total prose; flag if zero bolded keywords in 500+ words.

Sources: NN/g: concise, scannable, objective

I10. Old-to-new inversion

Pattern: Sentences begin with new information, ending with familiar. Williams' principle: begin with old, end with new.

Examples:

Bad: "Quantum entanglement was Einstein's chief objection. We discussed this in the previous chapter."
Better: "In the previous chapter we discussed Einstein's chief objection: quantum entanglement."

Mechanism: Working memory uses old info as scaffolding for new. Inverting the order forces re-parsing.

Fix: Start sentences with what the reader already knows; end with the new information.

Severity: M. Detection: qualitative.

Sources: Yale: coherence and old-to-new flow, Duke scientific writing

I11. Prose-pretending-to-be-list (inverse)

Pattern: Bulleted list where items are causally connected and need narrative. Bullets strip the connections.

Mechanism: Removing connective tissue forces the reader to reconstruct the logical chain.

Fix: Convert to prose with explicit "First X, which causes Y, then Z."

Severity: L. Detection: qualitative.

Sources: Writing Skills

I12. Parallelism failure in lists

Pattern: Bullet points that mix grammatical forms.

Examples:

"Optimize the page; SEO best practices; Should we A/B test?" (mixes verb / noun phrase / question)

Mechanism: Reader expects parallel structure. Mixing breaks the pattern and forces re-parsing.

Fix: Make every bullet start with the same form (verb, noun phrase, complete sentence).

Severity: M. Detection: parser-based; partial regex check (compare first POS of each bullet).

Sources: GOV.UK style guide

Group J — Sentence-level cognitive friction

J1. Passive voice excess

Pattern: Passive constructions used where active is available. "It was decided that..." vs. "The team decided..."

Mechanism: Passive hides the agent, lengthens sentences, reverses subject-verb-object expectation.

Fix: Identify the actor; rewrite with the actor as subject.

Severity: M. Detection: mechanical — regex for be + past participle; flag when ratio exceeds 10% of sentences (Yoast threshold) or 5% (Monash / Readable).

Sources: Yoast on passive voice, Readable: active voice

J2. Nominalization / zombie nouns

Pattern: Verbs converted to abstract nouns. "Make a determination" instead of "decide". "Implementation of optimization" instead of "we optimized".

Examples:

"The implementation of the strategy resulted in the realization of efficiencies." → "We implemented the strategy and became more efficient."

Mechanism: Helen Sword's zombie nouns cannibalize active verbs, hiding the agent and the action. Forces extra words and abstraction.

Fix: Find -tion / -ment / -ance / -ity nouns; convert back to verbs.

Severity: M. Detection: mechanical — suffix-based regex.

Sources: Helen Sword on zombie nouns — Skagit, Sword in NYT — LSU

J3. Abstract noun stacking

Pattern: Sequences of abstract nouns (strategy, framework, approach, methodology) with no concrete referent.

Examples:

"The framework provides a strategy for implementation of methodology approaches."

Mechanism: No mental image forms. Concrete words boost comprehension by ~43%. Memory recall is higher for concrete vs. abstract.

Fix: Replace each abstract with a concrete instance or example.

Severity: M. Detection: mechanical — flag strings of -ity / -tion / -ness / -ment in proximity.

Sources: Wylie on concrete images, Vanderbilt: show don't tell

J4. Hedge stacking (comprehension version)

Pattern: Multiple qualifiers in one claim. "It's somewhat likely that this could potentially be a mostly accurate answer."

Mechanism: Reader can't extract the claim. Different from AI-slop hedge stacking — that pattern is about texture; this is about whether the reader can determine what's actually being said.

Fix: Pick one hedge if needed; cut the rest. State the claim.

Severity: M. Detection: mechanical — count hedge words (somewhat, potentially, may, might, could, perhaps, arguably, relatively) per sentence; flag at 3+.

Sources: Jane Friedman on hedge inflation

J5. Decorative qualifiers / intensifier drift

Pattern: Very, really, quite, extremely, incredibly used non-functionally. Adds bulk without precision.

Mechanism: Each filler word steals attention without delivering meaning, increasing extraneous cognitive load (Sweller).

Fix: Cut or replace with stronger word. "Very tired" → "exhausted."

Severity: L. Detection: mechanical — regex.

Sources: Grammarbook on qualifiers

J6. Ambiguous pronoun reference

Pattern: It, this, they, that with multiple possible antecedents.

Examples:

"The CEO told the manager that he had been promoted." (Who?)

Mechanism: Reader pauses to resolve. Each ambiguity is a comprehension stutter.

Fix: Repeat the noun, or restructure so the antecedent is unambiguous.

Severity: M. Detection: parser-based; regex catches some cases (vague pronoun + 2+ candidate antecedents in prior sentence).

Sources: Swarthmore on pronoun reference

J7. Misplaced or dangling modifier

Pattern: Modifying phrase placed away from what it modifies.

Examples:

"Walking down the street, the building looked tall." (Building was walking?)

Mechanism: Reader's parsing fails; has to re-parse.

Fix: Move the modifier next to the subject; or rewrite to make the subject explicit.

Severity: M. Detection: parser-based; partial regex (sentence-initial -ing phrase + non-matching subject).

Sources: Purdue OWL on dangling modifiers

J8. Negative construction where positive available

Pattern: Don't fail to remember instead of remember. Not infrequent instead of frequent.

Mechanism: Negation requires extra processing — hold the proposition, then negate it. Strunk & White: "Put statements in positive form."

Fix: State positively.

Severity: L. Detection: mechanical — regex (don't fail to / not un- / not in-).

Sources: Strunk & White

How to use this file

During audit: Walk groups F through J in order. For each pattern, scan the draft for instances. Flag with quote + severity. The scanner catches the mechanically-detectable subset (~17 of 35); the rest require reading.

Severity guidelines:

High severity: acronym stacking, named-entity bombing, stat bombing, telegraphic colon-labeling, density-without-headings, long sentences, run-on sentences, coined terms used as known, curse of knowledge, buried lede, missing thesis, no topic sentence, first sentence doesn't hook
Medium severity: wall of text, list-pretending-to-be-prose, definition-by-synonym, mixed audience, forward-reference, missing transitions, hierarchy collapse, no concrete examples, nut-graf missing, no skim layer, old-to-new inversion, parallelism failure, passive voice excess, nominalization, abstract noun stacking, hedge stacking, ambiguous pronoun, dangling modifier
Low severity: glue-word bloat, prose-pretending-to-be-list, decorative qualifiers, negative construction

If a draft has 5+ H-severity violations within 500 words, recommend a substantial rewrite. See calibration.md for the dual-axis density formula.

What this list is not

This is not "rules for good writing." Plenty of skilled writers use long sentences, named-entity-heavy prose, or list-like compression on purpose, in contexts where the reader has the schema. The point is that prose written for a fresh reader who lacks context will lose them when these patterns stack.

For Mahmoud's specific voice or other voice-aware judgments, see mahmouds-writing-voice. This file is voice-agnostic — it asks only "can a fresh reader follow this?", not "does this sound like the author."

Overlap with the AI-slop axis (patterns.md):

Decorative adverbs / qualifiers — flagged on both axes for different reasons (texture vs cognitive load)
Hedge stacking — texture hedge (AI-slop) vs claim-extractability hedge (comprehension)
Em-dash density — AI signal AND telegraphic compression
Nominalization / abstract nouns — AI texture vs no mental model

A piece can fail one axis and pass the other. The dual verdict in audit-report-template.md makes this explicit.