slop-cop/SKILL.mdSKILL.md
Entry point for the agent. Modes (audit vs fix), when to skip, the workflow, and the top-20 lethal items.
name: ai-slop-detector description: Universal prose audit. Scores writing on TWO axes — AI-Slop (does this read like AI wrote it?) and Comprehension (can a fresh reader follow this?). Use whenever the user wants to audit, critique, score, or fix prose. Triggers on "audit this", "review this", "is this AI", "is this readable", "does this sound like AI", "humanize this", "make this less AI", "AI slop check", "score this", "detect AI writing", "slop check", "de-slop this", "is this readable", "would a fresh reader follow this", "comprehension check". Also use as a final pre-delivery pass inside other writing skills (cold-email, copywriting, sales-enablement, ad-creative, email-sequence, mahmouds-seo-writer, mahmouds-reddit-strategist, mahmouds-writing-voice). Catches 45 AI-slop patterns + ~150 vocab tells + ~33 formatting tells + 35 comprehension patterns + 8 readability metrics, with density-based scoring on both axes, audience calibration, model fingerprinting, and dual-verdict output.
slop-cop
A universal prose audit with two parallel axes. Built on ~135 published sources spanning peer-reviewed linguistics, AI-detector vendor methodology, plain-language style guides (plainlanguage.gov, GOV.UK, Microsoft, Google), cognitive-load research (Miller, Sweller, Pinker), web-readability research (NN/g), and the Plain Writing Act / WCAG accessibility standards.
The one-line summary
Two axes, two verdicts. AI-Slop and Comprehension. A piece can pass one and fail the other.
- AI-Slop axis — Does this read like AI wrote it? Patterns, vocabulary, formatting, rhythm.
- Comprehension axis — Can a fresh reader follow this? Acronyms, named-entity bombing, telegraphic compression, readability, structure.
Single instances aren't a signal. Density is. Both axes use density-based scoring (per 500 words, weighted by severity) with the same verdict tiers (PASS / LOW / MEDIUM / HIGH / CRITICAL). The audit reports both and combines them into a single recommendation based on whichever is worse.
Mode selection
Two modes. Pick one based on what the user wants.
| Signal | Mode |
|---|---|
| "audit this", "review", "critique", "is this AI", "is this readable", "score this", "check for slop" | Audit |
| "polish", "edit", "rewrite", "humanize", "make this less AI", "fix this", "de-slop", "clean up" | Audit, then revise |
| User shares a draft and asks for thoughts | Audit (default) |
| Another writing skill is wrapping up and about to deliver prose | Audit pass before delivery |
If ambiguous, ask one short question: "Audit only, or do you want a revised version?"
When to skip
This skill governs prose meant for human readers. Skip it for:
- Code, code comments, commit messages, or PR descriptions
- Technical API documentation or reference material (different audience target — see calibration §10)
- Raw data, structured outputs (JSON, YAML, CSV)
- Direct quotations from other people that should preserve their voice
- Instructions to other agents or skills (system prompts, agent briefs)
- Intentionally formal or legal documents (different audience target)
- Dialogue under another character's voice in fiction (apply voice-aware judgment)
The audit workflow
Five steps. The scanner does the mechanical work on both axes; reading does the qualitative work; calibration converts findings into two verdicts plus a combined recommendation.
Step 1 — Run the scanner
python3 scripts/scan.py /path/to/draft.md
echo "draft text..." | python3 scripts/scan.py
Flags:
# Compact one-screen output (for embedding in other skills):
python3 scripts/scan.py --quick draft.md
# Structured JSON for programmatic use:
python3 scripts/scan.py --json draft.md
# Set audience for comprehension-axis calibration:
python3 scripts/scan.py --audience marketing draft.md
python3 scripts/scan.py --audience academic draft.md
python3 scripts/scan.py --audience technical draft.md
# Override AI-slop genre detection:
python3 scripts/scan.py --genre encyclopedic draft.md
# Mahmoud-mode: treat ALL em dashes as H severity:
python3 scripts/scan.py --strict-em-dash draft.md
The scanner outputs:
- Two verdicts (AI-Slop, Comprehension) with density scores
- Combined recommendation
- Per-axis violation breakdown
- Readability metrics panel (Flesch RE, FK Grade, SMOG, Coleman-Liau, Dale-Chall, lexical density, avg sentence length, passive voice %)
- Density signals (acronym density, named-entity density, numeric density per sentence)
- Burstiness, contraction ratio, model fingerprint
Step 2 — Read against both pattern catalogs
AI-Slop axis: Load references/patterns.md. Walk through the 45 patterns by group. The scanner catches the mechanically-detectable subset; the rest requires reading.
Comprehension axis: Load references/comprehension.md. Walk through the 35 patterns by group. Roughly 17 are mechanically detectable; the rest require reading. Particularly:
- Buried lede / missing thesis — does the first paragraph tell the reader the point?
- No topic sentences — does each paragraph open with its claim?
- Curse of knowledge — does the writer assume context the reader lacks?
- No concrete examples — does every abstract claim have a specific instance?
- Definition by synonym — are domain terms defined with concrete examples or just other jargon?
For each pattern (both axes), flag with quote + severity.
Step 3 — Apply calibration
Load references/calibration.md. Apply:
- Density-based scoring on both axes (§1 for AI-Slop, §9 for Comprehension)
- Genre adjustment (AI-Slop: §3) and audience calibration (Comprehension: §10)
- Compound triggers — escalate when 3+ H tells in one paragraph (slop) or any 100w window has 3+ undefined acronyms / 5+ named entities (comp)
- Cross-axis recommendation (§11) — single sentence based on whichever axis is worse
The scanner does most of this automatically. The reader applies judgment to ambiguous cases.
Step 4 — Output the audit report
Use the dual-verdict format in references/audit-report-template.md. The report has:
- Two verdicts (AI-Slop and Comprehension) in a header table
- Combined recommendation — one sentence drawn from cross-axis matrix
- Stats block — word count, audience, burstiness, model fingerprint
- AI-Slop axis section — counts, top fixes, mechanical violations, qualitative violations, calibration notes
- Comprehension axis section — counts, top fixes, readability metrics panel, density signals, mechanical violations, qualitative violations, calibration notes
- Combined top 3 fixes — pulled from both axes by impact
- Combined recommended action — what to do next
Don't soften findings on either axis. The point of the audit is to catch what the writer missed.
Step 5 — If asked, deliver the revision
If the user wanted polish/edit (not just critique), produce the full revised version. The rewrite must address violations on both axes:
- Replace AI texture: kill
delve-class verbs, em-dash clusters, sycophancy, grandiose framing - Add reader scaffolding: define acronyms inline, break up long paragraphs, add a thesis sentence, contextualize named entities, replace telegraphic colon-labels with sentences
- Hit audience-specific readability targets (FK grade in band, lexical density appropriate, sentence length variance present)
- Don't introduce new tells (re-scan after rewrite if uncertain)
The revision should read as deliverable prose for the target audience.
Quick reference: lethal tells (both axes)
If you only have time for a quick scan, look for these. Each appears in the highest-density failures.
AI-Slop (20 most lethal)
Vocabulary: delve / delves, tapestry, underscore / underscores, leverage (verb), harness
Sentence-level: "It's not X, it's Y" (negation reversal), "serves as a" / "stands as a" / "boasts" (copula avoidance), "X happened, demonstrating Y" (-ing tail), "From small startups to global enterprises" (false range), "Studies show..." with no citation
Voice: "Great question!" (opener sycophancy), "I hope this helps!" (closer sycophancy), "In today's fast-paced world..." (107x more in AI), "As a society, we must..." (royal we), "As of my last update..." (knowledge-cutoff leakage)
Structural: "In conclusion / Furthermore / Moreover" (listicle transitions), "It's worth noting that..." (throat-clearing), em dashes in clusters (3+ per 500w), bold-first bullets, "X: A Comprehensive Guide" titles
Comprehension (10 most lethal)
- Undefined acronyms — 3+ per 100 words = high cognitive load
- Named-entity bombing — 5+ unfamiliar proper nouns per 100 words
- Stat bombing — 3+ numerics in one sentence with no comparative anchor
- Telegraphic colon-labeling — Anchor case: X. Tools that win: Y. Compresses topics into label-list rather than prose
- Coined insider terms — "two-stack social management thesis" used as if known
- Long sentences — over 30 words; comprehension drops sharply
- Buried lede — main point arrives after 2+ paragraphs of setup
- Missing thesis — reader can't summarize the central claim after reading
- Wall-of-text — paragraphs over 100 words with no breathing room
- No concrete examples — every abstract claim left abstract
If a draft has 5+ AI-Slop lethal items in 500 words → AI-Slop HIGH/CRITICAL. If a draft has 3+ Comprehension lethal items in 500 words → Comprehension HIGH/CRITICAL.
Calibration principles
A short version of the rules in references/calibration.md. Read the full file before producing the calibration section of the audit.
Density formula (both axes)
density = (H × 3) + (M × 1) + (L × 0.25) per 500 words
Verdict thresholds (both axes)
- 0–2 = PASS
- 2–5 = LOW
- 5–10 = MEDIUM
- 10–18 = HIGH
- 18+ = CRITICAL
AI-Slop axis tuning
- Genre adjustments: academic prose can use "studies show" with citations; marketing tolerates more intensifiers; encyclopedic prose triggers false positives (LLMs were trained on Wikipedia); fiction respects character voice
- Contested tells: em dashes in clusters = H, alone = L (Mahmoud-mode = always H via
--strict-em-dash); "actually" survives only contrasting reality with theory - Sanded-prose alert: if famous vocabulary is clean but structural tells are heavy, flag — writer prompt-engineered around the v1 list
- Uncanny valley: if 8+ weak tells stack with burstiness below 0.5, escalate one tier
Comprehension axis tuning
- Audience calibration (calibration.md §10): the SAME prose hits different verdicts depending on audience. General web/blog targets FK grade 7–9; marketing 6–8; technical 10–12; academic 12–16. The scanner uses
--audienceto pick the threshold band. - Compound triggers escalate one tier:
- 3+ undefined acronyms in any 100-word window
- 5+ named entities in any 100-word window
- 3+ telegraphic colon-labels in one paragraph
- Any paragraph over 150 words with no subheading
- Any sentence over 40 words
- Readability metrics calibrate, patterns rule: the metrics panel (FK Grade, lexical density, etc.) is diagnostic. The verdict comes from the catalog patterns. When metrics and patterns disagree, call it out (e.g., "patterns clean but FK Grade 16 — academic-density texture without specific failures").
Cross-axis recommendation
| AI-Slop | Comprehension | Recommendation |
|---|---|---|
| PASS / LOW | PASS / LOW | Ship it. |
| MEDIUM+ | PASS / LOW | AI-Slop revision; reader-friendly already. |
| PASS / LOW | MEDIUM+ | Comprehension revision; texture is fine. |
| MEDIUM+ | MEDIUM+ | Both cleanup; often overlapping fixes. |
| HIGH+ | HIGH+ | Full rewrite. |
Reference loading guide
Read what the task needs.
| File | Read when |
|---|---|
references/patterns.md | AI-Slop qualitative pass. The 45 rhetorical/structural patterns. |
references/vocabulary.md | AI-Slop word-level tells. ~150 items in 7 categories. |
references/formatting-tells.md | AI-Slop formatting checks. ~33 items in 5 categories. |
references/comprehension.md | Comprehension qualitative pass. The 35 patterns in 5 groups. |
references/readability-metrics.md | When interpreting the readability panel. The 8 formulas + thresholds + audience targets. |
references/calibration.md | Always at the start of every audit. Density rules, genre, audience, fingerprints, contested tells, cross-axis matrix. |
references/audit-report-template.md | Always at the start. Defines the dual-verdict output format. |
references/sources.md | When challenged on a specific tell. ~135 sources backing the catalogs. |
The scripts/scan.py scanner runs both axes in one pass and produces a report that references the categories and items in these files.
A note on the detector itself
This is a shape-and-comprehension detector, not a classifier. The AI-Slop axis measures whether prose has the shape of AI writing — patterns, vocabulary, structure, rhythm. The Comprehension axis measures whether a fresh reader can follow it. Neither tells you whether AI wrote a given piece.
A skilled writer using AI as a research tool can produce prose that scores PASS on both axes. A careful prompt can produce LOW on both. A human writing fast can produce HIGH on Comprehension while PASSing AI-Slop. A polished AI marketing post can score CRITICAL on AI-Slop while readable.
Treat the AI-Slop verdict as "this prose has the shape of AI writing." Treat the Comprehension verdict as "a cold reader will struggle here." Both verdicts are actionable.
The catalog will need updating as models change. After "delve" went viral in early 2024, arXiv frequency dropped sharply. After GPT-5.1 added an em-dash opt-out (Nov 2025), em-dash density became less reliable. The skill stays useful by weighting newer patterns higher and surfacing density rather than individual hits.
When the scanner says PASS but the prose still reads wrong, trust the reading — the qualitative patterns (parallel structure, force of metaphors, voice consistency, curse-of-knowledge moments) are what humans pick up first and what regex misses last.