
Why Arguments Persuade — and How to Measure It
Introduction
Too often—among both theists and skeptics—an exchange ends with the shrug, “I don’t find that convincing.” Then the conversation stalls. Rarely does the speaker say what isn’t convincing or why. That phrase is universally available: anyone can deploy it against any argument, strong or weak, without touching the substance. This paper is my attempt to replace that vague retreat with a clear, reusable way to say where persuasion succeeds or fails—and what, concretely, might improve it.
What follows is not a truth detector and not a substitute for an argument’s internal strength. The truth and validity of an argument still live in its premises, evidences, and logic. The Comparative Convincingness Scale (CCS) does something different: it provides a measurement tool for persuasiveness. Applied carefully, CCS tells us whether a given argument or proposition, as presented, is convincing—and it tells us why by breaking persuasion into ten anchored criteria. You can disagree with an argument’s conclusion and still acknowledge that it communicates clearly, enjoys historical staying power, or maps well onto lived experience. Conversely, you can admire its rigor and still see that it lands poorly with non-specialists.
I designed CCS so that anyone—believer, skeptic, or undecided—can use the same rubric without adopting the other side’s worldview. The anchors, weights, and expectations are symmetric for theistic and atheistic arguments. That symmetry matters because it turns “I’m not convinced” into an inspectable claim: Is the problem the premises? the breadth of what the argument explains? the presentation? its distance from ordinary experience? CCS turns those questions into scored dimensions you can compare, defend, and improve.
Before we begin, a quick word on positioning. CCS stands in conversation with classic approaches without duplicating them. It shares something of Pascal’s cumulative-case sensibility while making the multi-track appeal explicit and measurable.¹ It contrasts with Swinburne’s Bayesian program, which assigns probabilities to truth; CCS brackets truth and quantifies features that make a claim seem worth believing to rational agents.² And it engages Oppy’s critiques by applying one transparent rubric to both sides so points of strength and weakness become visible rather than impressionistic.³
How CCS Works
- Ten criteria, each scored 1–10 using anchored descriptions (one cannot score any criteria 0 or 11+.)
- Core criteria (×2 weight): Logical Soundness; Truth of Premises; Explanatory Power; Explanatory Scope.
- Secondary (×1 weight): Simplicity; Historical/Theoretical Usage; Fruitfulness; Experiential Resonance; Transformative Power; Communicability.
- Composite: CCS total=
2×(Soundness + Premises + Power + Scope)+(Simplicity+Historical+Fruitfulness+Resonance+Transformative+Communicability) Max = 140. I also report an equal-weight version as a sensitivity check. - Emotion’s role: Included but bounded (under Experiential Resonance and Transformative Power). Manipulative appeals to emotion count against Soundness; hype that obscures content counts against Communicability.
- Profiles, not just totals: The per-criterion breakdown is the real diagnostic.
The Three Running Examples (used in every section)
To keep comparisons clean, I will score the same three exemplars throughout:
- N1 (Neutral / shared logic): Socrates syllogism
P1. All humans are mortal.
P2. Socrates is human.
C. Therefore, Socrates is mortal. - T1 (Theistic): Kalam Cosmological Argument (deductive form)
P1. Whatever begins to exist has a cause.
P2. The universe began to exist.
C. Therefore, the universe has a cause.⁴ - A1 (Atheistic): Problem of Evil (gratuitous evil form)
P1. If an omnipotent, omniscient, omnibenevolent God exists, then gratuitous evil does not exist.
P2. Gratuitous evil exists.
C. Therefore, such a God does not exist.⁵
Criterion 1 — Logical Soundness (core, ×2)
Question: Is the argument’s form valid (deduction) or appropriately strong (induction/abduction) without a formal fallacy? Logical soundness in CCS is strictly about structure, not content. You may deny a premise or dislike the conclusion; neither changes the form score.
Anchors (1 / 5 / 10).
1 = clear formal error (affirming the consequent, undistributed middle, equivocation on a load-bearing term).
5 = largely valid/repairable but with a suppressed bridge or structural ambiguity that must be made explicit.
10 = cleanly valid (or clearly specified inductive/abductive scheme), no detected formal fallacy.
Worked scoring
- N1 (Socrates) — 10/10. A textbook categorical syllogism; no fallacy.
- T1 (Kalam) — 10/10. As stated, this is modus ponens:
- Begins-to-exist ⇒ has a cause; 2) The universe begins-to-exist; therefore 3) The universe has a cause.
Disputes about the meaning of cause or whether the universe began belong under Truth of Premises or Communicability, not here.
- Begins-to-exist ⇒ has a cause; 2) The universe begins-to-exist; therefore 3) The universe has a cause.
- A1 (Problem of Evil) — 10/10. In this framing, the argument is modus tollens:
- If G, then ¬(gratuitous evil); 2) (gratuitous evil); therefore 3) ¬G.
The structure is valid. Debates focus on the conditional in P1 and the status of “gratuitous,” which are premise and scope questions.
- If G, then ¬(gratuitous evil); 2) (gratuitous evil); therefore 3) ¬G.
Diagnostic notes
- Separate form from content. If you find yourself arguing over “cause,” “begins,” or “gratuitous,” you’re outside Criterion 1.
- Hidden-premise check. If validity depends on an unstated bridge (“From ‘no good known to us’ ⇒ ‘no good at all’”), mark 5–8 and name the missing step.
- Equivocation scan. The same word used in two senses across premises (e.g., cause as sustaining vs. originating) lowers the score.
- Inductive arguments. If an argument is explicitly inductive, assess the inference pattern (proper use of likelihoods), not the premises’ truth.
How to raise the score
- Define the key terms once at the top.
- Add any bridge premise needed for entailment or strong support.
- State the inference type (deductive/inductive/abductive) up front.
One-sentence rationales (examples)
- N1: “Categorical syllogism; valid form; no fallacy.”
- T1: “Modus ponens on ‘begins ⇒ cause’; validity intact regardless of premise disputes.”
- A1: “Modus tollens on ‘G ⇒ ¬GE’; validity clear; disputes target conditional and ‘gratuitous.’”
Criterion 2 — Truth of Premises (core, ×2)
Question: Are the premises broadly plausible and evidentially grounded for competent, fair-minded interlocutors? This is not omniscience; it is public reasonableness. A premise can be controversial and still earn a mid-to-high score if it carries non-trivial support from serious sources.
Anchors (1 / 5 / 10).
1 = premises widely implausible or contrary to established evidence.
5 = mixed/contestable but defensible with substantive support.
10 = broadly plausible, with strong evidential backing and minimal serious counter-evidence.
Worked scoring
- N1 (Socrates) — 9–10/10. “All humans are mortal” and “Socrates is human” are as well supported as ordinary general claims get. Absent hyper-skeptical scenarios, this is a 10.
- T1 (Kalam) — typical range 6–8/10.
- P1 (Whatever begins to exist has a cause.) is powerfully supported by experience and by the rational expectation that new entities have sufficient conditions. Skeptics sometimes cite quantum events; defenders reply that quantum descriptions presuppose lawlike structures and boundary conditions rather than “uncaused” in a metaphysically robust sense. Across competent audiences, 6–8 is common.
- P2 (The universe began to exist.) draws on philosophical arguments against an actual infinite regress of temporal events and on cosmological evidence of a temporal origin. Critics offer past-eternal or agnostic models; defenders reply with arguments about the nature of time and the entropy arrow. Here too, 5–8 is typical.
- Overall: 6–8 is defensible, depending on how the premises are presented and sourced.⁴
- A1 (Problem of Evil) — typical range 6–8/10.
- P1 (If a God who is omnipotent, omniscient, omnibenevolent exists, then gratuitous evil does not exist.) is strong on a straightforward reading; theists respond that there may be unknown goods or defeaters (free will, soul-making, divine reasons beyond human ken).⁵
- P2 (Gratuitous evil exists.) is where the heaviest lift occurs, because inferring “gratuitous” from our limited vantage point is hard. Still, many judge some horrors to be plausibly gratuitous. Practitioners who are cautious tend toward 6–7; those persuaded by vivid cases lean 7–8.
- Overall: 6–7 is common among mixed raters.
How to raise the score (general advice)
- Define key terms (e.g., cause, begins to exist, gratuitous).
- Cite representative sources (not just blogs) so readers can check the evidence.
- Acknowledge live objections and explain why the premise remains plausible in light of them.
- Avoid over-claiming. “Broadly plausible” is a respectable target.
Rater sentence templates
- “P1 is broadly plausible given X; P2 has contested but substantive support from Y; net = 7/10.”
- “Premise hinges on a controverted definition (‘gratuitous’); plausible but under-argued here; 6/10.”
Criterion 3 — Explanatory Power (core, ×2)
Question: How tight and coherent is the explanation of the target facts? Explanatory power is about fit: how well the hypothesis makes sense of the data without ad hoc patches.
Anchors (1 / 5 / 10).
1 = weak fit; explanation feels manufactured or post hoc.
5 = decent coherence with noticeable gaps or auxiliaries.
10 = tight, elegant account; few moving parts; minimal ad hocery.
Worked scoring
- N1 (Socrates) — 9–10/10. The conclusion follows directly from the generalization; the “explanation” of Socrates’ mortality is built into what “human” entails. Near-perfect fit.
- T1 (Kalam) — 7–9/10 as stated; higher with careful specification.
The conclusion therefore, the universe has a cause explains why there is a temporally finite cosmos rather than nothing or an eternal past. Power increases if the cause is characterized minimally (e.g., transcending space-time, causally potent, immaterial if space-time is emergent). Critics object that cause may not be the right category for a boundary of space-time; defenders argue that causal/explanatory dependence is still the right notion. A careful presentation earns 7–9. - A1 (Problem of Evil) — 6–8/10 in the strict logical form; evidential forms vary.
The hypothesis no God of type G straightforwardly “explains” the presence of gratuitous evils: if there is no perfectly good, all-powerful, all-knowing being, there is no guarantee such evils will be prevented. Power drops if P1’s conditional is too strong or if plausible theistic defeaters exist (hidden goods, constraints of creaturely freedom). For the logical version, 6–8 is typical; evidential versions often trade some logical tightness for broader scope.
How to raise the score
- Name the explananda explicitly (e.g., finite cosmic past, horrendous suffering of type H).
- Show how your hypothesis fits those explananda better (or more simply) than competitors.
- Minimize auxiliary “patches,” or motivate them independently.
Common confusions
Explanatory power (depth of fit) vs. scope (breadth of phenomena covered). An argument can fit one datum beautifully (high power) yet ignore other relevant data (low scope).
Criterion 4 — Explanatory Scope (core, ×2)
Question: How broad is the range of relevant phenomena the argument accounts for? If power is depth, scope is breadth.
Anchors (1 / 5 / 10).
1 = addresses a single narrow datum.
5 = covers two or more salient domains.
10 = covers most major domains implicated by the claim set.
Worked scoring
- N1 (Socrates) — 2–3/10. Scope is intentionally narrow: one person, one predicate. CCS makes that explicit; it’s not a flaw, just a limitation.
- T1 (Kalam) — 6–8/10 in minimal form; potentially higher if responsibly integrated.
In its lean version, Kalam addresses cosmic origination—a large domain by itself. If you connect Kalam to other theistic indicia (e.g., fine-tuning, moral realism, rational order, consciousness) you expand scope, but CCS scores the argument as stated. Minimal form: 6–7; with modest, well-motivated bridges: 7–8. - A1 (Problem of Evil) — 6–9/10 depending on breadth of “evil.”
If gratuitous evil is narrowly construed (a handful of hard cases), scope sits mid-range. If it is taken to include natural evil, moral atrocity, animal suffering, and hiddenness-adjacent phenomena, scope widens substantially—8–9—especially if paired with a clear atheistic hypothesis about a morally indifferent universe.⁵
How to raise the score
- List the domains your argument naturally touches (cosmology, morality, consciousness, suffering, rational order).
- Avoid over-reach. Scope rises when the argument naturally bears on a domain, not when we bolt on unrelated claims.
Trade-offs
Wide scope with weak power isn’t a win. A tight, narrower argument can score higher overall once weighting is applied.
Criterion 5 — Simplicity / Parsimony (secondary, ×1)
Question: Is the argument economical in assumptions, entities, and ad hoc moves? Simplicity is not minimalism at all costs; it is no gratuitous complication.
Anchors (1 / 5 / 10).
1 = baroque; many unsupported assumptions; frequent ad hoc fixes.
5 = reasonably lean with a few discretionary choices.
10 = spare formulation; avoids unnecessary entities and patches.
Worked scoring
- N1 (Socrates) — 10/10. Two premises, one conclusion, no auxiliaries: a model of economy.
- T1 (Kalam) — 7–8/10. The basic syllogism is lean. Simplicity dips if we pile on speculative auxiliaries too early (e.g., long detours about divine attributes before the minimal conclusion) or if we smuggle in other arguments. In minimal deductive form, Kalam is impressively spare—7–8 is justified.
- A1 (Problem of Evil) — 6–8/10. The skeleton is simple, but complexity creeps in via the term gratuitous, which needs careful definition and defense. If presented cleanly with a standard definition (“no morally sufficient reason even for an omniscient being”), 7–8 is fair; if the definition carries the load without independent support, it slips toward 6.
How to raise the score
- Strip to the canonical core: present premises and conclusion cleanly.
- Define the one hard term once and use it consistently.
- Resist the urge to solve every objection inside the premise list; put defenses in the commentary, not in the syllogism.
Criterion 6 — Historical/Theoretical Usage (secondary, ×1)
Question. Has the argument enjoyed sustained, serious use—teaching, debate, refinement—across time and traditions?
Anchors (1 / 5 / 10).
1 = marginal; little serious use beyond niche forums.
5 = moderate scholarly use with recognizable defenders and critics.
10 = sustained, cross-tradition use over eras; taught, debated, refined.
Worked scoring (same exemplars as Part 1)
- N1 (Socrates syllogism) — 9–10/10. Core of Aristotelian logic; centuries of instructional use.
- T1 (Kalam) — 8–9/10. Deep kalām lineage; large modern analytic literature, defenses and objections; widely taught.⁶
- A1 (Problem of Evil) — 9–10/10. Central from antiquity through Hume to contemporary logical/evidential debates (including horrendous evils, animal suffering).⁷ ⁸
Rater prompts. Cite representative texts across eras; name recognized refinements or replies (e.g., skeptical theism; free-will/soul-making theodicies; cosmology/causation debates).⁹ ¹⁰
Criterion 7 — Fruitfulness (secondary, ×1)
Question. Does the argument generate further inquiry—new distinctions, research programs, testable lines?
Anchors (1 / 5 / 10).
1 = tends to shut down inquiry; dead end.
5 = prompts some follow-ups or modest research.
10 = consistently spurs fertile debate, models, or testable questions.
Worked scoring
- N1 — 6–7/10. Highly didactic; baseline rather than research-generative.
- T1 — 8–9/10. Spurs work on cosmology (beginning/entropy/quantum), metaphysics (causation, modality), PoR (attributes of a first cause).⁶
- A1 — 9/10. Drives free-will defenses, soul-making, skeptical theism, horrendous evils, animal suffering, divine hiddenness, Bayesian/evidential forms.⁷ ⁸ ⁹ ¹¹
Rater prompts. List two concrete spin-offs per argument; check that both defenders and critics found new work to do.
Criterion 8 — Experiential Resonance (secondary, ×1)
Question. How well does the argument map onto ordinary human experience (recognizable to non-specialists)?
Anchors (1 / 5 / 10).
1 = feels disconnected from ordinary experience.
5 = connects to one widely shared domain.
10 = resonates across multiple everyday domains.
Worked scoring
- N1 — 6–7/10. Pedagogically resonant (mortality, category membership).
- T1 — 7–8/10. Everyday causal expectations and a publicly known cosmic beginning; depends on crisp definitions of cause, begin, nothing.
- A1 — 9–10/10. Suffering and moral horror are universal touchpoints; immediate recognition when framed carefully.⁷ ⁸
Guardrails. Pathos can’t replace reasons (hurts Soundness); if emotion obscures content, Communicability drops. Optional: report Affective Subscore = Resonance + Transformative (max 20).
Criterion 9 — Transformative Power (secondary, ×1)
Question. Is there credible evidence the argument durably changes belief or practice?
Anchors (1 / 5 / 10).
1 = little/no change attributable to the argument.
5 = mixed/moderate evidence.
10 = strong, cross-context evidence of durable change partly attributable to the argument.
Worked scoring
- N1 — 5–6/10. Improves reasoning literacy; less often shifts contested beliefs.
- T1 — 6–8/10. Often a gateway that legitimizes theism as rational; course/event data sometimes show durable attitude changes.⁶
- A1 — 7–9/10. Prominent in deconversion narratives and in reshaping theistic views; score reflects evidence quality (aggregates > anecdotes).⁷ ⁸ ¹¹
Method notes. Prefer pre/post measures; attribute modestly (“partly attributable”); note confounds; score conservatively when data are thin.
Criterion 10 — Communicability / Accessibility (secondary, ×1)
Question. Can the argument be accurately conveyed to non-specialists without distortion in ≈5 minutes?
Anchors (1 / 5 / 10).
1 = needs specialist training; loses meaning when simplified.
5 = explainable in one class period to college-level non-specialists.
10 = teachable to lay audiences in ≤5 minutes with fidelity.
Worked scoring
- N1 — 10/10. Logic-101 exemplar.
- T1 — 8–9/10. Crisp skeleton; define begins to exist, cause, nothing without speculative detours; one diagram helps.
- A1 — 8–9/10. Intuitive core; gloss gratuitous succinctly (“no morally sufficient reason—even for an omniscient being”) and give one careful example.⁷ ⁸
Five-minute discipline. Canonical form (≤4 lines) → define one hard term → one apt example → one best objection & reply → single-sentence upshot.
Illustrative CCS Profile — N1 = Socrates Syllogism (Neutral); T1 = Kalam (Theistic); A1 = Problem of Evil (Atheistic).
(Scores are example values consistent with Part 1 & 2 ranges; swap in your final ratings as needed.)
| # | Criterion | Weight | N1 | T1 | A1 |
|---|---|---|---|---|---|
| 1 | Logical Soundness | ×2 | 10 | 10 | 10 |
| 2 | Truth of Premises | ×2 | 10 | 7 | 7 |
| 3 | Explanatory Power | ×2 | 9 | 8 | 7 |
| 4 | Explanatory Scope | ×2 | 3 | 7 | 8 |
| 5 | Simplicity / Parsimony | ×1 | 10 | 8 | 7 |
| 6 | Historical / Theoretical Usage | ×1 | 10 | 9 | 10 |
| 7 | Fruitfulness | ×1 | 7 | 9 | 9 |
| 8 | Experiential Resonance | ×1 | 6 | 7 | 9 |
| 9 | Transformative Power | ×1 | 6 | 7 | 8 |
| 10 | Communicability / Accessibility | ×1 | 10 | 9 | 9 |
| Core weighted sum (1–4 ×2) | 64 | 64 | 64 | ||
| Secondary sum (5–10 ×1) | 49 | 49 | 52 | ||
| Total (max 140) | 113 | 113 | 116 | ||
| Equal-weight total (max 100) | 81 | 81 | 84 | ||
Notes for readers:
- Core criteria (1–4) are weighted ×2; secondary (5–10) are ×1. Publish the per-criterion profile alongside the total so disagreements are specific and actionable.
- For robustness, also report the equal-weight total (last row) and note any material rank changes under alternative weights.
Conclusion
“I don’t find that convincing” will never disappear from our conversations—but it doesn’t have to be a dead end. The Comparative Convincingness Scale (CCS) turns that vague retreat into a set of inspectable claims. Across these two parts, I’ve shown how persuasion can be measured without confusing it with truth: four core criteria—Logical Soundness, Truth of Premises, Explanatory Power, Explanatory Scope—carry double weight to reflect rational rigor, while five practice-facing criteria—Simplicity, Historical/Theoretical Usage, Fruitfulness, Experiential Resonance, Transformative Power, and Communicability—round out how arguments actually live in human communities. The result is a profile, not a verdict: a ten-dimensional picture of where an argument persuades and why.
This matters because real disagreements rarely hinge on a single flaw. An argument may be beautifully structured yet thin in scope; emotionally resonant yet premise-fragile; historically durable yet hard to communicate in five minutes. CCS forces those trade-offs into the open. It localizes failure modes (e.g., “Premises: 5/10; needs clearer evidence for P2”) and names improvement paths (e.g., “Raise Communicability by defining one term once; add one canonical example”). In classrooms, CCS teaches students how to diagnose arguments rather than merely declare allegiances. In public dialogue, it invites adversaries to publish side-by-side rationales per criterion, making genuine progress possible even when conclusions remain contested.
Equally important is what CCS does not do. It does not prove an argument true or false, nor does it replace careful philosophical or scientific work. It is a measurement tool for persuasiveness—a way to quantify how well an argument, as presented, is likely to convince rational agents. Because persuasiveness can be audience-dependent, CCS builds in safeguards: symmetric application to theistic and atheistic arguments; anchored 1–10 scales to reduce impressionism; weighted and equal-weight composites to check robustness; and, when multiple raters are used, reliability checks and uncertainty intervals so readers can see what is stable and what is not.
There are limits. Anchors cannot eliminate judgment; context and background knowledge will always matter. That is why I recommend publishing per-criterion scores and one-sentence rationales, not just totals; preregistering the argument set and weights when possible; running sensitivity analyses (equal-weight and ±25% shifts); and inviting adversarial collaboration so that different perspectives are not just acknowledged but built into the data. Where appropriate, reporting an optional Affective Subscore (Resonance + Transformative) can also make the role of emotion transparent without letting it dominate the composite.
If you want a practical way forward, here it is: pick one argument you care about, score it with CCS, publish the profile, and invite a critic to co-rate it. Compare the deltas criterion by criterion, propose concrete revisions, and—if the numbers change—say why. That simple loop turns disagreement into an engine for improvement. And if you think the weights should differ, re-weight and report; CCS is designed to be tested, not taken on trust.
My hope is that readers on any side will find this scale helpful, fair, and usable. By making persuasion measurable and disagreements diagnosable, CCS gives us a common language for improving arguments we already value—and for understanding those we don’t. Use the scale. Publish your profiles. Argue well.
Footnotes
- Blaise Pascal, Pensées, ed. and trans. Roger Ariew (Indianapolis: Hackett, 2005), §277.
- Richard Swinburne, The Existence of God, 2nd ed. (Oxford: Clarendon Press, 2004).
- Graham Oppy, Arguing about Gods (Cambridge: Cambridge University Press, 2006).
- William Lane Craig, Reasonable Faith: Christian Truth and Apologetics, 3rd ed. (Wheaton, IL: Crossway, 2008), chap. 3.
- J. L. Mackie, The Miracle of Theism: Arguments for and against the Existence of God (Oxford: Clarendon Press, 1982), chap. 9; Alvin Plantinga, God, Freedom, and Evil (Grand Rapids, MI: Eerdmans, 1974).
- William Lane Craig, Reasonable Faith: Christian Truth and Apologetics, 3rd ed. (Wheaton, IL: Crossway, 2008), chap. 3.
- William L. Rowe, “The Problem of Evil and Some Varieties of Atheism,” American Philosophical Quarterly 16, no. 4 (1979): 335–41; Paul Draper, “Pain and Pleasure: An Evidential Problem for Theism,” Noûs 23, no. 3 (1989): 331–50; J. L. Mackie, The Miracle of Theism, chap. 9.
- Alvin Plantinga, God, Freedom, and Evil (Grand Rapids, MI: Eerdmans, 1974); Stephen J. Wykstra, “The Humean Obstacle to Evidential Arguments from Suffering: On Avoiding the Evils of ‘Appearance’,” International Journal for Philosophy of Religion 16, no. 2 (1984): 73–93.
- Jordan Howard Sobel, Logic and Theism: Arguments For and Against Beliefs in God (Cambridge: Cambridge University Press, 2004).
- Graham Oppy, Arguing about Gods (Cambridge: Cambridge University Press, 2006).
- John Hick, Evil and the God of Love, rev. ed. (London: Macmillan, 1977); Marilyn McCord Adams, Horrendous Evils and the Goodness of God (Ithaca, NY: Cornell University Press, 1999).
- Thomas A. Shrout and Joseph L. Fleiss, “Intraclass Correlations: Uses in Assessing Rater Reliability,” Psychological Bulletin 86, no. 2 (1979): 420–28.
- Klaus Krippendorff, Content Analysis: An Introduction to Its Methodology, 3rd ed. (Thousand Oaks, CA: Sage, 2013).
- Lee J. Cronbach, “Coefficient Alpha and the Internal Structure of Tests,” Psychometrika 16, no. 3 (1951): 297–334.
- Bradley Efron and Robert J. Tibshirani, An Introduction to the Bootstrap (New York: Chapman & Hall, 1993).

Leave a comment