From Myths to Metrics: How TERVYX Grades Online Health Claims

Author: MoneyPuzzler (Kunyeob Kim) Date: 2025-10-20 15:34 (UTC) DOI: https://doi.org/10.5281/zenodo.17365759 Project: https://github.com/moneypuzzler/tervyx

SummaryThis post explains TERVYX, a system that grades the trustworthiness of health information so that the public can use it safely, quickly, and objectively.*
Complex health/medical claims are rendered into intuitive trust labels—Gold, Silver, Bronze, Red, Black.
The grading pipeline is transparent and automatable: evidence ingestion from papers, meta-analysis, and five gate checks (Physics, Relevance, Journal Trust, Safety, Exaggeration).
The system can be used by government, healthcare, consumer apps, and AI, reducing harm from misinformation.

Start here (Korean):AI 시대, 유사의학과 허위 건강정보 검증 시스템: TERVYX protocol (Part 1) This post: TERVYX: 건강 정보 신뢰 등급 시스템 (English version below)*

1. What is TERVYX?

1.1 Overview and origin

The internet floods us daily with health claims: “This food prevents cancer,” “That routine cures insomnia,” and so on. Many are contradictory or exaggerated. TERVYX exists to solve this: it translates a claim’s level of evidence and risk into a tiered trust label that anyone can understand.

Name & idea. TERVYX stands for Tiered Evidence & Risk Verification sYstem.
Goal. Present a claim’s evidence and risk as an intuitive label so people can tell—at a glance—whether to trust, be cautious, or avoid.
Result. A claim ends up with one of five levels: Gold, Silver, Bronze, Red, Black. Gold ≈ highly trustworthy; Black ≈ not trustworthy / risky.

Crucially, TERVYX is reproducible and transparent:

Reproducible: Same inputs/policies → same outputs, whoever runs it.
Transparent: It records why a label was issued (e.g., which studies, how they were analyzed), enabling external audit.

1.2 A simple analogy: “Nutrition label” for health claims

Think of the TERVYX label like a nutrition facts panel—but for claims.

Example: “Magnesium supplementation improves sleep quality.” If the TERVYX label reads Silver (PASS), it means moderately strong evidence supports a meaningful improvement.
Conversely, Black (FAIL) warns that a claim is untrustworthy or potentially unsafe.

The five levels resemble movie/age ratings or hotel stars:

Gold: top-tier evidence;
Black: reject—untrustworthy or unsafe.

The labels condense complex research into human-readable signals while preserving machine-readable detail under the hood.

2. Why do we need this?

2.1 The overload and confusion of modern health information

We encounter dozens of health claims daily (YouTube, blogs, news, group chats). These claims often conflict or are overstated, and people lack a shared standard to judge them. Expecting laypeople to parse effect sizes or bias risks from clinical trials is unrealistic, leading to:

Financial harm (buying into exaggerated products),
Health risks (following unsafe advice),
and erosion of trust.

Existing fact-checks (e.g., True/False) are too binary for health: most medical topics live in shades of gray (effect sizes vary, uncertainty exists, safety matters).

3. How does TERVYX work?

At a high level, TERVYX is like a factory that takes evidence as raw material, performs analysis and quality checks, and outputs a label.

3.1 The three phases

Preparation (Evidence ingestion & standardization)
Verification (Effect estimation & multi-gate checks)
Packaging (TEL-5 label output)

3.2 Evidence State Vector (ESV): standardizing inputs

For a claim (e.g., “Magnesium helps sleep”), TERVYX gathers relevant studies (e.g., PubMed). Each study is transformed into an Evidence State Vector (ESV) containing:

Study ID/DOI, year, design (RCT/observational),
Effect type (OR, RR, MD, SMD), value & CI, sample sizes,
Bias risk, journal metadata (for trust scoring), etc.

This uniform structure makes automated analysis reliable and repeatable.

3.3 Meta-analysis for effect estimation (Processing I)

TERVYX uses meta-analysis (random-effects; REML for τ²) to pool results across studies. This yields an average effect and uncertainty.

But “statistical significance” ≠ “meaningful change.” So TERVYX defines a domain-specific threshold δ (e.g., sleep δ=0.20 on PSQI). Then it estimates P(effect > δ) via Monte Carlo simulation.

Example: If 10,000 draws show 70% exceed δ, then P(effect > δ) = 0.70.

This probability anchors the TEL-5 tiering (see §3.5).

3.4 Five Gate Governance (Quality & safety checks)

Beyond effect size, TERVYX passes the claim through five gates:

Φ (Physics / Category validity) Filters physically/biologically impossible or category-mismatched claims. Fail here ⇒ Black.
R (Relevance) Checks if the evidence truly matches the claim (population, outcome, context). Poor fits trigger downgrades or exclusion.
J (Journal Trust) Journal-Trust Oracle computes a 0–1 trust score (e.g., impact indices, DOAJ/COPE membership, retractions, predatory lists). Bad venues ⇒ severe penalties or exclusion.
K (Safety) Flags adverse events and risk severity. Serious safety concerns ⇒ Black, regardless of effect size (monotone safety-first principle).
L (Exaggeration/Languaging) Detects overblown language (e.g., “miracle cure,” “no side effects,” “instant fix”). Excess hype ⇒ tier downgrade.

These gates ensure non-statistical but crucial factors (plausibility, fitness, provenance, safety, rhetoric) are enforced.

3.5 TEL-5: the five evidence levels (Packaging)

Final labels reflect P(effect > δ) plus gate outcomes:

🥇 Gold (PASS): ≥ 0.80 — very strong evidence; highly trustworthy.
🥈 Silver (PASS): 0.60–0.80 — strong evidence; generally trustworthy.
🥉 Bronze (AMBER): 0.40–0.60 — mixed/uncertain; proceed with caution.
🔴 Red (AMBER): 0.20–0.40 — weak evidence; likely not effective.
⚫ Black (FAIL): < 0.20 or Φ/K violations — reject/untrustworthy.

Gate overrides: If Φ or K fails, the claim becomes Black regardless of meta-analytic strength.

Machine+human outputs. TERVYX exports JSON-LD (for AI systems) with the label, gate results, P(effect>δ), DOIs, policy version, and an audit hash. The human-facing card shows the label + short justification.

References in repository:

Protocol excerpts: https://github.com/moneypuzzler/tervyx
Readme notes: https://github.com/moneypuzzler/tervyx/README.md

4. Where can TERVYX be used?

4.1 Consumer health platforms

Attach TERVYX labels to encyclopedia entries, Q&A answers, and product claims:

“Ginger for colds?” → TERVYX: Bronze (AMBER) — limited/uncertain evidence.
“Seasonal flu vaccine?” → TERVYX: Gold (PASS) — strong evidence.

4.2 Public sector & regulators

Use TERVYX for ad review and preclearance:

If a product claims “miracle osteoporosis cure” but TERVYX returns Black, regulators can request correction/sanction.
Public campaigns can cite Gold/PASS signals to strengthen trust.

4.3 Clinicians & researchers

Point-of-care quick checks for unfamiliar patient questions. Guideline writers can cite TEL-5 tiers in summaries. Machine-readable exports (JSON-LD, BibTeX) and DOIs facilitate citation and audit.

4.4 Platforms & AI

Social video/text platforms: Auto-attach labels to claim-heavy posts.
LLMs/assistants: Answer user questions with TERVYX labels (e.g., “TERVYX: Silver/PASS”).
Search engines: Display a TERVYX badge next to results.

5. What’s technically novel?

5.1 Multi-aspect evaluation (beyond one score)

TERVYX evaluates effect size and plausibility, relevance, provenance, safety, rhetoric—with hard caps (Φ/K) where needed. This is unusual: it merges scientific statistics with journal ethics and language hygiene checks.

5.2 Reproducibility & audit trail

Reproducible builds: Same inputs/policy/seed ⇒ same label.
Policy versioning & fingerprints: Every result records the policy version + hash for forensic audit.

5.3 Efficient updates via partial re-build (DAG)

A directed acyclic graph links inputs → outputs. Changes (e.g., a journal’s status, a new RCT, a δ tweak) trigger only the affected nodes to recompute. This enables near-real-time updates at scale.

5.4 Dual outputs (for humans and machines)

JSON-LD for AI integration; TEL-5 card (PASS/AMBER/FAIL with medals) for fast human judgment.

5.5 Automated evidence collection

Pipelines fetch studies (e.g., PubMed), extract effect sizes, consult journal-trust databases, run meta-analysis, and export labels with minimal manual steps—built for scale.

6.1 Systematic reviews & GRADE

Traditional systematic reviews/GRADE require months of expert work and become outdated quickly. TERVYX automates ingestion, keeps updating, and adds journal trust / language checks, plus machine-readable outputs.

6.2 Fact-checking & content moderation

General fact-checks (news/politics) don’t handle domain-specific evidence (RCTs, effect sizes, adverse events). TERVYX is health-specialized and quantitative.

6.3 Consumer health apps

Most apps show tips with unclear provenance. TERVYX adds standardized labels with transparent evidence and safety-first gating—clear differentiation.

6.4 Knowledge platforms (e.g., Wikipedia)

A TERVYX badge at the top of medical articles could provide immediate, standardized evidence context (e.g., “Silver (PASS)”).

6.5 Comparable projects?

Projects that quantify end-to-end health-claim trust with five gates, journal-trust oracle, exaggeration detection, and monotone capping (Φ/K) are rare. TERVYX is a first-of-its-kind synthesis.

7. What’s next?

7.1 Expanding topics

From supplements/sleep/mental health to broader therapies, diagnostics, prevention, and off-label contexts. The principle—multi-gate, probability-based assessment—generalizes, though domain δ thresholds differ.

7.2 Community participation

Curated contributions (study additions, critiques) can improve coverage while maintaining quality control.

7.3 Policy & decision-making

Regulators, payers, and guideline bodies can use TEL-5 tiers in eligibility, coverage, and counseling decisions.

7.4 AI synergy

Richer NLP extraction from papers; reinforcement loops that compare TERVYX predictions to future large trials; adaptive explanations for different audiences.

7.5 Standardization & global adoption

The dream: a universal, intuitive label for health claims online—TERVYX becomes the common language for trust.

Appendix A. FAQ

Q1. If a claim is Gold/PASS, should I just follow it? Gold ≠ 100%. It means very strong current evidence. Health decisions remain individualized; consult clinicians when appropriate. TERVYX is a guide, not a prescription.

Q2. If a claim is Black/FAIL, is it 100% false? Black means untrustworthy or unsafe with current evidence (or Φ/K violations). Science evolves, but until robust evidence emerges, avoid.

Q3. Who assigns the labels—AI or humans? Software computes labels under public policies. Humans author the policies (e.g., δ thresholds, weights). Results are reproducible; audits are welcome.

Q4. Can TERVYX be wrong? Any system can err (missing studies, parameter issues). But TERVYX is transparent and improvable: submit evidence, propose policy changes, and verify via audit trails.

Q5. Can individuals try TERVYX? Yes—open source on GitHub. You can create entries and run builds. For official conclusions, prefer expert-reviewed pipelines.

Q6. Will TERVYX “kill” health creators? No. It rewards creators who are evidence-based. A Silver/PASS label, for example, can boost credibility. It discourages overhyped content.

Q7. What if people don’t understand the labels? Labels are intuitive (colors/medals + PASS/AMBER/FAIL). Tooltips and detail pages can explain why (gate results, key citations).

Appendix B. Glossary

Evidence: Research supporting a claim (clinical trials, observational studies).
Meta-analysis: Pools results across studies to estimate overall effect.
REML: Estimation method for between-study variance (τ²) in random-effects models.
Monte Carlo simulation: Random draws to approximate probabilities like P(effect > δ).
δ (delta): Domain-specific threshold for meaningful effect.
Gates (Φ, R, J, K, L): Physics, Relevance, Journal Trust, Safety, Exaggeration checks.
Predatory journal: Low-quality venues lacking proper peer review; penalized in J gate.
DOAJ/COPE: Journal quality/ethics signals.
TEL-5: Five-tier evidence levels: Gold, Silver, Bronze, Red, Black.
Reproducibility/Auditability: Same inputs → same outputs; fully traceable decisions.
JSON-LD: Machine-readable linked data format.
LLM: Large Language Model (e.g., GPT-class assistants).
RFC-style governance: Open proposals for policy changes; versioned policies.
Policy fingerprint: Cryptographic hash of the current policy settings.
Journal-Trust Oracle (J*):*Aggregates journal trust signals into a 0–1 score.
Monotone safety-first principle: Φ/K failures cap labels at Black—safety and plausibility cannot be outweighed by other strengths.

References & Links

Canonical DOI: https://doi.org/10.5281/zenodo.17365759

Disclaimer (Not medical advice).TERVYX labels summarize evidence & safety signals; they are not medical diagnosis or treatment recommendations. Always consult qualified professionals for personal health decisions.*

TERVYX Protocol: The Future of Health Information Trust Grading