EvidenX Metrics Methodology

Complete documentation of how every metric is calculated, what it measures, its limitations, and the rationale behind each design decision.

Important: All EvidenX metrics are structural descriptors of the evidence landscape. They describe distribution, maturity, and coverage patterns — they do not assess clinical validity, treatment effectiveness, or make recommendations. Scores are calculated from title and abstract data only (not full text). They should inform research planning, not clinical decisions.

Contents 1. EMI — Evidence Maturity Index 2. RSC — Research Saturation Coefficient 3. PRF — Pyramid Robustness Factor 4. Article Potential Score 5. Cluster Opportunity Score 6. Study Type Classification 7. Abstract Quality Assessment 8. General Limitations 9. References

1. EMI — Evidence Maturity Index 0–100

What it measures: How mature and established the evidence base is for a given research topic. A high EMI indicates the field has progressed from exploratory studies to controlled trials and formal synthesis.

Formula

EMI = (Synthesis × 0.35) + (Design Quality × 0.25) + (Volume × 0.20) + (Temporal × 0.20)

Component	Weight	How Calculated	Rationale
Synthesis Presence	35%	Log-scaled count of meta-analyses and systematic reviews in corpus	Existence of synthesis is the strongest signal of field maturity (Ioannidis, 2016)
Design Quality	25%	Proportion of controlled designs (RCT, cohort, case-control) in corpus	Controlled studies indicate the field has moved beyond descriptive/exploratory phase
Volume Adequacy	20%	Log2-scaled article count (diminishing returns: 50 articles ≈ 75/100)	More evidence provides broader base for conclusions, with diminishing returns
Temporal Breadth	20%	Year span of publications (wider span = more mature)	Fields studied over longer periods have more opportunity for consolidation

Interpretation

Score	Level	Meaning
≥ 80	HIGH	Consolidated field with synthesis and controlled studies. Formal evidence synthesis is well-supported.
≥ 60	MODERATE	Maturing field with quality primary studies. Systematic reviews feasible with methodological caveats.
≥ 40	EMERGING	Developing field, predominantly observational. Suitable for hypothesis generation, not formal recommendations.
< 40	LOW	Sparse or low-quality evidence. Any conclusions would be premature.

Limitation: EMI is sensitive to study type classification accuracy. Misclassified articles (e.g., a narrative review labeled as systematic review) will inflate the score. EMI does not assess individual study quality or risk of bias.

Reference: Oxford Centre for Evidence-Based Medicine, "Levels of Evidence" (2011). GRADE Working Group, "GRADE: an emerging consensus" (2004). Ioannidis, "The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews" (2016).

2. RSC — Research Saturation Coefficient 0–1.0

What it measures: How saturated/complete the research coverage is for a topic. A high RSC indicates most research questions have been addressed and the marginal value of new primary studies is low.

Formula

RSC = (Synthesis Coverage × 0.30) + (Gap Scarcity × 0.30) + (Trend Stability × 0.20) + (Coverage Breadth × 0.20)

Component	Weight	How Calculated	Rationale
Synthesis Coverage	30%	Ratio of clusters containing MA/SR to total clusters	Synthesized clusters indicate completed research cycles
Gap Scarcity	30%	1 / (1 + high_priority_gaps). Zero gaps = 1.0	Fewer gaps mean more complete coverage
Trend Stability	20%	Proportion of stable or declining clusters	Non-growing fields indicate saturation, not opportunity
Coverage Breadth	20%	Proportion of articles captured in identified clusters	High clustering = well-organized evidence landscape

Interpretation

Score	Level	Meaning
≥ 0.8	SATURATED	Extensively studied. Prioritize synthesis and implementation over new primary studies.
≥ 0.5	MATURE	Well studied with specific gaps. Target research at identified gaps.
≥ 0.25	DEVELOPING	Significant space for new studies. Original contributions have high potential impact.
< 0.25	UNEXPLORED	Scarcity of studies. Any well-designed study will contribute meaningfully.

Limitation: RSC depends on cluster quality. If clustering misgroups articles, saturation assessment may be inaccurate. RSC also cannot distinguish between "saturated because well-studied" and "saturated because narrow topic."

Reference: Bornmann & Mutz, "Growth rates of modern science" (2015). Concept adapted from information retrieval completeness and bibliometric saturation analysis.

3. PRF — Pyramid Robustness Factor 0–1.0

What it measures: How well-formed the evidence pyramid is — whether the distribution of study types follows the expected shape (broad base of observational studies, middle layer of experimental, narrow top of synthesis).

Formula

PRF = (Tier Completeness × 0.30) + (Distribution Shape × 0.40) + (Inversion Check × 0.30)

Component	Weight	How Calculated	Rationale
Tier Completeness	30%	Are all 3 tiers present? (top: MA/SR, mid: RCT/Cohort, base: Cross-sectional/Case)	A complete pyramid requires evidence at all levels
Distribution Shape	40%	Distance from ideal proportions (top 10%, mid 45%, base 45%)	Core question: does this look like a proper pyramid? Highest weight.
Inversion Check	30%	Penalty if synthesis count exceeds primary study count	An inverted pyramid (more MA than primary studies) indicates possible duplicative synthesis or weak foundation

Ideal Pyramid

Top tier (MA + SR): 5-15% of corpus Mid tier (RCT + Cohort + Case-Control): 30-60% of corpus Base tier (Cross-sectional + Case + Other): 30-50% of corpus

Interpretation

Score	Level	Meaning
≥ 0.7	ROBUST	Well-structured pyramid. High reliability for evidence synthesis.
≥ 0.4	MODERATE	Some structural gaps. Synthesis possible with caveats.
< 0.4	FRAGILE	Poorly structured or inverted pyramid. Synthesis would be methodologically problematic.

Limitation: PRF assumes the traditional evidence pyramid model (Murad et al., 2016). Some fields may have legitimate reasons for non-standard distributions (e.g., ethics precludes RCTs). PRF cannot assess whether individual studies within each tier are methodologically sound.

Reference: Murad et al., "New evidence pyramid", Evidence-Based Medicine 21(4), 2016. Sackett et al., "Evidence-based medicine: what it is and what it isn't", BMJ 312, 1996.

4. Article Potential Score 0–1.0

What it measures: The scientific potential of an individual article for a researcher's purposes — combining evidence level, novelty, relevance, and recency.

Note: This is a research utility score, not a quality score. A novel case report (low evidence level) may score higher than an older meta-analysis if the topic is emerging.

Formula

Article Score = (Evidence Hierarchy × 0.25) + (Novelty × 0.20) + (Watchtower Match × 0.15) + (Lexicon Hits × 0.15) + (Recency × 0.15) + (Quality Indicators × 0.10)

Component	Weight	Source
Evidence Hierarchy	25%	Study design (MA=1.0, RCT=0.85, Cohort=0.70, Case Report=0.30)
Novelty	20%	Emerging terms, Watchtower alignment, challenges consensus
Watchtower Match	15%	Alignment with actively monitored research topics
Lexicon Hits	15%	MeSH and custom terminology matches
Recency	15%	≤1yr=1.0, ≤2yr=0.85, ≤5yr=0.5, ≤10yr=0.3
Quality Indicators	10%	Keywords: multicenter, prospective, large sample, validated

Limitation: Quality indicators are detected by keyword matching in the abstract, not by full-text assessment or formal risk of bias evaluation. A study claiming to be "multicenter" gets the same boost regardless of actual quality. This component has the lowest weight (10%) for this reason.

5. Cluster Opportunity Score 0–1.0

What it measures: The research opportunity within a thematic cluster — identifying areas where a systematic review or new study would have high impact.

Formula

Cluster Score = (Synthesis Gap × 0.30) + (Volume × 0.20) + (Growth Trend × 0.20) + (Quality Ratio × 0.15) + (Recency × 0.15)

A cluster with many primary studies but no meta-analysis scores highest — this is a clear synthesis opportunity.

Limitation: Clusters are identified by co-occurrence analysis (term overlap), not semantic understanding. Misclustured articles may inflate or deflate opportunity scores.

6. Study Type Classification

What it measures: The study design of each article, classified into the standard evidence hierarchy (EBM pyramid).

Method

Classification uses keyword matching on title and abstract. Patterns searched include: "meta-analysis", "systematic review", "randomized controlled trial", "cohort", "case-control", "cross-sectional", "case report", "guideline", etc. The first match determines the classification.

Evidence Hierarchy (Layers A–E)

Layer	Study Types	Evidence Level
A	Meta-Analysis, Systematic Review	Highest — synthesized evidence
B	RCT, Clinical Practice Guideline	High — controlled experimental
C	Cohort, Case-Control, Cross-Sectional	Moderate — observational
D	Narrative Review	Low — expert synthesis without systematic method
E	Case Report, Editorial, Expert Opinion	Lowest — anecdotal or opinion-based

Reference: Oxford Centre for Evidence-Based Medicine, "Levels of Evidence" (2011). Sackett et al., "Evidence-based medicine" (1996).

Limitation: Keyword-based classification may misclassify articles. For example, an article titled "A systematic approach to..." may be flagged as systematic review when it is not. Classification confidence is estimated based on keyword specificity and count.

7. Abstract Quality Assessment

What it measures: The reporting quality of an abstract — whether it follows structured reporting conventions (IMRAD), clearly states objectives, defines the population, and reports quantitative outcomes.

Not what it measures: This is NOT a risk of bias assessment or a judgment of study validity. A well-written abstract for a poorly designed study will score high.

Components

Component	Weight	What It Checks
IMRAD Structure	20%	Presence of Background, Objective, Methods, Results, Conclusion sections
Objective Clarity	20%	Explicit aim statement ("to determine", "to evaluate", "to compare")
Population Definition	20%	Sample size, age, sex, inclusion criteria, setting mentioned
Outcome Reporting	20%	Primary outcome defined, quantitative results present
Quantitative Data	10%	Statistical measures (p-values, CI, effect sizes)
Conclusion Alignment	10%	Balanced language, no overclaiming

Limitation: This assesses reporting completeness, not methodological rigor. A complete abstract with clear IMRAD structure does not guarantee a well-designed study. Overclaiming detection is keyword-based and may miss sophisticated exaggeration.

8. General Limitations

All EvidenX metrics share these inherent limitations:

Title/Abstract only: All analysis is based on title and abstract text. Full-text methodology, data quality, statistical analysis details, and supplementary materials are not assessed.

Keyword-based detection: Study type classification, quality indicators, and term extraction rely on keyword/pattern matching. This approach is fast and reproducible but less accurate than human expert classification.

No individual risk of bias: EvidenX does not perform formal risk of bias assessment (ROB2, NOS, etc.) on individual studies. The Method Engine provides frameworks for this, but the scoring system does not incorporate it.

Weight justification: Scoring weights are documented with rationale derived from EBM literature, but they have not been externally validated through empirical calibration against expert consensus. Users should interpret scores as relative rankings, not absolute quality measures.

Corpus dependency: All scores are relative to the searched corpus. The same article may score differently in different searches depending on what other articles are included.

No clinical recommendations: EvidenX explicitly does not make clinical recommendations, treatment suggestions, or diagnostic advice. All outputs are structural descriptions of the evidence landscape.

9. References

1. Oxford Centre for Evidence-Based Medicine. "Levels of Evidence" (2011).
2. GRADE Working Group. "GRADE: an emerging consensus on rating quality of evidence and strength of recommendations." BMJ 336(7650), 2008.
3. Sackett DL et al. "Evidence-based medicine: what it is and what it isn't." BMJ 312(7023), 1996.
4. Murad MH et al. "New evidence pyramid." Evidence-Based Medicine 21(4), 2016.
5. Ioannidis JPA. "The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-Analyses." Milbank Quarterly 94(3), 2016.
6. Bornmann L, Mutz R. "Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references." JASIST 66(11), 2015.
7. PRISMA Group. "Preferred Reporting Items for Systematic Reviews and Meta-Analyses." PLoS Medicine 6(7), 2009.
8. Higgins JPT et al. "Cochrane Handbook for Systematic Reviews of Interventions." Version 6.3, 2022.