EvidenX Methodology

EvidenX Metrics Methodology

Complete documentation of how every metric is calculated, what it measures, its limitations, and the rationale behind each design decision.

Important: All EvidenX metrics are structural descriptors of the evidence landscape. They describe distribution, maturity, and coverage patterns — they do not assess clinical validity, treatment effectiveness, or make recommendations. Scores are calculated from title and abstract data only (not full text). They should inform research planning, not clinical decisions.
Contents 1. EMI — Evidence Maturity Index 2. RSC — Research Saturation Coefficient 3. PRF — Pyramid Robustness Factor 4. Article Potential Score 5. Cluster Opportunity Score 6. Study Type Classification 7. Abstract Quality Assessment 8. General Limitations 9. References

1. EMI — Evidence Maturity Index 0–100

What it measures: How mature and established the evidence base is for a given research topic. A high EMI indicates the field has progressed from exploratory studies to controlled trials and formal synthesis.

Formula

EMI = (Synthesis × 0.35) + (Design Quality × 0.25) + (Volume × 0.20) + (Temporal × 0.20)
ComponentWeightHow CalculatedRationale
Synthesis Presence35%Log-scaled count of meta-analyses and systematic reviews in corpusExistence of synthesis is the strongest signal of field maturity (Ioannidis, 2016)
Design Quality25%Proportion of controlled designs (RCT, cohort, case-control) in corpusControlled studies indicate the field has moved beyond descriptive/exploratory phase
Volume Adequacy20%Log2-scaled article count (diminishing returns: 50 articles ≈ 75/100)More evidence provides broader base for conclusions, with diminishing returns
Temporal Breadth20%Year span of publications (wider span = more mature)Fields studied over longer periods have more opportunity for consolidation

Interpretation

ScoreLevelMeaning
≥ 80HIGHConsolidated field with synthesis and controlled studies. Formal evidence synthesis is well-supported.
≥ 60MODERATEMaturing field with quality primary studies. Systematic reviews feasible with methodological caveats.
≥ 40EMERGINGDeveloping field, predominantly observational. Suitable for hypothesis generation, not formal recommendations.
< 40LOWSparse or low-quality evidence. Any conclusions would be premature.
Limitation: EMI is sensitive to study type classification accuracy. Misclassified articles (e.g., a narrative review labeled as systematic review) will inflate the score. EMI does not assess individual study quality or risk of bias.
Reference: Oxford Centre for Evidence-Based Medicine, "Levels of Evidence" (2011). GRADE Working Group, "GRADE: an emerging consensus" (2004). Ioannidis, "The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews" (2016).

2. RSC — Research Saturation Coefficient 0–1.0

What it measures: How saturated/complete the research coverage is for a topic. A high RSC indicates most research questions have been addressed and the marginal value of new primary studies is low.

Formula

RSC = (Synthesis Coverage × 0.30) + (Gap Scarcity × 0.30) + (Trend Stability × 0.20) + (Coverage Breadth × 0.20)
ComponentWeightHow CalculatedRationale
Synthesis Coverage30%Ratio of clusters containing MA/SR to total clustersSynthesized clusters indicate completed research cycles
Gap Scarcity30%1 / (1 + high_priority_gaps). Zero gaps = 1.0Fewer gaps mean more complete coverage
Trend Stability20%Proportion of stable or declining clustersNon-growing fields indicate saturation, not opportunity
Coverage Breadth20%Proportion of articles captured in identified clustersHigh clustering = well-organized evidence landscape

Interpretation

ScoreLevelMeaning
≥ 0.8SATURATEDExtensively studied. Prioritize synthesis and implementation over new primary studies.
≥ 0.5MATUREWell studied with specific gaps. Target research at identified gaps.
≥ 0.25DEVELOPINGSignificant space for new studies. Original contributions have high potential impact.
< 0.25UNEXPLOREDScarcity of studies. Any well-designed study will contribute meaningfully.
Limitation: RSC depends on cluster quality. If clustering misgroups articles, saturation assessment may be inaccurate. RSC also cannot distinguish between "saturated because well-studied" and "saturated because narrow topic."
Reference: Bornmann & Mutz, "Growth rates of modern science" (2015). Concept adapted from information retrieval completeness and bibliometric saturation analysis.

3. PRF — Pyramid Robustness Factor 0–1.0

What it measures: How well-formed the evidence pyramid is — whether the distribution of study types follows the expected shape (broad base of observational studies, middle layer of experimental, narrow top of synthesis).

Formula

PRF = (Tier Completeness × 0.30) + (Distribution Shape × 0.40) + (Inversion Check × 0.30)
ComponentWeightHow CalculatedRationale
Tier Completeness30%Are all 3 tiers present? (top: MA/SR, mid: RCT/Cohort, base: Cross-sectional/Case)A complete pyramid requires evidence at all levels
Distribution Shape40%Distance from ideal proportions (top 10%, mid 45%, base 45%)Core question: does this look like a proper pyramid? Highest weight.
Inversion Check30%Penalty if synthesis count exceeds primary study countAn inverted pyramid (more MA than primary studies) indicates possible duplicative synthesis or weak foundation

Ideal Pyramid

Top tier (MA + SR): 5-15% of corpus Mid tier (RCT + Cohort + Case-Control): 30-60% of corpus Base tier (Cross-sectional + Case + Other): 30-50% of corpus

Interpretation

ScoreLevelMeaning
≥ 0.7ROBUSTWell-structured pyramid. High reliability for evidence synthesis.
≥ 0.4MODERATESome structural gaps. Synthesis possible with caveats.
< 0.4FRAGILEPoorly structured or inverted pyramid. Synthesis would be methodologically problematic.
Limitation: PRF assumes the traditional evidence pyramid model (Murad et al., 2016). Some fields may have legitimate reasons for non-standard distributions (e.g., ethics precludes RCTs). PRF cannot assess whether individual studies within each tier are methodologically sound.
Reference: Murad et al., "New evidence pyramid", Evidence-Based Medicine 21(4), 2016. Sackett et al., "Evidence-based medicine: what it is and what it isn't", BMJ 312, 1996.

4. Article Potential Score 0–1.0

What it measures: The scientific potential of an individual article for a researcher's purposes — combining evidence level, novelty, relevance, and recency.

Note: This is a research utility score, not a quality score. A novel case report (low evidence level) may score higher than an older meta-analysis if the topic is emerging.

Formula

Article Score = (Evidence Hierarchy × 0.25) + (Novelty × 0.20) + (Watchtower Match × 0.15) + (Lexicon Hits × 0.15) + (Recency × 0.15) + (Quality Indicators × 0.10)
ComponentWeightSource
Evidence Hierarchy25%Study design (MA=1.0, RCT=0.85, Cohort=0.70, Case Report=0.30)
Novelty20%Emerging terms, Watchtower alignment, challenges consensus
Watchtower Match15%Alignment with actively monitored research topics
Lexicon Hits15%MeSH and custom terminology matches
Recency15%≤1yr=1.0, ≤2yr=0.85, ≤5yr=0.5, ≤10yr=0.3
Quality Indicators10%Keywords: multicenter, prospective, large sample, validated
Limitation: Quality indicators are detected by keyword matching in the abstract, not by full-text assessment or formal risk of bias evaluation. A study claiming to be "multicenter" gets the same boost regardless of actual quality. This component has the lowest weight (10%) for this reason.

5. Cluster Opportunity Score 0–1.0

What it measures: The research opportunity within a thematic cluster — identifying areas where a systematic review or new study would have high impact.

Formula

Cluster Score = (Synthesis Gap × 0.30) + (Volume × 0.20) + (Growth Trend × 0.20) + (Quality Ratio × 0.15) + (Recency × 0.15)

A cluster with many primary studies but no meta-analysis scores highest — this is a clear synthesis opportunity.

Limitation: Clusters are identified by co-occurrence analysis (term overlap), not semantic understanding. Misclustured articles may inflate or deflate opportunity scores.

6. Study Type Classification

What it measures: The study design of each article, classified into the standard evidence hierarchy (EBM pyramid).

Method

Classification uses keyword matching on title and abstract. Patterns searched include: "meta-analysis", "systematic review", "randomized controlled trial", "cohort", "case-control", "cross-sectional", "case report", "guideline", etc. The first match determines the classification.

Evidence Hierarchy (Layers A–E)

LayerStudy TypesEvidence Level
AMeta-Analysis, Systematic ReviewHighest — synthesized evidence
BRCT, Clinical Practice GuidelineHigh — controlled experimental
CCohort, Case-Control, Cross-SectionalModerate — observational
DNarrative ReviewLow — expert synthesis without systematic method
ECase Report, Editorial, Expert OpinionLowest — anecdotal or opinion-based
Reference: Oxford Centre for Evidence-Based Medicine, "Levels of Evidence" (2011). Sackett et al., "Evidence-based medicine" (1996).
Limitation: Keyword-based classification may misclassify articles. For example, an article titled "A systematic approach to..." may be flagged as systematic review when it is not. Classification confidence is estimated based on keyword specificity and count.

7. Abstract Quality Assessment

What it measures: The reporting quality of an abstract — whether it follows structured reporting conventions (IMRAD), clearly states objectives, defines the population, and reports quantitative outcomes.

Not what it measures: This is NOT a risk of bias assessment or a judgment of study validity. A well-written abstract for a poorly designed study will score high.

Components

ComponentWeightWhat It Checks
IMRAD Structure20%Presence of Background, Objective, Methods, Results, Conclusion sections
Objective Clarity20%Explicit aim statement ("to determine", "to evaluate", "to compare")
Population Definition20%Sample size, age, sex, inclusion criteria, setting mentioned
Outcome Reporting20%Primary outcome defined, quantitative results present
Quantitative Data10%Statistical measures (p-values, CI, effect sizes)
Conclusion Alignment10%Balanced language, no overclaiming
Limitation: This assesses reporting completeness, not methodological rigor. A complete abstract with clear IMRAD structure does not guarantee a well-designed study. Overclaiming detection is keyword-based and may miss sophisticated exaggeration.

8. General Limitations

All EvidenX metrics share these inherent limitations:

Title/Abstract only: All analysis is based on title and abstract text. Full-text methodology, data quality, statistical analysis details, and supplementary materials are not assessed.
Keyword-based detection: Study type classification, quality indicators, and term extraction rely on keyword/pattern matching. This approach is fast and reproducible but less accurate than human expert classification.
No individual risk of bias: EvidenX does not perform formal risk of bias assessment (ROB2, NOS, etc.) on individual studies. The Method Engine provides frameworks for this, but the scoring system does not incorporate it.
Weight justification: Scoring weights are documented with rationale derived from EBM literature, but they have not been externally validated through empirical calibration against expert consensus. Users should interpret scores as relative rankings, not absolute quality measures.
Corpus dependency: All scores are relative to the searched corpus. The same article may score differently in different searches depending on what other articles are included.
No clinical recommendations: EvidenX explicitly does not make clinical recommendations, treatment suggestions, or diagnostic advice. All outputs are structural descriptions of the evidence landscape.

9. References

1. Oxford Centre for Evidence-Based Medicine. "Levels of Evidence" (2011).
2. GRADE Working Group. "GRADE: an emerging consensus on rating quality of evidence and strength of recommendations." BMJ 336(7650), 2008.
3. Sackett DL et al. "Evidence-based medicine: what it is and what it isn't." BMJ 312(7023), 1996.
4. Murad MH et al. "New evidence pyramid." Evidence-Based Medicine 21(4), 2016.
5. Ioannidis JPA. "The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-Analyses." Milbank Quarterly 94(3), 2016.
6. Bornmann L, Mutz R. "Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references." JASIST 66(11), 2015.
7. PRISMA Group. "Preferred Reporting Items for Systematic Reviews and Meta-Analyses." PLoS Medicine 6(7), 2009.
8. Higgins JPT et al. "Cochrane Handbook for Systematic Reviews of Interventions." Version 6.3, 2022.