Methodology

A transparent synthesis of six published rankings — what we measure, what we don't, and how to read our numbers.

What FWUR estimates (and what it doesn't)

FWUR Rank is a transparent synthesis of six published university rankings. We measure (1) where these rankings disagree, (2) where they converge, and (3) how sensitive the consensus is to which rankings we include. We do not measure educational or research quality directly.

The three things we measure

  1. Primary — Disagreement

    How differently the six agencies rank a given institution. This is what FWUR exists to surface; the consensus number is the hook, the disagreement signal is the substance.

  2. Secondary — Consensus

    A robust trimmed-mean summary of where the agencies converge. Reported as the headline number, but visually no larger than the disagreement display.

  3. Tertiary — Method sensitivity

    How much the answer depends on which agencies we include — surfaced via the custom-subset (Mode C) view and method-sensitivity bands.

Honest limitations

FWUR's v0.1 algorithm and v1.0 product completion were locked on 2026-05-08 by project lead authority alone, without external statistical consultant or domain expert sign-off. The decision was informed by seven years of accumulated project-lead thinking on multi-agency aggregation, fifteen LLM peer reviews across three rounds, and the deterministic v0.1 baseline (62 unit tests with theorem proofs).

Validation is via internal Saltelli–Sobol method-sensitivity analysis (Track C). External validation tracks (user A/B study; expert pairwise panel with Bradley–Terry) are documented as aspirational pending future budget. The Bayesian-model R&D branch is indefinitely deferred for the same reason.

This is the honest constraint. We do not claim external academic validation we do not have.

Methodological honesty — what we deliberately do not do

Why we avoid frequentist uncertainty intervals

The six rankings are not a random sample drawn from a population — they are the population of major published university rankings. Standard frequentist uncertainty quantification (the kind that produces an interval with a coverage guarantee) requires a sampling model that does not exist here, so quoting one would be mathematically misleading. Instead we surface a qualitative disagreement bucket (high agreement / mixed signal / divergent signal) and a method-sensitivity band (planned for v0.2 once the Saltelli–Sobol pipeline runs over the 41 size-≥3 agency subsets). Our naming-discipline lint actively blocks the corresponding language in user-facing copy.

Why our trajectory chart is overlay, not small multiples

Edward Tufte's rule for time series with more than three lines is small multiples — one mini-chart per agency, faceted side by side. We use overlay (six lines on one chart) because the user task is direct comparison: did agency X agree with agency Y this year? Faceted small multiples answer that less directly than co-located lines. We acknowledge the trade-off: with six overlapping series the chart can look crowded, especially in the middle of the rank range. A small-multiples view is on the v0.2-x backlog as an option toggle, not a default.

Both limits have explicit reactivation triggers in CONSTRAINTS.md §5: when external statistical consultation becomes accessible, or when the Saltelli–Sobol pipeline yields a defensible empirical band, the corresponding methodology section will be amended via a new ADR.

Standards we follow

Leiden Manifesto (Hicks et al. 2015) · Berlin Principles (IREG observatory) · OECD/JRC Composite Indicators Handbook (Saisana 2008/2011) · DORA · AAPOR

For full detail

These documents are part of the project repository; the methodology evolves through versioned ADR amendments rather than silent changes.