方法论

六个公开排名的透明综合 — 我们度量什么、不度量什么,以及如何解读我们的数字。

FWUR 估计什么(以及不估计什么)

FWUR Rank 是六个公开大学排名的透明综合。我们度量 (1) 这些排名在哪里不一致,(2) 在哪里趋同,(3) 共识对所纳入排名的选择有多敏感。我们不直接度量教育或研究质量。

我们度量的三件事

  1. 首要 — 分歧

    六家机构对某一机构的排名差异有多大。这正是 FWUR 存在的目的;共识数字是钩子,分歧信号才是实质。

  2. 次要 — 共识

    对各机构趋同处的稳健截尾均值摘要。作为头条数字呈现,但视觉上不大于分歧展示。

  3. 再次 — 方法敏感性

    答案对所纳入机构的依赖程度 — 通过自定义子集(模式 C)视图和方法敏感性带显示。

诚实的局限

FWUR v0.1 算法与 v1.0 产品完成于 2026-05-08 由项目负责人单独权威锁定,未经外部统计顾问或领域专家审稿人签署。该决定基于项目负责人对多机构聚合的七年积累思考、三轮共十五份 LLM 同行评审,以及确定性 v0.1 基线(62 个单元测试与定理证明)。

验证通过内部 Saltelli–Sobol 方法敏感性分析(Track C)进行。外部验证路径(用户 A/B 研究;Bradley–Terry 专家成对比较小组)记录为待未来预算的愿景。贝叶斯模型研究分支因相同原因被无限期推迟。

这就是诚实的约束。我们不主张我们没有的外部学术验证。

Methodological honesty — what we deliberately do not do

Why we avoid frequentist uncertainty intervals

The six rankings are not a random sample drawn from a population — they are the population of major published university rankings. Standard frequentist uncertainty quantification (the kind that produces an interval with a coverage guarantee) requires a sampling model that does not exist here, so quoting one would be mathematically misleading. Instead we surface a qualitative disagreement bucket (high agreement / mixed signal / divergent signal) and a method-sensitivity band (planned for v0.2 once the Saltelli–Sobol pipeline runs over the 41 size-≥3 agency subsets). Our naming-discipline lint actively blocks the corresponding language in user-facing copy.

Why our trajectory chart is overlay, not small multiples

Edward Tufte's rule for time series with more than three lines is small multiples — one mini-chart per agency, faceted side by side. We use overlay (six lines on one chart) because the user task is direct comparison: did agency X agree with agency Y this year? Faceted small multiples answer that less directly than co-located lines. We acknowledge the trade-off: with six overlapping series the chart can look crowded, especially in the middle of the rank range. A small-multiples view is on the v0.2-x backlog as an option toggle, not a default.

Both limits have explicit reactivation triggers in CONSTRAINTS.md §5: when external statistical consultation becomes accessible, or when the Saltelli–Sobol pipeline yields a defensible empirical band, the corresponding methodology section will be amended via a new ADR.

我们遵循的标准

莱顿宣言(Hicks et al. 2015) · 柏林原则(IREG 观测站) · OECD/JRC 复合指标手册(Saisana 2008/2011) · DORA · AAPOR

完整细节

这些文档是项目仓库的一部分;方法论通过版本化的 ADR 修正演进,而非默默更改。