Evaluation

Evidence v0: ranked feed vs naive baselines

This page calls GET /api/v1/evaluation/compare so you can inspect the same candidate pool under three orderings: materialized ranking, citation-sorted, and date-sorted. Nothing here claims human relevance - only distributional checks on short lists.

Pin: NEXT_PUBLIC_RANKING_VERSION=ml2-5a-qual-r2-k6-20260405

Disclaimer

These outputs are comparison aids for engineering and transparency, not human relevance judgments.

  • Side-by-side lists share the same candidate pool for the selected recommendation family and corpus snapshot.
  • Recency, citation, and topic summaries are coarse proxies over the short lists shown — they do not measure usefulness to a researcher.
  • Topic overlap uses Jaccard similarity on topic labels attached to papers in this corpus, not semantic similarity of full text.
  • Use ranked outputs for product behavior; use this endpoint to sanity-check drift against naive orderings.

Run and pool

Run ml2-5a-qual-r2-k6-20260405 | rank-83976f1097 | snapshot source-snapshot-20260329-170012 | embedding v1-title-abstract-1536-cleantext-r2 | pool size 38

All included works in the corpus snapshot (same candidate set as the ranking run's emerging/bridge families).

Topic label overlap between lists

Jaccard index on the set of OpenAlex topic labels appearing in the top tags of each paper in the list. High overlap means similar topic mix, not similar intellectual content.

Ranked vs citation baseline
0.5000
Ranked vs date baseline
0.7273
Citation vs date baseline
0.3846

Ranked (family)

Materialized ranking run: order by final_score descending (heuristic signals only in v0; semantic is not used).

Order: final_score DESC

Proxy stats (list-only; not relevance)

Recency
mean year 2022.92; min-max 2021-2025; share in latest two years 33.3%
Citations
mean 17.58; median 13.50; range 1-48
Topic mix
8 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Neuroscience and Music Perception, Speech and Audio Processing, Diverse Musicological Studies

Citation baseline

Popularity-style baseline on the same pool: highest citations first (not a relevance judgment).

Order: citation_count DESC, year DESC, openalex_id ASC

Proxy stats (list-only; not relevance)

Recency
mean year 2022.33; min-max 2021-2024; share in latest two years 50.0%
Citations
mean 19.58; median 17.00; range 6-48
Topic mix
7 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Neuroscience and Music Perception, Speech and Audio Processing, Diverse Musicological Studies

Date baseline

Pure recency baseline on the same pool: newest year first (not a relevance judgment).

Order: year DESC, openalex_id ASC

Proxy stats (list-only; not relevance)

Recency
mean year 2025.17; min-max 2025-2026; share in latest two years 100.0%
Citations
mean 0.67; median 0.00; range 0-3
Topic mix
11 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Diverse Musicological Studies, Neuroscience and Music Perception, Speech and Audio Processing

Generated at 2026-04-08T12:58:27.886440Z | Later work: labeled benchmarks, P@k, freeze-at-T backtests - see product checklist in API /api/v1/evaluation/summary.