Evaluation

Evidence v0: ranked feed vs naive baselines

This page calls GET /api/v1/evaluation/compare so you can inspect the same candidate pool under three orderings: materialized ranking, citation-sorted, and date-sorted. Nothing here claims human relevance - only distributional checks on short lists.

Pin: NEXT_PUBLIC_RANKING_VERSION=ml2-5a-qual-r2-k6-20260405

Disclaimer

These outputs are comparison aids for engineering and transparency, not human relevance judgments.

  • Side-by-side lists share the same candidate pool for the selected recommendation family and corpus snapshot.
  • Recency, citation, and topic summaries are coarse proxies over the short lists shown — they do not measure usefulness to a researcher.
  • Topic overlap uses Jaccard similarity on topic labels attached to papers in this corpus, not semantic similarity of full text.
  • Use ranked outputs for product behavior; use this endpoint to sanity-check drift against naive orderings.

Run and pool

Run ml2-5a-qual-r2-k6-20260405 | rank-83976f1097 | snapshot source-snapshot-20260329-170012 | embedding v1-title-abstract-1536-cleantext-r2 | pool size 35

Low-cite candidate pool (revision v0): included core works in this corpus snapshot, year≥2019, citations≤30, non-empty title and abstract. Matches docs/candidate-pool-low-cite.md and the materialized undercited family scope.

Low-cite gate from run config: year at least 2019, citations at most30 (revision v0). Frozen definition: docs/candidate-pool-low-cite.md.

Topic label overlap between lists

Jaccard index on the set of OpenAlex topic labels appearing in the top tags of each paper in the list. High overlap means similar topic mix, not similar intellectual content.

Ranked vs citation baseline
0.3571
Ranked vs date baseline
0.9167
Citation vs date baseline
0.3846

Ranked (family)

Materialized ranking run: order by final_score descending (heuristic signals only in v0; semantic is not used).

Order: final_score DESC

Proxy stats (list-only; not relevance)

Recency
mean year 2025.17; min-max 2025-2026; share in latest two years 100.0%
Citations
mean 0.42; median 0.00; range 0-2
Topic mix
12 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Diverse Musicological Studies, Neuroscience and Music Perception, Information Retrieval and Search Behavior

Citation baseline

Popularity-style baseline on the same pool: highest citations first (not a relevance judgment).

Order: citation_count DESC, year DESC, openalex_id ASC

Proxy stats (list-only; not relevance)

Recency
mean year 2022.83; min-max 2021-2025; share in latest two years 33.3%
Citations
mean 10.50; median 9.50; range 3-21
Topic mix
7 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Speech and Audio Processing, Diverse Musicological Studies, Neuroscience and Music Perception

Date baseline

Pure recency baseline on the same pool: newest year first (not a relevance judgment).

Order: year DESC, openalex_id ASC

Proxy stats (list-only; not relevance)

Recency
mean year 2025.17; min-max 2025-2026; share in latest two years 100.0%
Citations
mean 0.67; median 0.00; range 0-3
Topic mix
11 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Diverse Musicological Studies, Neuroscience and Music Perception, Speech and Audio Processing

Generated at 2026-04-08T13:08:03.952989Z | Later work: labeled benchmarks, P@k, freeze-at-T backtests - see product checklist in API /api/v1/evaluation/summary.