Evaluation

Evaluation v0: ranked feed vs simple baselines

Emerging familyDistributional checks only

This page calls GET /api/v1/evaluation/compare so you can inspect the same candidate pool under three orderings: materialized ranking, citation-sorted, and date-sorted. Nothing here measures whether researchers would find the papers useful; it only shows distributional checks on short lists. For Emerging, this compares the materialized paper_scores order, not the bounded ML scorer reorder.

Pool size

528

Run label

shadow-generalization-product-candidate-ranking-v1

Embedding

shadow-generalization-text-embedding-v1

Generated at

2026-06-26

Run label filter: shadow-generalization-product-candidate-ranking-v1

Focus paper: https://openalex.org/W4415315857 | compare limit 12 Visible in: date baseline.

Interpretation guardrails

Interpretation notes

These outputs help compare ranking behavior and expose drift; they are not expert-reviewed evidence that papers are useful to researchers.

  • Side-by-side lists share the same candidate pool for the selected recommendation family and corpus snapshot.
  • Recency, citation, and topic summaries are coarse proxies over the short lists shown; they do not measure whether a researcher would find a paper useful.
  • Topic overlap uses Jaccard similarity on topic labels attached to papers in this corpus, not semantic similarity of full text.
  • Use ranked outputs for product behavior; use this endpoint to sanity-check drift against naive orderings.

Run context

Run and pool

Run shadow-generalization-product-candidate-ranking-v1 | rank-83787b91ef | snapshot source-snapshot-shadow-generalization-v1-20260521 | embedding shadow-generalization-text-embedding-v1 | pool size 528

All included works in the corpus snapshot (same candidate set as the ranking run's emerging/bridge families).

Topic labels are imported metadata and can be noisy; use them as coarse navigation hints, not authoritative classifications.

List overlap

Topic label overlap between lists

Jaccard index on the set of OpenAlex topic labels appearing in the top tags of each paper in the list. High overlap means similar topic mix, not similar intellectual content.

Ranked vs citation baseline
0.5625
Ranked vs date baseline
0.1250
Citation vs date baseline
0.1818

Ranked (family)

List size 12

Focus paper: https://openalex.org/W4415315857 is not visible in this arm.

Materialized ranking run: order by final_score descending, then work_id (stable tie-break). Blend and signals follow this run's persisted family_weights and paper_scores (semantic may be used for Emerging when configured).

Order: final_score DESC, work_id ASC

Mean year

2024.8

Median cites

5.0

Unique topics

15

Proxy stats (list-only; not relevance)

Recency
mean year 2024.75; min-max 2024-2025; share in latest two years 100.0%
Citations
mean 6.67; median 5.00; range 1-25
Topic mix
15 unique labels in list; top: Music Technology and Sound Studies, Music and Audio Processing, Diverse Musicological Studies, Color Science and Applications, Extremum Seeking Control Systems

Citation baseline

List size 12

Focus paper: https://openalex.org/W4415315857 is not visible in this arm.

Popularity-style baseline on the same pool: highest citations first (not a relevance judgment).

Order: citation_count DESC, year DESC, openalex_id ASC

Mean year

2024.7

Median cites

6.0

Unique topics

10

Proxy stats (list-only; not relevance)

Recency
mean year 2024.67; min-max 2024-2025; share in latest two years 100.0%
Citations
mean 8.50; median 6.00; range 4-25
Topic mix
10 unique labels in list; top: Music Technology and Sound Studies, Music and Audio Processing, Hearing Loss and Rehabilitation, Tactile and Sensory Interactions, Image and Video Quality Assessment

Date baseline

List size 12

Focus paper: https://openalex.org/W4415315857 appears in this arm.

Pure recency baseline on the same pool: newest year first (not a relevance judgment).

Order: year DESC, openalex_id ASC

Mean year

2026.0

Median cites

0.0

Unique topics

3

Proxy stats (list-only; not relevance)

Recency
mean year 2026.00; min-max 2026-2026; share in latest two years 100.0%
Citations
mean 0.08; median 0.00; range 0-1
Topic mix
3 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Image Processing and 3D Reconstruction

Generated at 2026-06-26T21:26:50.576914Z. This page shows citation and date baselines plus distributional checks on short lists. For roadmap-style framing, see /api/v1/evaluation/summary.