Evaluation

Evaluation v0: ranked feed vs simple baselines

Bridge familyDistributional checks only

This page calls GET /api/v1/evaluation/compare so you can inspect the same candidate pool under three orderings: materialized ranking, citation-sorted, and date-sorted. Nothing here measures whether researchers would find the papers useful; it only shows distributional checks on short lists.

Pool size

528

Run label

shadow-generalization-product-candidate-ranking-v1

Embedding

shadow-generalization-text-embedding-v1

Generated at

2026-06-26

Run label filter: shadow-generalization-product-candidate-ranking-v1

Focus paper: https://openalex.org/W4405899471 | compare limit 12 Visible in: date baseline.

Interpretation guardrails

Interpretation notes

These outputs help compare ranking behavior and expose drift; they are not expert-reviewed evidence that papers are useful to researchers.

  • Side-by-side lists share the same candidate pool for the selected recommendation family and corpus snapshot.
  • Recency, citation, and topic summaries are coarse proxies over the short lists shown; they do not measure whether a researcher would find a paper useful.
  • Topic overlap uses Jaccard similarity on topic labels attached to papers in this corpus, not semantic similarity of full text.
  • Use ranked outputs for product behavior; use this endpoint to sanity-check drift against naive orderings.

Run context

Run and pool

Run shadow-generalization-product-candidate-ranking-v1 | rank-83787b91ef | snapshot source-snapshot-shadow-generalization-v1-20260521 | embedding shadow-generalization-text-embedding-v1 | pool size 528

All included works in the corpus snapshot (same candidate set as the ranking run's emerging/bridge families).

Topic labels are imported metadata and can be noisy; use them as coarse navigation hints, not authoritative classifications.

Bridge family

Bridge distinctness diagnostics

  • Diagnostic only.
  • Not a researcher-reviewed usefulness benchmark.
  • Used to inspect whether this bridge review arm is worth further evaluation.
  • Use these diagnostics to decide whether the bridge experiment deserves further evaluation.

Pinned to the same ranking run as the compare response above. Overlap and coverage are structural checks on short heads, not measures of paper usefulness.

Head overlap (Jaccard)

Full bridge vs eligible-only bridge
overlap 0; Jaccard 0.0000
Full bridge vs emerging
overlap 7; Jaccard 0.4118
Eligible-only bridge vs emerging
overlap 0; Jaccard 0.0000

Eligibility (bridge rows)

True / false / null
0 / 0 / 528

Score and signal coverage

Bridge score (non-null / null)
0 / 528
Bridge signal payload (present / missing)
0 / 528
Bridge family row count
528

Decision support (heuristic)

Suggested next step
insufficient bridge signal coverage
Eligible head differs from full bridge
Yes
Eligible head is less emerging-like than full bridge
Yes

Provenance

Active ranking run
rank-83787b91ef
Active ranking version
shadow-generalization-product-candidate-ranking-v1
Corpus snapshot
source-snapshot-shadow-generalization-v1-20260521
Embedding version
shadow-generalization-text-embedding-v1
Cluster version
not recorded
Head size
12
Generated at
2026-06-26T21:31:38.242985Z

List overlap

Topic label overlap between lists

Jaccard index on the set of OpenAlex topic labels appearing in the top tags of each paper in the list. High overlap means similar topic mix, not similar intellectual content.

Ranked vs citation baseline
0.2174
Ranked vs date baseline
0.1053
Citation vs date baseline
0.1818

Ranked (family)

List size 12

Focus paper: https://openalex.org/W4405899471 is not visible in this arm.

Materialized ranking run: order by final_score descending, then work_id (stable tie-break). Blend and signals follow this run's persisted family_weights and paper_scores (semantic may be used for Emerging when configured).

Order: final_score DESC, work_id ASC

Mean year

2025.0

Median cites

2.5

Unique topics

18

Proxy stats (list-only; not relevance)

Recency
mean year 2025.00; min-max 2024-2026; share in latest two years 91.7%
Citations
mean 2.75; median 2.50; range 0-6
Topic mix
18 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Diverse Musicological Studies, Spacecraft and Cryogenic Technologies, Fluid Dynamics Simulations and Interactions

Citation baseline

List size 12

Focus paper: https://openalex.org/W4405899471 is not visible in this arm.

Popularity-style baseline on the same pool: highest citations first (not a relevance judgment).

Order: citation_count DESC, year DESC, openalex_id ASC

Mean year

2024.7

Median cites

6.0

Unique topics

10

Proxy stats (list-only; not relevance)

Recency
mean year 2024.67; min-max 2024-2025; share in latest two years 100.0%
Citations
mean 8.50; median 6.00; range 4-25
Topic mix
10 unique labels in list; top: Music Technology and Sound Studies, Music and Audio Processing, Hearing Loss and Rehabilitation, Tactile and Sensory Interactions, Image and Video Quality Assessment

Date baseline

List size 12

Focus paper: https://openalex.org/W4405899471 appears in this arm.

Pure recency baseline on the same pool: newest year first (not a relevance judgment).

Order: year DESC, openalex_id ASC

Mean year

2026.0

Median cites

0.0

Unique topics

3

Proxy stats (list-only; not relevance)

Recency
mean year 2026.00; min-max 2026-2026; share in latest two years 100.0%
Citations
mean 0.08; median 0.00; range 0-1
Topic mix
3 unique labels in list; top: Music and Audio Processing, Music Technology and Sound Studies, Image Processing and 3D Reconstruction

Generated at 2026-06-26T21:31:38.162408Z. This page shows citation and date baselines plus distributional checks on short lists. For roadmap-style framing, see /api/v1/evaluation/summary.