Paper dossier

The GigaMIDI Dataset with Features for Expressive Music Performance Detection

Detail viewSimilarity handoff

Review source metadata, abstract, authors, topics, and local similarity context before moving into explanation and ranking views.

Paper year

2025

Citations

Authors

Topic labels

Paper ID: W4407236737core corpustismir

Check emerging feed Check bridge preview Inspect topic momentum

Source readout

Source and corpus status

Venue

Transactions of the International Society for Music Information Retrieval

Source slug

tismir

Corpus placement

Core corpus

Similarity rows

Ranking readout

Where this paper lands in the current run

Run shadow-generalization-product-candidate-ranking-v1Top 50 surfaced

This block uses the same resolved ranking run as Recommended. Ranks here are materialized paper_scores ranks; live Emerging may be reordered by the bounded ML scorer. Family rank is global within each family, but rank is only shown when this paper lands inside the surfaced top 50.

Families present

Top 50

Run label

shadow-generalization-product-candidate-ranking-v1

Snapshot

source-snapshot-shadow-generalization-v1-20260521

Scope: family global | run rank-83787b91ef

Emerging

In top 50 at rank 18

0.475

Emerging: embedding slice fit vs included-corpus centroid (title+abstract), plus citation velocity and topic growth; not universal relevance. Bridge signal not used here.

Signals: semantic=0.8500, citation_velocity=0.1200, topic_growth=0.8160, diversity_penalty=0.0000

Why this surfaced | 3 used | 1 penalty | 1 not computed

Embedding slice fit (corpus centroid)used

Embedding slice fit (corpus centroid): high; used in final ranking (contribution to score: 0.1700)

Recent attentionused

Recent attention: low; used in final ranking (contribution to score: 0.0600)

Topic momentumused

Topic momentum: high; used in final ranking (contribution to score: 0.2448)

Cross-cluster signalnot computed

Cross-cluster signal: not computed for this run

Similarity penaltypenalty

Similarity penalty: reduces score when non-zero (contribution to score: 0.0000)

Open Emerging feed Compare baseline

Bridge

In top 50 at rank 17

0.572

Multi-topic paper in active topics; no cluster_version on this run so bridge_score was not computed.

Signals: citation_velocity=0.1200, topic_growth=0.8160, diversity_penalty=0.0000

Why this surfaced | 2 used | 1 penalty | 2 not computed

Semantic matchnot computed

Semantic match: not computed for this run

Recent attentionused

Recent attention: low; used in final ranking (contribution to score: 0.0420)

Topic momentumused

Topic momentum: high; used in final ranking (contribution to score: 0.5304)

Cross-cluster signalnot computed

Cross-cluster signal: not computed for this run

Topic breadth penaltypenalty

Topic breadth penalty: reduces score when non-zero (contribution to score: 0.0000)

Open Bridge feed Compare baseline

Under-cited

In top 50 at rank 47

0.497

Low-cite candidate pool (see docs/candidate-pool-low-cite.md v0): core corpus, recency floor, citation ceiling, title+abstract gate; popularity penalty among pool members only. Semantic and bridge not yet modeled.

Signals: citation_velocity=0.1200, topic_growth=0.8160, diversity_penalty=0.4421

Why this surfaced | 2 used | 1 penalty | 2 not computed

Semantic matchnot computed

Semantic match: not computed for this run

Recent attentionused

Recent attention: low; used in final ranking (contribution to score: 0.0360)

Topic momentumused

Topic momentum: high; used in final ranking (contribution to score: 0.5712)

Cross-cluster signalnot computed

Cross-cluster signal: not computed for this run

Pool popularity penaltypenalty

Pool popularity penalty: reduces score when non-zero (contribution to score: -0.1105)

Open Under-cited feed Compare baseline

Abstract

The Musical Instrument Digital Interface (MIDI), introduced in 1983, revolutionized music production by allowing computers and instruments to communicate efficiently. MIDI files encode musical instructions compactly, facilitating convenient music sharing. They benefit music information retrieval (MIR), aiding in research on music understanding, computational musicology, and generative music. The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. GigaMIDI is currently the largest collection of symbolic music in MIDI format available for research purposes under fair dealing. Distinguishing between non‑expressive and expressive MIDI tracks is challenging, as MIDI files do not inherently make this distinction. To address this issue, we introduce a set of innovative heuristics for detecting expressive music performance. These include the distinctive note velocity ratio (DNVR) heuristic, which analyzes MIDI note velocity; the distinctive note onset deviation ratio (DNODR) heuristic, which examines deviations in note onset times; and the note onset median metric level (NOMML) heuristic, which evaluates onset positions relative to metric levels. Our evaluation demonstrates these heuristics effectively differentiate between non‑expressive and expressive MIDI tracks. Furthermore, after evaluation, we create the most substantial expressive MIDI dataset, employing our heuristic NOMML. This curated iteration of GigaMIDI encompasses expressively performed instrument tracks detected by NOMML, containing all General MIDI instruments, constituting 31% of the GigaMIDI dataset, totaling 1,655,649 tracks.

Authors

Keon Ju Maverick Lee
Jeff Ens
Sara Adkins
Pedro Sarmento
Mathieu Barthet
Philippe Pasquier

Neighborhood labels

Topics

3 labels

Topic labels are imported metadata and can be noisy; use them as coarse navigation hints, not authoritative classifications.

Music and Audio ProcessingMusic Technology and Sound StudiesNeuroscience and Music Perception

Neighbor surface

Best next moves from here

Check recommendation families

Use Recommended to see whether this paper behaves like an emerging or undercited signal in the current ranked feed, or how it appears on the bridge preview / diagnostics view.

Emerging Bridge preview Under-cited

Inspect nearby topics

Use Trends to understand whether its attached labels are heating up or cooling down inside the curated corpus.

Open trends

Cross-check evaluation baselines

Use Evaluation to compare the dossier readout against citation and recency baselines for the same resolved family run.

Open evaluation

The GigaMIDI Dataset with Features for Expressive Music Performance Detection

Source and corpus status

Venue

Source slug

Corpus placement

Similarity rows

Where this paper lands in the current run

Abstract

Authors

Topics

Similar papers

Best next moves from here

Check recommendation families

Inspect nearby topics

Cross-check evaluation baselines