mir_eval Documentation

mir_eval is a Python library which provides a transparent, standaridized, and straightforward way to evaluate Music Information Retrieval systems.

If you use mir_eval in a research project, please cite the following paper:

  1. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. W. Ellis, “mir_eval: A Transparent Implementation of Common MIR Metrics”, Proceedings of the 15th International Conference on Music Information Retrieval, 2014.

Installing mir_eval

The simplest way to install mir_eval is by using pip, which will also install the required dependencies if needed. To install mir_eval using pip, simply run

pip install mir_eval

Alternatively, you can install mir_eval from source by first installing the dependencies and then running

python setup.py install

from the source directory.

If you don’t use Python and want to get started as quickly as possible, you might consider using Anaconda which makes it easy to install a Python environment which can run mir_eval.

Using mir_eval

Once you’ve installed mir_eval (see Installing mir_eval), you can import it in your Python code as follows:

import mir_eval

From here, you will typically either load in data and call the evaluate() function from the appropriate submodule like so:

reference_beats = mir_eval.io.load_events('reference_beats.txt')
estimated_beats = mir_eval.io.load_events('estimated_beats.txt')
# Scores will be a dict containing scores for all of the metrics
# implemented in mir_eval.beat.  The keys are metric names
# and values are the scores achieved
scores = mir_eval.beat.evaluate(reference_beats, estimated_beats)

or you’ll load in the data, do some preprocessing, and call specific metric functions from the appropriate submodule like so:

reference_beats = mir_eval.io.load_events('reference_beats.txt')
estimated_beats = mir_eval.io.load_events('estimated_beats.txt')
# Crop out beats before 5s, a common preprocessing step
reference_beats = mir_eval.beat.trim_beats(reference_beats)
estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
# Compute the F-measure metric and store it in f_measure
f_measure = mir_eval.beat.f_measure(reference_beats, estimated_beats)

The documentation for each metric function, found in the mir_eval section below, contains further usage information.

Alternatively, you can use the evaluator scripts which allow you to run evaluation from the command line, without writing any code. These scripts are are available here:

https://github.com/craffel/mir_evaluators

mir_eval

The structure of the mir_eval Python module is as follows: Each MIR task for which evaluation metrics are included in mir_eval is given its own submodule, and each metric is defined as a separate function in each submodule. Every metric function includes detailed documentation, example usage, input validation, and references to the original paper which defined the metric (see the subsections below). The task submodules also all contain a function evaluate(), which takes as input reference and estimated annotations and returns a dictionary of scores for all of the metrics implemented (for casual users, this is the place to start). Finally, each task submodule also includes functions for common data pre-processing steps.

mir_eval also includes the following additional submodules:

  • mir_eval.io which contains convenience functions for loading in task-specific data from common file formats
  • mir_eval.util which includes miscellaneous functionality shared across the submodules
  • mir_eval.sonify which implements some simple methods for synthesizing annotations of various formats for “evaluation by ear”.
  • mir_eval.display which provides functions for plotting annotations for various tasks.

The following subsections document each submodule.

mir_eval.beat

The aim of a beat detection algorithm is to report the times at which a typical human listener might tap their foot to a piece of music. As a result, most metrics for evaluating the performance of beat tracking systems involve computing the error between the estimated beat times and some reference list of beat locations. Many metrics additionally compare the beat sequences at different metric levels in order to deal with the ambiguity of tempo.

Based on the methods described in:
Matthew E. P. Davies, Norberto Degara, and Mark D. Plumbley. “Evaluation Methods for Musical Audio Beat Tracking Algorithms”, Queen Mary University of London Technical Report C4DM-TR-09-06 London, United Kingdom, 8 October 2009.
See also the Beat Evaluation Toolbox:
https://code.soundsoftware.ac.uk/projects/beat-evaluation/

Conventions

Beat times should be provided in the form of a 1-dimensional array of beat times in seconds in increasing order. Typically, any beats which occur before 5s are ignored; this can be accomplished using mir_eval.beat.trim_beats().

Metrics

  • mir_eval.beat.f_measure(): The F-measure of the beat sequence, where an estimated beat is considered correct if it is sufficiently close to a reference beat
  • mir_eval.beat.cemgil(): Cemgil’s score, which computes the sum of Gaussian errors for each beat
  • mir_eval.beat.goto(): Goto’s score, a binary score which is 1 when at least 25% of the estimated beat sequence closely matches the reference beat sequence
  • mir_eval.beat.p_score(): McKinney’s P-score, which computes the cross-correlation of the estimated and reference beat sequences represented as impulse trains
  • mir_eval.beat.continuity(): Continuity-based scores which compute the proportion of the beat sequence which is continuously correct
  • mir_eval.beat.information_gain(): The Information Gain of a normalized beat error histogram over a uniform distribution
mir_eval.beat.trim_beats(beats, min_beat_time=5.0)

Removes beats before min_beat_time. A common preprocessing step.

Parameters:

beats : np.ndarray

Array of beat times in seconds.

min_beat_time : float

Minimum beat time to allow (Default value = 5.)

Returns:

beats_trimmed : np.ndarray

Trimmed beat array.

mir_eval.beat.validate(reference_beats, estimated_beats)

Checks that the input annotations to a metric look like valid beat time arrays, and throws helpful errors if not.

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

estimated beat times, in seconds

mir_eval.beat.f_measure(reference_beats, estimated_beats, f_measure_threshold=0.07)

Compute the F-measure of correct vs incorrectly predicted beats. “Correctness” is determined over a small window.

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

estimated beat times, in seconds

f_measure_threshold : float

Window size, in seconds (Default value = 0.07)

Returns:

f_score : float

The computed F-measure score

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> f_measure = mir_eval.beat.f_measure(reference_beats,
                                        estimated_beats)
mir_eval.beat.cemgil(reference_beats, estimated_beats, cemgil_sigma=0.04)

Cemgil’s score, computes a gaussian error of each estimated beat. Compares against the original beat times and all metrical variations.

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

query beat times, in seconds

cemgil_sigma : float

Sigma parameter of gaussian error windows (Default value = 0.04)

Returns:

cemgil_score : float

Cemgil’s score for the original reference beats

cemgil_max : float

The best Cemgil score for all metrical variations

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> cemgil_score, cemgil_max = mir_eval.beat.cemgil(reference_beats,
                                                    estimated_beats)
mir_eval.beat.goto(reference_beats, estimated_beats, goto_threshold=0.35, goto_mu=0.2, goto_sigma=0.2)

Calculate Goto’s score, a binary 1 or 0 depending on some specific heuristic criteria

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

query beat times, in seconds

goto_threshold : float

Threshold of beat error for a beat to be “correct” (Default value = 0.35)

goto_mu : float

The mean of the beat errors in the continuously correct track must be less than this (Default value = 0.2)

goto_sigma : float

The std of the beat errors in the continuously correct track must be less than this (Default value = 0.2)

Returns:

goto_score : float

Either 1.0 or 0.0 if some specific criteria are met

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> goto_score = mir_eval.beat.goto(reference_beats, estimated_beats)
mir_eval.beat.p_score(reference_beats, estimated_beats, p_score_threshold=0.2)

Get McKinney’s P-score. Based on the autocorrelation of the reference and estimated beats

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

query beat times, in seconds

p_score_threshold : float

Window size will be p_score_threshold*np.median(inter_annotation_intervals), (Default value = 0.2)

Returns:

correlation : float

McKinney’s P-score

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> p_score = mir_eval.beat.p_score(reference_beats, estimated_beats)
mir_eval.beat.continuity(reference_beats, estimated_beats, continuity_phase_threshold=0.175, continuity_period_threshold=0.175)

Get metrics based on how much of the estimated beat sequence is continually correct.

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

query beat times, in seconds

continuity_phase_threshold : float

Allowable ratio of how far is the estimated beat can be from the reference beat (Default value = 0.175)

continuity_period_threshold : float

Allowable distance between the inter-beat-interval and the inter-annotation-interval (Default value = 0.175)

Returns:

CMLc : float

Correct metric level, continuous accuracy

CMLt : float

Correct metric level, total accuracy (continuity not required)

AMLc : float

Any metric level, continuous accuracy

AMLt : float

Any metric level, total accuracy (continuity not required)

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> CMLc, CMLt, AMLc, AMLt = mir_eval.beat.continuity(reference_beats,
                                                      estimated_beats)
mir_eval.beat.information_gain(reference_beats, estimated_beats, bins=41)

Get the information gain - K-L divergence of the beat error histogram to a uniform histogram

Parameters:

reference_beats : np.ndarray

reference beat times, in seconds

estimated_beats : np.ndarray

query beat times, in seconds

bins : int

Number of bins in the beat error histogram (Default value = 41)

Returns:

information_gain_score : float

Entropy of beat error histogram

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> reference_beats = mir_eval.beat.trim_beats(reference_beats)
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> estimated_beats = mir_eval.beat.trim_beats(estimated_beats)
>>> information_gain = mir_eval.beat.information_gain(reference_beats,
                                                      estimated_beats)
mir_eval.beat.evaluate(reference_beats, estimated_beats, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

reference_beats : np.ndarray

Reference beat times, in seconds

estimated_beats : np.ndarray

Query beat times, in seconds

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> reference_beats = mir_eval.io.load_events('reference.txt')
>>> estimated_beats = mir_eval.io.load_events('estimated.txt')
>>> scores = mir_eval.beat.evaluate(reference_beats, estimated_beats)

mir_eval.chord

Chord estimation algorithms produce a list of intervals and labels which denote the chord being played over each timespan. They are evaluated by comparing the estimated chord labels to some reference, usually using a mapping to a chord subalphabet (e.g. minor and major chords only, all triads, etc.). There is no single ‘right’ way to compare two sequences of chord labels. Embracing this reality, every conventional comparison rule is provided. Comparisons are made over the different components of each chord (e.g. G:maj(6)/5): the root (G), the root-invariant active semitones as determined by the quality shorthand (maj) and scale degrees (6), and the bass interval (5). This submodule provides functions both for comparing a sequences of chord labels according to some chord subalphabet mapping and for using these comparisons to score a sequence of estimated chords against a reference.

Conventions

A sequence of chord labels is represented as a list of strings, where each label is the chord name based on the syntax of [1]. Reference and estimated chord label sequences should be of the same length for comparison functions. When converting the chord string into its constituent parts,

  • Pitch class counting starts at C, e.g. C:0, D:2, E:4, F:5, etc.
  • Scale degree is represented as a string of the diatonic interval, relative to the root note, e.g. ‘b6’, ‘#5’, or ‘7’
  • Bass intervals are represented as strings
  • Chord bitmaps are positional binary vectors indicating active pitch classes and may be absolute or relative depending on context in the code.

If no chord is present at a given point in time, it should have the label ‘N’, which is defined in the variable mir_eval.chord.NO_CHORD.

Metrics

  • mir_eval.chord.root(): Only compares the root of the chords.
  • mir_eval.chord.majmin(): Only compares major, minor, and “no chord” labels.
  • mir_eval.chord.majmin_inv(): Compares major/minor chords, with inversions. The bass note must exist in the triad.
  • mir_eval.chord.mirex(): A estimated chord is considered correct if it shares at least three pitch classes in common.
  • mir_eval.chord.thirds(): Chords are compared at the level of major or minor thirds (root and third), For example, both (‘A:7’, ‘A:maj’) and (‘A:min’, ‘A:dim’) are equivalent, as the third is major and minor in quality, respectively.
  • mir_eval.chord.thirds_inv(): Same as above, with inversions (bass relationships).
  • mir_eval.chord.triads(): Chords are considered at the level of triads (major, minor, augmented, diminished, suspended), meaning that, in addition to the root, the quality is only considered through #5th scale degree (for augmented chords). For example, (‘A:7’, ‘A:maj’) are equivalent, while (‘A:min’, ‘A:dim’) and (‘A:aug’, ‘A:maj’) are not.
  • mir_eval.chord.triads_inv(): Same as above, with inversions (bass relationships).
  • mir_eval.chord.tetrads(): Chords are considered at the level of the entire quality in closed voicing, i.e. spanning only a single octave; extended chords (9’s, 11’s and 13’s) are rolled into a single octave with any upper voices included as extensions. For example, (‘A:7’, ‘A:9’) are equivlent but (‘A:7’, ‘A:maj7’) are not.
  • mir_eval.chord.tetrads_inv(): Same as above, with inversions (bass relationships).
  • mir_eval.chord.sevenths(): Compares according to MIREX “sevenths” rules; that is, only major, major seventh, seventh, minor, minor seventh and no chord labels are compared.
  • mir_eval.chord.sevenths_inv(): Same as above, with inversions (bass relationships).
  • mir_eval.chord.overseg(): Computes the level of over-segmentation between estimated and reference intervals.
  • mir_eval.chord.underseg(): Computes the level of under-segmentation between estimated and reference intervals.
  • mir_eval.chord.seg(): Computes the minimum of over- and under-segmentation between estimated and reference intervals.

References

[1](1, 2) C. Harte. Towards Automatic Extraction of Harmony Information from Music Signals. PhD thesis, Queen Mary University of London, August 2010.
exception mir_eval.chord.InvalidChordException(message='', chord_label=None)

Bases: exceptions.Exception

Exception class for suspect / invalid chord labels

mir_eval.chord.pitch_class_to_semitone(pitch_class)

Convert a pitch class to semitone.

Parameters:

pitch_class : str

Spelling of a given pitch class, e.g. ‘C#’, ‘Gbb’

Returns:

semitone : int

Semitone value of the pitch class.

mir_eval.chord.scale_degree_to_semitone(scale_degree)

Convert a scale degree to semitone.

Parameters:

scale degree : str

Spelling of a relative scale degree, e.g. ‘b3’, ‘7’, ‘#5’

Returns:

semitone : int

Relative semitone of the scale degree, wrapped to a single octave

Raises:

InvalidChordException if `scale_degree` is invalid.

mir_eval.chord.scale_degree_to_bitmap(scale_degree, modulo=False, length=12)

Create a bitmap representation of a scale degree.

Note that values in the bitmap may be negative, indicating that the semitone is to be removed.

Parameters:

scale_degree : str

Spelling of a relative scale degree, e.g. ‘b3’, ‘7’, ‘#5’

modulo : bool, default=True

If a scale degree exceeds the length of the bit-vector, modulo the scale degree back into the bit-vector; otherwise it is discarded.

length : int, default=12

Length of the bit-vector to produce

Returns:

bitmap : np.ndarray, in [-1, 0, 1], len=`length`

Bitmap representation of this scale degree.

mir_eval.chord.quality_to_bitmap(quality)

Return the bitmap for a given quality.

Parameters:

quality : str

Chord quality name.

Returns:

bitmap : np.ndarray

Bitmap representation of this quality (12-dim).

mir_eval.chord.reduce_extended_quality(quality)

Map an extended chord quality to a simpler one, moving upper voices to a set of scale degree extensions.

Parameters:

quality : str

Extended chord quality to reduce.

Returns:

base_quality : str

New chord quality.

extensions : set

Scale degrees extensions for the quality.

mir_eval.chord.validate_chord_label(chord_label)

Test for well-formedness of a chord label.

Parameters:

chord : str

Chord label to validate.

mir_eval.chord.split(chord_label, reduce_extended_chords=False)
Parse a chord label into its four constituent parts:
  • root
  • quality shorthand
  • scale degrees
  • bass
Note: Chords lacking quality AND interval information are major.
  • If a quality is specified, it is returned.
  • If an interval is specified WITHOUT a quality, the quality field is empty.

Some examples:

'C' -> ['C', 'maj', {}, '1']
'G#:min(*b3,*5)/5' -> ['G#', 'min', {'*b3', '*5'}, '5']
'A:(3)/6' -> ['A', '', {'3'}, '6']
Parameters:

chord_label : str

A chord label.

reduce_extended_chords : bool

Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)

Returns:

chord_parts : list

Split version of the chord label.

mir_eval.chord.join(chord_root, quality='', extensions=None, bass='')

Join the parts of a chord into a complete chord label.

Parameters:

chord_root : str

Root pitch class of the chord, e.g. ‘C’, ‘Eb’

quality : str

Quality of the chord, e.g. ‘maj’, ‘hdim7’ (Default value = ‘’)

extensions : list

Any added or absent scaled degrees for this chord, e.g. [‘4’, ‘*3’] (Default value = None)

bass : str

Scale degree of the bass note, e.g. ‘5’. (Default value = ‘’)

Returns:

chord_label : str

A complete chord label.

mir_eval.chord.encode(chord_label, reduce_extended_chords=False, strict_bass_intervals=False)

Translate a chord label to numerical representations for evaluation.

Parameters:

chord_label : str

Chord label to encode.

reduce_extended_chords : bool

Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)

strict_bass_intervals : bool

Whether to require that the bass scale degree is present in the chord. (Default value = False)

Returns:

root_number : int

Absolute semitone of the chord’s root.

semitone_bitmap : np.ndarray, dtype=int

12-dim vector of relative semitones in the chord spelling.

bass_number : int

Relative semitone of the chord’s bass note, e.g. 0=root, 7=fifth, etc.

mir_eval.chord.encode_many(chord_labels, reduce_extended_chords=False)

Translate a set of chord labels to numerical representations for sane evaluation.

Parameters:

chord_labels : list

Set of chord labels to encode.

reduce_extended_chords : bool

Whether to map the upper voicings of extended chords (9’s, 11’s, 13’s) to semitone extensions. (Default value = False)

Returns:

root_number : np.ndarray, dtype=int

Absolute semitone of the chord’s root.

interval_bitmap : np.ndarray, dtype=int

12-dim vector of relative semitones in the given chord quality.

bass_number : np.ndarray, dtype=int

Relative semitones of the chord’s bass notes.

mir_eval.chord.rotate_bitmap_to_root(bitmap, chord_root)

Circularly shift a relative bitmap to its asbolute pitch classes.

For clarity, the best explanation is an example. Given ‘G:Maj’, the root and quality map are as follows:

root=5
quality=[1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0]  # Relative chord shape

After rotating to the root, the resulting bitmap becomes:

abs_quality = [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1]  # G, B, and D
Parameters:

bitmap : np.ndarray, shape=(12,)

Bitmap of active notes, relative to the given root.

chord_root : int

Absolute pitch class number.

Returns:

bitmap : np.ndarray, shape=(12,)

Absolute bitmap of active pitch classes.

mir_eval.chord.rotate_bitmaps_to_roots(bitmaps, roots)

Circularly shift a relative bitmaps to asbolute pitch classes.

See rotate_bitmap_to_root() for more information.

Parameters:

bitmap : np.ndarray, shape=(N, 12)

Bitmap of active notes, relative to the given root.

root : np.ndarray, shape=(N,)

Absolute pitch class number.

Returns:

bitmap : np.ndarray, shape=(N, 12)

Absolute bitmaps of active pitch classes.

mir_eval.chord.validate(reference_labels, estimated_labels)

Checks that the input annotations to a comparison function look like valid chord labels.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

mir_eval.chord.weighted_accuracy(comparisons, weights)

Compute the weighted accuracy of a list of chord comparisons.

Parameters:

comparisons : np.ndarray

List of chord comparison scores, in [0, 1] or -1

weights : np.ndarray

Weights (not necessarily normalized) for each comparison. This can be a list of interval durations

Returns:

score : float

Weighted accuracy

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> # Here, we're using the "thirds" function to compare labels
>>> # but any of the comparison functions would work.
>>> comparisons = mir_eval.chord.thirds(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.thirds(reference_labels, estimated_labels)

Compare chords along root & third relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.thirds(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.thirds_inv(reference_labels, estimated_labels)

Score chords along root, third, & bass relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.thirds_inv(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.triads(reference_labels, estimated_labels)

Compare chords along triad (root & quality to #5) relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.triads(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.triads_inv(reference_labels, estimated_labels)

Score chords along triad (root, quality to #5, & bass) relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.triads_inv(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.tetrads(reference_labels, estimated_labels)

Compare chords along tetrad (root & full quality) relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.tetrads(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.tetrads_inv(reference_labels, estimated_labels)

Compare chords along tetrad (root, full quality, & bass) relationships.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.tetrads_inv(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.root(reference_labels, estimated_labels)

Compare chords according to roots.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.root(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.mirex(reference_labels, estimated_labels)

Compare chords along MIREX rules.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0]

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.mirex(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.majmin(reference_labels, estimated_labels)

Compare chords along major-minor rules. Chords with qualities outside Major/minor/no-chord are ignored.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.majmin(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.majmin_inv(reference_labels, estimated_labels)

Compare chords along major-minor rules, with inversions. Chords with qualities outside Major/minor/no-chord are ignored, and the bass note must exist in the triad (bass in [1, 3, 5]).

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.majmin_inv(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.sevenths(reference_labels, estimated_labels)

Compare chords along MIREX ‘sevenths’ rules. Chords with qualities outside [maj, maj7, 7, min, min7, N] are ignored.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.sevenths(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.sevenths_inv(reference_labels, estimated_labels)

Compare chords along MIREX ‘sevenths’ rules. Chords with qualities outside [maj, maj7, 7, min, min7, N] are ignored.

Parameters:

reference_labels : list, len=n

Reference chord labels to score against.

estimated_labels : list, len=n

Estimated chord labels to score against.

Returns:

comparison_scores : np.ndarray, shape=(n,), dtype=float

Comparison scores, in [0.0, 1.0], or -1 if the comparison is out of gamut.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> est_intervals, est_labels = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, ref_intervals.min(),
...     ref_intervals.max(), mir_eval.chord.NO_CHORD,
...     mir_eval.chord.NO_CHORD)
>>> (intervals,
...  ref_labels,
...  est_labels) = mir_eval.util.merge_labeled_intervals(
...      ref_intervals, ref_labels, est_intervals, est_labels)
>>> durations = mir_eval.util.intervals_to_durations(intervals)
>>> comparisons = mir_eval.chord.sevenths_inv(ref_labels, est_labels)
>>> score = mir_eval.chord.weighted_accuracy(comparisons, durations)
mir_eval.chord.directional_hamming_distance(reference_intervals, estimated_intervals)

Compute the directional hamming distance between reference and estimated intervals as defined by [1] and used for MIREX ‘OverSeg’, ‘UnderSeg’ and ‘MeanSeg’ measures.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2), dtype=float

Reference chord intervals to score against.

estimated_intervals : np.ndarray, shape=(m, 2), dtype=float

Estimated chord intervals to score against.

Returns:

directional hamming distance : float

directional hamming distance between reference intervals and estimated intervals.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> overseg = 1 - mir_eval.chord.directional_hamming_distance(
...     ref_intervals, est_intervals)
>>> underseg = 1 - mir_eval.chord.directional_hamming_distance(
...     est_intervals, ref_intervals)
>>> seg = min(overseg, underseg)
mir_eval.chord.overseg(reference_intervals, estimated_intervals)

Compute the MIREX ‘OverSeg’ score.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2), dtype=float

Reference chord intervals to score against.

estimated_intervals : np.ndarray, shape=(m, 2), dtype=float

Estimated chord intervals to score against.

Returns:

oversegmentation score : float

Comparison score, in [0.0, 1.0], where 1.0 means no oversegmentation.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> score = mir_eval.chord.overseg(ref_intervals, est_intervals)
mir_eval.chord.underseg(reference_intervals, estimated_intervals)

Compute the MIREX ‘UnderSeg’ score.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2), dtype=float

Reference chord intervals to score against.

estimated_intervals : np.ndarray, shape=(m, 2), dtype=float

Estimated chord intervals to score against.

Returns:

undersegmentation score : float

Comparison score, in [0.0, 1.0], where 1.0 means no undersegmentation.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> score = mir_eval.chord.underseg(ref_intervals, est_intervals)
mir_eval.chord.seg(reference_intervals, estimated_intervals)

Compute the MIREX ‘MeanSeg’ score.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2), dtype=float

Reference chord intervals to score against.

estimated_intervals : np.ndarray, shape=(m, 2), dtype=float

Estimated chord intervals to score against.

Returns:

segmentation score : float

Comparison score, in [0.0, 1.0], where 1.0 means perfect segmentation.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> score = mir_eval.chord.seg(ref_intervals, est_intervals)
mir_eval.chord.merge_chord_intervals(intervals, labels)

Merge consecutive chord intervals if they represent the same chord.

Parameters:

intervals : np.ndarray, shape=(n, 2), dtype=float

Chord intervals to be merged, in the format returned by mir_eval.io.load_labeled_intervals().

labels : list, shape=(n,)

Chord labels to be merged, in the format returned by mir_eval.io.load_labeled_intervals().

Returns:

merged_ivs : np.ndarray, shape=(k, 2), dtype=float

Merged chord intervals, k <= n

mir_eval.chord.evaluate(ref_intervals, ref_labels, est_intervals, est_labels, **kwargs)

Computes weighted accuracy for all comparison functions for the given reference and estimated annotations.

Parameters:

ref_intervals : np.ndarray, shape=(n, 2)

Reference chord intervals, in the format returned by mir_eval.io.load_labeled_intervals().

ref_labels : list, shape=(n,)

reference chord labels, in the format returned by mir_eval.io.load_labeled_intervals().

est_intervals : np.ndarray, shape=(m, 2)

estimated chord intervals, in the format returned by mir_eval.io.load_labeled_intervals().

est_labels : list, shape=(m,)

estimated chord labels, in the format returned by mir_eval.io.load_labeled_intervals().

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> scores = mir_eval.chord.evaluate(ref_intervals, ref_labels,
...                                  est_intervals, est_labels)

mir_eval.melody

Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. For evaluation, an estimated pitch series is evaluated against a reference based on whether the voicing (melody present or not) and the pitch is correct (within some tolerance).

For a detailed explanation of the measures please refer to:
J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, “Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges”, IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.

Conventions

Melody annotations are assumed to be given in the format of a 1d array of frequency values which are accompanied by a 1d array of times denoting when each frequency value occurs. In a reference melody time series, a frequency value of 0 denotes “unvoiced”. In a estimated melody time series, unvoiced frames can be indicated either by 0 Hz or by a negative Hz value - negative values represent the algorithm’s pitch estimate for frames it has determined as unvoiced, in case they are in fact voiced.

Metrics are computed using a sequence of reference and estimated pitches in cents and boolean voicing arrays, both of which are sampled to the same timebase. The function mir_eval.melody.to_cent_voicing() can be used to convert a sequence of estimated and reference times and frequency values in Hz to boolean voicing arrays and frequency arrays in the format required by the metric functions. By default, the convention is to resample the estimated melody time series to the reference melody time series’ timebase.

Metrics

  • mir_eval.melody.voicing_measures(): Voicing measures, including the recall rate (proportion of frames labeled as melody frames in the reference that are estimated as melody frames) and the false alarm rate (proportion of frames labeled as non-melody in the reference that are mistakenly estimated as melody frames)
  • mir_eval.melody.raw_pitch_accuracy(): Raw Pitch Accuracy, which computes the proportion of melody frames in the reference for which the frequency is considered correct (i.e. within half a semitone of the reference frequency)
  • mir_eval.melody.raw_chroma_accuracy(): Raw Chroma Accuracy, where the estimated and reference frequency sequences are mapped onto a single octave before computing the raw pitch accuracy
  • mir_eval.melody.overall_accuracy(): Overall Accuracy, which computes the proportion of all frames correctly estimated by the algorithm, including whether non-melody frames where labeled by the algorithm as non-melody
mir_eval.melody.validate_voicing(ref_voicing, est_voicing)

Checks that voicing inputs to a metric are in the correct format.

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

est_voicing : np.ndarray

Estimated boolean voicing array

mir_eval.melody.validate(ref_voicing, ref_cent, est_voicing, est_cent)

Checks that voicing and frequency arrays are well-formed. To be used in conjunction with mir_eval.melody.validate_voicing()

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

ref_cent : np.ndarray

Reference pitch sequence in cents

est_voicing : np.ndarray

Estimated boolean voicing array

est_cent : np.ndarray

Estimate pitch sequence in cents

mir_eval.melody.hz2cents(freq_hz, base_frequency=10.0)

Convert an array of frequency values in Hz to cents. 0 values are left in place.

Parameters:

freq_hz : np.ndarray

Array of frequencies in Hz.

base_frequency : float

Base frequency for conversion. (Default value = 10.0)

Returns:

cent : np.ndarray

Array of frequencies in cents, relative to base_frequency

mir_eval.melody.freq_to_voicing(frequencies)

Convert from an array of frequency values to frequency array + voice/unvoiced array

Parameters:

frequencies : np.ndarray

Array of frequencies. A frequency <= 0 indicates “unvoiced”.

Returns:

frequencies : np.ndarray

Array of frequencies, all >= 0.

voiced : np.ndarray

Boolean array, same length as frequencies, which indicates voiced or unvoiced

mir_eval.melody.constant_hop_timebase(hop, end_time)

Generates a time series from 0 to end_time with times spaced hop apart

Parameters:

hop : float

Spacing of samples in the time series

end_time : float

Time series will span [0, end_time]

Returns:

times : np.ndarray

Generated timebase

mir_eval.melody.resample_melody_series(times, frequencies, voicing, times_new, kind='linear')

Resamples frequency and voicing time series to a new timescale. Maintains any zero (“unvoiced”) values in frequencies.

If times and times_new are equivalent, no resampling will be performed.

Parameters:

times : np.ndarray

Times of each frequency value

frequencies : np.ndarray

Array of frequency values, >= 0

voicing : np.ndarray

Boolean array which indicates voiced or unvoiced

times_new : np.ndarray

Times to resample frequency and voicing sequences to

kind : str

kind parameter to pass to scipy.interpolate.interp1d. (Default value = ‘linear’)

Returns:

frequencies_resampled : np.ndarray

Frequency array resampled to new timebase

voicing_resampled : np.ndarray, dtype=bool

Boolean voicing array resampled to new timebase

mir_eval.melody.to_cent_voicing(ref_time, ref_freq, est_time, est_freq, base_frequency=10.0, hop=None, kind='linear')

Converts reference and estimated time/frequency (Hz) annotations to sampled frequency (cent)/voicing arrays.

A zero frequency indicates “unvoiced”.

A negative frequency indicates “Predicted as unvoiced, but if it’s voiced, this is the frequency estimate”.

Parameters:

ref_time : np.ndarray

Time of each reference frequency value

ref_freq : np.ndarray

Array of reference frequency values

est_time : np.ndarray

Time of each estimated frequency value

est_freq : np.ndarray

Array of estimated frequency values

base_frequency : float

Base frequency in Hz for conversion to cents (Default value = 10.)

hop : float

Hop size, in seconds, to resample, default None which means use ref_time

kind : str

kind parameter to pass to scipy.interpolate.interp1d. (Default value = ‘linear’)

Returns:

ref_voicing : np.ndarray, dtype=bool

Resampled reference boolean voicing array

ref_cent : np.ndarray

Resampled reference frequency (cent) array

est_voicing : np.ndarray, dtype=bool

Resampled estimated boolean voicing array

est_cent : np.ndarray

Resampled estimated frequency (cent) array

mir_eval.melody.voicing_measures(ref_voicing, est_voicing)

Compute the voicing recall and false alarm rates given two voicing indicator sequences, one as reference (truth) and the other as the estimate (prediction). The sequences must be of the same length.

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

est_voicing : np.ndarray

Estimated boolean voicing array

Returns:

vx_recall : float

Voicing recall rate, the fraction of voiced frames in ref indicated as voiced in est

vx_false_alarm : float

Voicing false alarm rate, the fraction of unvoiced frames in ref indicated as voiced in est

Examples

>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
>>> (ref_v, ref_c,
...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
...                                                  ref_freq,
...                                                  est_time,
...                                                  est_freq)
>>> recall, false_alarm = mir_eval.melody.voicing_measures(ref_v,
...                                                        est_v)
mir_eval.melody.raw_pitch_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)

Compute the raw pitch accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

ref_cent : np.ndarray

Reference pitch sequence in cents

est_voicing : np.ndarray

Estimated boolean voicing array

est_cent : np.ndarray

Estimate pitch sequence in cents

cent_tolerance : float

Maximum absolute deviation for a cent value to be considerd correct (Default value = 50)

Returns:

raw_pitch : float

Raw pitch accuracy, the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents).

Examples

>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
>>> (ref_v, ref_c,
...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
...                                                  ref_freq,
...                                                  est_time,
...                                                  est_freq)
>>> raw_pitch = mir_eval.melody.raw_pitch_accuracy(ref_v, ref_c,
...                                                est_v, est_c)
mir_eval.melody.raw_chroma_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)

Compute the raw chroma accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

ref_cent : np.ndarray

Reference pitch sequence in cents

est_voicing : np.ndarray

Estimated boolean voicing array

est_cent : np.ndarray

Estimate pitch sequence in cents

cent_tolerance : float

Maximum absolute deviation for a cent value to be considered correct (Default value = 50)

Returns:

raw_chroma : float

Raw chroma accuracy, the fraction of voiced frames in ref_cent for which est_cent provides a correct frequency values (within cent_tolerance cents), ignoring octave errors

References

[2]J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, “Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges”, IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.
[3]G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S. Streich, and B. Ong. “Melody transcription from music audio: Approaches and evaluation”, IEEE Transactions on Audio, Speech, and Language Processing, 15(4):1247-1256, 2007.

Examples

>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
>>> (ref_v, ref_c,
...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
...                                                  ref_freq,
...                                                  est_time,
...                                                  est_freq)
>>> raw_chroma = mir_eval.melody.raw_chroma_accuracy(ref_v, ref_c,
...                                                  est_v, est_c)
mir_eval.melody.overall_accuracy(ref_voicing, ref_cent, est_voicing, est_cent, cent_tolerance=50)

Compute the overall accuracy given two pitch (frequency) sequences in cents and matching voicing indicator sequences. The first pitch and voicing arrays are treated as the reference (truth), and the second two as the estimate (prediction). All 4 sequences must be of the same length.

Parameters:

ref_voicing : np.ndarray

Reference boolean voicing array

ref_cent : np.ndarray

Reference pitch sequence in cents

est_voicing : np.ndarray

Estimated boolean voicing array

est_cent : np.ndarray

Estimate pitch sequence in cents

cent_tolerance : float

Maximum absolute deviation for a cent value to be considered correct (Default value = 50)

Returns:

overall_accuracy : float

Overall accuracy, the total fraction of correctly estimates frames, where provides a correct frequency values (within cent_tolerance cents).

Examples

>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
>>> (ref_v, ref_c,
...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
...                                                  ref_freq,
...                                                  est_time,
...                                                  est_freq)
>>> overall_accuracy = mir_eval.melody.overall_accuracy(ref_v, ref_c,
...                                                     est_v, est_c)
mir_eval.melody.evaluate(ref_time, ref_freq, est_time, est_freq, **kwargs)

Evaluate two melody (predominant f0) transcriptions, where the first is treated as the reference (ground truth) and the second as the estimate to be evaluated (prediction).

Parameters:

ref_time : np.ndarray

Time of each reference frequency value

ref_freq : np.ndarray

Array of reference frequency values

est_time : np.ndarray

Time of each estimated frequency value

est_freq : np.ndarray

Array of estimated frequency values

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
>>> scores = mir_eval.melody.evaluate(ref_time, ref_freq,
...                                   est_time, est_freq)

mir_eval.multipitch

The goal of multiple f0 (multipitch) estimation and tracking is to identify all of the active fundamental frequencies in each time frame in a complex music signal.

Conventions

Multipitch estimates are represented by a timebase and a corresponding list of arrays of frequency estimates. Frequency estimates may have any number of frequency values, including 0 (represented by an empty array). Time values are in units of seconds and frequency estimates are in units of Hz.

The timebase of the estimate time series should ideally match the timebase of the reference time series, but if this is not the case, the estimate time series is resampled using a nearest neighbor interpolation to match the estimate. Time values in the estimate time series that are outside of the range of the reference time series are given null (empty array) frequencies.

By default, a frequency is “correct” if it is within 0.5 semitones of a reference frequency. Frequency values are compared by first mapping them to log-2 semitone space, where the distance between semitones is constant. Chroma-wrapped frequency values are computed by taking the log-2 frequency values modulo 12 to map them down to a single octave. A chroma-wrapped frequency estimate is correct if it’s single-octave value is within 0.5 semitones of the single-octave reference frequency.

The metrics are based on those described in [4] and [5].

Metrics

  • mir_eval.multipitch.metrics(): Precision, Recall, Accuracy, Substitution, Miss, False Alarm, and Total Error scores based both on raw frequency values and values mapped to a single octave (chroma).

References

[4]G. E. Poliner, and D. P. W. Ellis, “A Discriminative Model for Polyphonic Piano Transription”, EURASIP Journal on Advances in Signal Processing, 2007(1):154-163, Jan. 2007.
[5]Bay, M., Ehmann, A. F., & Downie, J. S. (2009). Evaluation of Multiple-F0 Estimation and Tracking Systems. In ISMIR (pp. 315-320).
mir_eval.multipitch.validate(ref_time, ref_freqs, est_time, est_freqs)

Checks that the time and frequency inputs are well-formed.

Parameters:

ref_time : np.ndarray

reference time stamps in seconds

ref_freqs : list of np.ndarray

reference frequencies in Hz

est_time : np.ndarray

estimate time stamps in seconds

est_freqs : list of np.ndarray

estimated frequencies in Hz

mir_eval.multipitch.resample_multipitch(times, frequencies, target_times)

Resamples multipitch time series to a new timescale. Values in target_times outside the range of times return no pitch estimate.

Parameters:

times : np.ndarray

Array of time stamps

frequencies : list of np.ndarray

List of np.ndarrays of frequency values

target_times : np.ndarray

Array of target time stamps

Returns:

frequencies_resampled : list of numpy arrays

Frequency list of lists resampled to new timebase

mir_eval.multipitch.frequencies_to_midi(frequencies, ref_frequency=440.0)

Converts frequencies to continuous MIDI values.

Parameters:

frequencies : list of np.ndarray

Original frequency values

ref_frequency : float

reference frequency in Hz.

Returns:

frequencies_midi : list of np.ndarray

Continuous MIDI frequency values.

mir_eval.multipitch.midi_to_chroma(frequencies_midi)

Wrap MIDI frequencies to a single octave (chroma).

Parameters:

frequencies_midi : list of np.ndarray

Continuous MIDI note frequency values.

Returns:

frequencies_chroma : list of np.ndarray

Midi values wrapped to one octave.

mir_eval.multipitch.compute_num_freqs(frequencies)

Computes the number of frequencies for each time point.

Parameters:

frequencies : list of np.ndarray

Frequency values

Returns:

num_freqs : np.ndarray

Number of frequencies at each time point.

mir_eval.multipitch.compute_num_true_positives(ref_freqs, est_freqs, window=0.5, chroma=False)

Compute the number of true positives in an estimate given a reference. A frequency is correct if it is within a quartertone of the correct frequency.

Parameters:

ref_freqs : list of np.ndarray

reference frequencies (MIDI)

est_freqs : list of np.ndarray

estimated frequencies (MIDI)

window : float

Window size, in semitones

chroma : bool

If True, computes distances modulo n. If True, ref_freqs and est_freqs should be wrapped modulo n.

Returns:

true_positives : np.ndarray

Array the same length as ref_freqs containing the number of true positives.

mir_eval.multipitch.compute_accuracy(true_positives, n_ref, n_est)

Compute accuracy metrics.

Parameters:

true_positives : np.ndarray

Array containing the number of true positives at each time point.

n_ref : np.ndarray

Array containing the number of reference frequencies at each time point.

n_est : np.ndarray

Array containing the number of estimate frequencies at each time point.

Returns:

precision : float

sum(true_positives)/sum(n_est)

recall : float

sum(true_positives)/sum(n_ref)

acc : float

sum(true_positives)/sum(n_est + n_ref - true_positives)

mir_eval.multipitch.compute_err_score(true_positives, n_ref, n_est)

Compute error score metrics.

Parameters:

true_positives : np.ndarray

Array containing the number of true positives at each time point.

n_ref : np.ndarray

Array containing the number of reference frequencies at each time point.

n_est : np.ndarray

Array containing the number of estimate frequencies at each time point.

Returns:

e_sub : float

Substitution error

e_miss : float

Miss error

e_fa : float

False alarm error

e_tot : float

Total error

mir_eval.multipitch.metrics(ref_time, ref_freqs, est_time, est_freqs, **kwargs)

Compute multipitch metrics. All metrics are computed at the ‘macro’ level such that the frame true positive/false positive/false negative rates are summed across time and the metrics are computed on the combined values.

Parameters:

ref_time : np.ndarray

Time of each reference frequency value

ref_freqs : list of np.ndarray

List of np.ndarrays of reference frequency values

est_time : np.ndarray

Time of each estimated frequency value

est_freqs : list of np.ndarray

List of np.ndarrays of estimate frequency values

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

precision : float

Precision (TP/(TP + FP))

recall : float

Recall (TP/(TP + FN))

accuracy : float

Accuracy (TP/(TP + FP + FN))

e_sub : float

Substitution error

e_miss : float

Miss error

e_fa : float

False alarm error

e_tot : float

Total error

precision_chroma : float

Chroma precision

recall_chroma : float

Chroma recall

accuracy_chroma : float

Chroma accuracy

e_sub_chroma : float

Chroma substitution error

e_miss_chroma : float

Chroma miss error

e_fa_chroma : float

Chroma false alarm error

e_tot_chroma : float

Chroma total error

Examples

>>> ref_time, ref_freqs = mir_eval.io.load_ragged_time_series(
...     'reference.txt')
>>> est_time, est_freqs = mir_eval.io.load_ragged_time_series(
...     'estimated.txt')
>>> metris_tuple = mir_eval.multipitch.metrics(
...     ref_time, ref_freqs, est_time, est_freqs)
mir_eval.multipitch.evaluate(ref_time, ref_freqs, est_time, est_freqs, **kwargs)

Evaluate two multipitch (multi-f0) transcriptions, where the first is treated as the reference (ground truth) and the second as the estimate to be evaluated (prediction).

Parameters:

ref_time : np.ndarray

Time of each reference frequency value

ref_freqs : list of np.ndarray

List of np.ndarrays of reference frequency values

est_time : np.ndarray

Time of each estimated frequency value

est_freqs : list of np.ndarray

List of np.ndarrays of estimate frequency values

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> ref_time, ref_freq = mir_eval.io.load_ragged_time_series('ref.txt')
>>> est_time, est_freq = mir_eval.io.load_ragged_time_series('est.txt')
>>> scores = mir_eval.multipitch.evaluate(ref_time, ref_freq,
...                                       est_time, est_freq)

mir_eval.onset

The goal of an onset detection algorithm is to automatically determine when notes are played in a piece of music. The primary method used to evaluate onset detectors is to first determine which estimated onsets are “correct”, where correctness is defined as being within a small window of a reference onset.

Based in part on this script:

Conventions

Onsets should be provided in the form of a 1-dimensional array of onset times in seconds in increasing order.

Metrics

  • mir_eval.onset.f_measure(): Precision, Recall, and F-measure scores based on the number of esimated onsets which are sufficiently close to reference onsets.
mir_eval.onset.validate(reference_onsets, estimated_onsets)

Checks that the input annotations to a metric look like valid onset time arrays, and throws helpful errors if not.

Parameters:

reference_onsets : np.ndarray

reference onset locations, in seconds

estimated_onsets : np.ndarray

estimated onset locations, in seconds

mir_eval.onset.f_measure(reference_onsets, estimated_onsets, window=0.05)

Compute the F-measure of correct vs incorrectly predicted onsets. “Corectness” is determined over a small window.

Parameters:

reference_onsets : np.ndarray

reference onset locations, in seconds

estimated_onsets : np.ndarray

estimated onset locations, in seconds

window : float

Window size, in seconds (Default value = .05)

Returns:

f_measure : float

2*precision*recall/(precision + recall)

precision : float

(# true positives)/(# true positives + # false positives)

recall : float

(# true positives)/(# true positives + # false negatives)

Examples

>>> reference_onsets = mir_eval.io.load_events('reference.txt')
>>> estimated_onsets = mir_eval.io.load_events('estimated.txt')
>>> F, P, R = mir_eval.onset.f_measure(reference_onsets,
...                                    estimated_onsets)
mir_eval.onset.evaluate(reference_onsets, estimated_onsets, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

reference_onsets : np.ndarray

reference onset locations, in seconds

estimated_onsets : np.ndarray

estimated onset locations, in seconds

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> reference_onsets = mir_eval.io.load_events('reference.txt')
>>> estimated_onsets = mir_eval.io.load_events('estimated.txt')
>>> scores = mir_eval.onset.evaluate(reference_onsets,
...                                  estimated_onsets)

mir_eval.pattern

Pattern discovery involves the identification of musical patterns (i.e. short fragments or melodic ideas that repeat at least twice) both from audio and symbolic representations. The metrics used to evaluate pattern discovery systems attempt to quantify the ability of the algorithm to not only determine the present patterns in a piece, but also to find all of their occurrences.

Based on the methods described here:
T. Collins. MIREX task: Discovery of repeated themes & sections. http://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_&_Sections, 2013.

Conventions

The input format can be automatically generated by calling mir_eval.io.load_patterns(). This format is a list of a list of tuples. The first list collections patterns, each of which is a list of occurences, and each occurrence is a list of MIDI onset tuples of (onset_time, mid_note)

A pattern is a list of occurrences. The first occurrence must be the prototype of that pattern (i.e. the most representative of all the occurrences). An occurrence is a list of tuples containing the onset time and the midi note number.

Metrics

  • mir_eval.pattern.standard_FPR(): Strict metric in order to find the possibly transposed patterns of exact length. This is the only metric that considers transposed patterns.
  • mir_eval.pattern.establishment_FPR(): Evaluates the amount of patterns that were successfully identified by the estimated results, no matter how many occurrences they found. In other words, this metric captures how the algorithm successfully established that a pattern repeated at least twice, and this pattern is also found in the reference annotation.
  • mir_eval.pattern.occurrence_FPR(): Evaluation of how well an estimation can effectively identify all the occurrences of the found patterns, independently of how many patterns have been discovered. This metric has a threshold parameter that indicates how similar two occurrences must be in order to be considered equal. In MIREX, this evaluation is run twice, with thresholds .75 and .5.
  • mir_eval.pattern.three_layer_FPR(): Aims to evaluate the general similarity between the reference and the estimations, combining both the establishment of patterns and the retrieval of its occurrences in a single F1 score.
  • mir_eval.pattern.first_n_three_layer_P(): Computes the three-layer precision for the first N patterns only in order to measure the ability of the algorithm to sort the identified patterns based on their relevance.
  • mir_eval.pattern.first_n_target_proportion_R(): Computes the target proportion recall for the first N patterns only in order to measure the ability of the algorithm to sort the identified patterns based on their relevance.
mir_eval.pattern.validate(reference_patterns, estimated_patterns)

Checks that the input annotations to a metric look like valid pattern lists, and throws helpful errors if not.

Parameters:

reference_patterns : list

The reference patterns using the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

mir_eval.pattern.standard_FPR(reference_patterns, estimated_patterns, tol=1e-05)

Standard F1 Score, Precision and Recall.

This metric checks if the prototype patterns of the reference match possible translated patterns in the prototype patterns of the estimations. Since the sizes of these prototypes must be equal, this metric is quite restictive and it tends to be 0 in most of 2013 MIREX results.

Parameters:

reference_patterns : list

The reference patterns using the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

tol : float

Tolerance level when comparing reference against estimation. Default parameter is the one found in the original matlab code by Tom Collins used for MIREX 2013. (Default value = 1e-5)

Returns:

f_measure : float

The standard F1 Score

precision : float

The standard Precision

recall : float

The standard Recall

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> F, P, R = mir_eval.pattern.standard_FPR(ref_patterns, est_patterns)
mir_eval.pattern.establishment_FPR(reference_patterns, estimated_patterns, similarity_metric='cardinality_score')

Establishment F1 Score, Precision and Recall.

Parameters:

reference_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

similarity_metric : str

A string representing the metric to be used when computing the similarity matrix. Accepted values:

  • “cardinality_score”: Count of the intersection between occurrences.

(Default value = “cardinality_score”)

Returns:

f_measure : float

The establishment F1 Score

precision : float

The establishment Precision

recall : float

The establishment Recall

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> F, P, R = mir_eval.pattern.establishment_FPR(ref_patterns,
...                                              est_patterns)
mir_eval.pattern.occurrence_FPR(reference_patterns, estimated_patterns, thres=0.75, similarity_metric='cardinality_score')

Establishment F1 Score, Precision and Recall.

Parameters:

reference_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

thres : float

How similar two occcurrences must be in order to be considered equal (Default value = .75)

similarity_metric : str

A string representing the metric to be used when computing the similarity matrix. Accepted values:

  • “cardinality_score”: Count of the intersection between occurrences.

(Default value = “cardinality_score”)

Returns:

f_measure : float

The establishment F1 Score

precision : float

The establishment Precision

recall : float

The establishment Recall

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> F, P, R = mir_eval.pattern.occurrence_FPR(ref_patterns,
...                                           est_patterns)
mir_eval.pattern.three_layer_FPR(reference_patterns, estimated_patterns)

Three Layer F1 Score, Precision and Recall. As described by Meridith.

Parameters:

reference_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

Returns:

f_measure : float

The three-layer F1 Score

precision : float

The three-layer Precision

recall : float

The three-layer Recall

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> F, P, R = mir_eval.pattern.three_layer_FPR(ref_patterns,
...                                            est_patterns)
mir_eval.pattern.first_n_three_layer_P(reference_patterns, estimated_patterns, n=5)

First n three-layer precision.

This metric is basically the same as the three-layer FPR but it is only applied to the first n estimated patterns, and it only returns the precision. In MIREX and typically, n = 5.

Parameters:

reference_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

n : int

Number of patterns to consider from the estimated results, in the order they appear in the matrix (Default value = 5)

Returns:

precision : float

The first n three-layer Precision

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> P = mir_eval.pattern.first_n_three_layer_P(ref_patterns,
...                                            est_patterns, n=5)
mir_eval.pattern.first_n_target_proportion_R(reference_patterns, estimated_patterns, n=5)

First n target proportion establishment recall metric.

This metric is similar is similar to the establishment FPR score, but it only takes into account the first n estimated patterns and it only outputs the Recall value of it.

Parameters:

reference_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

estimated_patterns : list

The estimated patterns in the same format

n : int

Number of patterns to consider from the estimated results, in the order they appear in the matrix. (Default value = 5)

Returns:

recall : float

The first n target proportion Recall.

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> R = mir_eval.pattern.first_n_target_proportion_R(
...                                 ref_patterns, est_patterns, n=5)
mir_eval.pattern.evaluate(ref_patterns, est_patterns, **kwargs)

Load data and perform the evaluation.

Parameters:

ref_patterns : list

The reference patterns in the format returned by mir_eval.io.load_patterns()

est_patterns : list

The estimated patterns in the same format

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> ref_patterns = mir_eval.io.load_patterns("ref_pattern.txt")
>>> est_patterns = mir_eval.io.load_patterns("est_pattern.txt")
>>> scores = mir_eval.pattern.evaluate(ref_patterns, est_patterns)

mir_eval.segment

Evaluation criteria for structural segmentation fall into two categories: boundary annotation and structural annotation. Boundary annotation is the task of predicting the times at which structural changes occur, such as when a verse transitions to a refrain. Metrics for boundary annotation compare estimated segment boundaries to reference boundaries. Structural annotation is the task of assigning labels to detected segments. The estimated labels may be arbitrary strings - such as A, B, C, - and they need not describe functional concepts. Metrics for structural annotation are similar to those used for clustering data.

Conventions

Both boundary and structural annotation metrics require two dimensional arrays with two columns, one for boundary start times and one for boundary end times. Structural annotation further require lists of reference and estimated segment labels which must have a length which is equal to the number of rows in the corresponding list of boundary edges. In both tasks, we assume that annotations express a partitioning of the track into intervals. The function mir_eval.util.adjust_intervals() can be used to pad or crop the segment boundaries to span the duration of the entire track.

Metrics

References

[6](1, 2) Turnbull, D., Lanckriet, G. R., Pampalk, E., & Goto, M. A Supervised Approach for Detecting Boundaries in Music Using Difference Features and Boosting. In ISMIR (pp. 51-54).
[7]Levy, M., & Sandler, M. Structural segmentation of musical audio by constrained clustering. IEEE transactions on audio, speech, and language processing, 16(2), 318-326.
[8]Lukashevich, H. M. Towards Quantitative Measures of Evaluating Song Segmentation. In ISMIR (pp. 375-380).
[9]Rosenberg, A., & Hirschberg, J. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. In EMNLP-CoNLL (Vol. 7, pp. 410-420).
mir_eval.segment.validate_boundary(reference_intervals, estimated_intervals, trim)

Checks that the input annotations to a segment boundary estimation metric (i.e. one that only takes in segment intervals) look like valid segment times, and throws helpful errors if not.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

trim : bool

will the start and end events be trimmed?

mir_eval.segment.validate_structure(reference_intervals, reference_labels, estimated_intervals, estimated_labels)

Checks that the input annotations to a structure estimation metric (i.e. one that takes in both segment boundaries and their labels) look like valid segment times and labels, and throws helpful errors if not.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

mir_eval.segment.detection(reference_intervals, estimated_intervals, window=0.5, beta=1.0, trim=False)

Boundary detection hit-rate.

A hit is counted whenever an reference boundary is within window of a estimated boundary. Note that each boundary is matched at most once: this is achieved by computing the size of a maximal matching between reference and estimated boundary points, subject to the window constraint.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

window : float > 0

size of the window of ‘correctness’ around ground-truth beats (in seconds) (Default value = 0.5)

beta : float > 0

weighting constant for F-measure. (Default value = 1.0)

trim : boolean

if True, the first and last boundary times are ignored. Typically, these denote start (0) and end-markers. (Default value = False)

Returns:

precision : float

precision of estimated predictions

recall : float

recall of reference reference boundaries

f_measure : float

F-measure (weighted harmonic mean of precision and recall)

Examples

>>> ref_intervals, _ = mir_eval.io.load_labeled_intervals('ref.lab')
>>> est_intervals, _ = mir_eval.io.load_labeled_intervals('est.lab')
>>> # With 0.5s windowing
>>> P05, R05, F05 = mir_eval.segment.detection(ref_intervals,
...                                            est_intervals,
...                                            window=0.5)
>>> # With 3s windowing
>>> P3, R3, F3 = mir_eval.segment.detection(ref_intervals,
...                                         est_intervals,
...                                         window=3)
>>> # Ignoring hits for the beginning and end of track
>>> P, R, F = mir_eval.segment.detection(ref_intervals,
...                                      est_intervals,
...                                      window=0.5,
...                                      trim=True)
mir_eval.segment.deviation(reference_intervals, estimated_intervals, trim=False)

Compute the median deviations between reference and estimated boundary times.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

trim : boolean

if True, the first and last intervals are ignored. Typically, these denote start (0.0) and end-of-track markers. (Default value = False)

Returns:

reference_to_estimated : float

median time from each reference boundary to the closest estimated boundary

estimated_to_reference : float

median time from each estimated boundary to the closest reference boundary

Examples

>>> ref_intervals, _ = mir_eval.io.load_labeled_intervals('ref.lab')
>>> est_intervals, _ = mir_eval.io.load_labeled_intervals('est.lab')
>>> r_to_e, e_to_r = mir_eval.boundary.deviation(ref_intervals,
...                                              est_intervals)
mir_eval.segment.pairwise(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)

Frame-clustering segmentation evaluation by pair-wise agreement.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

beta : float > 0

beta value for F-measure (Default value = 1.0)

Returns:

precision : float > 0

Precision of detecting whether frames belong in the same cluster

recall : float > 0

Recall of detecting whether frames belong in the same cluster

f : float > 0

F-measure of detecting whether frames belong in the same cluster

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> precision, recall, f = mir_eval.structure.pairwise(ref_intervals,
...                                                    ref_labels,
...                                                    est_intervals,
...                                                    est_labels)
mir_eval.segment.rand_index(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)

(Non-adjusted) Rand index.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

beta : float > 0

beta value for F-measure (Default value = 1.0)

Returns:

rand_index : float > 0

Rand index

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> rand_index = mir_eval.structure.rand_index(ref_intervals,
...                                            ref_labels,
...                                            est_intervals,
...                                            est_labels)
mir_eval.segment.ari(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1)

Adjusted Rand Index (ARI) for frame clustering segmentation evaluation.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

Returns:

ari_score : float > 0

Adjusted Rand index between segmentations.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> ari_score = mir_eval.structure.ari(ref_intervals, ref_labels,
...                                    est_intervals, est_labels)
mir_eval.segment.mutual_information(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1)

Frame-clustering segmentation: mutual information metrics.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

Returns:

MI : float > 0

Mutual information between segmentations

AMI : float

Adjusted mutual information between segmentations.

NMI : float > 0

Normalize mutual information between segmentations

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> mi, ami, nmi = mir_eval.structure.mutual_information(ref_intervals,
...                                                      ref_labels,
...                                                      est_intervals,
...                                                      est_labels)
mir_eval.segment.nce(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0, marginal=False)

Frame-clustering segmentation: normalized conditional entropy

Computes cross-entropy of cluster assignment, normalized by the max-entropy.

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

beta : float > 0

beta for F-measure (Default value = 1.0)

marginal : bool

If False, normalize conditional entropy by uniform entropy. If True, normalize conditional entropy by the marginal entropy. (Default value = False)

Returns:

S_over

Over-clustering score:

  • For marginal=False, 1 - H(y_est | y_ref) / log(|y_est|)
  • For marginal=True, 1 - H(y_est | y_ref) / H(y_est)

If |y_est|==1, then S_over will be 0.

S_under

Under-clustering score:

  • For marginal=False, 1 - H(y_ref | y_est) / log(|y_ref|)
  • For marginal=True, 1 - H(y_ref | y_est) / H(y_ref)

If |y_ref|==1, then S_under will be 0.

S_F

F-measure for (S_over, S_under)

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> S_over, S_under, S_F = mir_eval.structure.nce(ref_intervals,
...                                               ref_labels,
...                                               est_intervals,
...                                               est_labels)
mir_eval.segment.vmeasure(reference_intervals, reference_labels, estimated_intervals, estimated_labels, frame_size=0.1, beta=1.0)

Frame-clustering segmentation: v-measure

Computes cross-entropy of cluster assignment, normalized by the marginal-entropy.

This is equivalent to nce(…, marginal=True).

Parameters:

reference_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

reference_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

estimated_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

frame_size : float > 0

length (in seconds) of frames for clustering (Default value = 0.1)

beta : float > 0

beta for F-measure (Default value = 1.0)

Returns:

V_precision

Over-clustering score: 1 - H(y_est | y_ref) / H(y_est)

If |y_est|==1, then V_precision will be 0.

V_recall

Under-clustering score: 1 - H(y_ref | y_est) / H(y_ref)

If |y_ref|==1, then V_recall will be 0.

V_F

F-measure for (V_precision, V_recall)

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> # Trim or pad the estimate to match reference timing
>>> (ref_intervals,
...  ref_labels) = mir_eval.util.adjust_intervals(ref_intervals,
...                                               ref_labels,
...                                               t_min=0)
>>> (est_intervals,
...  est_labels) = mir_eval.util.adjust_intervals(
...     est_intervals, est_labels, t_min=0, t_max=ref_intervals.max())
>>> V_precision, V_recall, V_F = mir_eval.structure.vmeasure(ref_intervals,
...                                                          ref_labels,
...                                                          est_intervals,
...                                                          est_labels)
mir_eval.segment.evaluate(ref_intervals, ref_labels, est_intervals, est_labels, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

ref_intervals : np.ndarray, shape=(n, 2)

reference segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

ref_labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

est_intervals : np.ndarray, shape=(m, 2)

estimated segment intervals, in the format returned by mir_eval.io.load_labeled_intervals().

est_labels : list, shape=(m,)

estimated segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> (ref_intervals,
...  ref_labels) = mir_eval.io.load_labeled_intervals('ref.lab')
>>> (est_intervals,
...  est_labels) = mir_eval.io.load_labeled_intervals('est.lab')
>>> scores = mir_eval.segment.evaluate(ref_intervals, ref_labels,
...                                    est_intervals, est_labels)

mir_eval.hierarchy

Evaluation criteria for hierarchical structure analysis.

Hierarchical structure analysis seeks to annotate a track with a nested decomposition of the temporal elements of the piece, effectively providing a kind of “parse tree” of the composition. Unlike the flat segmentation metrics defined in mir_eval.segment, which can only encode one level of analysis, hierarchical annotations expose the relationships between short segments and the larger compositional elements to which they belong.

Conventions

Annotations are assumed to take the form of an ordered list of segmentations. As in the mir_eval.segment metrics, each segmentation itself consists of an n-by-2 array of interval times, so that the i th segment spans time intervals[i, 0] to intervals[i, 1].

Hierarchical annotations are ordered by increasing specificity, so that the first segmentation should contain the fewest segments, and the last segmentation contains the most.

Metrics

References

[10]Brian McFee, Oriol Nieto, and Juan P. Bello. “Hierarchical evaluation of segment boundary detection”, International Society for Music Information Retrieval (ISMIR) conference, 2015.
[11]Brian McFee, Oriol Nieto, Morwaread Farbood, and Juan P. Bello. “Evaluating hierarchical structure in music annotations”, Frontiers in Psychology, 2017.
mir_eval.hierarchy.validate_hier_intervals(intervals_hier)

Validate a hierarchical segment annotation.

Parameters:

intervals_hier : ordered list of segmentations

Raises:

ValueError

If any segmentation does not span the full duration of the top-level segmentation.

If any segmentation does not start at 0.

mir_eval.hierarchy.tmeasure(reference_intervals_hier, estimated_intervals_hier, transitive=False, window=15.0, frame_size=0.1, beta=1.0)

Computes the tree measures for hierarchical segment annotations.

Parameters:

reference_intervals_hier : list of ndarray

reference_intervals_hier[i] contains the segment intervals (in seconds) for the i th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.

estimated_intervals_hier : list of ndarray

Like reference_intervals_hier but for the estimated annotation

transitive : bool

whether to compute the t-measures using transitivity or not.

window : float > 0

size of the window (in seconds). For each query frame q, result frames are only counted within q +- window.

frame_size : float > 0

length (in seconds) of frames. The frame size cannot be longer than the window.

beta : float > 0

beta parameter for the F-measure.

Returns:

t_precision : number [0, 1]

T-measure Precision

t_recall : number [0, 1]

T-measure Recall

t_measure : number [0, 1]

F-beta measure for (t_precision, t_recall)

Raises:

ValueError

If either of the input hierarchies are inconsistent

If the input hierarchies have different time durations

If frame_size > window or frame_size <= 0

mir_eval.hierarchy.lmeasure(reference_intervals_hier, reference_labels_hier, estimated_intervals_hier, estimated_labels_hier, frame_size=0.1, beta=1.0)

Computes the tree measures for hierarchical segment annotations.

Parameters:

reference_intervals_hier : list of ndarray

reference_intervals_hier[i] contains the segment intervals (in seconds) for the i th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.

reference_labels_hier : list of list of str

reference_labels_hier[i] contains the segment labels for the ``i``th layer of the annotations

estimated_intervals_hier : list of ndarray

estimated_labels_hier : list of ndarray

Like reference_intervals_hier and reference_labels_hier but for the estimated annotation

frame_size : float > 0

length (in seconds) of frames. The frame size cannot be longer than the window.

beta : float > 0

beta parameter for the F-measure.

Returns:

l_precision : number [0, 1]

L-measure Precision

l_recall : number [0, 1]

L-measure Recall

l_measure : number [0, 1]

F-beta measure for (l_precision, l_recall)

Raises:

ValueError

If either of the input hierarchies are inconsistent

If the input hierarchies have different time durations

If frame_size > window or frame_size <= 0

mir_eval.hierarchy.evaluate(ref_intervals_hier, ref_labels_hier, est_intervals_hier, est_labels_hier, **kwargs)

Compute all hierarchical structure metrics for the given reference and estimated annotations.

Parameters:

ref_intervals_hier : list of list-like

ref_labels_hier : list of list of str

est_intervals_hier : list of list-like

est_labels_hier : list of list of str

Hierarchical annotations are encoded as an ordered list of segmentations. Each segmentation itself is a list (or list-like) of intervals (*_intervals_hier) and a list of lists of labels (*_labels_hier).

kwargs

additional keyword arguments to the evaluation metrics.

Returns:

scores : OrderedDict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

T-measures are computed in both the “full” (transitive=True) and “reduced” (transitive=False) modes.

Raises:

ValueError

Thrown when the provided annotations are not valid.

Examples

A toy example with two two-layer annotations

>>> ref_i = [[[0, 30], [30, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]]
>>> est_i = [[[0, 45], [45, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]]
>>> ref_l = [ ['A', 'B'], ['a', 'b', 'a', 'c'] ]
>>> est_l = [ ['A', 'B'], ['a', 'a', 'b', 'b'] ]
>>> scores = mir_eval.hierarchy.evaluate(ref_i, ref_l, est_i, est_l)
>>> dict(scores)
{'T-Measure full': 0.94822745804853459,
 'T-Measure reduced': 0.8732458222764804,
 'T-Precision full': 0.96569179094693058,
 'T-Precision reduced': 0.89939075137018787,
 'T-Recall full': 0.93138358189386117,
 'T-Recall reduced': 0.84857799953694923}

A more realistic example, using SALAMI pre-parsed annotations

>>> def load_salami(filename):
...     "load SALAMI event format as labeled intervals"
...     events, labels = mir_eval.io.load_labeled_events(filename)
...     intervals = mir_eval.util.boundaries_to_intervals(events)[0]
...     return intervals, labels[:len(intervals)]
>>> ref_files = ['data/10/parsed/textfile1_uppercase.txt',
...              'data/10/parsed/textfile1_lowercase.txt']
>>> est_files = ['data/10/parsed/textfile2_uppercase.txt',
...              'data/10/parsed/textfile2_lowercase.txt']
>>> ref = [load_salami(fname) for fname in ref_files]
>>> ref_int = [seg[0] for seg in ref]
>>> ref_lab = [seg[1] for seg in ref]
>>> est = [load_salami(fname) for fname in est_files]
>>> est_int = [seg[0] for seg in est]
>>> est_lab = [seg[1] for seg in est]
>>> scores = mir_eval.hierarchy.evaluate(ref_int, ref_lab,
...                                      est_hier, est_lab)
>>> dict(scores)
{'T-Measure full': 0.66029225561405358,
 'T-Measure reduced': 0.62001868041578034,
 'T-Precision full': 0.66844764668949885,
 'T-Precision reduced': 0.63252297209957919,
 'T-Recall full': 0.6523334654992341,
 'T-Recall reduced': 0.60799919710921635}

mir_eval.separation

Source separation algorithms attempt to extract recordings of individual sources from a recording of a mixture of sources. Evaluation methods for source separation compare the extracted sources from reference sources and attempt to measure the perceptual quality of the separation.

See also the bss_eval MATLAB toolbox:
http://bass-db.gforge.inria.fr/bss_eval/

Conventions

An audio signal is expected to be in the format of a 1-dimensional array where the entries are the samples of the audio signal. When providing a group of estimated or reference sources, they should be provided in a 2-dimensional array, where the first dimension corresponds to the source number and the second corresponds to the samples.

Metrics

References

[12](1, 2) Emmanuel Vincent, Rémi Gribonval, and Cédric Févotte, “Performance measurement in blind audio source separation,” IEEE Trans. on Audio, Speech and Language Processing, 14(4):1462-1469, 2006.
mir_eval.separation.validate(reference_sources, estimated_sources)

Checks that the input data to a metric are valid, and throws helpful errors if not.

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing true sources

estimated_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing estimated sources

mir_eval.separation.bss_eval_sources(reference_sources, estimated_sources, compute_permutation=True)

Ordering and measurement of the separation quality for estimated source signals in terms of filtered true source, interference and artifacts.

The decomposition allows a time-invariant filter distortion of length 512, as described in Section III.B of [12].

Passing False for compute_permutation will improve the computation performance of the evaluation; however, it is not always appropriate and is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_sources.

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing true sources (must have same shape as estimated_sources)

estimated_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing estimated sources (must have same shape as reference_sources)

compute_permutation : bool, optional

compute permutation of estimate/source combinations (True by default)

Returns:

sdr : np.ndarray, shape=(nsrc,)

vector of Signal to Distortion Ratios (SDR)

sir : np.ndarray, shape=(nsrc,)

vector of Source to Interference Ratios (SIR)

sar : np.ndarray, shape=(nsrc,)

vector of Sources to Artifacts Ratios (SAR)

perm : np.ndarray, shape=(nsrc,)

vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j] corresponds to true source number j). Note: perm will be [0, 1, ..., nsrc-1] if compute_permutation is False.

References

[13]Emmanuel Vincent, Shoko Araki, Fabian J. Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, B. Vikrham Gowreesunker, Dominik Lutter and Ngoc Q.K. Duong, “The Signal Separation Evaluation Campaign (2007-2010): Achievements and remaining challenges”, Signal Processing, 92, pp. 1928-1936, 2012.

Examples

>>> # reference_sources[n] should be an ndarray of samples of the
>>> # n'th reference source
>>> # estimated_sources[n] should be the same for the n'th estimated
>>> # source
>>> (sdr, sir, sar,
...  perm) = mir_eval.separation.bss_eval_sources(reference_sources,
...                                               estimated_sources)
mir_eval.separation.bss_eval_sources_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)

Framewise computation of bss_eval_sources

Please be aware that this function does not compute permutations (by default) on the possible relations between reference_sources and estimated_sources due to the dangers of a changing permutation. Therefore (by default), it assumes that reference_sources[i] corresponds to estimated_sources[i]. To enable computing permutations please set compute_permutation to be True and check that the returned perm is identical for all windows.

NOTE: if reference_sources and estimated_sources would be evaluated using only a single window or are shorter than the window length, the result of mir_eval.separation.bss_eval_sources() called on reference_sources and estimated_sources (with the compute_permutation parameter passed to mir_eval.separation.bss_eval_sources()) is returned.

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing true sources (must have the same shape as estimated_sources)

estimated_sources : np.ndarray, shape=(nsrc, nsampl)

matrix containing estimated sources (must have the same shape as reference_sources)

window : int, optional

Window length for framewise evaluation (default value is 30s at a sample rate of 44.1kHz)

hop : int, optional

Hop size for framewise evaluation (default value is 15s at a sample rate of 44.1kHz)

compute_permutation : bool, optional

compute permutation of estimate/source combinations for all windows (False by default)

Returns:

sdr : np.ndarray, shape=(nsrc, nframes)

vector of Signal to Distortion Ratios (SDR)

sir : np.ndarray, shape=(nsrc, nframes)

vector of Source to Interference Ratios (SIR)

sar : np.ndarray, shape=(nsrc, nframes)

vector of Sources to Artifacts Ratios (SAR)

perm : np.ndarray, shape=(nsrc, nframes)

vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j] corresponds to true source number j). Note: perm will be range(nsrc) for all windows if compute_permutation is False

Examples

>>> # reference_sources[n] should be an ndarray of samples of the
>>> # n'th reference source
>>> # estimated_sources[n] should be the same for the n'th estimated
>>> # source
>>> (sdr, sir, sar,
...  perm) = mir_eval.separation.bss_eval_sources_framewise(
         reference_sources,
...      estimated_sources)
mir_eval.separation.bss_eval_images(reference_sources, estimated_sources, compute_permutation=True)

Implementation of the bss_eval_images function from the BSS_EVAL Matlab toolbox.

Ordering and measurement of the separation quality for estimated source signals in terms of filtered true source, interference and artifacts. This method also provides the ISR measure.

The decomposition allows a time-invariant filter distortion of length 512, as described in Section III.B of [12].

Passing False for compute_permutation will improve the computation performance of the evaluation; however, it is not always appropriate and is not the way that the BSS_EVAL Matlab toolbox computes bss_eval_images.

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing true sources

estimated_sources : np.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing estimated sources

compute_permutation : bool, optional

compute permutation of estimate/source combinations (True by default)

Returns:

sdr : np.ndarray, shape=(nsrc,)

vector of Signal to Distortion Ratios (SDR)

isr : np.ndarray, shape=(nsrc,)

vector of source Image to Spatial distortion Ratios (ISR)

sir : np.ndarray, shape=(nsrc,)

vector of Source to Interference Ratios (SIR)

sar : np.ndarray, shape=(nsrc,)

vector of Sources to Artifacts Ratios (SAR)

perm : np.ndarray, shape=(nsrc,)

vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j] corresponds to true source number j). Note: perm will be (1,2,...,nsrc) if compute_permutation is False.

References

[14]Emmanuel Vincent, Shoko Araki, Fabian J. Theis, Guido Nolte, Pau Bofill, Hiroshi Sawada, Alexey Ozerov, B. Vikrham Gowreesunker, Dominik Lutter and Ngoc Q.K. Duong, “The Signal Separation Evaluation Campaign (2007-2010): Achievements and remaining challenges”, Signal Processing, 92, pp. 1928-1936, 2012.

Examples

>>> # reference_sources[n] should be an ndarray of samples of the
>>> # n'th reference source
>>> # estimated_sources[n] should be the same for the n'th estimated
>>> # source
>>> (sdr, isr, sir, sar,
...  perm) = mir_eval.separation.bss_eval_images(reference_sources,
...                                               estimated_sources)
mir_eval.separation.bss_eval_images_framewise(reference_sources, estimated_sources, window=1323000, hop=661500, compute_permutation=False)

Framewise computation of bss_eval_images

Please be aware that this function does not compute permutations (by default) on the possible relations between reference_sources and estimated_sources due to the dangers of a changing permutation. Therefore (by default), it assumes that reference_sources[i] corresponds to estimated_sources[i]. To enable computing permutations please set compute_permutation to be True and check that the returned perm is identical for all windows.

NOTE: if reference_sources and estimated_sources would be evaluated using only a single window or are shorter than the window length, the result of bss_eval_images called on reference_sources and estimated_sources (with the compute_permutation parameter passed to bss_eval_images) is returned

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing true sources (must have the same shape as estimated_sources)

estimated_sources : np.ndarray, shape=(nsrc, nsampl, nchan)

matrix containing estimated sources (must have the same shape as reference_sources)

window : int

Window length for framewise evaluation

hop : int

Hop size for framewise evaluation

compute_permutation : bool, optional

compute permutation of estimate/source combinations for all windows (False by default)

Returns:

sdr : np.ndarray, shape=(nsrc, nframes)

vector of Signal to Distortion Ratios (SDR)

isr : np.ndarray, shape=(nsrc, nframes)

vector of source Image to Spatial distortion Ratios (ISR)

sir : np.ndarray, shape=(nsrc, nframes)

vector of Source to Interference Ratios (SIR)

sar : np.ndarray, shape=(nsrc, nframes)

vector of Sources to Artifacts Ratios (SAR)

perm : np.ndarray, shape=(nsrc, nframes)

vector containing the best ordering of estimated sources in the mean SIR sense (estimated source number perm[j] corresponds to true source number j) Note: perm will be range(nsrc) for all windows if compute_permutation is False

Examples

>>> # reference_sources[n] should be an ndarray of samples of the
>>> # n'th reference source
>>> # estimated_sources[n] should be the same for the n'th estimated
>>> # source
>>> (sdr, isr, sir, sar,
...  perm) = mir_eval.separation.bss_eval_images_framewise(
         reference_sources,
...      estimated_sources,
         window,
....     hop)
mir_eval.separation.evaluate(reference_sources, estimated_sources, **kwargs)

Compute all metrics for the given reference and estimated signals.

NOTE: This will always compute mir_eval.separation.bss_eval_images() for any valid input and will additionally compute mir_eval.separation.bss_eval_sources() for valid input with fewer than 3 dimensions.

Parameters:

reference_sources : np.ndarray, shape=(nsrc, nsampl[, nchan])

matrix containing true sources

estimated_sources : np.ndarray, shape=(nsrc, nsampl[, nchan])

matrix containing estimated sources

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> # reference_sources[n] should be an ndarray of samples of the
>>> # n'th reference source
>>> # estimated_sources[n] should be the same for the n'th estimated source
>>> scores = mir_eval.separation.evaluate(reference_sources,
...                                       estimated_sources)

mir_eval.tempo

The goal of a tempo estimation algorithm is to automatically detect the tempo of a piece of music, measured in beats per minute (BPM).

See http://www.music-ir.org/mirex/wiki/2014:Audio_Tempo_Estimation for a description of the task and evaluation criteria.

Conventions

Reference tempi should be strictly positive, and provided in ascending order as a numpy array of length 2. Estimated tempi are allowed to be 0, but otherwise are subject to the same constraints as reference.

The weighting value from the reference must be a float in the range [0, 1].

Metrics

mir_eval.tempo.validate_tempi(tempi)

Checks that there are two non-negative tempi.

Parameters:

tempi : np.ndarray

length-2 array of tempo, in bpm

mir_eval.tempo.validate(reference_tempi, reference_weight, estimated_tempi)

Checks that the input annotations to a metric look like valid tempo annotations.

Parameters:

reference_tempi : np.ndarray

reference tempo values, in bpm

reference_weight : float

perceptual weight of slow vs fast in reference

estimated_tempi : np.ndarray

estimated tempo values, in bpm

mir_eval.tempo.detection(reference_tempi, reference_weight, estimated_tempi, tol=0.08)

Compute the tempo detection accuracy metric.

Parameters:

reference_tempi : np.ndarray, shape=(2,)

Two non-negative reference tempi

reference_weight : float > 0

The relative strength of reference_tempi[0] vs reference_tempi[1].

estimated_tempi : np.ndarray, shape=(2,)

Two non-negative estimated tempi.

tol : float in [0, 1]:

The maximum allowable deviation from a reference tempo to count as a hit. |est_t - ref_t| <= tol * ref_t (Default value = 0.08)

Returns:

p_score : float in [0, 1]

Weighted average of recalls: reference_weight * hits[0] + (1 - reference_weight) * hits[1]

one_correct : bool

True if at least one reference tempo was correctly estimated

both_correct : bool

True if both reference tempi were correctly estimated

Raises:

ValueError

If the input tempi are ill-formed

If the reference weight is not in the range [0, 1]

If tol <= 0 or tol > 1.

mir_eval.tempo.evaluate(reference_tempi, reference_weight, estimated_tempi, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

reference_tempi : np.ndarray, shape=(2,)

Two non-negative reference tempi

reference_weight : float > 0

The relative strength of reference_tempi[0] vs reference_tempi[1].

estimated_tempi : np.ndarray, shape=(2,)

Two non-negative estimated tempi.

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

mir_eval.transcription

The aim of a transcription algorithm is to produce a symbolic representation of a recorded piece of music in the form of a set of discrete notes. There are different ways to represent notes symbolically. Here we use the piano-roll convention, meaning each note has a start time, a duration (or end time), and a single, constant, pitch value. Pitch values can be quantized (e.g. to a semitone grid tuned to 440 Hz), but do not have to be. Also, the transcription can contain the notes of a single instrument or voice (for example the melody), or the notes of all instruments/voices in the recording. This module is instrument agnostic: all notes in the estimate are compared against all notes in the reference.

There are many metrics for evaluating transcription algorithms. Here we limit ourselves to the most simple and commonly used: given two sets of notes, we count how many estimated notes match the reference, and how many do not. Based on these counts we compute the precision, recall, f-measure and overlap ratio of the estimate given the reference. The default criteria for considering two notes to be a match are adopted from the MIREX Multiple fundamental frequency estimation and tracking, Note Tracking subtask (task 2):

“This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a reference note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the reference note’s duration around the reference note’s offset, or within 50ms whichever is larger.”

In short, we compute precision, recall, f-measure and overlap ratio, once without taking offsets into account, and the second time with.

For further details see Salamon, 2013 (page 186), and references therein:

Salamon, J. (2013). Melody Extraction from Polyphonic Music Signals. Ph.D. thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013.

IMPORTANT NOTE: the evaluation code in mir_eval contains several important differences with respect to the code used in MIREX 2015 for the Note Tracking subtask on the Su dataset (henceforth “MIREX”):

  1. mir_eval uses bipartite graph matching to find the optimal pairing of reference notes to estimated notes. MIREX uses a greedy matching algorithm, which can produce sub-optimal note matching. This will result in mir_eval’s metrics being slightly higher compared to MIREX.
  2. MIREX rounds down the onset and offset times of each note to 2 decimal points using new_time = 0.01 * floor(time*100). mir_eval rounds down the note onset and offset times to 4 decinal points. This will bring our metrics down a notch compared to the MIREX results.
  3. In the MIREX wiki, the criterion for matching offsets is that they must be within 0.2 * ref_duration or 0.05 seconds from each other, whichever is greater (i.e. offset_dif <= max(0.2 * ref_duration, 0.05). The MIREX code however only uses a threshold of 0.2 * ref_duration, without the 0.05 second minimum. Since mir_eval does include this minimum, it might produce slightly higher results compared to MIREX.

This means that differences 1 and 3 bring mir_eval’s metrics up compared to MIREX, whilst 2 brings them down. Based on internal testing, overall the effect of these three differences is that the Precision, Recall and F-measure returned by mir_eval will be higher compared to MIREX by about 1%-2%.

Finally, note that different evaluation scripts have been used for the Multi-F0 Note Tracking task in MIREX over the years. In particular, some scripts used < for matching onsets, offsets, and pitch values, whilst the others used <= for these checks. mir_eval provides both options: by default the latter (<=) is used, but you can set strict=True when calling mir_eval.transcription.precision_recall_f1_overlap() in which case < will be used. The default value (strict=False) is the same as that used in MIREX 2015 for the Note Tracking subtask on the Su dataset.

Conventions

Notes should be provided in the form of an interval array and a pitch array. The interval array contains two columns, one for note onsets and the second for note offsets (each row represents a single note). The pitch array contains one column with the corresponding note pitch values (one value per note), represented by their fundamental frequency (f0) in Hertz.

Metrics

  • mir_eval.transcription.precision_recall_f1_overlap(): The precision, recall, F-measure, and Average Overlap Ratio of the note transcription, where an estimated note is considered correct if its pitch, onset and (optionally) offset are sufficiently close to a reference note.
  • mir_eval.transcription.onset_precision_recall_f1(): The precision, recall and F-measure of the note transcription, where an estimated note is considered correct if its onset is sufficiently close to a reference note’s onset. That is, these metrics are computed taking only note onsets into account, meaning two notes could be matched even if they have very different pitch values.
  • mir_eval.transcription.offset_precision_recall_f1(): The precision, recall and F-measure of the note transcription, where an estimated note is considered correct if its offset is sufficiently close to a reference note’s offset. That is, these metrics are computed taking only note offsets into account, meaning two notes could be matched even if they have very different pitch values.
mir_eval.transcription.validate(ref_intervals, ref_pitches, est_intervals, est_pitches)

Checks that the input annotations to a metric look like time intervals and a pitch list, and throws helpful errors if not.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

mir_eval.transcription.validate_intervals(ref_intervals, est_intervals)

Checks that the input annotations to a metric look like time intervals, and throws helpful errors if not.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

mir_eval.transcription.match_note_offsets(ref_intervals, est_intervals, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False)

Compute a maximum matching between reference and estimated notes, only taking note offsets into account.

Given two note sequences represented by ref_intervals and est_intervals (see mir_eval.io.load_valued_intervals()), we seek the largest set of correspondences (i, j) such that the offset of reference note i has to be within offset_tolerance of the offset of estimated note j, where offset_tolerance is equal to offset_ratio times the reference note’s duration, i.e. offset_ratio * ref_duration[i] where ref_duration[i] = ref_intervals[i, 1] - ref_intervals[i, 0]. If the resulting offset_tolerance is less than offset_min_tolerance (50 ms by default) then offset_min_tolerance is used instead.

Every reference note is matched against at most one estimated note.

Note there are separate functions match_note_onsets() and match_notes() for matching notes based on onsets only or based on onset, offset, and pitch, respectively. This is because the rules for matching note onsets and matching note offsets are different.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

offset_ratio : float > 0

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or 0.05 (50 ms), whichever is greater.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined.

strict : bool

If strict=False (the default), threshold checks for offset matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

Returns:

matching : list of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

mir_eval.transcription.match_note_onsets(ref_intervals, est_intervals, onset_tolerance=0.05, strict=False)

Compute a maximum matching between reference and estimated notes, only taking note onsets into account.

Given two note sequences represented by ref_intervals and est_intervals (see mir_eval.io.load_valued_intervals()), we see the largest set of correspondences (i,j) such that the onset of reference note i is within onset_tolerance of the onset of estimated note j.

Every reference note is matched against at most one estimated note.

Note there are separate functions match_note_offsets() and match_notes() for matching notes based on offsets only or based on onset, offset, and pitch, respectively. This is because the rules for matching note onsets and matching note offsets are different.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

strict : bool

If strict=False (the default), threshold checks for onset matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

Returns:

matching : list of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

mir_eval.transcription.match_notes(ref_intervals, ref_pitches, est_intervals, est_pitches, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False)

Compute a maximum matching between reference and estimated notes, subject to onset, pitch and (optionally) offset constraints.

Given two note sequences represented by ref_intervals, ref_pitches, est_intervals and est_pitches (see mir_eval.io.load_valued_intervals()), we seek the largest set of correspondences (i, j) such that:

  1. The onset of reference note i is within onset_tolerance of the onset of estimated note j.
  2. The pitch of reference note i is within pitch_tolerance of the pitch of estimated note j.
  3. If offset_ratio is not None, the offset of reference note i has to be within offset_tolerance of the offset of estimated note j, where offset_tolerance is equal to offset_ratio times the reference note’s duration, i.e. offset_ratio * ref_duration[i] where ref_duration[i] = ref_intervals[i, 1] - ref_intervals[i, 0]. If the resulting offset_tolerance is less than 0.05 (50 ms), 0.05 is used instead.
  4. If offset_ratio is None, note offsets are ignored, and only criteria 1 and 2 are taken into consideration.

Every reference note is matched against at most one estimated note.

This is useful for computing precision/recall metrics for note transcription.

Note there are separate functions match_note_onsets() and match_note_offsets() for matching notes based on onsets only or based on offsets only, respectively.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerance : float > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratio : float > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or 0.05 (50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the matching.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strict : bool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

Returns:

matching : list of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

mir_eval.transcription.precision_recall_f1_overlap(ref_intervals, ref_pitches, est_intervals, est_pitches, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, beta=1.0)

Compute the Precision, Recall and F-measure of correct vs incorrectly transcribed notes, and the Average Overlap Ratio for correctly transcribed notes (see average_overlap_ratio()). “Correctness” is determined based on note onset, pitch and (optionally) offset: an estimated note is assumed correct if its onset is within +-50ms of a reference note and its pitch (F0) is within +- quarter tone (50 cents) of the corresponding reference note. If offset_ratio is None, note offsets are ignored in the comparison. Otherwise, on top of the above requirements, a correct returned note is required to have an offset value within 20% (by default, adjustable via the offset_ratio parameter) of the reference note’s duration around the reference note’s offset, or within offset_min_tolerance (50 ms by default), whichever is larger.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerance : float > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratio : float > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the evaluation.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strict : bool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

beta : float > 0

Weighting factor for f-measure (default value = 1.0).

Returns:

precision : float

The computed precision score

recall : float

The computed recall score

f_measure : float

The computed F-measure score

avg_overlap_ratio : float

The computed Average Overlap Ratio score

Examples

>>> ref_intervals, ref_pitches = mir_eval.io.load_valued_intervals(
...     'reference.txt')
>>> est_intervals, est_pitches = mir_eval.io.load_valued_intervals(
...     'estimated.txt')
>>> (precision,
...  recall,
...  f_measure) = mir_eval.transcription.precision_recall_f1_overlap(
...      ref_intervals, ref_pitches, est_intervals, est_pitches)
>>> (precision_no_offset,
...  recall_no_offset,
...  f_measure_no_offset) = (
...      mir_eval.transcription.precision_recall_f1_overlap(
...          ref_intervals, ref_pitches, est_intervals, est_pitches,
...          offset_ratio=None))
mir_eval.transcription.average_overlap_ratio(ref_intervals, est_intervals, matching)

Compute the Average Overlap Ratio between a reference and estimated note transcription. Given a reference and corresponding estimated note, their overlap ratio (OR) is defined as the ratio between the duration of the time segment in which the two notes overlap and the time segment spanned by the two notes combined (earliest onset to latest offset):

>>> OR = ((min(ref_offset, est_offset) - max(ref_onset, est_onset)) /
...     (max(ref_offset, est_offset) - min(ref_onset, est_onset)))

The Average Overlap Ratio (AOR) is given by the mean OR computed over all matching reference and estimated notes. The metric goes from 0 (worst) to 1 (best).

Note: this function assumes the matching of reference and estimated notes (see match_notes()) has already been performed and is provided by the matching parameter. Furthermore, it is highly recommended to validate the intervals (see validate_intervals()) before calling this function, otherwise it is possible (though unlikely) for this function to attempt a divide-by-zero operation.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

matching : list of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

Returns:

avg_overlap_ratio : float

The computed Average Overlap Ratio score

mir_eval.transcription.onset_precision_recall_f1(ref_intervals, est_intervals, onset_tolerance=0.05, strict=False, beta=1.0)

Compute the Precision, Recall and F-measure of note onsets: an estimated onset is considered correct if it is within +-50ms of a reference onset. Note that this metric completely ignores note offset and note pitch. This means an estimated onset will be considered correct if it matches a reference onset, even if the onsets come from notes with completely different pitches (i.e. notes that would not match with match_notes()).

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

strict : bool

If strict=False (the default), threshold checks for onset matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

beta : float > 0

Weighting factor for f-measure (default value = 1.0).

Returns:

precision : float

The computed precision score

recall : float

The computed recall score

f_measure : float

The computed F-measure score

Examples

>>> ref_intervals, _ = mir_eval.io.load_valued_intervals(
...     'reference.txt')
>>> est_intervals, _ = mir_eval.io.load_valued_intervals(
...     'estimated.txt')
>>> (onset_precision,
...  onset_recall,
...  onset_f_measure) = mir_eval.transcription.onset_precision_recall_f1(
...      ref_intervals, est_intervals)
mir_eval.transcription.offset_precision_recall_f1(ref_intervals, est_intervals, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, beta=1.0)

Compute the Precision, Recall and F-measure of note offsets: an estimated offset is considered correct if it is within +-50ms (or 20% of the ref note duration, which ever is greater) of a reference offset. Note that this metric completely ignores note onsets and note pitch. This means an estimated offset will be considered correct if it matches a reference offset, even if the offsets come from notes with completely different pitches (i.e. notes that would not match with match_notes()).

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

offset_ratio : float > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is greater.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined.

strict : bool

If strict=False (the default), threshold checks for onset matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

beta : float > 0

Weighting factor for f-measure (default value = 1.0).

Returns:

precision : float

The computed precision score

recall : float

The computed recall score

f_measure : float

The computed F-measure score

Examples

>>> ref_intervals, _ = mir_eval.io.load_valued_intervals(
...     'reference.txt')
>>> est_intervals, _ = mir_eval.io.load_valued_intervals(
...     'estimated.txt')
>>> (offset_precision,
...  offset_recall,
...  offset_f_measure) = mir_eval.transcription.offset_precision_recall_f1(
...      ref_intervals, est_intervals)
mir_eval.transcription.evaluate(ref_intervals, ref_pitches, est_intervals, est_pitches, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> ref_intervals, ref_pitches = mir_eval.io.load_valued_intervals(
...    'reference.txt')
>>> est_intervals, est_pitches = mir_eval.io.load_valued_intervals(
...    'estimate.txt')
>>> scores = mir_eval.transcription.evaluate(ref_intervals, ref_pitches,
...     est_intervals, est_pitches)

mir_eval.transcription_velocity

Transcription evaluation, as defined in mir_eval.transcription, does not take into account the velocities of reference and estimated notes. This submodule implements a variant of mir_eval.transcription.precision_recall_f1_overlap() which additionally considers note velocity when determining whether a note is correctly transcribed. This is done by defining a new function mir_eval.transcription_velocity.match_notes() which first calls mir_eval.transcription.match_notes() to get a note matching based on onset, offset, and pitch. Then, we follow the evaluation procedure described in [15] to test whether an estimated note should be considered correct:

  1. Reference velocities are re-scaled to the range [0, 1].
  2. A linear regression is performed to estimate global scale and offset parameters which minimize the L2 distance between matched estimated and (rescaled) reference notes.
  3. The scale and offset parameters are used to rescale estimated velocities.
  4. An estimated/reference note pair which has been matched according to the onset, offset, and pitch is further only considered correct if the rescaled velocities are within a predefined threshold, defaulting to 0.1.

mir_eval.transcription_velocity.match_notes() is used to define a new variant mir_eval.transcription_velocity.precision_recall_f1_overlap() which considers velocity.

Conventions

This submodule follows the conventions of mir_eval.transcription and additionally requires velocities to be provided as MIDI velocities in the range [0, 127].

Metrics

References

[15]Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, and Douglas Eck, “Onsets and Frames: Dual-Objective Piano Transcription”, Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018.
mir_eval.transcription_velocity.validate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities)

Checks that the input annotations have valid time intervals, pitches, and velocities, and throws helpful errors if not.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocities : np.ndarray, shape=(m,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

mir_eval.transcription_velocity.match_notes(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1)

Match notes, taking note velocity into consideration.

This function first calls mir_eval.transcription.match_notes() to match notes according to the supplied intervals, pitches, onset, offset, and pitch tolerances. The velocities of the matched notes are then used to estimate a slope and intercept which can rescale the estimated velocities so that they are as close as possible (in L2 sense) to their matched reference velocities. Velocities are then normalized to the range [0, 1]. A estimated note is then further only considered correct if its velocity is within velocity_tolerance of its matched (according to pitch and timing) reference note.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocities : np.ndarray, shape=(m,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerance : float > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratio : float > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or 0.05 (50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the matching.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strict : bool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

velocity_tolerance : float > 0

Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within velocity_tolerance of a matched reference note.

Returns:

matching : list of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

mir_eval.transcription_velocity.precision_recall_f1_overlap(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1, beta=1.0)

Compute the Precision, Recall and F-measure of correct vs incorrectly transcribed notes, and the Average Overlap Ratio for correctly transcribed notes (see mir_eval.transcription.average_overlap_ratio()). “Correctness” is determined based on note onset, velocity, pitch and (optionally) offset. An estimated note is considered correct if

  1. Its onset is within onset_tolerance (default +-50ms) of a reference note
  2. Its pitch (F0) is within +/- pitch_tolerance (default one quarter tone, 50 cents) of the corresponding reference note
  3. Its velocity, after normalizing reference velocities to the range [0, 1] and globally rescaling estimated velocities to minimize L2 distance between matched reference notes, is within velocity_tolerance (default 0.1) the corresponding reference note
  4. If offset_ratio is None, note offsets are ignored in the comparison. Otherwise, on top of the above requirements, a correct returned note is required to have an offset value within offset_ratio` (default 20%) of the reference note’s duration around the reference note’s offset, or within offset_min_tolerance (default 50 ms), whichever is larger.
Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

onset_tolerance : float > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerance : float > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratio : float > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the evaluation.

offset_min_tolerance : float > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strict : bool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

velocity_tolerance : float > 0

Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within velocity_tolerance of a matched reference note.

beta : float > 0

Weighting factor for f-measure (default value = 1.0).

Returns:

precision : float

The computed precision score

recall : float

The computed recall score

f_measure : float

The computed F-measure score

avg_overlap_ratio : float

The computed Average Overlap Ratio score

mir_eval.transcription_velocity.evaluate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

ref_intervals : np.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitches : np.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervals : np.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitches : np.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocities : np.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

mir_eval.key

Key Detection involves determining the underlying key (distribution of notes and note transitions) in a piece of music. Key detection algorithms are evaluated by comparing their estimated key to a ground-truth reference key and reporting a score according to the relationship of the keys.

Conventions

Keys are represented as strings of the form '(key) (mode)', e.g. 'C# major' or 'Fb minor'. The case of the key is ignored. Note that certain key strings are equivalent, e.g. 'C# major' and 'Db major'. The mode may only be specified as either 'major' or 'minor', no other mode strings will be accepted.

Metrics

mir_eval.key.validate_key(key)

Checks that a key is well-formatted, e.g. in the form 'C# major'.

Parameters:

key : str

Key to verify

mir_eval.key.validate(reference_key, estimated_key)

Checks that the input annotations to a metric are valid key strings and throws helpful errors if not.

Parameters:

reference_key : str

Reference key string.

estimated_key : str

Estimated key string.

mir_eval.key.split_key_string(key)

Splits a key string (of the form, e.g. 'C# major'), into a tuple of (key, mode) where key is is an integer representing the semitone distance from C.

Parameters:

key : str

String representing a key.

Returns:

key : int

Number of semitones above C.

mode : str

String representing the mode.

mir_eval.key.weighted_score(reference_key, estimated_key)

Computes a heuristic score which is weighted according to the relationship of the reference and estimated key, as follows:

Relationship Score
Same key 1.0
Estimated key is a perfect fifth above reference key 0.5
Relative major/minor 0.3
Parallel major/minor 0.2
Other 0.0
Parameters:

reference_key : str

Reference key string.

estimated_key : str

Estimated key string.

Returns:

score : float

Score representing how closely related the keys are.

Examples

>>> ref_key = mir_eval.io.load_key('ref.txt')
>>> est_key = mir_eval.io.load_key('est.txt')
>>> score = mir_eval.key.weighted_score(ref_key, est_key)
mir_eval.key.evaluate(reference_key, estimated_key, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:

ref_key : str

Reference key string.

ref_key : str

Estimated key string.

kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:

scores : dict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.

Examples

>>> ref_key = mir_eval.io.load_key('reference.txt')
>>> est_key = mir_eval.io.load_key('estimated.txt')
>>> scores = mir_eval.key.evaluate(ref_key, est_key)

mir_eval.util

This submodule collects useful functionality required across the task submodules, such as preprocessing, validation, and common computations.

mir_eval.util.index_labels(labels, case_sensitive=False)

Convert a list of string identifiers into numerical indices.

Parameters:

labels : list of strings, shape=(n,)

A list of annotations, e.g., segment or chord labels from an annotation file.

case_sensitive : bool

Set to True to enable case-sensitive label indexing (Default value = False)

Returns:

indices : list, shape=(n,)

Numerical representation of labels

index_to_label : dict

Mapping to convert numerical indices back to labels. labels[i] == index_to_label[indices[i]]

mir_eval.util.generate_labels(items, prefix='__')

Given an array of items (e.g. events, intervals), create a synthetic label for each event of the form ‘(label prefix)(item number)’

Parameters:

items : list-like

A list or array of events or intervals

prefix : str

This prefix will be prepended to all synthetically generated labels (Default value = ‘__’)

Returns:

labels : list of str

Synthetically generated labels

mir_eval.util.intervals_to_samples(intervals, labels, offset=0, sample_size=0.1, fill_value=None)

Convert an array of labeled time intervals to annotated samples.

Parameters:

intervals : np.ndarray, shape=(n, d)

An array of time intervals, as returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals(). The i th interval spans time intervals[i, 0] to intervals[i, 1].

labels : list, shape=(n,)

The annotation for each interval

offset : float > 0

Phase offset of the sampled time grid (in seconds) (Default value = 0)

sample_size : float > 0

duration of each sample to be generated (in seconds) (Default value = 0.1)

fill_value : type(labels[0])

Object to use for the label with out-of-range time points. (Default value = None)

Returns:

sample_times : list

list of sample times

sample_labels : list

array of labels for each generated sample

Notes

Intervals will be rounded down to the nearest multiple of sample_size.

mir_eval.util.interpolate_intervals(intervals, labels, time_points, fill_value=None)

Assign labels to a set of points in time given a set of intervals.

Time points that do not lie within an interval are mapped to fill_value.

Parameters:

intervals : np.ndarray, shape=(n, 2)

An array of time intervals, as returned by mir_eval.io.load_intervals(). The i th interval spans time intervals[i, 0] to intervals[i, 1].

Intervals are assumed to be disjoint.

labels : list, shape=(n,)

The annotation for each interval

time_points : array_like, shape=(m,)

Points in time to assign labels. These must be in non-decreasing order.

fill_value : type(labels[0])

Object to use for the label with out-of-range time points. (Default value = None)

Returns:

aligned_labels : list

Labels corresponding to the given time points.

Raises:

ValueError

If time_points is not in non-decreasing order.

mir_eval.util.sort_labeled_intervals(intervals, labels=None)

Sort intervals, and optionally, their corresponding labels according to start time.

Parameters:

intervals : np.ndarray, shape=(n, 2)

The input intervals

labels : list, optional

Labels for each interval

Returns:

intervals_sorted or (intervals_sorted, labels_sorted)

Labels are only returned if provided as input

mir_eval.util.f_measure(precision, recall, beta=1.0)

Compute the f-measure from precision and recall scores.

Parameters:

precision : float in (0, 1]

Precision

recall : float in (0, 1]

Recall

beta : float > 0

Weighting factor for f-measure (Default value = 1.0)

Returns:

f_measure : float

The weighted f-measure

mir_eval.util.intervals_to_boundaries(intervals, q=5)

Convert interval times into boundaries.

Parameters:

intervals : np.ndarray, shape=(n_events, 2)

Array of interval start and end-times

q : int

Number of decimals to round to. (Default value = 5)

Returns:

boundaries : np.ndarray

Interval boundary times, including the end of the final interval

mir_eval.util.boundaries_to_intervals(boundaries)

Convert an array of event times into intervals

Parameters:

boundaries : list-like

List-like of event times. These are assumed to be unique timestamps in ascending order.

Returns:

intervals : np.ndarray, shape=(n_intervals, 2)

Start and end time for each interval

mir_eval.util.adjust_intervals(intervals, labels=None, t_min=0.0, t_max=None, start_label='__T_MIN', end_label='__T_MAX')

Adjust a list of time intervals to span the range [t_min, t_max].

Any intervals lying completely outside the specified range will be removed.

Any intervals lying partially outside the specified range will be cropped.

If the specified range exceeds the span of the provided data in either direction, additional intervals will be appended. If an interval is appended at the beginning, it will be given the label start_label; if an interval is appended at the end, it will be given the label end_label.

Parameters:

intervals : np.ndarray, shape=(n_events, 2)

Array of interval start and end-times

labels : list, len=n_events or None

List of labels (Default value = None)

t_min : float or None

Minimum interval start time. (Default value = 0.0)

t_max : float or None

Maximum interval end time. (Default value = None)

start_label : str or float or int

Label to give any intervals appended at the beginning (Default value = ‘__T_MIN’)

end_label : str or float or int

Label to give any intervals appended at the end (Default value = ‘__T_MAX’)

Returns:

new_intervals : np.ndarray

Intervals spanning [t_min, t_max]

new_labels : list

List of labels for new_labels

mir_eval.util.adjust_events(events, labels=None, t_min=0.0, t_max=None, label_prefix='__')

Adjust the given list of event times to span the range [t_min, t_max].

Any event times outside of the specified range will be removed.

If the times do not span [t_min, t_max], additional events will be added with the prefix label_prefix.

Parameters:

events : np.ndarray

Array of event times (seconds)

labels : list or None

List of labels (Default value = None)

t_min : float or None

Minimum valid event time. (Default value = 0.0)

t_max : float or None

Maximum valid event time. (Default value = None)

label_prefix : str

Prefix string to use for synthetic labels (Default value = ‘__’)

Returns:

new_times : np.ndarray

Event times corrected to the given range.

mir_eval.util.intersect_files(flist1, flist2)

Return the intersection of two sets of filepaths, based on the file name (after the final ‘/’) and ignoring the file extension.

Parameters:

flist1 : list

first list of filepaths

flist2 : list

second list of filepaths

Returns:

sublist1 : list

subset of filepaths with matching stems from flist1

sublist2 : list

corresponding filepaths from flist2

Examples

>>> flist1 = ['/a/b/abc.lab', '/c/d/123.lab', '/e/f/xyz.lab']
>>> flist2 = ['/g/h/xyz.npy', '/i/j/123.txt', '/k/l/456.lab']
>>> sublist1, sublist2 = mir_eval.util.intersect_files(flist1, flist2)
>>> print sublist1
['/e/f/xyz.lab', '/c/d/123.lab']
>>> print sublist2
['/g/h/xyz.npy', '/i/j/123.txt']
mir_eval.util.merge_labeled_intervals(x_intervals, x_labels, y_intervals, y_labels)

Merge the time intervals of two sequences.

Parameters:

x_intervals : np.ndarray

Array of interval times (seconds)

x_labels : list or None

List of labels

y_intervals : np.ndarray

Array of interval times (seconds)

y_labels : list or None

List of labels

Returns:

new_intervals : np.ndarray

New interval times of the merged sequences.

new_x_labels : list

New labels for the sequence x

new_y_labels : list

New labels for the sequence y

mir_eval.util.match_events(ref, est, window, distance=None)

Compute a maximum matching between reference and estimated event times, subject to a window constraint.

Given two lists of event times ref and est, we seek the largest set of correspondences (ref[i], est[j]) such that distance(ref[i], est[j]) <= window, and each ref[i] and est[j] is matched at most once.

This is useful for computing precision/recall metrics in beat tracking, onset detection, and segmentation.

Parameters:

ref : np.ndarray, shape=(n,)

Array of reference values

est : np.ndarray, shape=(m,)

Array of estimated values

window : float > 0

Size of the window.

distance : function

function that computes the outer distance of ref and est. By default uses |ref[i] - est[j]|

Returns:

matching : list of tuples

A list of matched reference and event numbers. matching[i] == (i, j) where ref[i] matches est[j].

mir_eval.util.validate_intervals(intervals)

Checks that an (n, 2) interval ndarray is well-formed, and raises errors if not.

Parameters:

intervals : np.ndarray, shape=(n, 2)

Array of interval start/end locations.

mir_eval.util.validate_events(events, max_time=30000.0)

Checks that a 1-d event location ndarray is well-formed, and raises errors if not.

Parameters:

events : np.ndarray, shape=(n,)

Array of event times

max_time : float

If an event is found above this time, a ValueError will be raised. (Default value = 30000.)

mir_eval.util.validate_frequencies(frequencies, max_freq, min_freq, allow_negatives=False)

Checks that a 1-d frequency ndarray is well-formed, and raises errors if not.

Parameters:

frequencies : np.ndarray, shape=(n,)

Array of frequency values

max_freq : float

If a frequency is found above this pitch, a ValueError will be raised. (Default value = 5000.)

min_freq : float

If a frequency is found below this pitch, a ValueError will be raised. (Default value = 20.)

allow_negatives : bool

Whether or not to allow negative frequency values.

mir_eval.util.has_kwargs(function)

Determine whether a function has **kwargs.

Parameters:

function : callable

The function to test

Returns:

True if function accepts arbitrary keyword arguments.

False otherwise.

mir_eval.util.filter_kwargs(_function, *args, **kwargs)

Given a function and args and keyword args to pass to it, call the function but using only the keyword arguments which it accepts. This is equivalent to redefining the function with an additional **kwargs to accept slop keyword args.

If the target function already accepts **kwargs parameters, no filtering is performed.

Parameters:

_function : callable

Function to call. Can take in any number of args or kwargs

mir_eval.util.intervals_to_durations(intervals)

Converts an array of n intervals to their n durations.

Parameters:

intervals : np.ndarray, shape=(n, 2)

An array of time intervals, as returned by mir_eval.io.load_intervals(). The i th interval spans time intervals[i, 0] to intervals[i, 1].

Returns:

durations : np.ndarray, shape=(n,)

Array of the duration of each interval.

mir_eval.util.hz_to_midi(freqs)

Convert Hz to MIDI numbers

Parameters:

freqs : number or ndarray

Frequency/frequencies in Hz

Returns:

midi : number or ndarray

MIDI note numbers corresponding to input frequencies. Note that these may be fractional.

mir_eval.util.midi_to_hz(midi)

Convert MIDI numbers to Hz

Parameters:

midi : number or ndarray

MIDI notes

Returns:

freqs : number or ndarray

Frequency/frequencies in Hz corresponding to midi

mir_eval.io

Functions for loading in annotations from files in different formats.

mir_eval.io.load_delimited(filename, converters, delimiter='\\s+')

Utility function for loading in data from an annotation file where columns are delimited. The number of columns is inferred from the length of the provided converters list.

Parameters:

filename : str

Path to the annotation file

converters : list of functions

Each entry in column n of the file will be cast by the function converters[n].

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

columns : tuple of lists

Each list in this tuple corresponds to values in one of the columns in the file.

Examples

>>> # Load in a one-column list of event times (floats)
>>> load_delimited('events.txt', [float])
>>> # Load in a list of labeled events, separated by commas
>>> load_delimited('labeled_events.csv', [float, str], ',')
mir_eval.io.load_events(filename, delimiter='\\s+')

Import time-stamp events from an annotation file. The file should consist of a single column of numeric values corresponding to the event times. This is primarily useful for processing events which lack duration, such as beats or onsets.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

event_times : np.ndarray

array of event times (float)

mir_eval.io.load_labeled_events(filename, delimiter='\\s+')

Import labeled time-stamp events from an annotation file. The file should consist of two columns; the first having numeric values corresponding to the event times and the second having string labels for each event. This is primarily useful for processing labeled events which lack duration, such as beats with metric beat number or onsets with an instrument label.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

event_times : np.ndarray

array of event times (float)

labels : list of str

list of labels

mir_eval.io.load_intervals(filename, delimiter='\\s+')

Import intervals from an annotation file. The file should consist of two columns of numeric values corresponding to start and end time of each interval. This is primarily useful for processing events which span a duration, such as segmentation, chords, or instrument activation.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

intervals : np.ndarray, shape=(n_events, 2)

array of event start and end times

mir_eval.io.load_labeled_intervals(filename, delimiter='\\s+')

Import labeled intervals from an annotation file. The file should consist of three columns: Two consisting of numeric values corresponding to start and end time of each interval and a third corresponding to the label of each interval. This is primarily useful for processing events which span a duration, such as segmentation, chords, or instrument activation.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

intervals : np.ndarray, shape=(n_events, 2)

array of event start and end time

labels : list of str

list of labels

mir_eval.io.load_time_series(filename, delimiter='\\s+')

Import a time series from an annotation file. The file should consist of two columns of numeric values corresponding to the time and value of each sample of the time series.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

times : np.ndarray

array of timestamps (float)

values : np.ndarray

array of corresponding numeric values (float)

mir_eval.io.load_patterns(filename)

Loads the patters contained in the filename and puts them into a list of patterns, each pattern being a list of occurrence, and each occurrence being a list of (onset, midi) pairs.

The input file must be formatted as described in MIREX 2013: http://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections

Parameters:

filename : str

The input file path containing the patterns of a given piece using the MIREX 2013 format.

Returns:

pattern_list : list

The list of patterns, containing all their occurrences, using the following format:

onset_midi = (onset_time, midi_number)
occurrence = [onset_midi1, ..., onset_midiO]
pattern = [occurrence1, ..., occurrenceM]
pattern_list = [pattern1, ..., patternN]

where N is the number of patterns, M[i] is the number of occurrences of the i th pattern, and O[j] is the number of onsets in the j’th occurrence. E.g.:

occ1 = [(0.5, 67.0), (1.0, 67.0), (1.5, 67.0), (2.0, 64.0)]
occ2 = [(4.5, 65.0), (5.0, 65.0), (5.5, 65.0), (6.0, 62.0)]
pattern1 = [occ1, occ2]

occ1 = [(10.5, 67.0), (11.0, 67.0), (11.5, 67.0), (12.0, 64.0),
        (12.5, 69.0), (13.0, 69.0), (13.5, 69.0), (14.0, 67.0),
        (14.5, 76.0), (15.0, 76.0), (15.5, 76.0), (16.0, 72.0)]
occ2 = [(18.5, 67.0), (19.0, 67.0), (19.5, 67.0), (20.0, 62.0),
        (20.5, 69.0), (21.0, 69.0), (21.5, 69.0), (22.0, 67.0),
        (22.5, 77.0), (23.0, 77.0), (23.5, 77.0), (24.0, 74.0)]
pattern2 = [occ1, occ2]

pattern_list = [pattern1, pattern2]
mir_eval.io.load_wav(path, mono=True)

Loads a .wav file as a numpy array using scipy.io.wavfile.

Parameters:

path : str

Path to a .wav file

mono : bool

If the provided .wav has more than one channel, it will be converted to mono if mono=True. (Default value = True)

Returns:

audio_data : np.ndarray

Array of audio samples, normalized to the range [-1., 1.]

fs : int

Sampling rate of the audio data

mir_eval.io.load_valued_intervals(filename, delimiter='\\s+')

Import valued intervals from an annotation file. The file should consist of three columns: Two consisting of numeric values corresponding to start and end time of each interval and a third, also of numeric values, corresponding to the value of each interval. This is primarily useful for processing events which span a duration and have a numeric value, such as piano-roll notes which have an onset, offset, and a pitch value.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

intervals : np.ndarray, shape=(n_events, 2)

Array of event start and end times

values : np.ndarray, shape=(n_events,)

Array of values

mir_eval.io.load_key(filename, delimiter='\\s+')

Load key labels from an annotation file. The file should consist of two string columns: One denoting the key scale degree (semitone), and the other denoting the mode (major or minor). The file should contain only one row.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

key : str

Key label, in the form '(key) (mode)'

mir_eval.io.load_tempo(filename, delimiter='\\s+')

Load tempo estimates from an annotation file in MIREX format. The file should consist of three numeric columns: the first two correspond to tempo estimates (in beats-per-minute), and the third denotes the relative confidence of the first value compared to the second (in the range [0, 1]). The file should contain only one row.

Parameters:

filename : str

Path to the annotation file

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

Returns:

tempi : np.ndarray, non-negative

The two tempo estimates

weight : float [0, 1]

The relative importance of tempi[0] compared to tempi[1]

mir_eval.io.load_ragged_time_series(filename, dtype=<type 'float'>, delimiter='\\s+', header=False)

Utility function for loading in data from a delimited time series annotation file with a variable number of columns. Assumes that column 0 contains time stamps and columns 1 through n contain values. n may be variable from time stamp to time stamp.

Parameters:

filename : str

Path to the annotation file

dtype : function

Data type to apply to values columns.

delimiter : str

Separator regular expression. By default, lines will be split by any amount of whitespace.

header : bool

Indicates whether a header row is present or not. By default, assumes no header is present.

Returns:

times : np.ndarray

array of timestamps (float)

values : list of np.ndarray

list of arrays of corresponding values

Examples

>>> # Load a ragged list of tab-delimited multi-f0 midi notes
>>> times, vals = load_ragged_time_series('multif0.txt', dtype=int,
                                          delimiter='\t')
>>> # Load a raggled list of space delimited multi-f0 values with a header
>>> times, vals = load_ragged_time_series('labeled_events.csv',
                                          header=True)

mir_eval.sonify

Methods which sonify annotations for “evaluation by ear”. All functions return a raw signal at the specified sampling rate.

mir_eval.sonify.clicks(times, fs, click=None, length=None)

Returns a signal with the signal ‘click’ placed at each specified time

Parameters:

times : np.ndarray

times to place clicks, in seconds

fs : int

desired sampling rate of the output signal

click : np.ndarray

click signal, defaults to a 1 kHz blip

length : int

desired number of samples in the output signal, defaults to times.max()*fs + click.shape[0] + 1

Returns:

click_signal : np.ndarray

Synthesized click signal

mir_eval.sonify.time_frequency(gram, frequencies, times, fs, function=<ufunc 'sin'>, length=None, n_dec=1)

Reverse synthesis of a time-frequency representation of a signal

Parameters:

gram : np.ndarray

gram[n, m] is the magnitude of frequencies[n] from times[m] to times[m + 1]

Non-positive magnitudes are interpreted as silence.

frequencies : np.ndarray

array of size gram.shape[0] denoting the frequency of each row of gram

times : np.ndarray, shape= (gram.shape[1],) or (gram.shape[1], 2)

Either the start time of each column in the gram, or the time interval corresponding to each column.

fs : int

desired sampling rate of the output signal

function : function

function to use to synthesize notes, should be 2\pi-periodic

length : int

desired number of samples in the output signal, defaults to times[-1]*fs

n_dec : int

the number of decimals used to approximate each sonfied frequency. Defaults to 1 decimal place. Higher precision will be slower.

Returns:

output : np.ndarray

synthesized version of the piano roll

mir_eval.sonify.pitch_contour(times, frequencies, fs, amplitudes=None, function=<ufunc 'sin'>, length=None, kind='linear')

Sonify a pitch contour.

Parameters:

times : np.ndarray

time indices for each frequency measurement, in seconds

frequencies : np.ndarray

frequency measurements, in Hz. Non-positive measurements will be interpreted as un-voiced samples.

fs : int

desired sampling rate of the output signal

amplitudes : np.ndarray

amplitude measurments, nonnegative defaults to np.ones((length,))

function : function

function to use to synthesize notes, should be 2\pi-periodic

length : int

desired number of samples in the output signal, defaults to max(times)*fs

kind : str

Interpolation mode for the frequency and amplitude values. See: scipy.interpolate.interp1d for valid settings.

Returns:

output : np.ndarray

synthesized version of the pitch contour

mir_eval.sonify.chroma(chromagram, times, fs, **kwargs)

Reverse synthesis of a chromagram (semitone matrix)

Parameters:

chromagram : np.ndarray, shape=(12, times.shape[0])

Chromagram matrix, where each row represents a semitone [C->Bb] i.e., chromagram[3, j] is the magnitude of D# from times[j] to times[j + 1]

times: np.ndarray, shape=(len(chord_labels),) or (len(chord_labels), 2)

Either the start time of each column in the chromagram, or the time interval corresponding to each column.

fs : int

Sampling rate to synthesize audio data at

kwargs

Additional keyword arguments to pass to mir_eval.sonify.time_frequency()

Returns:

output : np.ndarray

Synthesized chromagram

mir_eval.sonify.chords(chord_labels, intervals, fs, **kwargs)

Synthesizes chord labels

Parameters:

chord_labels : list of str

List of chord label strings.

intervals : np.ndarray, shape=(len(chord_labels), 2)

Start and end times of each chord label

fs : int

Sampling rate to synthesize at

kwargs

Additional keyword arguments to pass to mir_eval.sonify.time_frequency()

Returns:

output : np.ndarray

Synthesized chord labels

mir_eval.display

Display functions

mir_eval.display.segments(intervals, labels, base=None, height=None, text=False, text_kw=None, ax=None, **kwargs)

Plot a segmentation as a set of disjoint rectangles.

Parameters:

intervals : np.ndarray, shape=(n, 2)

segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

base : number

The vertical position of the base of the rectangles. By default, this will be the bottom of the plot.

height : number

The height of the rectangles. By default, this will be the top of the plot (minus base).

text : bool

If true, each segment’s label is displayed in its upper-left corner

text_kw : dict

If text == True, the properties of the text object can be specified here. See matplotlib.pyplot.Text for valid parameters

ax : matplotlib.pyplot.axes

An axis handle on which to draw the segmentation. If none is provided, a new set of axes is created.

kwargs

Additional keyword arguments to pass to matplotlib.patches.Rectangle.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.labeled_intervals(intervals, labels, label_set=None, base=None, height=None, extend_labels=True, ax=None, tick=True, **kwargs)

Plot labeled intervals with each label on its own row.

Parameters:

intervals : np.ndarray, shape=(n, 2)

segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals().

labels : list, shape=(n,)

reference segment labels, in the format returned by mir_eval.io.load_labeled_intervals().

label_set : list

An (ordered) list of labels to determine the plotting order. If not provided, the labels will be inferred from ax.get_yticklabels(). If no yticklabels exist, then the sorted set of unique values in labels is taken as the label set.

base : np.ndarray, shape=(n,), optional

Vertical positions of each label. By default, labels are positioned at integers np.arange(len(labels)).

height : scalar or np.ndarray, shape=(n,), optional

Height for each label. If scalar, the same value is applied to all labels. By default, each label has height=1.

extend_labels : bool

If False, only values of labels that also exist in label_set will be shown.

If True, all labels are shown, with those in labels but not in label_set appended to the top of the plot. A horizontal line is drawn to indicate the separation between values in or out of label_set.

ax : matplotlib.pyplot.axes

An axis handle on which to draw the intervals. If none is provided, a new set of axes is created.

tick : bool

If True, sets tick positions and labels on the y-axis.

kwargs

Additional keyword arguments to pass to matplotlib.collection.BrokenBarHCollection.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

class mir_eval.display.IntervalFormatter(base, ticks)

Bases: matplotlib.ticker.Formatter

Ticker formatter for labeled interval plots.

Parameters:

base : array-like of int

The base positions of each label

ticks : array-like of string

The labels for the ticks

Attributes

axis  

Methods

mir_eval.display.hierarchy(intervals_hier, labels_hier, levels=None, ax=None, **kwargs)

Plot a hierarchical segmentation

Parameters:

intervals_hier : list of np.ndarray

A list of segmentation intervals. Each element should be an n-by-2 array of segment intervals, in the format returned by mir_eval.io.load_intervals() or mir_eval.io.load_labeled_intervals(). Segmentations should be ordered by increasing specificity.

labels_hier : list of list-like

A list of segmentation labels. Each element should be a list of labels for the corresponding element in intervals_hier.

levels : list of string

Each element levels[i] is a label for the `i th segmentation. This is used in the legend to denote the levels in a segment hierarchy.

kwargs

Additional keyword arguments to labeled_intervals.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.events(times, labels=None, base=None, height=None, ax=None, text_kw=None, **kwargs)

Plot event times as a set of vertical lines

Parameters:

times : np.ndarray, shape=(n,)

event times, in the format returned by mir_eval.io.load_events() or mir_eval.io.load_labeled_events().

labels : list, shape=(n,), optional

event labels, in the format returned by mir_eval.io.load_labeled_events().

base : number

The vertical position of the base of the line. By default, this will be the bottom of the plot.

height : number

The height of the lines. By default, this will be the top of the plot (minus base).

ax : matplotlib.pyplot.axes

An axis handle on which to draw the segmentation. If none is provided, a new set of axes is created.

text_kw : dict

If labels is provided, the properties of the text objects can be specified here. See matplotlib.pyplot.Text for valid parameters

kwargs

Additional keyword arguments to pass to matplotlib.pyplot.vlines.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.pitch(times, frequencies, midi=False, unvoiced=False, ax=None, **kwargs)

Visualize pitch contours

Parameters:

times : np.ndarray, shape=(n,)

Sample times of frequencies

frequencies : np.ndarray, shape=(n,)

frequencies (in Hz) of the pitch contours. Voicing is indicated by sign (positive for voiced, non-positive for non-voiced).

midi : bool

If True, plot on a MIDI-numbered vertical axis. Otherwise, plot on a linear frequency axis.

unvoiced : bool

If True, unvoiced pitch contours are plotted and indicated by transparency.

Otherwise, unvoiced pitch contours are omitted from the display.

ax : matplotlib.pyplot.axes

An axis handle on which to draw the pitch contours. If none is provided, a new set of axes is created.

kwargs

Additional keyword arguments to matplotlib.pyplot.plot.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.multipitch(times, frequencies, midi=False, unvoiced=False, ax=None, **kwargs)

Visualize multiple f0 measurements

Parameters:

times : np.ndarray, shape=(n,)

Sample times of frequencies

frequencies : list of np.ndarray

frequencies (in Hz) of the pitch measurements. Voicing is indicated by sign (positive for voiced, non-positive for non-voiced).

times and frequencies should be in the format produced by mir_eval.io.load_ragged_time_series()

midi : bool

If True, plot on a MIDI-numbered vertical axis. Otherwise, plot on a linear frequency axis.

unvoiced : bool

If True, unvoiced pitches are plotted and indicated by transparency.

Otherwise, unvoiced pitches are omitted from the display.

ax : matplotlib.pyplot.axes

An axis handle on which to draw the pitch contours. If none is provided, a new set of axes is created.

kwargs

Additional keyword arguments to plt.scatter.

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.piano_roll(intervals, pitches=None, midi=None, ax=None, **kwargs)

Plot a quantized piano roll as intervals

Parameters:

intervals : np.ndarray, shape=(n, 2)

timing intervals for notes

pitches : np.ndarray, shape=(n,), optional

pitches of notes (in Hz).

midi : np.ndarray, shape=(n,), optional

pitches of notes (in MIDI numbers).

At least one of pitches or midi must be provided.

ax : matplotlib.pyplot.axes

An axis handle on which to draw the intervals. If none is provided, a new set of axes is created.

kwargs

Additional keyword arguments to labeled_intervals().

Returns:

ax : matplotlib.pyplot.axes._subplots.AxesSubplot

A handle to the (possibly constructed) plot axes

mir_eval.display.separation(sources, fs=22050, labels=None, alpha=0.75, ax=None, **kwargs)

Source-separation visualization

Parameters:

sources : np.ndarray, shape=(nsrc, nsampl)

A list of waveform buffers corresponding to each source

fs : number > 0

The sampling rate

labels : list of strings

An optional list of descriptors corresponding to each source

alpha : float in [0, 1]

Maximum alpha (opacity) of spectrogram values.

ax : matplotlib.pyplot.axes

An axis handle on which to draw the spectrograms. If none is provided, a new set of axes is created.

kwargs

Additional keyword arguments to scipy.signal.spectrogram

Returns:

ax

The axis handle for this plot

mir_eval.display.ticker_notes(ax=None)

Set the y-axis of the given axes to MIDI notes

Parameters:

ax : matplotlib.pyplot.axes

The axes handle to apply the ticker. By default, uses the current axes handle.

mir_eval.display.ticker_pitch(ax=None)

Set the y-axis of the given axes to MIDI frequencies

Parameters:

ax : matplotlib.pyplot.axes

The axes handle to apply the ticker. By default, uses the current axes handle.

Changes

Indices and tables