Annotation of Soft Onsets in String Ensemble Recordings

aBirmingham City University, Sound and Music Analysis (SoMA) Group, Birmingham, UK
bUniversity of Birmingham, Sensory Motor Neuroscience (SyMon) Centre, Birmingham, UK
cWMG, University of Warwick, Coventry, UK

Abstract

Supplementary experimental results for the paper Annotation of Soft Onsets in String Ensemble Recordings.

Onset detection is the process of identifying the start points of musical note events within an audio recording. While the detection of percussive onsets is often considered a solved problem, soft onsets—as found in string instrument recordings—still pose a significant challenge for state-of-the-art algorithms. The problem is further exacerbated by a paucity of data containing expert annotations and research related to best practices for curating soft onset annotations for string instruments. To this end, we investigate inter-annotator agreement between 24 participants, extend an algorithm for determining the most consistent annotator, and compare the performance of human annotators and state-of-the-art onset detection algorithms. Experimental results reveal a positive trend between musical experience and both inter-annotator agreement and performance in comparison with automated systems. Additionally, onsets produced by changes in fingering as well as those from the cello were found to be particularly challenging for both human annotators and automatic approaches. To promote research in best practices for annotation of soft onsets, we have made all experimental data associated with this study publicly available. In addition, we also publish the ARME Virtuoso Strings Dataset, consisting of over 144 recordings of professional performances of an excerpt from Haydn's Op. 74 No. 1 Finale, each with corresponding individual instrumental note onset annotations.

BibTeX

If you use the resources on this website in your research, feel free to cite this paper:

            
@article{tomczak2022annotation,
  title={Annotation of Soft Onsets in String Ensemble Recordings}, 
  author={Tomczak, Maciej and Li, Min Susan and Bradbury, Adrian and Elliott, Mark and Stables, Ryan and Witek, Maria and Goodman, Tom and Abdlkarim, Diar and Di Luca, Massimiliano and Wing, Alan and Hockman, Jason},
  journal={arXiv preprint arXiv:2211.08848},
  year={2022}
}
            
          

Musical Conditions

The musical conditions used in our study represent different playing styles and were chosen to span a wide range of performance types. The following are definitions of the three analysed conditions.



For additional information please refer to the paper.

Inter-annotator Agreements for All Participants

Inter-annotator agreement results for the 12th repetition of the NR condition recordings (NR12) visualised over different tolerance windows (20-100 ms) for viola, cello, and first and second violin (VA, VC, VN1 and V2). For more details please refer to Experiment 1 in the paper.



Viola - 24 Annotators

Pairwise agreements between 24 annotators using F-measure score sorted by years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Cello - 24 Annotators

Pairwise agreements between 24 annotators using F-measure score sorted by years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


First Violin - 24 Annotators

Pairwise agreements between 24 annotators using F-measure score sorted by years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Second Violin - 24 Annotators

Pairwise agreements between 24 annotators using F-measure score sorted by years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Inter-annotator Agreements for a Subset of Participants

Viola - 16 Annotators

Pairwise F-measure agreements between 16 annotators with 5 or more years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Cello - 16 Annotators

Pairwise F-measure agreements between 16 annotators with 5 or more years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


First Violin - 16 Annotators

Pairwise F-measure agreements between 16 annotators with 5 or more years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Second Violin - 16 Annotators

Pairwise agreements between 24 annotators using F-measure score sorted by years of musical experience.

Interpolate start reference image.

20ms Tolerance Window

Loading...
Interpolation end reference image.

70ms Tolerance Window


Inter-annotator Performance

For more details please refer to Experiment 2 in the paper.

Onset detection results from CNN system.

Figure 1. True positive rates per participant compared to ground truth expert annotations a_0 in NR12.


Figure 1 shows the performance of each annotator a calculated as percentage of TP onsets per instrument and onset category when compared to the expert annotations a_0 (i.e., here considered ground truth). The reported results use tolerance window size of 25 ms and are presented for every annotation participant. The highest mean accuracy of 98%, 97%, and 96% across onset types and instruments is observed in annotators a_2, a_20 and a_23, respectively. The lowest mean performance across onset types and instruments can be seen in annotators a_9, a_7 and a_16 with respective true positive rates of 12.8%, 42.5% and 55.2%. Additionally, means across onset types and annotators are 82.0%, 72.4%, 82.8% and 81.7% for VA, VC, VN1 and VN2, respectively.

Onset Detection Results For Each Instrument

For more details please refer to Experiment 3 in the paper.

Onset detection results from all systems per instrument.

Figure 2. Onset detection results as a correspondence between the onsets detected by the algorithms and compared against annotations from the annotators a_0, a_12, a_13, a_16, a_18.


Results are reported for five algorithms using annotations from NR, SP and DP conditions as well as, expert annotations a_0 from NR12. The overall highest performing algorithms for all instruments are CNN and CoF. The highest precision is achieved by CNN system (0.93) on the VN1 recordings with NR12 expert annotations. In the NR, SP and DP conditions the highest precision is achieved by the CNN (0.8), CoF (0.8), and CNN (0.83) algorithms in VN1, VN2 and VN1 instruments, respectively.

Onset Detection Results For Each Participant

For more details please refer to Experiment 3 in the paper.

Onset detection results from CNN system.

Figure 3. Mean F-measure, precision and recall for CNN method calculated for each annotator and instrument.

Figure 3 extends the per-instrument means plotted in Figure 7 in the paper.