Pitch Analysis

Click For Python Demo:

A Multi-Dimensional Model

Chroma Analysis

To analyze the pitch we have taken various steps in processing the data using frequency analysis. Before messing with the signal, we load in the audio by passing it in as an array along with a predetermined sampling rate. Our first test used the chroma function which essentially plots a short time frequency transform (STFT) of the signal with respect to time, which showcases the pitch class and allows us to visualize the notes being playing at any given time as blocks, with a colorbar signifying the magnitude of the various frequencies. However, the plot is not just the STFT but the constant Q-transform (CQT):



This is more suitable for musical representation in the frequency domain, since the output of the transform is the amplitude/phase against the logarithmic frequency, which allows for a greater representation of octave range, with fewer frequency bins needed. The results of running the guitar recording over this function are shown in figure 1.

Screen Shot 2022-04-27 at 11.45_edited.png

Correlation Martrix

The correlation matrix algorithm makes use of the basic functions from the chromogram to correlate the chroma of the two musical pieces with respect to time. After calculating the chroma constant Q-transform, a small time delay embedding is applied to ensure both pieces are synchronized and can be directly compared note to note. Followingly, a librosa cross similarity function is ran which uses an affinity recurrence matrix to correlate the two CQT’s which are then plotted and shown in figure 2.





The figure tells how well two similar pieces are aligned and synchronized. Evidently, the ideal score would be a straight line y=x of magnitude 1 going through the origin in a fully symmetrical plot. The straight line suggests two pieces have similar chromas arriving at the same moment. An offset above/below the y=x line means the chroma of piece x arrives before/after piece y. Figure 2 compares the two guitar pieces that show great similarity and verify that the algorithm is showing some success. The correlation matrix is used as a visual feedback representation to showcase how well the practice piece matches the ideal one overall in terms of chroma and time. Any linear dark lines parallel to y=x but with some time delay, indicate that the same notes where being playing but with at different times.


Screen Shot 2022-04-26 at 10.54.57 PM.png

Note-to-Note and Scoring

Our note-to-note comparison algorithm was largely inspired by IEEE’s research article “Assessment of musical representations using a music information retrieval technique”. There are numerous helper functions running to calculate the notes played for each piece by doing an FFT on segments of each note. First, a segmenting function we wrote runs a librosa split function which splits the audio into segments by spotting all silent segments in-between the notes through setting a top_dB limit at 15dB defining silence. This evidently incurs some issues wherein two audios with different amplitudes and signal magnitudes will result in a different size of silent segments being split. Thus, the algorithm uses the splitting segments of the ideal piece to divide the second piece reciprocally. Then all these segments are appended into arrays for both musical pieces. Another function then chooses which hop length to use as a variable for the analysis, by choosing 512 if tempo is smaller or equal to 90bpm, and 256 if tempo is greater than 90bpm. The most important function that calculates the notes, first loads in the length of the segmented array. A windowing variable is then stored by flooring half the length of the segment and subtracting 1. A frequency array is passed in as the size of the segment multiplied by the sampling rate, and then the FFT of the windowed segment is calculated. The frequency of the note is determined to be that of the maximum frequency coefficient, using the argmax python function, and finally, the librosa hz_to_note function calculates the musical note from that frequency. The algorithm then stores an array of all the notes calculated in the piece and prints both arrays vertically side by side for comparison. An example of the output is shown in figure 3 below. The scoring merely counts how many notes match between the two columns which equals to the success rate and then weights that score by 25% to be added in with the weighted tempo score, as seen in the bottom line of figure 3. 

There have been a few obstacles encountered by our code. First, the splitting function is limited by using the splitting of the ideal audio in order to slice up both audio pieces into smaller sample segments, which turnt out to have the highest accuracy, and helped to get two arrays of equal length for the note-to-note analysis. Another issue that we did not get to resolve is some parallel harmonics being more dominant than the actual note being played, which could possibly be an issue of the splitting function not running for each audio piece separately.