Melodic Extraction

automatically estimating the pitch sequence of the melody directly from the audio signal ... [1]

Paiva's algorithm

Our system comprises three main modules: pitch detection, de- termination of musical notes (with precise temporal boundaries, pitches, and intensity levels), and identification of melodic notes.... Generally, most current systems, including ours, are based on a front-end for frequency analysis (e.g., Fourier Transform, autocorrelation, auditory models, multi-rate filterbanks, or Bayesian frameworks), peak-picking and tracking (in the magnitude spectrum, in a summary autocorrelation function, or in a pitch probability density function), and postprocessing for melody identification (primarily rulebased approaches based on perceptual rules of sound organization, musicological rules, path-finding in networks of notes, etc.). One exception is Poliner and Ellis (2005), where the authors follow a different strategy by approaching the melody-detection prob- lem as a classification task using Support Vector Machines. [2]

Melody

we define melody as ''the domi- nant individual pitched line in a musical ensemble.'' Also, we define “line” as a sequence of musical notes, each characterized by specific temporal boundaries as well as a corresponding MIDI note number and intensity level [2]

[1] J. Salamon. Melody Extraction from Polyphonic Music Signals. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain, 2013.]

[2] Rui Pedro Paiva, Teresa Mendes, and AmA ̃lcar Cardoso. Melody detection in polyphonic musical signals: Exploiting perceptual rules, note salience and melodic smoothness. Computer Music Journal, 30:80–98, 12 2006.

Extra

http://ismir2004.ismir.net/melody_contest/results.html

Algorithm Structure

  1. Segmenting audio based on Onset Detetion : this allows us to reduce the scope of the frequency analysis and enhance accuracy.

  2. Estimating the fundamental frequency of the segments

    1. autocorrelation
    2. dft estimation
  3. Concatenating the f0 estimates into an array of the same length as the original sound file

Load an audio file

Detect Onsets

Librosa's Onset Detection

Segment the Audio

Split the audio file into segments

Here is a function that adds 300 ms of silence onto the end of each segment and concatenates them into one signal.

Later, we will use this function to listen to each segment, perhaps sorted in a different order.

Listen to the newly concatenated signal.

Fundamental Frequency Estimation

1. Autocorrelation Method

run the function and estimate

Exercise

Make a for loop to compute the NOTE value using librosa.hz_to_note of all segments

Now, Use the generate_tone function below to modify your for loop so as to return a "tones" array with all the tones instead of segments

2. DFT f0 Estimation method

Exercise

use the dft_estimate function instead of the autocorrelation to generate another tones array called dft_estimation_tones and compare

Constant Q-Transform

The constant-Q transform, simply known as CQT transforms a data series to the frequency domain. It is related to the Fourier transform... can be thought of as a series of filters logarithmically spaced in frequency.