Scholars have been encoding music into a searchable format since the 1950s (Bronson, 1951), and we might consider "the first wave" of computational music encoding to be in the 1960s, with Arthur Mendel's Josquin project at Princeton, and the DARMS project in New York (led by Stefan Bauer-Mengelberg).
Although the basic idea of encoding seems like a fairly straightforward problem–how might we refer to a musical score as a text object?–it was dealt with in wildly different ways. Here are just a couple of the most notable formats, but know that there are many more.
The DARMS project made the decision to not actually encode pitch, but simply location on the staff. See the "space code" from the DARMS Manual below:
Here is an example of hw pitches might be encoded:
DARMS was used quite extensively for typesetting, and was actually used until surprisingly recently in some of the major publishing houses.
The Plaine and Easie format has been most notably used by RISM, which catalogues millions of incipits from libraries around the world.
Kern (Huron, 1995) is the native format of the Humdrum toolkit.
This webpage does a great job of showing how the format works.
This page from CCARH is quite good, and worth a read.
The Humdrum toolkit is a set of command-line tools (originally written in AWK; the Humdrum extras are written in C++) that can be combined together to ask questions of a score.
Let's start learning some Python! There are some important things to remember:
Historically, this class has been used as a way of teaching Humdrum. We would begin with symbolic analyses in Humdrum, and the data analysis in R.
Python is a good all-purpose language, and it's used extensively inside academia and out. Our hope is that we can use a single language to cover text-processing of musical scores, data analysis, audio analysis, and machine learning elements this semester.
Variables are defined in a pretty straightforward way. Here we have variables defining a string ("apples") and a float (3.1415). Another type
## THIS IS A COMMENT! Let's declare some variables.
## Here's a string.
apples = "apples"
## here's an integer
number_of_kids = 2
## here's a float
pi = 3.1415
print(apples)
print(type(pi))
print(pi)
print(type(number_of_kids))
apples <class 'float'> 3.1415 <class 'int'>
You can combine things into lists (also called arrays in other languages). See how the list below is defined (with square brackets).
We can also loop over the list! See how a for loop is written in Python below:
my_grocery_list = ["apples", "oranges", "toothpaste", "socks"]
for grocery in my_grocery_list:
print(grocery)
apples oranges toothpaste socks
print(my_grocery_list)
['apples', 'oranges', 'toothpaste', 'socks']
In the above, note that the loop needs a colon at the end of the first line, and a tab in the second line. Anything inside of that loop has to have a least one tab. (It doesn't actually matter if it's tabs or spaces, but certain people prefer one over another, and certain companies like to keep consistency; Google uses four spaces, and says to never, ever use tabs).
## or this one...
for grocery in my_grocery_list:
print("I mustn't forget", grocery+"!")
I mustn't forget apples! I mustn't forget oranges! I mustn't forget toothpaste! I mustn't forget socks!
Conditional statements can include if/else and elif (meaning "else if"). Code becomes very powerful with conditionals...
### Lists, loops and conditionals.
for grocery in my_grocery_list:
if grocery == "socks":
print("I mustn't forget", grocery+"!")
I mustn't forget socks!
for grocery in my_grocery_list:
if grocery == "socks":
print("I mustn't forget", grocery+"!")
else:
print(grocery+", but not as important as socks")
apples, but not as important as socks oranges, but not as important as socks toothpaste, but not as important as socks I mustn't forget socks!
for grocery in my_grocery_list:
if grocery == "socks":
print("I mustn't forget", grocery+"!")
elif grocery == "oranges":
print("I mustn't forget", grocery+", it has vitamin C!!!")
else:
print("I mustn't forget", grocery+", but not as important as socks")
I mustn't forget apples, but not as important as socks I mustn't forget oranges, it has vitamin C!!! I mustn't forget toothpaste, but not as important as socks I mustn't forget socks!
my_five_favorite_composers = ["Bach", "Beethoven", "Brahms", "Britten", "Bernstein"]
for i in my_five_favorite_composers:
print(i)
print()
for composer in my_five_favorite_composers:
print("This is one my favorite composers -", composer+"!")
for composer in my_five_favorite_composers:
if composer == "Beethoven":
print(composer, "is the most famous composer.")
elif composer == "Bach":
print()
else:
print(composer, "is not as famous as Beethoven.")
for composer in my_five_favorite_composers:
if composer == "Bach":
print("However,", composer, "wrote the most famous works for solo violin!")
Bach Beethoven Brahms Britten Bernstein This is one my favorite composers - Bach! This is one my favorite composers - Beethoven! This is one my favorite composers - Brahms! This is one my favorite composers - Britten! This is one my favorite composers - Bernstein! Beethoven is the most famous composer. Brahms is not as famous as Beethoven. Britten is not as famous as Beethoven. Bernstein is not as famous as Beethoven. However, Bach wrote the most famous works for solo violin!
Below is a Polish folksong. How would you print it off without the global metadata (the lines that begin with "!")?
## this prints it off by line.
for line in melody:
if "!" not in line:
print(line)
- Counts the unique key signatures
- Sums the durations (how many quarter notes, eighth notes, etc.)
- Sums the pitches?
Copy this Colab document and create your own. You can then share the link on Carmen.
### all of this is just allowing me to read files from Google drive; it's not normally this complicated.
from google.colab import drive
import re
import numpy as np
import glob
drive.mount('content', force_remount=True)
import requests
## this is a function that reads all the files. We will get to it later.
def filebrowser(ext="content/MyDrive/python_scratch/polska/*.krn"):
"Returns files with an extension"
return [f for f in glob.glob(f"*{ext}")]
file_list = filebrowser()
## this grabs a specific file...
melody = [line.rstrip() for line in open(file_list[2], "r+")]
### this counts only unique key signature
## create an empty list
key_signature = []
for line in melody:
if "*k[" in line:
print(line)
x.append(line)
len(key_signature)
### this sums different durations (quarter notes, eighth notes, etc)
def duration_counter(number):
x = []
for line in melody:
##only grab duration information
if "!" not in line and "=" not in line and "*" not in line:
## remove all other elements except durations from a line
x.append(re.sub("[^c-b._\]\[]", "", line))
plt.hist(x)
duration_counter(1, f+1):
### this sums the pitches
## create an empty list
pitches_list = []
for line in melody:
##only grab pitches
if "!" not in line and "=" not in line and "*" not in line:
## remove all other elements except pitches from a line
pitches_list.append(re.sub("[^0-9._\]\[]", "", line))
print(pitches_list)
melody = [line.retrip()for line in open(file_list[0, f+1], "r+")]
unique_key_signature = []
for line in melody:
if "*k[]" in line:
print(line)
len(unique_key_signature)
Today we will cover:
Many of the first corpus studies were simply counting the occurrences of musical events. There's a lot to be said for simply counting things. You can better understand the language used, and the the likelihood of events.
For example, much of the work from the 80s through the early-2000s focused on the role of statistical learning. We learn grammars and styles through exposure to events. For example, children learn languages by being exposed to words and grammars, and they are implicitly aware of likely transitions. Therefore, understanding the frequency of those events, and how likely they are to occur, might be able to tell us a great deal.
In order to count things, it's helpful to tally them in lists. There are two ways of adding things to lists as you find them.
The first option is to create an empty list, that you then add things to. To create an empty list, you just have to do something like this:
my_list = []
After this, you can add things to the list with append.
for i in range(1,15):
my_list.append(i)
You can try this below:
### create an empty list.
my_list = []
## run a loop.
for i in range(1,15):
my_list.append(i)
print(my_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Write a list that scans a list from 1 to 100, and only prints odd numbers.
Hint: Python has a modulo operator (%) that allows you to return the remainder of a number. So
15 % 4
returns "3".
The following will search for all even numbers:
num % 2 == 0
So something like the code below will find all numbers that don't have a remainder of 0 when divided by 2:
num %2 != 0
my_list = []
## run a loop.
for i in range(1,101):
if i % 2 != 0:
my_list.append(i)
print(my_list)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99]
A more "pythonic" way to solve this problem might be with a list comprehension. The basic formula for a list comprehension is just something like:
[variable for variable in loop if condition]
So to find even numbers, you could just run the below code:
even = [i for i in range(1,101) if i % 2==0]
print(even)
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]
If we want to count how long the list is, we can use the len
function, which is built in to Python (it means length).
all_numbers = [i for i in range(1000) if i > 0]
len(all_numbers)
999
If you rerun a line of code, it should probably be in a function. If variables are the nouns of your code, a function is the verb. It makes your code do things.
So if I don't want to always run the same statement over and over again, I can just put it into a function:
def favorite_pets():
print("dog")
favorite_pets()
dog
Functions become more powerful with arguments (in the parentheses).
def favorite_composer(selection, age):
if selection == "Bach" and age != "65":
print("You don't mean J.S. Bach, you mean CPE.")
else:
print("J.S. Bach")
favorite_composer("Bach", "65")
J.S. Bach
Write a function that finds evens or odds, depending on the parameter.
### your code here.
def even_ands_odds(choice):
if choice == "even":
even = [i for i in range(1,101) if i % 2 == 0]
print(even)
else:
odd = [i for i in range(1,101) if i % 2 != 0]
print(odd)
even_ands_odds("odd")
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99]
How would we look at all rhythms in a Polish folksong?
###this prints everything
for line in melody:
print(line)
!!!OTL: Die verkaufte Muellerin Es ging ein Mueller ins Wald spazieren, !!!ARE: Europa, Osteuropa, Polen, Galizien !!!YOR: 4, S. 344 !! 1940 gedruckt !!!SCT: Q0086T !!!YEM: Copyright 1995, estate of Helmut Schaffrath. **kern *ICvox *Ivox *M4/4 *k[f#] *G: {4g =1 4dd 4dd 8b 8g 4gg =2 4dd 4dd 8b 8g} {8g 8b =3 8dd 8dd 4dd 4ee 4gg =4 4ee 4dd} {4.dd 8ff# =5 8gg 8gg 4b 4cc 4a =6 2b 4gg} == !!!AGN: Ballade, Menschenhandel, Rettung, Hinrichtung !!!ONB: ESAC (Essen Associative Code) Database: BALLADE !!!AMT: simple quadruple !!!AIN: vox !!!EED: Helmut Schaffrath !!!EEV: 1.0 *-
What about only rhythms?
###this prints only musical information
for line in melody:
if "!" not in line and "=" not in line and "*" not in line:
print(line)
{4g 4dd 4dd 8b 8g 4gg 4dd 4dd 8b 8g} {8g 8b 8dd 8dd 4dd 4ee 4gg 4ee 4dd} {4.dd 8ff# 8gg 8gg 4b 4cc 4a 2b 4gg}
It's useful to be able to substitute things, getting rid of information you don't need and cleaning things up in the process. My favorite way to do this is with "re.sub". "re" stands for "regular expression", which is a fairly old computational tool of searching for things that are not necessarily exact searches.
It's helpful to learn regular expressions as much as possible, especially when dealing with kern files, which are flat text files.
So the expression below is subbing cats for dogs. The logic goes:
re.sub("what you want to change", "what you want to change it with", where you want to change it)
Notice how I need to include the variable at the end.
pet = "cats are the best pet in the world"
corrected = re.sub("c.*w", "dogs", pet)
print(corrected)
dogsorld
##create an empty list
x = []
## run a loop over every line
for line in melody:
##only grab rhythmic information
if "!" not in line and "=" not in line and "*" not in line:
### this is a tough line to parse. It's using regular expressions, and substition.
x.append(re.sub("[^0-9._\]\[]", "", line))
print(x)
### through text.
values, counts = np.unique(x, return_counts=True)
print(f'{values}\n{counts}')
['4', '4', '4', '8', '8', '4', '4', '4', '8', '8', '8', '8', '8', '8', '4', '4', '4', '4', '4', '4.', '8', '8', '8', '4', '4', '4', '2', '4'] ['2' '4' '4.' '8'] [ 1 15 1 11]
import matplotlib.pyplot as plt
plt.hist(x)
(array([15., 0., 0., 11., 0., 0., 1., 0., 0., 1.]), array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]), <a list of 10 Patch objects>)
def rhythm_counter():
x = []
melody = [line.rstrip() for line in open(file_list[6], "r+")]
for line in melody:
##only grab rhythmic information
if "!" not in line and "=" not in line and "*" not in line:
### this is a tough line to parse. It's using regular expressions, and substition.
x.append(re.sub("[^0-9._\]\[]", "", line))
plt.hist(x)
rhythm_counter()
def rhythm_counter(number):
x = []
melody = [line.rstrip() for line in open(file_list[number], "r+")]
for line in melody:
##only grab rhythmic information
if "!" not in line and "=" not in line and "*" not in line:
### this is a tough line to parse. It's using regular expressions, and substition.
x.append(re.sub("[^0-9._\]\[]", "", line))
plt.hist(x)
for file in range(10):
rhythm_counter(file)