Pandas and Spotify Data

Goals this week.

Importing Pandas

First, we need to import pandas. We should also import some data!

Alright! Now our data is available in a dataframe.

We can look at the whole thing by typing in beyonce, or we can get a brief look at it by typing in print(beyonce).

For an even more succinct (and possibly more helpful) view of the dataframe, the .info function can be used.

There are some methods and attributes associated with Pandas objects that allow for us to more easily retrieve information. The most commonly used ones are:

With these, you can get the first or last n rows of a dataframe. Unlike unix, in which the default is 10, the default in Pandas is 5.

There are also some And attributes:

With the .columns function, you can access the column labels of the dataframe.

Using .dtypes returns the data types of each column in the dataframe.

The .shape function returns number of rows and columns in the dataframe.

Picking out specific things.

.iloc, (integer location) can look up rows by position.

.loc can find columns by label or lookup.

With .loc we can select columns based on both their row index and column name. For example:

We can find columns by names like so:

Calculating Summary Statistics

Let's look at how to find some summary statistics first.

Testing a hypothesis

Let's state a hypothesis:

H1: Beyonce's music becomes less "danceable" over time.

Results:

We were not able to reject the null hypothesis.

Let's keep digging!

Is Beyonce's music faster than Taylor Swift's?

Exercise

We've done a lot of things on a single object.

Wouldn't this be a good use of a class? How would we turn all of these ideas that we've just analyzed into a class?