Bonus: Plotting

Bonus lessons are optional.

Often it helps to see what data looks like. Python has a bunch o' plotting libraries.

The goal

Say you have some times for the goathalon. Here's part of it.

  • Goat,Running,Swimming
  • Roderick,90.77,93.58
  • Junie,80.1,79.03
  • Bea,86.04,89.8
  • Rodney,82.43,83.98
  • Weldon,86.16,88.22
  • Del,94.71,95.56
  • Charissa,90.28,89.99
  • Gail,89.08,85.32
  • ...

Let's make a chart like this:

Plot

It will show in the Plots tab.

Each point is the times for one goat. For example, Roderick's dot is at 90.77 on the x axis (running times), and 93.58 on y (swimming times).

We'll use the Pyplot module of Matplotlib, a popular library. It comes with Spyder, so you don't need to install anything.

Pyplot has the method scatter, which will create the chart. So, what does scatter want? Here's code from a tutorial:

  1. import matplotlib.pyplot as plt
  2.  
  3. price = [2.50, 1.23, 4.02, 3.25, 5.00, 4.40]
  4. sales_per_day = [34, 62, 49, 22, 13, 19]
  5.  
  6. plt.scatter(price, sales_per_day)
  7. plt.show()

scatter takes two lists, and makes a plot. Here's what this code draws:

Plot

The first point is at (2.5, 34), the first value from each list. You can see it on the chart. The second is at (1.23, 62), and so on.

So in...

  • plt.scatter(price, sales_per_day)

... the first list is the values for the x axis, and the second is values for y.

Our code will include:

  • plt.scatter(running, swimming)

We need two lists, one of running times, and the other of swimming times.

Data prep

We have a plotting method (scatter) that can do the job. But, it needs data in a specific format. So we'll write code to take the data in the format we have (a list of dictionaries), and create variables with the data in the format scatter wants (two lists).

This is a common task for data analysts.

Our data is:

  • Goat,Running,Swimming
  • Roderick,90.77,93.58
  • Junie,80.1,79.03
  • Bea,86.04,89.8
  • Rodney,82.43,83.98
  • Weldon,86.16,88.22
  • Del,94.71,95.56
  • Charissa,90.28,89.99
  • Gail,89.08,85.32
  • ...

We can make the lists by:

  • Use read_csv_data_set to read a list of dictionaries.
  • Make two MT lists.
  • Loop over the goat records. For each one, put the running value in one list, and swimming in the other.

Here's some code to start with.

  1. import csv
  2. import matplotlib.pyplot as plt
  3.  
  4. def read_csv_data_set(file_name):
  5.     ...
  6.     return data_set
  7.  
  8. # Read the CSV file, and make a list of dictionaries.
  9. goatathlon_data_set = read_csv_data_set('goatathlon.csv')
  10. # Make a couple MT lists.
  11. running = []
  12. swimming = []
  13. # Loop over the records.
  14. for goat_record in goatathlon_data_set:
  15.     # For the current record, add the running value to one of the new lists.
  16.     running.append(float(goat_record['Running']))
  17.     # Put swimming into the other one.
  18.     swimming.append(float(goat_record['Swimming']))
  19. # Draw the chart.
  20. plt.scatter(running, swimming)
  21. plt.title('Goatathlon: Running and swimming times')
  22. plt.xlabel('Running')
  23. plt.ylabel('Swimming')
  24. plt.show()

Don't forget to use float (lines 16 and 18), since all the values in the list of dictionaries are strings.

Run the code, and you get:

Plot

Looks like swimming and running times are correlated. A goat who can run fast can swim fast as well.

Summary

  • Python has a bunch o' plotting libraries.
  • Pyplot's scatter method creates a scatter diagram.
  • scatter takes two lists, and makes a plot.
  • We write code to take the data in the format we have (a list of dictionaries), and create variables with the data in the format scatter wants (two lists).

Exercise

Exercise

Curiosity vs GPA

Download this data file. Here's a sample:

  • "Goat","Curiosity","GPA"
  • "Roderick ",3.0,1.9
  • "Junie ",1.7,2.1
  • "Rodney ",4.6,3.0
  • "Weldon ",2.2,1.8

The fields are:

  • Goat: cannot be MT.
  • Curiosity: float from 0 to 5.
  • GPA: float from 0 to 4.

Make a plot like this, but only for valid records.

Output

​Output

Attachments