Not graded. So why do it?
Not graded. So why do it?
Not graded. So why do it?
Not graded. So why do it?
A list of dictionaries is a good way to store a data set in memory. Now, how do we read the data into memory?
Let's write a function for that. You can reuse it as much as you like.
CSV
Data is typically stored in files or databases, or is fetched over the internet. We'll only discuss files in this course, though the processing isn't much different for the other sources. Once you've fetched the data, you analyze it the same way.
Here's what the data from before would look like in a CSV (comma-separated values) file.
- "Episode number","Title","Length"
- 1,"Nobody Listens to Paula Poundstone",51.37
- 2,"Maintaining friendships",50.75
- 3,"Audiologist Michele Sherman talks ears",48.9
- 4,"The Survivalist!",53.22
The first row gives headers for each column. Then there's a row for each entity.
Function to read a data set
Make a new project. Put this file (that's a link) in the project folder. It's the data set above.
Now make a Python file, and put this code into it.
- import csv
- def read_csv_data_set(file_name):
- '''
- Read a data set from a CSV file.
- Parameters
- ----------
- file_name : string
- Name of the CSV file in the current folder.
- Returns
- -------
- data_set : List of dictionaries.
- Data set.
- '''
- # Create a list to be the return value.
- data_set = []
- with open('./' + file_name) as file:
- file_csv = csv.DictReader(file)
- # Put each row into the return list.
- for row in file_csv:
- data_set.append(row)
- return data_set
- episodes = read_csv_data_set('episodes.csv')
- print(episodes)
Here's the file again, and the data structure read_csv_data_set
creates.
File | Data structure |
---|---|
|
|
As you can see, a list of dictionaries.
How does it work?
- import csv
- def read_csv_data_set(file_name):
- '''
- Read a data set from a CSV file.
- Parameters
- ----------
- file_name : string
- Name of the CSV file in the current folder.
- Returns
- -------
- data_set : List of dictionaries.
- Data set.
- '''
- # Create a list to be the return value.
- data_set = []
- with open('./' + file_name) as file:
- file_csv = csv.DictReader(file)
- # Put each row into the return list.
- for row in file_csv:
- data_set.append(row)
- return data_set
Line 1 (import csv
) imports Python's csv
module. There are other ways to read CSV, but this one is the easiest to learn.
Line 19 (data_set = []
) creates the thing the function will return. data_set
is a list.
Line 20 (with open('./' + file_name) as file
) opens a file, using the parameter you pass to the function as the file name. with
closes the file automatically, as soon as its code block finishes.
About the './'
thing. It means "the current folder," so Python will look for the file in the same folder as the program.
Line 21 (file_csv = csv.DictReader(file)
) reads the entire contents into the variable file_csv
. file_csv
is a DictReader
, one of Python's many special types. A DictReader
can read a CSV file and make a list of dictionaries.
DictReader
s are a bit of a pain, though, so lines 23 and 24 copy the data from file_csv
into data_set
, the thing that's returned.
Check out these lines:
- for row in file_csv:
- data_set.append(row)
Line 23 loops over the elements in file_csv
. The first time through the loop, row
is the first element of file_csv
, that is, the first row from the CSV file. Second time through the loop, row
is the second element of file_csv
, that is, the second row from the CSV file. And so on.
What to do with each row
? Line 24 appends it to the list data_set
.
Run the program. You should see a list of episodes, printed by print(episodes)
.
Switch to the console. The program left its variables behind, so we can check out what episodes
is.
Try this is the console.
- type(episodes)
It will tell you the data type of the variable episodes
.
What type is episodes
?

Ray
It's a list.
Right!
What type is episodes[0]
? How about episodes[1]
?
Answer before you try it in the console.

Ethan
They're dictionaries.
Correct!
Let's look at the individual fields in the dictionaries. Try:
- type(episodes[3]['Title'])
What's the value of episodes[3]['Title']
? What's its type?

Georgina
That's the title of the fourth episode, 'The Survivalist!'. It's a string.
Good!
Without typing in the console, what's the value of episodes[0]['length']
?

Ethan
It's the length of the first episode, 51.37.
Let me try it in the console...
What?! I got an error: KeyError: 'length'
What's that about?
Anyone?

Adela
I think I see it. You get the same error from episodes[0]['title']
, but episodes[0]['Title']
is OK.

Ray
Huh? They're the same... Oh, you've got to be kidding. Title
works, but title
doesn't.
Why does Title
work, but not title
? Where did Title
come from?

Georgina
Title
comes from the first line in the CSV file, that gives the column names.
Right! Nice work. Here's the CSV file:
- "Episode number","Title","Length"
- 1,"Nobody Listens to Paula Poundstone",51.37
- 2,"Maintaining friendships",50.75
- 3,"Audiologist Michele Sherman talks ears",48.9
- 4,"The Survivalist!",53.22
The lesson:
Note
Dictionary keys are case-sensitive.
Use the function

Ethan
The code you gave us, to read the CSV file. Should we just use it? As is?
Aye, that's why I gave it to you. You can modify it if you like, though.
There's more to CSV files, a lot more, but let's leave it at that for now. That's enough for us to get into data analysis.
Using the data
Suppose the episode data was in a file called episodes.csv
.
Write a line of code that would read the episodes data into a list of dictionaries named episodes
.

Ray
I got: episodes = read_csv_data_set('episodes.csv')
Right! Short and sweet.
Write two lines that will print the lengths of all the episodes.

Adela
I got this.
- for episode in episodes:
- print(episode['Length'])
Good! Once you have those values, you can do anything with them you want. Print them, add them up, whatevs.
A new pattern
Let's add a data machine pattern to the pattern catalog.
A function to read a comma-separated values (CSV) file into a data set.
When you start a new project, you can use the pattern catalog to remind yourself of useful chunks of code.
Summary
- A list of dictionaries is a good way for Python to have a data set in memory.
- CSV (comma-separated values) files are commonly used in analysis.
- The function
read_csv_data_set
reads CSV data sets into memory. Copy-and-paste it as you need.