Not graded. So why do it?
Not graded. So why do it?
Not graded. So why do it?
Records
Almost all business data you'll deal with is organized like this:
Episode number | Title | Length |
---|---|---|
1 | Nobody Listens to Paula Poundstone | 51.37 |
2 | Maintaining friendships | 50.75 |
3 | Audiologist Michele Sherman talks ears | 48.9 |
4 | The Survivalist! | 53.22 |
(It's a data set about my fave podcast, Nobody Listens to Paula Poundstone. Tell your old people about it.)
The data is about one type of thing: NLTPP episodes. Each line is called a record, row, or maybe entity. A record describes one episode.
Records are made of attributes, also called fields. Here, each episode has three attributes: number, title, and length. Every record has the same set of attributes, although some data might be missing.
Each attribute is the same data type. Here:
- All episode numbers are integers.
- All titles are strings.
- All lengths are floats.
A data set in memory
We need to have a data set in memory, so we can analyze it. There's a problem, though. Until now, each variable only has one piece of data in it, like:
- The variable
legs
is an integer with the value 8. Or 4, or 2.legs
can only hold one value, though. It can't be 8 and 4. - The variable
weight
is a float with 23.1 in it. It can only have one value. - The variable
family_name
is a string with "Park" in it. It can be Park, Smith, Felber, whatever, but only one name.
Now we have bunches of data. We might have a customer data set with thousands of records. What do we do?
We need variables that somehow store many values together, and make it easy to get each one when we need it. Just as we have strings, floats, ints, and booleans, we need a new data type to hold lots of data in a variable.
We'll actually need two new data types:
- Store fields in a record
- Store a collection of records
Let's do the first one.
Dictionary
Dictionaries are perfect for storing individual records. Here's an example of a Python dictionary.
- animal1 = {
- 'common name': 'Red kangaroo',
- 'species name': 'Osphranter rufus',
- 'length': 1.5,
- 'weight': 74,
- 'url': 'https://en.wikipedia.org/wiki/Red_kangaroo'
- }
animal1
is the variable containing one record, as a dictionary. It has attributes, each one with a key
and a value
. The keys are usually strings. The values can be anything. Add as many key/value pairs as you like.
Python knows it's a dictionary because of the braces (the {}
). Other types use different symbols, like ()
and []
.
by andrework
Another dictionary:
- best_pokemon = {
- 'name': 'Snorlax',
- 'generation': 1,
- 'pokedex number': 143
- }
(Snorlax is my spirit animal.)
That's one Pokémon record.
You set the values of individual fields like this:
- best_pokemon['rating'] = 10
Notice that rating
wasn't in the original record. We've added it:
- best_pokemon = {
- 'name': 'Snorlax',
- 'generation': 1,
- 'pokedex number': 143,
- 'rating': 10
- }
Doing more things:
- # Changing a value.
- best_pokemon['rating'] = 11
- # Appending to the name.
- best_pokemon['name'] += ' (the best)'
- # Testing a value.
- if best_pokemon['generation'] == 1:
- print('OG!')
- # Input a value.
- best_pokemon['pokedex number'] = int(input("What's the Pokedex number? "))
Basically, you can do anything with a dictionary_name[key]
that you can do with a variable. Calculate with it, input it, output it, whatevs. In reality, a dictionary_name[key]
is a regular variable.
Here's some data again.
Episode number | Title | Length |
---|---|---|
1 | Nobody Listens to Paula Poundstone | 51.37 |
2 | Maintaining friendships | 50.75 |
3 | Audiologist Michele Sherman talks ears | 48.9 |
4 | The Survivalist! | 53.22 |
Here's the same data as four dictionaries, once for each record.
- an_episode = {
- 'Episode number': 1,
- 'Title': "Nobody Listens to Paula Poundstone",
- 'Length': 51.37
- }
- another_episode = {
- 'Episode number': 2,
- 'Title': 'Maintaining friendships',
- 'Length': 50.75
- }
- yet_another_episode = {
- 'Episode number': 3,
- 'Title': 'Audiologist Michele Sherman talks ears',
- 'Length': 48.9
- }
- yet_yet_another_episode = {
- 'Episode number': 4,
- 'Title': 'The Survivalist!',
- 'Length': 53.22
- }
There's a problem, though. We have four variables, each containing a dictionary. But there are hundreds of episodes. Creating hundreds of different variables, one for each episode, would be a pain.
Lists
What we need is to put a bunch o' dictionaries together, in a collection. There's a data type called list
that does the job.
A list is a sequence of individual values. The values can be strings, floats, dictionaries, anything. Here's a list of strings, Australian state names.
- states = ['Queensland', 'New South Wales', 'Victoria',
- 'Tasmania', 'South Australia', 'Western Australia']
I'm using individual values for now, to keep it simple. We'll bring dictionaries back in later.
Python uses []
for lists, as it uses {}
for dictionaries.
states
can be any size. Australia has six states. The US has 50. No problem. A list can contain as many values as we like. A thousand? No problem. 65,536? OK.
You create a list like this:
- name = [values]
Values can be MT, and often is at the start of a program. Like:
- movies = []
Here are things you can do with lists.
list_name.append(thing)
addsthing
to the end of the list.len(list_name)
tells you how many items are in the list.- And lots more.
You can get items from a list in two main ways. First, you can use an index, like states[3]
. Indexes are always numbers.
Paste this into the console.
- states = ['Queensland', 'New South Wales', 'Victoria',
- 'Tasmania', 'South Australia', 'Western Australia']
- states[0]
What did you get?

Adela
The first value, Queensland.
Right.
What's states[1]
? Answer before you try it.

Georgina
It's the second element.
So data_set[0]
is the first element, and data_set[1]
is the second?
Aye. The first element's index is zero. The reason for that is buried in software history. It's not relevant for this course.
So the six values in the list are:
- states[0]
- states[1]
- states[2]
- states[3]
- states[4]
- states[5]
Try states[6]
in the console. What happens?

Georgina
I got IndexError: list index out of range
.
states[5]
is the last one.
Right!
In your own words, explain what this program does, without running it.
- fruits = []
- done = False
- while not done:
- fruit = input('Type the name of a fruit, or bye to quit: ')
- fruit_normalized = fruit.lower().strip()
- if fruit_normalized == 'bye':
- done = True
- else:
- fruits.append(fruit)
- print(fruits)

Adela
It asks you to type a fruit. It adds the fruit to a list. It keeps asking until you type bye. Then it shows the list.
Indeed! (Try running it.)

Ethan
About these lines:
- fruit = input('Type the name of a fruit, or bye to quit: ')
- fruit_normalized = fruit.lower().strip()
- if fruit_normalized == 'bye':
- done = True
- else:
- fruits.append(fruit)
You get the fruit in line 4, then normalize it in line 5. But you use a new variable, fruit_normalized
instead of putting the normalized value back into fruit
. Why?
Why?

Georgina
Ooo! I see it.
You want to add whatever the user typed to the list, with uppercase characters and everything. But when you normalize, which you do to check for bye, you lose things like capitalization.
So, you keep the original fruit
around, and make a new variable (fruit_normalized
) to normalize and test for bye
.
Exactly! Nice work, Georgina.
for
You saw how we can use indexes to get the values in the list directly, like states[2]
gets the third value (the first one is states[0]
).
Often, we want to go through all the items in a list to compute some stats, or do something else. We want to get the values one at a time.
We could use a while
loop and a counter, but there's as easier way: a for
loop.
for
is a loop, like while
. Remember while
loops while a condition is true, like:
- count = 1
- while count <= 5:
- print('Doggos! ')
- count += 1
That loop prints Doggos!
five times.
Here's a loop that prints each state.
- states = ['Queensland', 'New South Wales', 'Victoria',
- 'Tasmania', 'South Australia', 'Western Australia']
- counter = 0
- while counter < len(states):
- print(states[counter])
- counter += 1
- print('OK, bye!')
counter
starts at 0, so the first value printed is states[0]
. The last one printed is when counter
is less than 6, the number of items in the list. The last value is 5, so the last one printed is states[5]
.
We could do that, but for
is easier.

Ray
Easy is good.
Aye, 'tis so.
- for var in collection:
- Do something with var
... runs Do something
for each item in the list. Do something
can be as many lines of Python as you like.
For example:
- states = ['Queensland', 'New South Wales', 'Victoria',
- 'Tasmania', 'South Australia', 'Western Australia']
- for state in states:
- print(state)
The first time through the loop, state
is equal to the first element, 'Queensland'
(that's where I'm from). So line 4 prints Queensland
.
The second time through the loop, state
is equal to the second element, 'New South Wales'
. Line 4 prints New South Wales
.
And so on, until the last element is run through the code block. The code block is the stuff indented inside the for
loop.
Add two new states to the list. Call them what you want. Run the program again. Did it work?

Ethan
My code is:
- states = ['Queensland', 'New South Wales', 'Victoria',
- 'Tasmania', 'South Australia', 'Western Australia',
- 'Wombatland', 'Koalaland']
- for state in states:
- print(state)
Cool. Did the for
loop change?

Ethan
No, it didn't
Indeed! The for
loop works no matter how many items there are in the list.
Loop over a list of dictionaries
You have a data set. Each record is a dictionary. All the records are in a list. Use a for
loop to run through each record in the list.
Lists of dictionaries
Our goal is to store records and fields in memory, so we can easily read a file like this:
- "Episode number","Title","Length"
- 1,"Nobody Listens to Paula Poundstone",51.37
- 2,"Maintaining friendships",50.75
- 3,"Audiologist Michele Sherman talks ears",48.9
- 4,"The Survivalist!",53.22
This is a comma-separated values (CSV) file. You'll learn about them in the next lesson.
A dictionary is a good way to store one record. How to store a bunch of records? In a list of dictionaries!
- episodes = [
- {
- 'Episode number': 1,
- 'Title': "Nobody Listens to Paula Poundstone",
- 'Length': 51.37
- },
- {
- 'Episode number': 2,
- 'Title': 'Maintaining friendships',
- 'Length': 50.75
- },
- {
- 'Episode number': 3,
- 'title': 'Audiologist Michele Sherman talks ears',
- 'Length': 48.9
- },
- {
- 'Episode number': 4,
- 'Title': 'The Survivalist!',
- 'Length': 53.22
- }
- ]
Because of the []
, Python knows you want a list. Each list item has {}
, with is Pythonese for a dictionary.

Georgina
That's so cool!
Aye!
You can store as many items in a list as you want. So, any number of records.
What does this output? Type your answer before you run the code.
- episodes = [
- {
- 'Episode number': 1,
- 'Title': "Nobody Listens to Paula Poundstone",
- 'Length': 51.37
- },
- {
- 'Episode number': 2,
- 'Title': 'Maintaining friendships',
- 'Length': 50.75
- },
- {
- 'Episode number': 3,
- 'title': 'Audiologist Michele Sherman talks ears',
- 'Length': 48.9
- },
- {
- 'Episode number': 4,
- 'Title': 'The Survivalist!',
- 'Length': 53.22
- }
- ]
- for episode in episodes:
- print(episode['title'])

Ray
I like this! It outputs:
- Nobody Listens to Paula Poundstone
- Maintaining friendships
- Audiologist Michele Sherman talks ears
- The Survivalist!
Right! Woohoo!
Summary
- Almost all the business data sets are groups of records. Records are made of attributes, also called fields.
- A convenient way to represent data sets in Python is as lists of dictionaries.
- Use
for
loops to process lists of dictionaries.
Exercise
Sum of episode lengths
Write a program to show average episode length. Show the number of episodes as well.
Paste this into your code. Use it without any changes:
- episodes = [
- {
- 'Episode number': 1,
- 'Title': "Nobody Listens to Paula Poundstone",
- 'Length': 51.37
- },
- {
- 'Episode number': 2,
- 'Title': 'Maintaining friendships',
- 'Length': 50.75
- },
- {
- 'Episode number': 3,
- 'title': 'Audiologist Michele Sherman talks ears',
- 'Length': 48.9
- },
- {
- 'Episode number': 4,
- 'Title': 'The Survivalist!',
- 'Length': 53.22
- }
- ]
Here's what the output should be.
- Number of episodes: 4
- Average length: 51.06 minutes
Use a for
loop.
Upload a zip of your project folder. The usual coding standards apply.