Seeing errors
Sometimes, errors in data can be hard to see. For example...
Is there anything wrong with this?
Snor1ax is the BEST!

Adela
Should be Snorlax with an l (letter l), not Snor1ax with a 1 (the digit).
Right! When you're skimming many records, that can be hard to see.
It can help to print
out which records have errors. Maybe show the Goat field of invalid records, like this:
- Bad record:
- Bad record:
- Bad record: Bertha
- Bad record: Bessie
- Bad record: Boyd
- Bad record: Bridgette
- Bad record: Carrie
- Bad record: Darell
- Bad record: Deborah
- Bad record: Gerald
- Bad record: Johnnie
- Bad record: Long
- Bad record: Vincent
Two records are missing goat names. That's what the first two lines show you.
A coupla new lines
Here's code Ray wrote:
- def clean_goat_scores(raw_goat_scores):
- # Create a new list for the clean records.
- cleaned_goat_scores = []
- # Loop over raw records.
- for raw_record in raw_goat_scores:
- # Is the record OK?
- if is_record_ok(raw_record):
- # Yes, make a new record with the right data types.
- clean_record = {
- 'Goat': raw_record['Goat'],
- 'Before': float(raw_record['Before']),
- 'After': float(raw_record['After'])
- }
- # Add the new record to the clean list.
- cleaned_goat_scores.append(clean_record)
- # Send the cleaned list back.
- return cleaned_goat_scores
Add code to show the Goat field of invalid records.

Ethan
Something like:
- for raw_record in raw_goat_scores:
- # Is the record OK?
- if is_record_ok(raw_record):
- ...
- cleaned_goat_scores.append(clean_record)
- else:
- print('Bad record:' + str(raw_record['Goat']))
Nice!
Make it a param
You could add a param to clean_goat_scores
to control whether it shows bad records. We could even make it an optional param.
- def clean_goat_scores(raw_goat_scores, show_bad_records):
- ...
- for raw_record in raw_goat_scores:
- # Is the record OK?
- if is_record_ok(raw_record):
- ...
- else:
- if show_bad_records:
- # Show the record that has an issue.
- print('Bad record:' + str(raw_record['Goat']))
- ...
If you want to see the bad record goat names...
- cleaned_goat_scores = clean_goat_scores(raw_goat_scores, True)
If you don't want to see them...
- cleaned_goat_scores = clean_goat_scores(raw_goat_scores, False)
Make it optional
We can do one better, so if you call the function in the usual way...
- cleaned_goat_scores = clean_goat_scores(raw_goat_scores)
... you don't see the messages. But you can put the param in if you want to see the messages.
- cleaned_goat_scores = clean_goat_scores(raw_goat_scores, True)
Python supports optional parameters. If the caller leaves a param out, you can tell the function what value to give it.
Here's the final code, with docstring. If it's called without the second param...
- cleaned_goat_scores = clean_goat_scores(raw_goat_scores)
... you won't get the error report.
- def clean_goat_scores(raw_goat_scores, show_bad_records = False):
- '''
- Clean score data. Optionally display the names of goats in invalid records.
- Parameters
- ----------
- raw_goat_scores : Data set (list of dictionaries)
- Data set with possible errors, and wrong types.
- show_bad_records : boolean, optional
- Identify bad records? The default is False.
- Returns
- -------
- clean_scores : Data set (list of dictionaries)
- Valid records only.
- '''
- # Create a new list for the clean records.
- cleaned_goat_scores = []
- # Loop over raw records.
- for raw_record in raw_goat_scores:
- # Is the record OK?
- if is_record_ok(raw_record):
- # Yes, make a new record with the right data types.
- clean_record = {
- 'Goat': raw_record['Goat'],
- 'Before': float(raw_record['Before']),
- 'After': float(raw_record['After'])
- }
- # Add the new record to the clean list.
- cleaned_goat_scores.append(clean_record)
- else:
- if show_bad_records:
- # Show the record that has an issue.
- print('Bad record:' + str(raw_record['Goat']))
- # Send the list back.
- return cleaned_goat_scores

Adela
Hey, that's cool!
Summary
- Data errors are hard to see sometimes.
- You can add a param to a cleaning function to show bad records.
- You can make the param optional.