Write a function taking a data set as a param, and returning another data set with a subset of the original records, based on criteria you choose.
You have a data set with lotsa records. You want to include some of them in an analysis. For example, you have sales data for all regions, and you just want to analyze sales in the northwest region.
Say you have data from several neighborhoods in a data set, and you want records for one neighborhood.
Write a function you can call like this:
- angora_data_set = get_neighborhood_records('Angora Acres', clean_halloween_data_set)
clean_halloween_data_set
is a data set. angora_data_set
is a subset of records in clean_halloween_data_set
from one neighborhood.
An example:
- def get_neighborhood_records(neighborhood_name_to_find, data_set):
- # Normalize name to find.
- neighborhood_name_to_find = neighborhood_name_to_find.strip().lower()
- neighborhood_records = []
- # Loop over records.
- for record in data_set:
- # Get normalized name for current record.
- neighborhood_name_in_record = record['Neighborhood']
- neighborhood_name_in_record = neighborhood_name_in_record.strip().lower()
- # Is it the one we want?
- if neighborhood_name_in_record == neighborhood_name_to_find:
- # Aye.
- neighborhood_records.append(record)
- return neighborhood_records
In this example, a value for the filter criteria is passed in. You might or might not do that, depending on your goals.