Data machine: Computation

Summary

Use existing computation functions where you can, like statistics.mean. If you can't, like when you identify records with lowest/highest values in a data set, write your own loopy function.

Situation

You need some analysis you can't do with packaged functions, like statistics.mean.

Action

Write a function looping over a data set, doing the computations you need. The deets vary, depending on what you want to compute.

An example function finding the name of the goat with the largest After value.

  1. def find_largest_after(clean_goat_scores):
  2.     largest_after_value = -99999999999
  3.     largest_after_name = ''
  4.     for record in clean_goat_scores:
  5.         goat_name = record['Goat']
  6.         goat_after_value = record['After']
  7.         if goat_after_value > largest_after_value:
  8.             # Remember the new large value.
  9.             largest_after_value = goat_after_value
  10.             # Remember the name for that record.
  11.             largest_after_name = goat_name
  12.     return largest_after_name, largest_after_value

Line 7 compares the After for the current record with the largest so far. For that to work, you need to initialize the largest-value-so-far variable to something very small (line 2). The first time through the loop, the first value of After will be greater than the largest-value-so-far, and it will become the first value.

Where referenced