Goat influencers

Challenge
No

More and more goats are following YouTube influencers. But what type of content are they most interest in? Write a program to work out which category of influencer has the most growth.

Download this data set. Here's part of it:

  • Influencer,Category,Last year,This year
  • Aisha,entertainment,84789,131902
  • Andreas,entertainment,60528,103209
  • August,tech,89564,103189
  • Bertha,lifestyle,91941,137909

Each record has four fields. Here they are, with their validation rules.

  • Goat name. Cannot be empty.
  • Content category. One of entertainment, lifestyle, or tech. Extra leading or trailing spaces are OK, and case doesn't matter, So " Tech " is valid, but "t3ch" is not.
  • Last year's subscribers: number, zero or more.
  • This year's subscribers: number, zero or more.

Only include valid records in your analysis.

Write a program to show the average changes in subscribers for each category. Use at least three functions. Using the data machine functions from this textbook might be easiest. Like cleaning the record set, adding the change in subscribers to the data set as a computed field, getting a record subset for each category, and so on. Use the statistics module if you want; I did in my solution.

Here's what your program's output should be:

  • Goats Influencers
  • ===== ===========
  •  
  • Subscriber changes by category
  •  
  • Counts
  • ------
  • Valid records: 45
  • Invalid records: 5
  • Total records: 50
  •  
  • Category mean changes
  • -------- ---- -------
  • Entertainment: 36453.7
  • Lifestyle: 15022.7
  • Tech: 14326.8
  •  
  • Category with the highest change: Entertainment

No, don't write a program with just a bunch of print statements. Someone tries that every so often. Your program's output should change if the data changes.

Include the record counts as shown.

The "Category mean changes" are average changes in subscribers between last year and this, that is, this year minus last year. So for this record...

  • Aisha,entertainment,84789,131902

... the value to be analyzed is 131902 - 84789. Check the computed fields lesson if that is not clear.

Write a program to read all the data in the CSV file, perform the calculations, and output the results in the format shown. The averages should be to one decimal place, as you can see. The usual coding standards apply.

Upload your solution here, as usual, not to Moodle.

Attachments