Roger Goodell’s Magic Prediction Machine

What do I want to accomplish by doing this program?

For the past year, I have been very interested in data science, analytics, Python, and any other buzzword you can think of related to the field. My end goal is to gain experience and apply this knowledge to my current job and all future work. I will consider this a success if i complete the following:

  • build momentum to sustain the basic skill and interest to continue improving and apply my knowledge after the program is over
  • follow through with a full project
  • use pandas to combine various data sets, matplotlib to visualize data, and sklearn to make predictions

Problem Statement

Can I use machine learning to predict if an NFL game will score more or less total points than the line set on gambling sites?

Over and Under directions. Opposite traffic sign.


The NFL is a massive business and there are many forms of gambling: fantasy football,  game outcomes, and in-game betting. The American gambling association estimated that there is over $2.5 billion in legal NFL gambling and potentially hundreds of billions (emphasis on the B) in illegal gambling each year. The NFL is also extremely well documented with lots of public data, available for use. Additionally, there are several studies related to the efficiency of these betting markets.


One very specific type of gambling is betting the over/under of total points scored. For example, the Chicago Bears played  Green Bay Packers earlier this month and the game ended with the Packers winning 24-23. The over/under betting line set by various gambling sites was set at 45, so 24 + 23 = 47 and 47 is great than 45, so if you picked the over, than you would have won.


Data Gathering

Kaggle has a very good data set with outcomes dating all the way back to 1966. Not all of the data is complete and the betting data is not capture until the late 80’s but this will be my primary starting point for data collection. Additionally, I plan to use summary statistics of each team by year from

One consideration is related to how the game has changed over the years, and if there would be any significant variability over time on scoring, plays, or field/weather conditions that may impact the outcomes. Anecdotally, football has transition to a fast paced, high scoring game, with more pass plays than run plays, but the data may show something different. To add additional samples I may also try to include college football games with similar variables.

Key Variables

Some of the variables that I think are important include: offensive points per game, wins, losses, location, weather, week of the season, defensive points allowed per game,  pass/run breakdown of each team, and many others.


Date – Task – Milestone

08/26/2018 – Problem Statement
09/02/2018 – Problem Statement
09/09/2018  – Problem Statement – Sept 13 – Headshot due
09/16/2018 – Data Gathering / Data Cleaning Sept 20 – Workshop 1 / Blog 1
09/23/2018 – Data Gathering / Data Cleaning
09/30/2018 – Data Gathering / Data Cleaning
10/07/2018 – Data Cleaning / Algorithm Selection
10/14/2018 – Algorithm Selection – Oct 18 – Workshop 2 / Blog 2
10/21/2018 – Algorithm Implementation
10/28/2018  – Algorithm Implementation / Improve Parameters
11/04/2018 – Improve Parameters / Develop Presentation
11/11/2018 – Nov 15 – Workshop 3 / Blog 3
11/25/2018 – Nov 25 – Final Presentations Set
12/09/2018 – Dec 13 – Final Presentation





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s