What do I want to accomplish by doing this program?
For the past year, I have been very interested in data science, analytics, Python, and any other buzzword you can think of related to the field. My end goal is to gain experience and apply this knowledge to my current job and all future work. I will consider this a success if i complete the following:
- build momentum to sustain the basic skill and interest to continue improving and apply my knowledge after the program is over
- follow through with a full project
- use pandas to combine various data sets, matplotlib to visualize data, and sklearn to make predictions
Can I use machine learning to predict if an NFL game will score more or less total points than the line set on gambling sites?
The NFL is a massive business and there are many forms of gambling: fantasy football, game outcomes, and in-game betting. The American gambling association estimated that there is over $2.5 billion in legal NFL gambling and potentially hundreds of billions (emphasis on the B) in illegal gambling each year. The NFL is also extremely well documented with lots of public data, available for use. Additionally, there are several studies related to the efficiency of these betting markets.
One very specific type of gambling is betting the over/under of total points scored. For example, the Chicago Bears played Green Bay Packers earlier this month and the game ended with the Packers winning 24-23. The over/under betting line set by various gambling sites was set at 45, so 24 + 23 = 47 and 47 is great than 45, so if you picked the over, than you would have won.
Kaggle has a very good data set with outcomes dating all the way back to 1966. Not all of the data is complete and the betting data is not capture until the late 80’s but this will be my primary starting point for data collection. Additionally, I plan to use summary statistics of each team by year from NFL.com.
One consideration is related to how the game has changed over the years, and if there would be any significant variability over time on scoring, plays, or field/weather conditions that may impact the outcomes. Anecdotally, football has transition to a fast paced, high scoring game, with more pass plays than run plays, but the data may show something different. To add additional samples I may also try to include college football games with similar variables.
Some of the variables that I think are important include: offensive points per game, wins, losses, location, weather, week of the season, defensive points allowed per game, pass/run breakdown of each team, and many others.
Date – Task – Milestone
08/26/2018 – Problem Statement
09/02/2018 – Problem Statement
09/09/2018 – Problem Statement – Sept 13 – Headshot due
09/16/2018 – Data Gathering / Data Cleaning Sept 20 – Workshop 1 / Blog 1
09/23/2018 – Data Gathering / Data Cleaning
09/30/2018 – Data Gathering / Data Cleaning
10/07/2018 – Data Cleaning / Algorithm Selection
10/14/2018 – Algorithm Selection – Oct 18 – Workshop 2 / Blog 2
10/21/2018 – Algorithm Implementation
10/28/2018 – Algorithm Implementation / Improve Parameters
11/04/2018 – Improve Parameters / Develop Presentation
11/11/2018 – Nov 15 – Workshop 3 / Blog 3
11/25/2018 – Nov 25 – Final Presentations Set
12/09/2018 – Dec 13 – Final Presentation