1. TV, halftime shows, and the Big Game

Whether or not you like football, the Super Bowl is a spectacle. There's a little something for everyone at your Super Bowl party. Drama in the form of blowouts, comebacks, and controversy for the sports fan. There are the ridiculously expensive ads, some hilarious, others gut-wrenching, thought-provoking, and weird. The half-time shows with the biggest musicians in the world, sometimes riding giant mechanical tigers or leaping from the roof of the stadium. It's a show, baby. And in this notebook, we're going to find out how some of the elements of this show interact with each other. After exploring and cleaning our data a little, we're going to answer questions like:

  • What are the most extreme game outcomes?
  • How does the game affect television viewership?
  • How have viewership, TV ratings, and ad cost evolved over time?
  • Who are the most prolific musicians in terms of halftime show performances?

Left Shark Steals The Show Left Shark Steals The Show. Katy Perry performing at halftime of Super Bowl XLIX. Photo by Huntley Paton. Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0).

The dataset we'll use was scraped and polished from Wikipedia. It is made up of three CSV files, one with game data, one with TV data, and one with halftime musician data for all 52 Super Bowls through 2018. Let's take a look, using display() instead of print() since its output is much prettier in Jupyter Notebooks.

In [93]:
# Import pandas
import pandas as pd

# Load the CSV data into DataFrames
super_bowls = pd.read_csv('datasets/super_bowls.csv')
tv = pd.read_csv('datasets/tv.csv')
halftime_musicians = pd.read_csv('datasets/halftime_musicians.csv')

# Display the first five rows of each DataFrame
display(super_bowls)
display(tv)
display(halftime_musicians)
date super_bowl venue city state attendance team_winner winning_pts qb_winner_1 qb_winner_2 coach_winner team_loser losing_pts qb_loser_1 qb_loser_2 coach_loser combined_pts difference_pts
0 2018-02-04 52 U.S. Bank Stadium Minneapolis Minnesota 67612 Philadelphia Eagles 41 Nick Foles NaN Doug Pederson New England Patriots 33 Tom Brady NaN Bill Belichick 74 8
1 2017-02-05 51 NRG Stadium Houston Texas 70807 New England Patriots 34 Tom Brady NaN Bill Belichick Atlanta Falcons 28 Matt Ryan NaN Dan Quinn 62 6
2 2016-02-07 50 Levi's Stadium Santa Clara California 71088 Denver Broncos 24 Peyton Manning NaN Gary Kubiak Carolina Panthers 10 Cam Newton NaN Ron Rivera 34 14
3 2015-02-01 49 University of Phoenix Stadium Glendale Arizona 70288 New England Patriots 28 Tom Brady NaN Bill Belichick Seattle Seahawks 24 Russell Wilson NaN Pete Carroll 52 4
4 2014-02-02 48 MetLife Stadium East Rutherford New Jersey 82529 Seattle Seahawks 43 Russell Wilson NaN Pete Carroll Denver Broncos 8 Peyton Manning NaN John Fox 51 35
5 2013-02-03 47 Mercedes-Benz Superdome New Orleans Louisiana 71024 Baltimore Ravens 34 Joe Flacco NaN John Harbaugh San Francisco 49ers 31 Colin Kaepernick NaN Jim Harbaugh 65 3
6 2012-02-05 46 Lucas Oil Stadium Indianapolis Indiana 68658 New York Giants 21 Eli Manning NaN Tom Coughlin New England Patriots 17 Tom Brady NaN Bill Belichick 38 4
7 2011-02-06 45 Cowboys Stadium Arlington Texas 103219 Green Bay Packers 31 Aaron Rodgers NaN Mike McCarthy Pittsburgh Steelers 25 Ben Roethlisberger NaN Mike Tomlin 56 6
8 2010-02-07 44 Sun Life Stadium Miami Gardens Florida 74059 New Orleans Saints 31 Drew Brees NaN Sean Payton Indianapolis Colts 17 Peyton Manning NaN Jim Caldwell 48 14
9 2009-02-01 43 Raymond James Stadium Tampa Florida 70774 Pittsburgh Steelers 27 Ben Roethlisberger NaN Mike Tomlin Arizona Cardinals 23 Kurt Warner NaN Ken Whisenhunt 50 4
10 2008-02-03 42 University of Phoenix Stadium Glendale Arizona 71101 New York Giants 17 Eli Manning NaN Tom Coughlin New England Patriots 14 Tom Brady NaN Bill Belichick 31 3
11 2007-02-04 41 Dolphin Stadium Miami Gardens Florida 74512 Indianapolis Colts 29 Peyton Manning NaN Tony Dungy Chicago Bears 17 Rex Grossman NaN Lovie Smith 46 12
12 2006-02-05 40 Ford Field Detroit Michigan 68206 Pittsburgh Steelers 21 Ben Roethlisberger NaN Bill Cowher Seattle Seahawks 10 Matt Hasselbeck NaN Mike Holmgren 31 11
13 2005-02-06 39 Alltel Stadium Jacksonville Florida 78125 New England Patriots 24 Tom Brady NaN Bill Belichick Philadelphia Eagles 21 Donovan McNabb NaN Andy Reid 45 3
14 2004-02-01 38 Reliant Stadium Houston Texas 71525 New England Patriots 32 Tom Brady NaN Bill Belichick Carolina Panthers 29 Jake Delhomme NaN John Fox 61 3
15 2003-01-26 37 Qualcomm Stadium San Diego California 67603 Tampa Bay Buccaneers 48 Brad Johnson NaN Jon Gruden Oakland Raiders 21 Rich Gannon NaN Bill Callahan 69 27
16 2002-02-03 36 Louisiana Superdome New Orleans Louisiana 72922 New England Patriots 20 Tom Brady NaN Bill Belichick St. Louis Rams 17 Kurt Warner NaN Mike Martz 37 3
17 2001-01-28 35 Raymond James Stadium Tampa Florida 71921 Baltimore Ravens 34 Trent Dilfer NaN Brian Billick New York Giants 7 Kerry Collins NaN Jim Fassel 41 27
18 2000-01-30 34 Georgia Dome Atlanta Georgia 72625 St. Louis Rams 23 Kurt Warner NaN Dick Vermeil Tennessee Titans 16 Steve McNair NaN Jeff Fisher 39 7
19 1999-01-31 33 Pro Player Stadium Miami Gardens Florida 74803 Denver Broncos 34 John Elway NaN Mike Shanahan Atlanta Falcons 19 Chris Chandler NaN Dan Reeves 53 15
20 1998-01-25 32 Qualcomm Stadium San Diego California 68912 Denver Broncos 31 John Elway NaN Mike Shanahan Green Bay Packers 24 Brett Favre NaN Mike Holmgren 55 7
21 1997-01-26 31 Louisiana Superdome New Orleans Louisiana 72301 Green Bay Packers 35 Brett Favre NaN Mike Holmgren New England Patriots 21 Drew Bledsoe NaN Bill Parcells 56 14
22 1996-01-28 30 Sun Devil Stadium Tempe Arizona 76347 Dallas Cowboys 27 Troy Aikman NaN Barry Switzer Pittsburgh Steelers 17 Neil O'Donnell NaN Bill Cowher 44 10
23 1995-01-29 29 Joe Robbie Stadium Miami Gardens Florida 74107 San Francisco 49ers 49 Steve Young NaN George Seifert San Diego Chargers 26 Stan Humphreys NaN Bobby Ross 75 23
24 1994-01-30 28 Georgia Dome Atlanta Georgia 72817 Dallas Cowboys 30 Troy Aikman NaN Jimmy Johnson Buffalo Bills 13 Jim Kelly NaN Marv Levy 43 17
25 1993-01-31 27 Rose Bowl Pasadena California 98374 Dallas Cowboys 52 Troy Aikman NaN Jimmy Johnson Buffalo Bills 17 Jim Kelly Frank Reich Marv Levy 69 35
26 1992-01-26 26 Metrodome Minneapolis Minnesota 63130 Washington Redskins 37 Mark Rypien NaN Joe Gibbs Buffalo Bills 24 Jim Kelly NaN Marv Levy 61 13
27 1991-01-27 25 Tampa Stadium Tampa Florida 73813 New York Giants 20 Jeff Hostetler NaN Bill Parcells Buffalo Bills 19 Jim Kelly NaN Marv Levy 39 1
28 1990-01-28 24 Louisiana Superdome New Orleans Louisiana 72919 San Francisco 49ers 55 Joe Montana NaN George Seifert Denver Broncos 10 John Elway NaN Dan Reeves 65 45
29 1989-01-22 23 Joe Robbie Stadium Miami Gardens Florida 75129 San Francisco 49ers 20 Joe Montana NaN Bill Walsh Cincinnati Bengals 16 Boomer Esiason NaN Sam Wyche 36 4
30 1988-01-31 22 Jack Murphy Stadium San Diego California 73302 Washington Redskins 42 Doug Williams NaN Joe Gibbs Denver Broncos 10 John Elway NaN Dan Reeves 52 32
31 1987-01-25 21 Rose Bowl Pasadena California 101063 New York Giants 39 Phil Simms NaN Bill Parcells Denver Broncos 20 John Elway NaN Dan Reeves 59 19
32 1986-01-26 20 Louisiana Superdome New Orleans Louisiana 73818 Chicago Bears 46 Jim McMahon NaN Mike Ditka New England Patriots 10 Tony Eason Steve Grogan Raymond Berry 56 36
33 1985-01-20 19 Stanford Stadium Palo Alto California 84059 San Francisco 49ers 38 Joe Montano NaN Bill Walsh Miami Dolphins 16 Dan Marino NaN Don Shula 54 22
34 1984-01-22 18 Tampa Stadium Tampa Florida 72920 Los Angeles Raiders 38 Jim Plunkett NaN Tom Flores Washington Redskins 9 Joe Theismann NaN Joe Gibbs 47 29
35 1983-01-30 17 Rose Bowl Pasadena California 103667 Washington Redskins 27 Joe Theismann NaN Joe Gibbs Miami Dolphins 17 David Woodley NaN Don Shula 44 10
36 1982-01-24 16 Pontiac Silverdome Pontiac Michigan 81270 San Francisco 49ers 26 Joe Montana NaN Bill Walsh Cincinnati Bengals 21 Ken Anderson NaN Forrest Gregg 47 5
37 1981-01-25 15 Louisiana Superdome New Orleans Louisiana 76135 Oakland Raiders 27 Jim Plunkett NaN Tom Flores Philadelphia Eagles 10 Ron Jaworski NaN Dick Vermeil 37 17
38 1980-01-20 14 Rose Bowl Pasadena California 103985 Pittsburgh Steelers 31 Terry Bradshaw NaN Chuck Noll Los Angeles Rams 19 Vince Ferragamo NaN Ray Malavasi 50 12
39 1979-01-21 13 Orange Bowl Miami Florida 79484 Pittsburgh Steelers 35 Terry Bradshaw NaN Chuck Noll Dallas Cowboys 31 Roger Staubach NaN Tom Landry 66 4
40 1978-01-15 12 Superdome New Orleans Louisiana 76400 Dallas Cowboys 27 Roger Staubach NaN Tom Landry Denver Broncos 10 Craig Morton NaN Red Miller 37 17
41 1977-01-09 11 Rose Bowl Pasadena California 103438 Oakland Raiders 32 Kenny Stabler NaN John Madden Minnesota Vikings 14 Fran Tarkenton NaN Bud Grant 46 18
42 1976-01-18 10 Orange Bowl Miami Florida 80187 Pittsburgh Steelers 21 Terry Bradshaw NaN Chuck Noll Dallas Cowboys 17 Roger Staubach NaN Tom Landry 38 4
43 1975-01-12 9 Tulane Stadium New Orleans Louisiana 80997 Pittsburgh Steelers 16 Terry Bradshaw NaN Chuck Noll Minnesota Vikings 6 Fran Tarkenton NaN Bud Grant 22 10
44 1974-01-13 8 Rice Stadium Houston Texas 71882 Miami Dolphins 24 Bob Griese NaN Don Shula Minnesota Vikings 7 Fran Tarkenton NaN Bud Grant 31 17
45 1973-01-14 7 Memorial Coliseum Los Angeles California 90182 Miami Dolphins 14 Bob Griese NaN Don Shula Washington Redskins 7 Bill Kilmer NaN George Allen 21 7
46 1972-01-16 6 Tulane Stadium New Orleans Louisiana 81023 Dallas Cowboys 24 Roger Staubach NaN Tom Landry Miami Dolphins 3 Bob Griese NaN Don Shula 27 21
47 1971-01-17 5 Orange Bowl Miami Florida 79204 Baltimore Colts 16 Earl Morrall Johnny Unitas Don McCafferty Dallas Cowboys 13 Craig Morton NaN Tom Landry 29 3
48 1970-01-11 4 Tulane Stadium New Orleans Louisiana 80562 Kansas City Chiefs 23 Len Dawson Mike Livingston Hank Stram Minnesota Vikings 7 Joe Kapp NaN Bud Grant 30 16
49 1969-01-12 3 Orange Bowl Miami Florida 75389 New York Jets 16 Joe Namath NaN Weeb Ewbank Baltimore Colts 7 Earl Morrall Johnny Unitas Don Shula 23 9
50 1968-01-14 2 Orange Bowl Miami Florida 75546 Green Bay Packers 33 Bart Starr NaN Vince Lombardi Oakland Raiders 14 Daryle Lamonica NaN John Rauch 47 19
51 1967-01-15 1 Memorial Coliseum Los Angeles California 61946 Green Bay Packers 35 Bart Starr NaN Vince Lombardi Kansas City Chiefs 10 Len Dawson NaN Hank Stram 45 25
super_bowl network avg_us_viewers total_us_viewers rating_household share_household rating_18_49 share_18_49 ad_cost
0 52 NBC 103390000 NaN 43.1 68 33.4 78.0 5000000
1 51 Fox 111319000 172000000.0 45.3 73 37.1 79.0 5000000
2 50 CBS 111864000 167000000.0 46.6 72 37.7 79.0 5000000
3 49 NBC 114442000 168000000.0 47.5 71 39.1 79.0 4500000
4 48 Fox 112191000 167000000.0 46.7 69 39.3 77.0 4000000
5 47 CBS 108693000 164100000.0 46.3 69 39.7 77.0 4000000
6 46 NBC 111346000 163500000.0 47.0 71 40.5 NaN 3500000
7 45 Fox 111041000 162900000.0 46.0 69 39.9 NaN 3100000
8 44 CBS 106476000 153400000.0 45.0 68 38.6 NaN 2800000
9 43 NBC 98732000 151600000.0 42.0 64 36.7 NaN 3000000
10 42 Fox 97448000 148300000.0 43.1 65 37.5 NaN 2699963
11 41 CBS 93184000 139800000.0 42.6 64 35.2 NaN 2385365
12 40 ABC 90745000 141400000.0 41.6 62 NaN NaN 2500000
13 39 Fox 86072000 NaN 41.1 62 NaN NaN 2400000
14 38 CBS 89795000 144400000.0 41.4 63 NaN NaN 2302200
15 37 ABC 88637000 138500000.0 40.7 61 NaN NaN 2200000
16 36 Fox 86801000 NaN 40.4 61 NaN NaN 2200000
17 35 CBS 84335000 NaN 40.4 61 NaN NaN 2200000
18 34 ABC 88465000 NaN 43.3 63 37.9 NaN 2100000
19 33 Fox 83720000 NaN 40.2 61 36.4 NaN 1600000
20 32 NBC 90000000 NaN 44.5 67 NaN NaN 1291100
21 31 Fox 87870000 NaN 43.3 65 NaN NaN 1200000
22 30 NBC 94080000 NaN 46.0 68 41.2 NaN 1085000
23 29 ABC 83420000 NaN 41.3 62 NaN NaN 1150000
24 28 NBC 90000000 NaN 45.5 66 NaN NaN 900000
25 27 NBC 90990000 NaN 45.1 66 NaN NaN 850000
26 26 CBS 79590000 NaN 40.3 61 NaN NaN 850000
27 25 ABC 79510000 NaN 41.9 63 NaN NaN 800000
28 24 CBS 73852000 NaN 39.0 67 NaN NaN 700400
29 23 NBC 81590000 NaN 43.5 68 NaN NaN 675000
30 22 ABC 80140000 NaN 41.9 62 NaN NaN 645000
31 21 CBS 87190000 NaN 45.8 66 NaN NaN 600000
32 20 NBC 92570000 NaN 48.3 70 NaN NaN 550000
33 19 ABC 85530000 NaN 46.4 63 NaN NaN 525000
34 18 CBS 77620000 NaN 46.4 71 NaN NaN 368200
35 17 NBC 81770000 NaN 48.6 69 NaN NaN 400000
36 16 CBS 85240000 NaN 49.1 73 NaN NaN 324300
37 15 NBC 68290000 NaN 44.4 63 NaN NaN 275000
38 14 CBS 76240000 NaN 46.3 67 NaN NaN 222000
39 13 NBC 74740000 NaN 47.1 74 NaN NaN 185000
40 12 CBS 78940000 NaN 47.2 67 NaN NaN 162300
41 11 NBC 62050000 NaN 44.4 73 NaN NaN 125000
42 10 CBS 57710000 NaN 42.3 78 NaN NaN 110000
43 9 NBC 56050000 NaN 42.4 72 NaN NaN 107000
44 8 CBS 51700000 NaN 41.6 73 NaN NaN 103500
45 7 NBC 53320000 NaN 42.7 72 NaN NaN 88100
46 6 CBS 56640000 NaN 44.2 74 NaN NaN 86100
47 5 NBC 46040000 NaN 39.9 75 NaN NaN 72500
48 4 CBS 44270000 NaN 39.4 69 NaN NaN 78200
49 3 NBC 41660000 NaN 36.0 70 NaN NaN 55000
50 2 CBS 39120000 NaN 36.8 68 NaN NaN 54500
51 1 CBS 26750000 51180000.0 22.6 43 NaN NaN 42500
52 1 NBC 24430000 NaN 18.5 36 NaN NaN 37500
super_bowl musician num_songs
0 52 Justin Timberlake 11.0
1 52 University of Minnesota Marching Band 1.0
2 51 Lady Gaga 7.0
3 50 Coldplay 6.0
4 50 Beyoncé 3.0
5 50 Bruno Mars 3.0
6 50 Mark Ronson 1.0
7 50 University of California Marching Band 3.0
8 50 Youth Orchestra Los Angeles 3.0
9 50 Gustavo Dudamel 3.0
10 49 Katy Perry 8.0
11 49 Lenny Kravitz 1.0
12 49 Missy Elliott 3.0
13 49 Arizona State University Sun Devil Marching Band NaN
14 48 Bruno Mars 6.0
15 48 Red Hot Chili Peppers 1.0
16 47 Beyoncé 7.0
17 47 Destiny's Child 2.0
18 47 Kelly Rowland 1.0
19 47 Michelle Williams 1.0
20 46 Madonna 5.0
21 46 LMFAO 1.0
22 46 Nicki Minaj 1.0
23 46 M.I.A. 1.0
24 46 Cee Lo Green 2.0
25 45 The Black Eyed Peas 6.0
26 45 Slash 1.0
27 45 Usher 1.0
28 45 will.i.am 1.0
29 45 Fergie 1.0
... ... ... ...
104 14 Up with People NaN
105 14 Grambling State University Tiger Marching Band NaN
106 13 Ken Hamilton NaN
107 13 Gramacks NaN
108 12 Tyler Junior College Apache Band NaN
109 12 Pete Fountain NaN
110 12 Al Hirt NaN
111 11 Los Angeles Unified School District All City H... NaN
112 10 Up with People NaN
113 9 Mercer Ellington NaN
114 9 Grambling State University Tiger Marching Band NaN
115 8 University of Texas Longhorn Band NaN
116 8 Judy Mallett NaN
117 7 University of Michigan Marching Band NaN
118 7 Woody Herman NaN
119 7 Andy Williams NaN
120 6 Ella Fitzgerald NaN
121 6 Carol Channing NaN
122 6 Al Hirt NaN
123 6 United States Air Force Academy Cadet Chorale NaN
124 5 Southeast Missouri State Marching Band NaN
125 4 Marguerite Piazza NaN
126 4 Doc Severinsen NaN
127 4 Al Hirt NaN
128 4 The Human Jukebox NaN
129 3 Florida A&M University Marching 100 Band NaN
130 2 Grambling State University Tiger Marching Band NaN
131 1 University of Arizona Symphonic Marching Band NaN
132 1 Grambling State University Tiger Marching Band NaN
133 1 Al Hirt NaN

134 rows × 3 columns

2. Taking note of dataset issues

For the Super Bowl game data, we can see the dataset appears whole except for missing values in the backup quarterback columns (qb_winner_2 and qb_loser_2), which make sense given most starting QBs in the Super Bowl (qb_winner_1 and qb_loser_1) play the entire game.

From the visual inspection of TV and halftime musicians data, there is only one missing value displayed, but I've got a hunch there are more. The Super Bowl goes all the way back to 1967, and the more granular columns (e.g. the number of songs for halftime musicians) probably weren't tracked reliably over time. Wikipedia is great but not perfect.

An inspection of the .info() output for tv and halftime_musicians shows us that there are multiple columns with null values.

In [95]:
# Summary of the TV data to inspect
tv.info()

print('\n')

# Summary of the halftime musician data to inspect
halftime_musicians.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53 entries, 0 to 52
Data columns (total 9 columns):
super_bowl          53 non-null int64
network             53 non-null object
avg_us_viewers      53 non-null int64
total_us_viewers    15 non-null float64
rating_household    53 non-null float64
share_household     53 non-null int64
rating_18_49        15 non-null float64
share_18_49         6 non-null float64
ad_cost             53 non-null int64
dtypes: float64(4), int64(4), object(1)
memory usage: 3.8+ KB


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 134 entries, 0 to 133
Data columns (total 3 columns):
super_bowl    134 non-null int64
musician      134 non-null object
num_songs     88 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 3.2+ KB

3. Combined points distribution

For the TV data, the following columns have missing values and a lot of them:

  • total_us_viewers (amount of U.S. viewers who watched at least some part of the broadcast)
  • rating_18_49 (average % of U.S. adults 18-49 who live in a household with a TV that were watching for the entire broadcast)
  • share_18_49 (average % of U.S. adults 18-49 who live in a household with a TV in use that were watching for the entire broadcast)

For the halftime musician data, there are missing numbers of songs performed (num_songs) for about a third of the performances.

There are a lot of potential reasons for these missing values. Was the data ever tracked? Was it lost in history? Is the research effort to make this data whole worth it? Maybe. Watching every Super Bowl halftime show to get song counts would be pretty fun. But we don't have the time to do that kind of stuff now! Let's take note of where the dataset isn't perfect and start uncovering some insights.

Let's start by looking at combined points for each Super Bowl by visualizing the distribution. Let's also pinpoint the Super Bowls with the highest and lowest scores.

In [97]:
# Import matplotlib and set plotting style
from matplotlib import pyplot as plt
%matplotlib inline
plt.style.use('seaborn')

# Plot a histogram of combined points
plt.hist(super_bowls.combined_pts)
plt.xlabel('Combined Points')
plt.ylabel('Number of Super Bowls')
plt.show()

# Display the Super Bowls with the highest and lowest combined scores
display(super_bowls[super_bowls['combined_pts'] > 70])
display(super_bowls[super_bowls['combined_pts'] < 25])
date super_bowl venue city state attendance team_winner winning_pts qb_winner_1 qb_winner_2 coach_winner team_loser losing_pts qb_loser_1 qb_loser_2 coach_loser combined_pts difference_pts
0 2018-02-04 52 U.S. Bank Stadium Minneapolis Minnesota 67612 Philadelphia Eagles 41 Nick Foles NaN Doug Pederson New England Patriots 33 Tom Brady NaN Bill Belichick 74 8
23 1995-01-29 29 Joe Robbie Stadium Miami Gardens Florida 74107 San Francisco 49ers 49 Steve Young NaN George Seifert San Diego Chargers 26 Stan Humphreys NaN Bobby Ross 75 23
date super_bowl venue city state attendance team_winner winning_pts qb_winner_1 qb_winner_2 coach_winner team_loser losing_pts qb_loser_1 qb_loser_2 coach_loser combined_pts difference_pts
43 1975-01-12 9 Tulane Stadium New Orleans Louisiana 80997 Pittsburgh Steelers 16 Terry Bradshaw NaN Chuck Noll Minnesota Vikings 6 Fran Tarkenton NaN Bud Grant 22 10
45 1973-01-14 7 Memorial Coliseum Los Angeles California 90182 Miami Dolphins 14 Bob Griese NaN Don Shula Washington Redskins 7 Bill Kilmer NaN George Allen 21 7
49 1969-01-12 3 Orange Bowl Miami Florida 75389 New York Jets 16 Joe Namath NaN Weeb Ewbank Baltimore Colts 7 Earl Morrall Johnny Unitas Don Shula 23 9

4. Point difference distribution

Most combined scores are around 40-50 points, with the extremes being roughly equal distance away in opposite directions. Going up to the highest combined scores at 74 and 75, we find two games featuring dominant quarterback performances. One even happened recently in 2018's Super Bowl LII where Tom Brady's Patriots lost to Nick Foles' underdog Eagles 41-33 for a combined score of 74.

Going down to the lowest combined scores, we have Super Bowl III and VII, which featured tough defenses that dominated. We also have Super Bowl IX in New Orleans in 1975, whose 16-6 score can be attributed to inclement weather. The field was slick from overnight rain, and it was cold at 46 °F (8 °C), making it hard for the Steelers and Vikings to do much offensively. This was the second-coldest Super Bowl ever and the last to be played in inclement weather for over 30 years. The NFL realized people like points, I guess.

UPDATE: In Super Bowl LIII in 2019, the Patriots and Rams broke the record for the lowest-scoring Super Bowl with a combined score of 16 points (13-3 for the Patriots).

Let's take a look at point difference now.

In [99]:
# Plot a histogram of point differences
plt.hist(super_bowls.difference_pts)
plt.xlabel('Point Difference')
plt.ylabel('Number of Super Bowls')
plt.show()

# Display the closest game(s) and biggest blowouts
display(super_bowls[super_bowls['combined_pts'] == 1])
display(super_bowls[super_bowls['combined_pts'] >= 35])
date super_bowl venue city state attendance team_winner winning_pts qb_winner_1 qb_winner_2 coach_winner team_loser losing_pts qb_loser_1 qb_loser_2 coach_loser combined_pts difference_pts
date super_bowl venue city state attendance team_winner winning_pts qb_winner_1 qb_winner_2 coach_winner team_loser losing_pts qb_loser_1 qb_loser_2 coach_loser combined_pts difference_pts
0 2018-02-04 52 U.S. Bank Stadium Minneapolis Minnesota 67612 Philadelphia Eagles 41 Nick Foles NaN Doug Pederson New England Patriots 33 Tom Brady NaN Bill Belichick 74 8
1 2017-02-05 51 NRG Stadium Houston Texas 70807 New England Patriots 34 Tom Brady NaN Bill Belichick Atlanta Falcons 28 Matt Ryan NaN Dan Quinn 62 6
3 2015-02-01 49 University of Phoenix Stadium Glendale Arizona 70288 New England Patriots 28 Tom Brady NaN Bill Belichick Seattle Seahawks 24 Russell Wilson NaN Pete Carroll 52 4
4 2014-02-02 48 MetLife Stadium East Rutherford New Jersey 82529 Seattle Seahawks 43 Russell Wilson NaN Pete Carroll Denver Broncos 8 Peyton Manning NaN John Fox 51 35
5 2013-02-03 47 Mercedes-Benz Superdome New Orleans Louisiana 71024 Baltimore Ravens 34 Joe Flacco NaN John Harbaugh San Francisco 49ers 31 Colin Kaepernick NaN Jim Harbaugh 65 3
6 2012-02-05 46 Lucas Oil Stadium Indianapolis Indiana 68658 New York Giants 21 Eli Manning NaN Tom Coughlin New England Patriots 17 Tom Brady NaN Bill Belichick 38 4
7 2011-02-06 45 Cowboys Stadium Arlington Texas 103219 Green Bay Packers 31 Aaron Rodgers NaN Mike McCarthy Pittsburgh Steelers 25 Ben Roethlisberger NaN Mike Tomlin 56 6
8 2010-02-07 44 Sun Life Stadium Miami Gardens Florida 74059 New Orleans Saints 31 Drew Brees NaN Sean Payton Indianapolis Colts 17 Peyton Manning NaN Jim Caldwell 48 14
9 2009-02-01 43 Raymond James Stadium Tampa Florida 70774 Pittsburgh Steelers 27 Ben Roethlisberger NaN Mike Tomlin Arizona Cardinals 23 Kurt Warner NaN Ken Whisenhunt 50 4
11 2007-02-04 41 Dolphin Stadium Miami Gardens Florida 74512 Indianapolis Colts 29 Peyton Manning NaN Tony Dungy Chicago Bears 17 Rex Grossman NaN Lovie Smith 46 12
13 2005-02-06 39 Alltel Stadium Jacksonville Florida 78125 New England Patriots 24 Tom Brady NaN Bill Belichick Philadelphia Eagles 21 Donovan McNabb NaN Andy Reid 45 3
14 2004-02-01 38 Reliant Stadium Houston Texas 71525 New England Patriots 32 Tom Brady NaN Bill Belichick Carolina Panthers 29 Jake Delhomme NaN John Fox 61 3
15 2003-01-26 37 Qualcomm Stadium San Diego California 67603 Tampa Bay Buccaneers 48 Brad Johnson NaN Jon Gruden Oakland Raiders 21 Rich Gannon NaN Bill Callahan 69 27
16 2002-02-03 36 Louisiana Superdome New Orleans Louisiana 72922 New England Patriots 20 Tom Brady NaN Bill Belichick St. Louis Rams 17 Kurt Warner NaN Mike Martz 37 3
17 2001-01-28 35 Raymond James Stadium Tampa Florida 71921 Baltimore Ravens 34 Trent Dilfer NaN Brian Billick New York Giants 7 Kerry Collins NaN Jim Fassel 41 27
18 2000-01-30 34 Georgia Dome Atlanta Georgia 72625 St. Louis Rams 23 Kurt Warner NaN Dick Vermeil Tennessee Titans 16 Steve McNair NaN Jeff Fisher 39 7
19 1999-01-31 33 Pro Player Stadium Miami Gardens Florida 74803 Denver Broncos 34 John Elway NaN Mike Shanahan Atlanta Falcons 19 Chris Chandler NaN Dan Reeves 53 15
20 1998-01-25 32 Qualcomm Stadium San Diego California 68912 Denver Broncos 31 John Elway NaN Mike Shanahan Green Bay Packers 24 Brett Favre NaN Mike Holmgren 55 7
21 1997-01-26 31 Louisiana Superdome New Orleans Louisiana 72301 Green Bay Packers 35 Brett Favre NaN Mike Holmgren New England Patriots 21 Drew Bledsoe NaN Bill Parcells 56 14
22 1996-01-28 30 Sun Devil Stadium Tempe Arizona 76347 Dallas Cowboys 27 Troy Aikman NaN Barry Switzer Pittsburgh Steelers 17 Neil O'Donnell NaN Bill Cowher 44 10
23 1995-01-29 29 Joe Robbie Stadium Miami Gardens Florida 74107 San Francisco 49ers 49 Steve Young NaN George Seifert San Diego Chargers 26 Stan Humphreys NaN Bobby Ross 75 23
24 1994-01-30 28 Georgia Dome Atlanta Georgia 72817 Dallas Cowboys 30 Troy Aikman NaN Jimmy Johnson Buffalo Bills 13 Jim Kelly NaN Marv Levy 43 17
25 1993-01-31 27 Rose Bowl Pasadena California 98374 Dallas Cowboys 52 Troy Aikman NaN Jimmy Johnson Buffalo Bills 17 Jim Kelly Frank Reich Marv Levy 69 35
26 1992-01-26 26 Metrodome Minneapolis Minnesota 63130 Washington Redskins 37 Mark Rypien NaN Joe Gibbs Buffalo Bills 24 Jim Kelly NaN Marv Levy 61 13
27 1991-01-27 25 Tampa Stadium Tampa Florida 73813 New York Giants 20 Jeff Hostetler NaN Bill Parcells Buffalo Bills 19 Jim Kelly NaN Marv Levy 39 1
28 1990-01-28 24 Louisiana Superdome New Orleans Louisiana 72919 San Francisco 49ers 55 Joe Montana NaN George Seifert Denver Broncos 10 John Elway NaN Dan Reeves 65 45
29 1989-01-22 23 Joe Robbie Stadium Miami Gardens Florida 75129 San Francisco 49ers 20 Joe Montana NaN Bill Walsh Cincinnati Bengals 16 Boomer Esiason NaN Sam Wyche 36 4
30 1988-01-31 22 Jack Murphy Stadium San Diego California 73302 Washington Redskins 42 Doug Williams NaN Joe Gibbs Denver Broncos 10 John Elway NaN Dan Reeves 52 32
31 1987-01-25 21 Rose Bowl Pasadena California 101063 New York Giants 39 Phil Simms NaN Bill Parcells Denver Broncos 20 John Elway NaN Dan Reeves 59 19
32 1986-01-26 20 Louisiana Superdome New Orleans Louisiana 73818 Chicago Bears 46 Jim McMahon NaN Mike Ditka New England Patriots 10 Tony Eason Steve Grogan Raymond Berry 56 36
33 1985-01-20 19 Stanford Stadium Palo Alto California 84059 San Francisco 49ers 38 Joe Montano NaN Bill Walsh Miami Dolphins 16 Dan Marino NaN Don Shula 54 22
34 1984-01-22 18 Tampa Stadium Tampa Florida 72920 Los Angeles Raiders 38 Jim Plunkett NaN Tom Flores Washington Redskins 9 Joe Theismann NaN Joe Gibbs 47 29
35 1983-01-30 17 Rose Bowl Pasadena California 103667 Washington Redskins 27 Joe Theismann NaN Joe Gibbs Miami Dolphins 17 David Woodley NaN Don Shula 44 10
36 1982-01-24 16 Pontiac Silverdome Pontiac Michigan 81270 San Francisco 49ers 26 Joe Montana NaN Bill Walsh Cincinnati Bengals 21 Ken Anderson NaN Forrest Gregg 47 5
37 1981-01-25 15 Louisiana Superdome New Orleans Louisiana 76135 Oakland Raiders 27 Jim Plunkett NaN Tom Flores Philadelphia Eagles 10 Ron Jaworski NaN Dick Vermeil 37 17
38 1980-01-20 14 Rose Bowl Pasadena California 103985 Pittsburgh Steelers 31 Terry Bradshaw NaN Chuck Noll Los Angeles Rams 19 Vince Ferragamo NaN Ray Malavasi 50 12
39 1979-01-21 13 Orange Bowl Miami Florida 79484 Pittsburgh Steelers 35 Terry Bradshaw NaN Chuck Noll Dallas Cowboys 31 Roger Staubach NaN Tom Landry 66 4
40 1978-01-15 12 Superdome New Orleans Louisiana 76400 Dallas Cowboys 27 Roger Staubach NaN Tom Landry Denver Broncos 10 Craig Morton NaN Red Miller 37 17
41 1977-01-09 11 Rose Bowl Pasadena California 103438 Oakland Raiders 32 Kenny Stabler NaN John Madden Minnesota Vikings 14 Fran Tarkenton NaN Bud Grant 46 18
42 1976-01-18 10 Orange Bowl Miami Florida 80187 Pittsburgh Steelers 21 Terry Bradshaw NaN Chuck Noll Dallas Cowboys 17 Roger Staubach NaN Tom Landry 38 4
50 1968-01-14 2 Orange Bowl Miami Florida 75546 Green Bay Packers 33 Bart Starr NaN Vince Lombardi Oakland Raiders 14 Daryle Lamonica NaN John Rauch 47 19
51 1967-01-15 1 Memorial Coliseum Los Angeles California 61946 Green Bay Packers 35 Bart Starr NaN Vince Lombardi Kansas City Chiefs 10 Len Dawson NaN Hank Stram 45 25

5. Do blowouts translate to lost viewers?

The vast majority of Super Bowls are close games. Makes sense. Both teams are likely to be deserving if they've made it this far. The closest game ever was when the Buffalo Bills lost to the New York Giants by 1 point in 1991, which was best remembered for Scott Norwood's last-second missed field goal attempt that went wide right, kicking off four Bills Super Bowl losses in a row. Poor Scott. The biggest point discrepancy ever was 45 points (!) where Hall of Famer Joe Montana's led the San Francisco 49ers to victory in 1990, one year before the closest game ever.

I remember watching the Seahawks crush the Broncos by 35 points (43-8) in 2014, which was a boring experience in my opinion. The game was never really close. I'm pretty sure we changed the channel at the end of the third quarter. Let's combine our game data and TV to see if this is a universal phenomenon. Do large point differences translate to lost viewers? We can plot household share (average percentage of U.S. households with a TV in use that were watching for the entire broadcast) vs. point difference to find out.

In [101]:
# Join game and TV data, filtering out SB I because it was split over two networks
games_tv = pd.merge(tv[tv['super_bowl'] > 1], super_bowls, on='super_bowl')

# Import seaborn
import seaborn as sns

# Create a scatter plot with a linear regression model fit
sns.regplot(x='difference_pts', y='share_household', data=games_tv)
Out[101]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f43e5293b70>

6. Viewership and the ad industry over time

The downward sloping regression line and the 95% confidence interval for that regression suggest that bailing on the game if it is a blowout is common. Though it matches our intuition, we must take it with a grain of salt because the linear relationship in the data is weak due to our small sample size of 52 games.

Regardless of the score though, I bet most people stick it out for the halftime show, which is good news for the TV networks and advertisers. A 30-second spot costs a pretty \$5 million now, but has it always been that way? And how have number of viewers and household ratings trended alongside ad cost? We can find out using line plots that share a "Super Bowl" x-axis.

In [103]:
# Create a figure with 3x1 subplot and activate the top subplot
plt.subplot(3, 1, 1)
plt.plot(tv.super_bowl, tv.avg_us_viewers, color='#648FFF')
plt.title('Average Number of US Viewers')

# Activate the middle subplot
plt.subplot(3, 1, 2)
plt.plot(tv.super_bowl, tv.rating_household, color='#DC267F')
plt.title('Household Rating')

# Activate the bottom subplot
plt.subplot(3, 1, 3)
plt.plot(tv.super_bowl, tv.ad_cost, color='#FFB000')
plt.title('Ad Cost')
plt.xlabel('SUPER BOWL')

# Improve the spacing between subplots
plt.tight_layout()

7. Halftime shows weren't always this great

We can see viewers increased before ad costs did. Maybe the networks weren't very data savvy and were slow to react? Makes sense since DataCamp didn't exist back then.

Another hypothesis: maybe halftime shows weren't that good in the earlier years? The modern spectacle of the Super Bowl has a lot to do with the cultural prestige of big halftime acts. I went down a YouTube rabbit hole and it turns out the old ones weren't up to today's standards. Some offenders:

  • Super Bowl XXVI in 1992: A Frosty The Snowman rap performed by children.
  • Super Bowl XXIII in 1989: An Elvis impersonator that did magic tricks and didn't even sing one Elvis song.
  • Super Bowl XXI in 1987: Tap dancing ponies. (Okay, that's pretty awesome actually.)

It turns out Michael Jackson's Super Bowl XXVII performance, one of the most watched events in American TV history, was when the NFL realized the value of Super Bowl airtime and decided they needed to sign big name acts from then on out. The halftime shows before MJ indeed weren't that impressive, which we can see by filtering our halftime_musician data.

In [105]:
#halftime_musicians.super_bowl.head()
halftime_musicians[halftime_musicians['super_bowl'] <= 27].musician
Out[105]:
80                                       Michael Jackson
81                                        Gloria Estefan
82                 University of Minnesota Marching Band
83                                 New Kids on the Block
84                                         Pete Fountain
85                                          Doug Kershaw
86                                           Irma Thomas
87                       Pride of Nicholls Marching Band
88                                     The Human Jukebox
89                                     Pride of Acadiana
90                                          Elvis Presto
91                                        Chubby Checker
92            San Diego State University Marching Aztecs
93                                        Spirit of Troy
94        Grambling State University Tiger Marching Band
95                                        Spirit of Troy
96                                        Up with People
97                                          Tops In Blue
98     The University of Florida Fightin' Gator March...
99          The Florida State University Marching Chiefs
100    Los Angeles Unified School District All City H...
101                                       Up with People
102                                    The Human Jukebox
103                                      Helen O'Connell
104                                       Up with People
105       Grambling State University Tiger Marching Band
106                                         Ken Hamilton
107                                             Gramacks
108                     Tyler Junior College Apache Band
109                                        Pete Fountain
110                                              Al Hirt
111    Los Angeles Unified School District All City H...
112                                       Up with People
113                                     Mercer Ellington
114       Grambling State University Tiger Marching Band
115                    University of Texas Longhorn Band
116                                         Judy Mallett
117                 University of Michigan Marching Band
118                                         Woody Herman
119                                        Andy Williams
120                                      Ella Fitzgerald
121                                       Carol Channing
122                                              Al Hirt
123        United States Air Force Academy Cadet Chorale
124               Southeast Missouri State Marching Band
125                                    Marguerite Piazza
126                                       Doc Severinsen
127                                              Al Hirt
128                                    The Human Jukebox
129             Florida A&M University Marching 100 Band
130       Grambling State University Tiger Marching Band
131        University of Arizona Symphonic Marching Band
132       Grambling State University Tiger Marching Band
133                                              Al Hirt
Name: musician, dtype: object

8. Who has the most halftime show appearances?

Lots of marching bands. American jazz clarinetist Pete Fountain. Miss Texas 1973 playing a violin. Nothing against those performers, they're just simply not Beyoncé. To be fair, no one is.

Let's see all of the musicians that have done at least one halftime show, including their performance counts.

In [107]:
# Count halftime show appearances for each musician and sort them from most to least
halftime_appearances = halftime_musicians.groupby('musician').count()['super_bowl'].reset_index()
halftime_appearances = halftime_appearances.sort_values('super_bowl', ascending=False)

# Display musicians with more than one halftime show appearance
halftime_appearances[halftime_appearances['super_bowl'] > 1]
Out[107]:
musician super_bowl
28 Grambling State University Tiger Marching Band 6
104 Up with People 4
1 Al Hirt 4
83 The Human Jukebox 3
76 Spirit of Troy 2
25 Florida A&M University Marching 100 Band 2
26 Gloria Estefan 2
102 University of Minnesota Marching Band 2
10 Bruno Mars 2
64 Pete Fountain 2
5 Beyoncé 2
36 Justin Timberlake 2
57 Nelly 2
44 Los Angeles Unified School District All City H... 2

9. Who performed the most songs in a halftime show?

The world famous Grambling State University Tiger Marching Band takes the crown with six appearances. Beyoncé, Justin Timberlake, Nelly, and Bruno Mars are the only post-Y2K musicians with multiple appearances (two each).

From our previous inspections, the num_songs column has lots of missing values:

  • A lot of the marching bands don't have num_songs entries.
  • For non-marching bands, missing data starts occurring at Super Bowl XX.

Let's filter out marching bands by filtering out musicians with the word "Marching" in them and the word "Spirit" (a common naming convention for marching bands is "Spirit of [something]"). Then we'll filter for Super Bowls after Super Bowl XX to address the missing data issue, then let's see who has the most number of songs.

In [110]:
# Filter out most marching bands
no_bands = halftime_musicians[~halftime_musicians.musician.str.contains('Marching')]
no_bands = no_bands[~no_bands.musician.str.contains('Spirit')]

# Plot a histogram of number of songs per performance
most_songs = int(max(no_bands['num_songs'].values))
plt.hist(no_bands.num_songs.dropna(), bins=most_songs)
plt.xlabel('Number of Songs Per Halftime Show Performance')
plt.ylabel('Number of Musicians')
plt.show()

# Sort the non-band musicians by number of songs per appearance...
no_bands = no_bands.sort_values('num_songs', ascending=False)
display(no_bands.head(15))
super_bowl musician num_songs
0 52 Justin Timberlake 11.0
70 30 Diana Ross 10.0
10 49 Katy Perry 8.0
2 51 Lady Gaga 7.0
90 23 Elvis Presto 7.0
33 41 Prince 7.0
16 47 Beyoncé 7.0
14 48 Bruno Mars 6.0
3 50 Coldplay 6.0
25 45 The Black Eyed Peas 6.0
20 46 Madonna 5.0
30 44 The Who 5.0
80 27 Michael Jackson 5.0
64 32 The Temptations 4.0
36 39 Paul McCartney 4.0

10. Conclusion

So most non-band musicians do 1-3 songs per halftime show. It's important to note that the duration of the halftime show is fixed (roughly 12 minutes) so songs per performance is more a measure of how many hit songs you have. JT went off in 2018, wow. 11 songs! Diana Ross comes in second with 10 in her medley in 1996.

In this notebook, we loaded, cleaned, then explored Super Bowl game, television, and halftime show data. We visualized the distributions of combined points, point differences, and halftime show performances using histograms. We used line plots to see how ad cost increases lagged behind viewership increases. And we discovered that blowouts do appear to lead to a drop in viewers.

This year's Big Game will be here before you know it. Who do you think will win Super Bowl LIII?

UPDATE: Spoiler alert.

In [ ]:
# 2018-2019 conference champions
patriots = 'New England Patriots'
rams = 'Los Angeles Rams'

# Who will win Super Bowl LIII?
super_bowl_LIII_winner = ...
print('The winner of Super Bowl LIII will be the', super_bowl_LIII_winner)