By: Sterling Terrell
Introduction
This
paper seeks to make a contribution by exploring more
closely an enigma that has equally frustrated both economists and the
proletariat alike: What are the major
determinates of points scored in a college football game? Further, on average, do the conventional
methods and measurements matter? And
finally, what are the implications of the results?
As
much empirical analysis has been done on the economics of sports in general and
baseball in particular (see Moneyball
for baseball and The Blind Side – for
football, both by Michael Lewis), much less has been done on football. This is due, in part, to cause and effect in
football being much more difficult to assign than baseball. For example, in baseball, if one makes an
error – blame is easily assigned, a home run or RBI is the same. In football, however, when a quarterback
throws an incomplete pass, the “error” is assigned to his passing percentage –
while the actual cause of the incompletion could be a number of things: maybe
the receiver dropped the pass, the receiver was tripped or fell, the pass was
tipped at the line, or the quarterback was rushed by an ineffective offensive
line. In short, the quarterback with the
highest passing percentage is not necessarily the best passing
quarterback.
Willoughby
2002 found that significant differences exist between factors that influence
winning and scoring in Canadian football.
For instance, interceptions matter more than fumble recoveries in
determining winners and losers. One
shortcoming of his analysis, as he notes, is that he is predicting only a
binary outcome; a team can only win or lose.
This fails to account for the magnitude in which a team wins or loses
by.
Schwarz,
Barsky 1977 did an extensive analysis of the “Home Advantage” on a variety of
different sports. In regards to football
they found that the advantage does exist in a significant way for collegiate
football – but not professional football.
This observation about college football is confirmed in my data below.
P.
and S. Gray 1997 showed that most predictive models done in the world of
organized sports betting had an overall tendency to overemphasize more recent
performance while underestimating overall performance. Maybe a way to allow for this in the
predictive model I have done below would be to use a larger average of data
rather than only recent data. For
instance, in predicting the scoring of an upcoming game it might be more useful
to use the average yards passing, rushing, turnovers, and penalties of the
entire season rather than the averages from the last two or three games.
Description
of Data and Methods
Using
a standard multiple liner regression technique (by way of OLS) I will analyze
the major determinates of college football scores. For the sample, I have
chosen all the games played by top 25 NCAA Division I teams during Saturday, 22
November 2008 (week 13) of the regular season.
In this sample 12 games were played by a total of 24 teams. The sample is small enough to be reported in
its entirety and is as follows:*
TEAM

PTS

RUSH

PASS

TURNO

PENALT

H

Texas Tech

21

45

361

3

47

0

Oklahoma

65

299

326

1

96

1

Citadel

19

103

214

3

0

0

Florida

70

394

311

0

25

1

Brigham Y

24

214

205

6

85

0

Utah

48

108

307

0

51

1

Mich St

18

35

287

2

51

0

Penn St

49

138

419

0

38

1

Boise St

41

70

414

4

50

0

Nevada

34

144

241

0

25

1

Michig

7

111

87

2

15

0

Ohio St

42

232

184

1

20

1

USAF

10

150

11

0

23

0

TCU

44

183

321

0

30

1

Missippi

31

201

307

1

51

0

LSU

13

37

178

2

32

1

Pitt

21

35

229

2

50

0

Cincinnati

28

87

309

1

46

1

Orega St.

19

166

224

0

32

0

Arizona

17

139

158

0

30

1

NC State

41

187

279

0

55

0

N Carolia

10

56

147

6

20

1

FL State

37

172

160

0

40

0

Maryland

7

103

149

4

5

1

*(Texas Tech fans are
asked to ignore the first two lines of data – an outlier, no doubt)
Where
PTS = number of points scored, RUSH = number of rushing yards, PASS = number of
passing yards, TURNO = number of turnovers, PENALT = number yards penalized, and
H = home game. H is obviously used as a
dummy variable. A cannot be used as a dummy variable for Away games because
then H and A become perfectly correlated and the regression leads to a biased
result. All of this gives a regression
form of:
PTS=B0+B1(RUSH)+B2(PASS)+B3(TURNO)+B4(PENALT)+B5(H)
It is hypothesized from
a priori analysis and common sense
that RUSH, PASS, and H will all have positive coefficients, while TURNO,
PENALT, and A will each be negative.
To
begin, the Adjusted R^{2} of .842 tells us out model or parameter is
highly descriptive of points scored. The
F statistic of 25.43 allows us to quickly and strongly reject that all of the
variables in the model are insignificant.
RUSH, PASS, TURNO, PENALT and H are indeed all stronglyjointly
significant at the 5% significance level.
Further,
as noted by the pvalues reported, RUSH and PASS are both statistically significant
while TURNO, PENALT, and H are insignificant at the 5% level of significance. More specifically the conclusions of the
coefficients of each parameter tell us the following: 100 rushing yards in a
game will on average add 10.9 points to a team’s final score. 100 passing yards in a game will on average
add 9.1 points to a team’s final score. One turnover has the effect of
subtracting 1.5 points from the final score.
Receiving a 10 yard penalty has the effect of adding 0.8 points to a
team’s final score. And finally, a team
playing at home results in about a 5 point advantage. Conversely, a team playing an away game will
begin the game, in effect, with a 5.06 point deficit. To better illustrate the descriptive ability
of each parameter, each has been graphically regressed against PTS. These graphs are found in the Appendix and
confirm the results above. The intercept
has no explanatory value, as it is negative, and is said to be outside the
scope of the model. Intuitively, a game
with no passing, rushing, turnovers, penalties, or place to play should have a
zero score – as a regression through the origin would. The largest issue, as seen when calculated,
is that the standard errors of the no intercept model increase in the case of
each parameter.
Taking
a look at heteroskedasticity, by conducting the BreuschPagan test, a critical
value of 30.14 and a calculated value of 15.78 allows us with confidence to “Fail
to reject the null,” that residuals are homoskedastic. Using the White test, a critical value of
16.2 and calculated value of 4.98 allow us to strongly “Fail to reject the null,”
that the residuals are homoskedastic.
Confirmed by both tests, the presence of heteroskedasticity is not an
issue.
Able
to be written linearly in parameters, assuming the draw was indeed a random
sample, the variables not being exactly linearly related, the error term having
an expected value of zero, and the error term’s variance not being dependent on
the values of any explanatory variables allows us to say the model fulfills the
requirement of the GaussMarkov Theorem and is the “best, linear, unbiased,
estimator.”
Conclusion
What
can be taken from all of this? As shown
somewhat in the results section, on average putting rushing, and passing yards,
on the board leads to more points while committing a turnover is almost like
giving the other team a free safety.
However, on average, the data says that a 10 yard penalty will add about
0.8 points to your score. One possible
explanation for the PENALT coefficient having a positive value (other than the
sample not being truly random  or large enough) is that teams that get more
penalties are more aggressive  and the team that plays the most aggressive
usually scores more points: Penalties could
be correlated with effort and effort is correlated with scoring. Finally, the best way for a team to score
nearly a touchdown with zero effort is to play the game in their home
stadium. More than doubling the sample
size by adding out of sample observations to the data set shows that the
coefficients for RUSH and PASS remain virtually unchanged while the coefficients
for TURNO changes to 1.9, PENALT changes to .01, and H moves closer to 3. These parameters are probably closer to their
true values as estimates can always be made more accurate with larger sample
sizes  the limited sample size provided in this particular study is
constrained by time.
Further
research could be done – and some of it has been done – on many different
aspects of football in general and determinates of football scoring in
particular. For example: Does the type of turnover count (interception
vs. fumble)? – see Willoughby 2002. Does
the weight of the offensive line have an effect on scoring / winning?...Does
the defensive line? How does the average 40 yard dash time of the starting
receivers effect points scored?...The average 40 yard dash time of the starting
safety’s. How does the coach’s win
percentage play into the analysis? Is
the number of third down conversions made in a game highly descriptive of the
final score? And from Michael Lewis in The Blind Side: How is total points scored related to the
time the quarterback is allowed to stand in the pocket? How is the efficiency of an offense related
to a weighted score (weight, speed, strength) of the starting left
tackle?...The right defensive end? …The entire offensive line?...The entire
defensive line?
All of these questions are entertaining to
think about and would be immensely fun to research and write about. Is it too late to switch from Ag. to Sports
economics?
Gray, Phillip and Gray, Stephen. (1997). Testing Market Efficiency: Evidence
from the NFL Sports Betting Market. The Journal of Finance, 52(4),
1725 – 1737.
Lewis, Michael. (2003). Moneyball. W. W. Norton & Company.
Lewis,
Michael. (2007). The Blind Side:
Evolution of a Game. W. W. Norton & Company.
Lewis, Michael. (2003). Moneyball. W. W. Norton & Company.
Schwartz, Barry and Barsky, Steven.
(1997). The Home Advantage. Social Forces, 55(3), 641 –
661.
Willoughby, Keith. (2002). Winning Games
in Canadian Football: A Logistic Regression
Analysis. The
College Mathematics Journal, 33(3), 215 – 220.
Click here to read Part 2: College Football Winners: A Binary Case Of Winning And Losing