Thursday, September 8, 2011

Beyond 'Moneyball':

The first a a three part series I am writing was published in Analytics Magazine.

Beyond ‘Moneyball’:
The rapidly evolving world of sports analytics, Part I

 By Benjamin Alamar and Vijay Mehrotra

Over the past few years, the world of sports has experienced an explosion in the use of analytics. In this three-part series, we reflect on the current state of sports analytics and consider what the future of sports analytics may look like.
We define sports analytics as “the management of structured historical data, the application of predictive analytic models that utilize that data, and the use of information systems to inform decision makers and enable them to help their organizations in gaining a competitive advantage on the field of play.”


Friday, July 8, 2011

Can Alex Smith Save the 49ers?

I recently broke the news to my son that Alex Smith appears to be slotted in as the starting QB for his beloved 49ers. I took the dejected look on his face as a challenge to provide him some hope. Going to the data on QB's is always a iffy proposition as QB stats are the product of the efforts of an entire team, not just one player and given the woeful play of the 49ers over the last few years, I was expecting that this would be a rather fruitless effort in restoring hope.

I pulled up Smith's page on Pro-Football Reference and checked on his performance. I then took his age, years in the league, Yds/Att and interception rate for last season and plugged them into PFR's useful Player Season Finder. I frankly did not believe what I found. Besides Smith, the Player Season Finder found three other players who had a similar performance in Yds/Att and interception rate during their 5th season, at age 26: Drew Brees, Peyton Manning, and Ben Roethlisberger. Looking at a broader range of stats, Smith actually fits in with this group fairly well.

On Yds/Att, QB rating, TD% (indexed by PFR), Int% (indexed by PFR), and Completion Percentage, Smith fits right in. He is better than Big Ben in TD%, QB rating, and Compl%, better than Brees in Int%, and just behind the three on Yds/Att. The other three of course all of Super Bowl rings and multiple All-Star game appearances on their resumes. So what is going on? How can the much maligned Alex Smith, playing for multiple head coaches and offensive coordinators, have delivered a similar performance as these shining examples of QB play?

To try and answer that question, I expanded my search. I removed all the filters on performance, and just looked for QB's in their 5th season while they were 26 years old. Since 1980, there are 33 QB's not named Alex Smith who played their 5th season when they were 26. After eliminating all the QB's with fewer than 8 games played, I loaded their advanced passing stats into two different clustering analyses to find groupings of players that had similar performances across all of the stats. The results of the two analysis were the essentially the same and put Smith in with the following group:

  1. Tommy Kramer
  2. Steve McNair 
  3. Rich Gannon       
  4. Bernie Kosar       
  5. Neil Lomax         
  6. Gary Hogeboom      
  7. Trent Dilfer       
  8. David Whitehurst   
  9. Randall Cunningham 
  10. Paul McDonald      
  11. Kyle Boller        
  12. Tim Couch          
  13. Ben Roethlisberger
So this list seems rather more satisfying. Some players who have had success (Gannon, McNair, Boller, Roethlisberger) and some that haven't (Hogeboom, Couch). For the most part this list should provide 49er fans with some hope. There are no spectacular QB's here, but plenty that have played good efficient football and several that achieved more later in their careers (Gannon, Cunningham, McNair).

For some context, the rest of the QB's fall into the following groups:

Stars Flashes No Impact
Dan Marino Vince Evans         Dave M. Brown      
Peyton Manning      Mike McMahon        Mike Moroski       
Chris Miller       

Drew Brees         

Dave Krieg         

Brett Favre        

Daunte Culpepper   

Aaron Rodgers      

Ok, maybe Chris Miller is not really a star, but for the most part, these groupings seem fairly reasonable, so 49er fans, have some hope, Alex Smith may just be Trent Dilfer.

Saturday, July 2, 2011

Initial Poll Results

As data starts to arrive from the entirely unscientific poll posted below, one result jumps out immediately. The very first question of the poll asks which area within analytics will have the biggest impact on teams. The choices offered were: Collecting new data, better data management, new metrics, integration of statistical analysis in decision making, and better information systems for decision makers. In writing that question I had imagined that the "new metrics" answer would be fairly popular. That is the area that draws most practitioners initially, as it is the use of statistical tools to measure and learn about a sport. The results did not at all match that expectation.

The chart above indicates that integration of analysis and better information systems are the areas that the respondents felt would have the biggest impact, while none of them selected new metrics. What this result suggests (and again, this was hardly a rigorous poll) is that the area of analytics that is perceived to be the biggest area of growth is finding ways to help decision makers use the analysis that has already been done. This can take the form of better reporting, more in depth conversation, better explanations, and/or better information systems (that put the decision makers in control of the analysis to some degree), to name a few.

Still time to take the poll if you have not already.

Wednesday, June 29, 2011

Future of Sports Analytics

Five questions on the future of sports analytics. What are your thoughts?

Create your free online surveys with SurveyMonkey, the world's leading questionnaire tool.

Tuesday, June 28, 2011

Additcted to Math?

My first reaction after reading Jonah Lehrer's piece on Grantland was to try and ignore it. It's the same debate we've been having since Lewis published Moneyball right? But there is something different in this latest railing against sports analytics (and please can we stop referring to statistics applied to any sport as Sabrmetrics? That is a baseball term, nobody in basketball or any other sport refers to their work that way.) The difference here is that instead of being accused of not being relevant, us geeks are being accused of being too relevant. Much as my grandfather accused me of ruining sports, Lehrer is accusing decision makers in sports of ruing the game because they only care about the numbers.

This is a rather bizarre charge, as any of who have worked for teams not named the Houston Rockets can tell you. Decision makers (GM's, coaches, etc) are not exactly waiting breathlessly for the latest pronouncements  from their resident geeks. Do some teams factor good analysis into their decision making? Sure they do. By my count 11 of the 16 NBA playoff teams employed analysts, at least to some degree, but all of those teams also have serious scouting departments that employ a lot more people, and the scouts are not ignored. Personnel and coaching decisions are looked at from every angle imaginable, and it is rather silly to suggest that somehow math has put all of these other sources of information in the back seat.

I do want to address two specifics of the article though. The first is the use of the Mavericks as the counter example and the second is the phrase "The underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league." and both points actually tied back to the same idea.

First the Mavericks: The Mavericks are one of the most innovative teams when it comes to analytics, employing the first statistician who actually travels with the coaching staff and works with them on a daily basis. Roland Beech is one of the best statisticians in the world when it comes to basketball, and while I do not know what Roland's numbers looked like for Barea, I can bet that he had  input on the decision. Data may not have made the decision, but, as it should be, was a factor. To continue on the Mavericks example, Lehrer notes that "According to one statistical analysis, the Los Angeles Lakers had four of the top five players in the series. The Miami Heat had three of the top four." (cleverly linking back to Beech's own site to make his point). The problem is that not all analysis is created equal. What is published on the internet may not always be the cutting edge of basketball analysis (or football or baseball or soccer ....). Wayne Winston once famously said that, because of his analysis, we would advise a GM not to sign a young Kevin Durant at any price. Just because some one puts a number down does not mean it is a valuable number.

On the second point regarding the assumption that a team is somehow the sum of its parts, I would suggest that most GMs, Coaches, and even statisticians are sophisticated enough to know that  this is simply not the case. I have never heard a serious statistician argue (outside of baseball at least) that you can simply add metrics together and get a result that predicts the outcome of adding a specific player.

What this comes down to, I believe, is a general misunderstanding about how sports analytics is both practiced and utilized. Sports analytics is, at times, a set of sophisticated tools that can help provide insight into the games we love. It can even be applied, as Dean Oliver has, to issues like team fit, but any good statistician will also be the first to explain to a decision maker the limits on the analysis. Are their people that take their analysis too far and draw conclusions that are not supported by their own work? Of course, their are irresponsible people in every profession. That does not mean, however that the tool is being over used or is in any way shifting a decision maker's focus away from the important variables.

Let us not have this debate any longer. Live and let live. Statisticians, scouts, fans, coaches and  general managers can choose how much stock they put into various types of analysis, but lets not dismiss an entire field that is, honestly, still in its infancy.

Monday, June 20, 2011

Sports Analytics: Should Fans Care?

I have not posted for while but the passing of Father's Day yesterday has inspired me. The main reason that Father's Day could have this effect on me is that I am occasionally and forcefully reminded that most of the world does not care one lick about sports analytics, and can even be offended by sports analytics. This reminder was given to me by, of all people, my own grandfather. My grandfather is a rather opinionated man (as any 91 year old has a right to be) and likes to occasionally make proclamations about the world, and how I am personally functioning in it.

His most recent proclamation was that people like me are ruining sports. The normal argument about how advanced statistical analysis is bad for sports rests squarely on the idea that it is not an effective tool. As my grandfather is never one to take the road well traveled, he instead took the opposite tact - sports statistics is too effective a tool. Yes, sports statistics is ruing sports, because the analysis takes all of the uncertainty out of sports. We are too good at what we do.

I thanked him for the complement, but it reminded me of some work I had done with Matt Futterman at the Wall Street Journal. In this analysis, I looked out how predictive baseball standings were on June 1 of whether a team would make the playoffs. Turns out the answer is very predictive (teams below .500 on June 1 has only a 9% chance of making the playoffs). This seems to be in line with my grandfather's argument, fans of teams that are below .500 in MLB now might as well give up. Which leaves me with the question: Did a simple correlation calculation suck all of the fun out of the baseball season for a large chunk of fans?

In my defense, I would like to offer two ideas that, regardless of how well statisticians can predict outcomes, should inspire fans to want more, not less analysis.

The first idea is that all leagues collect and report statistics. These statistics are used all of the time by writers and announcers to tell stories about a player or a team or a season. The problem is that many of these statistics are misleading, incomplete, or just plain wrong. For example, one of the most convoluted statistics in sports is the NFL QB Rating. If a QB completes a 4 yard pass on 3rd and 3, he has made a good play - keeping his team's drive alive, but his QB rating is likely to go down (I say likely as the exact calculation depends upon the QB's performance up to that point in the game), while if a QB completes an 8 yard pass on 3rd and 10, the team has to give up the ball, but the QB's rating probably went up. Having watched the two plays you may know that one QB made a good play, while the other didn't, but if you didn't watch the plays, and just looked at the reported QB ratings, you might get the wrong idea about the performance of the two players. Since we are going to be given numbers to look at and they are going to be used to tell stories, better to have the right numbers.

The second idea has to do with one of the many reasons we love sports: the incredible. Whether it be an incredible play, and incredible game, or an incredible season, the truly spectacular and unexpected moment is a unique aspect of sports. We rarely, if ever, have the truly spectacularly unexpected in any other form of entertainment. What good analysis does, is allows us to recognize again and again, how truly uncommon a moment was. The statistics can give us context for understanding how rare something we just saw, really was. Returning to baseball, the Pittsburgh Pirates are currently 2 games below .500, which means, at best, they have a 9% chance of making the playoffs. But what if they went on a run, maybe picked someone up at the trade deadline, and made the playoffs. The analysis allows us to understand how unlikely and special that would really be, and to appreciate it 4 years from now and remember how special that Pirate's team that beat the odds really was.

I would not try and argue these points with my grandfather, we have plenty of other battles to have, but I would suggest that most fans can appreciate, and in fact desire more and better analysis. When the writers and broadcasters embrace new statistics and analysis, then fans can understand the games they love better, and identity and appreciate the truly spectacular in a deeper and longer lasting manner.

Wednesday, April 27, 2011

Risk and Drafting a QB

What kind of football related blog would this be without some kind of post related to the upcoming NFL draft? It was not difficult to know that a post was necessary, finding an angle that has not been done to death by the football media that, outside of some truly exciting court rulings, has had nothing else to talk about recently.

After surveying the landscape, and seeing yet another discussion of how many first round busts there have been (particularly at the QB position) but no QB in current crop of potential first rounders being labeled as the potential bust, I thought I would take up the mantle of uncertainty in the NFL draft.

As a believer in on field performance above all else (pro day work outs, combine performances etc) I dug into the on field performances of all the QBs that received a draft grade of 2 or higher from Sports Illustrated in order to see which player's performance might have the highest probability of being a mirage. The most direct numerical way to look at this is to build the 95% confidence interval around a QB's completion percentage. Using a player's number of attempts with their completion percentage, I calculated a range of values for each QB's "true" completion percentage. The wider the range, the riskier the pick is.

Based on this admittedly crude measure we can see that Cam Newton has the highest level of uncertainty around his college completion percentage. In his career, Cam completed 65.4% of his passes, but due to the small sample of passes attempted, his "true" completion percentage is somewhere between 59.9% and 70.9%. This is a fairly wide range of potential values, which suggest that we likely do not have enough information to properly evaluate Cam at this point. To be clear, I do not mean that statisticians do not have enough information, but that talent evaluators of all types, do not have enough information to determine Cam's true level of performance.

Cam has the highest standard error in this group at +/- 5.5% while the standard error of the next highest QB to receive a draft grade higher than 3 is Ryan Mallet with a standard error of +/- 3.1%. This suggests that the evaluations of players like Mallett and Gabbert are more likely to be accurate that the evaluation of Newton. This is not to suggest that Cam will not be a successful NFL AB, but that NFL teams should be careful to factor in the high level of uncertainty when drafting him relative to the other highly regarded QBs.

Player  School  Grade Compl% High Low Std Err.
Cam Newton  Auburn  3.12 65.4% 70.9% 59.9% 5.5%
Jeff Van Camp  Florida Atlantic  2.14 57.2% 61.4% 53.0% 4.2%
Scott Tolzien  Wisconsin  2.26 68.1% 71.8% 64.4% 3.7%
Ryan Colburn  Fresno State  2.11 62.1% 65.8% 58.4% 3.7%
Greg McElroy  Alabama  2.34 66.3% 69.9% 62.7% 3.6%
Jordan La Secla  San Jose State  2.1 59.1% 62.7% 55.5% 3.6%
Tyrod Taylor  Virginia Tech  2.36 57.2% 60.5% 53.9% 3.3%
Ricky Stanzi  Iowa  2.5 59.8% 63.0% 56.6% 3.2%
Ryan Mallett  Arkansas  3.02 57.8% 60.9% 54.7% 3.1%
Blaine Gabbert  Missouri  3.14 60.9% 64.0% 57.8% 3.1%
Christian Ponder  Florida State  2.83 61.8% 64.9% 58.7% 3.1%
Ben Chappell  Indiana  2.27 61.1% 64.0% 58.2% 2.9%
Jerrod Johnson  Texas A&M  2.33 58.6% 61.5% 55.7% 2.9%
Jake Locker  Washington  2.92 53.9% 56.8% 51.0% 2.9%
Taylor Potts  Texas Tech  2.22 66.3% 69.1% 63.5% 2.8%
Colin Kaepernick  Nevada  2.9 58.2% 60.9% 55.5% 2.7%
TJ Yates  North Carolina  2.38 62.3% 65.0% 59.6% 2.7%
Andy Dalton  TCU  2.88 61.7% 64.3% 59.1% 2.6%
Nathan Enderle  Idaho  2.41 54.6% 57.2% 52.0% 2.6%

Tuesday, April 19, 2011

Accepting an 18 Game Schedule

The NFL owners have made it clear that they want an 18 game regular season and will push the players hard to get that concession. The primary objection to the 18 game schedule from the players point of view is of course the increased risk of debilitating injury which will likely shorten careers and lower the total earning potential of the individual players. While the incidence of injury within a season will most certainly rise with more games, the focus of much of the discussion undoubtedly also turns on the issue of concussions which is not well understood. After much public pressure and increasing evidence on the effect of repeated brain injury on football players, the NFL is taking steps to at least better handle concussions on the field. The question still remains though about how damaging concussions and other more minor repeated brain injuries really are which allows the NFL to be less than fully aggressive in minimizing not only the impact of specific injuries, but on monitoring and understanding the long term effects of these injuries.

If the players are serious about understanding the impact of repeated brain injury, as I suspect as least some of them are, then they should take a lesson from State Attorney General's from across the country when they sued the tobacco companies and insist on disclosure. Disclosure has been a part of many settlements with the tobacco industry in which the tobacco companies have agreed to make their internal documents available to anyone, including researchers in a broad array of fields such as chemistry, biology, marketing, economics, and political science. These documents have allowed these scientists to learn not only about the behavior of the companies, but gain a greater understanding  of the broader health and policy issues related to tobacco.

The deal for the players should actually require a lot less from the NFL than these settlements required of the tobacco companies. Instead of forcing the disclosure of previously secret memos and financial information, the players should request the collection and disclosure of new information. In exchange for the 18 game regular season, players should request the following three items:

  1. Brain scans at the begining and end of the season for every NFL player.
  2. Insertion of a chip into every player's helmet that tracks the impact that the player's brain is exposed to both on a play by play basis and on a cumulative basis.
  3. Disclosure at the end of the season of all the data (anonymized) to any and all interested parties.
This data would very quickly help researchers understand the full impact that repeated brain injury has on the brain throughout the course of an NFL season, as well as allow all interested parties to see the impact of various policies and equipment on reducing this impact. This data would go a long way from moving the debate from its current stage of arguing about how serious an issue it is, to figuring out the best way to protect players from these types of injuries.

Wednesday, April 13, 2011

Blind Side Project Help

Last season I ran a pilot study on valuing offensive linemen. Some of the results are described here. The results are promising, but are clearly lacking in usefulness as only a sample of games of a few teams are covered. In order to move the process forward, I am looking for a few reliable volunteers who would be willing to assist in collecting the data. If you are willing to spend hours watching and re-watching plays to collect the needed data, I want to hear from you. All consistently contributing volunteers will have access to the complete data set.

The data collection requires some basic understanding of offensive line play as well as some training on a specific game to insure high quality and consistent data. If you are interested, please send an email to: quantsports at gmail dot com. In the email, please rate you knowledge of offensive line play and explain your rating. Also, please indicate which team(s) you would want be able to cover (you must have your own video access that can rewind and pause the games you chart) and how many games you would be able to do in a week (games typically take 2 to 3 hours to complete for experienced trackers).

Thank you to all willing volunteers.

Sunday, March 27, 2011

Business Intelligence on YouTube

After reading Freedarko exhort the wonders of YouTube and what it has done for basketball, I thought it might make sense to see what YouTube has done for analytics/business intelligence. What I learned is that a lot of companies like to advertise their products on YouTube and that some folks have no idea how to relate their quality audio content to meaningful video content. I also learned that there are some interesting BI related videos available on the web. Here are a few:

Saturday, March 26, 2011

The Undisputed Guide to FREEDARKO

Sitting on my sofa with a beer next to me and the battle for a seat in the Final Four of college basketball unfolds seems to be as near as an ideal setting for writing a review of Freedarko's Undisputed Guide to Pro Basketball History as it gets. Before discussing the book though, I must explain why a review of the book is appropriate on a blog devoted to analytics, I mean no where in their 223 pages do they even give so much as a shout out to Dean Oliver and the only advanced metric they utilize is the Jazz-O-Meter. The motivation is to remind the analytic minded among us, that there are elements of the sports we analyze that we can't get at with our metrics. There are stories that lie outside of our experiences and a richness to the sport that can be forgotten when we focus solely on the data.

My adventure with the Undisputed Guide began on my trip to the Sloan Sports Analytics Conference earlier this month. As I was packing, I look at the ever growing stack of books that I need/want to read and whittled it down to two choices: The Book of Basketball: The NBA According to The Sports Guy or FreeDarko Presents: The Undisputed Guide to Pro Basketball History. Deciding which hardcover book to slip into my computer bag was an easy choice, as Simmon's book would have required me to pack my back brace along with my computer, while FD's slim 223 page tome was significantly more portable. Once the decision was made and I found myself taxing for takeoff, I opened it up to soak in the knowledge.

What I got was an entirely unique view into the history of professional basketball. Starting with the inception of the sport, the FD team brings the reader from the barnstorming years (as a side note, some day there has to be a movie that features a game between the All American Red Heads triumphing of over the Terrible Swedes, has to happen) to LeBron with a unique point insight, humor, and totally original visualizations.

The visualizations are truly the key that separates  FD from the rest, not just because of what each individual graphic communicates (ex. 20 years of draft history color coded for college experience and post-draft value) but how their inclusion drives home the point that what FD provides is the color. I am an analyst and I live in the data. FD brings the game outside of the data and reminds me that it is not just about delving deeper and deeper into the data to find the right answer, but that what we often refer to as "noise" is actually full of great stories, if not definitive answers.

One near perfect example from the book of this is the chapter on the ABA that is subtitled "What the Hell Was the ABA?". This is a provocative question, and it largely goes unanswered, mostly because there is no definitive answer. The ABA had a profound effect on the NBA in terms of salary and style of play, but it was also fueled expressly by business men looking to cash in. Large contracts were mirages and while the league was often at the cutting edge of marketing, they could also be as hokey as it can be. As an analyst I am used to asking questions and delivering the best answer that I can. FD presents a host of interesting questions, and reminds us that some of them are so mutli-dimensional, they really don't have answers.

After reading a serious of analytics and related books, settling back into the wonders of sport is a welcomed reminder of my original motivation for being involved in sports: the fun of it all.

Differentiating Sports Analytics

I had a conversation recently with USF analytics professor Vijay Mehrotra that resulted in Prof Mehrotra's column in January issue of Analytics Magazine. As part of this conversation, I discussed with Vijay the general issues that arise when trying to implement an analytics program in a sports organization. After sharing my experiences, Vijay shared his wealth of experience in logistics and analytics in a variety of business sectors and we were both struck by the similarities. A very common story in sports and business in general, is for leaders in a business with a wealth of domain knowledge or faced with being told by people who have not have near the experience or "feel" for the industry how to change the way they operate. Understandably, domain experts are hesitant to embrace insights and new management styles from those that have not had the same deep experiences in the industry that they have.

This story in sports of course is a coach or general manager being told how to change their strategy or which players to add to their roster based on a series of calculations that they may or may not understand. For a business leader, the story is a CEO being told that their past "gut" level decisions have often been wrong and that they should change how they view their industry by looking more closely at information that is the result of a series of calculations that they may or may not understand. Clearly the data is different and the techniques used may be different, but there are many overlapping principles in how to both technically and culturally implement a strong analytics program in sports and in other industries.

The three elements of analytic systems (data management, predictive analysis, and information systems) are the same in sports and other industries. Many of the cultural hurdles are also the same, so is sports analytics a specialty within the general field of analytics or is it a field unto itself that draws on a similar set of tools? The answer is yes. Sports analytics is simultaneously a specialty within the broader field and a unique field that draws on a similar set of tools.

Explaining how sports analytics can occupy that space, lets explore why it is important to understand the uniqueness of sports analytics as a field. An analytics program has the highest probability of success when, going in, all parties involved have an understanding of the possibilities and value of sports analytics. If sports analytics is viewed as either a specialty or as completely distinct, some of the potential value of the program can be reduced.

Sports Analytics as a Sub-Field
When sports analytics is viewed as a specialty within the field of analytics, there is a tendency to seek out experts in analytics and ask them to build a product for a sports organization. The basic problem with this approach, is that consultancies that specialize in building analytic systems such as Accenture, have built their businesses by building systems that help managers maximize revenues. This can take many different forms and different industries have their own components, divisions, and peculiar industry structure, but they are all focused on profits. While I would never argue that professional sports teams are not trying to maximize profits, the sport side of the organization tends to be evaluated not on a profit basis, but on a wins basis.

Coaches that win keep their jobs regardless of how profitable the larger organization is, general managers who build consistent winners become presidents of the organization, regardless of their ability to manage the business side of the team. Therefore, the analytic systems that get produced for the sport side of the organization need to be focused on maximizing wins, and the difference between objectives (wins and profits) is not trivial.

Wins vs. Profits
Wins are a fixed resource. In the NFL for example, there are only 256 games and each team plays only 16 games. A win for one team means a loss for another. Each game is also a binary event, you either win or you lose. This creates a very different set of of objectives and manner of evaluation. In business, a project can be evaluated on a clear and consistent basis: does the investment in the project have a expected return on the invested resources that is high enough. On team, the question is not what kind of return on invested assets does an activity have, but rather can this activity increase the odds that the team wins one more game? The binary nature of the payoff has a significant impact on how information is used, the types of questions that get asked, and the impact that information can have.

Type of Information
The other major difference between sports and most businesses is the type of information that is used. Even the least analytic CEO has read a P&L, they have seen and used quantitative information  on a regular basis through their careers. While coaches and GM's have certainly seen box scores their entire lives, the information that is far more important to them is the film. Coaches review film for tendencies and strategic information and play personnel folks watch hours and hours of film to evaluate players. This creates a unique situation in which the best analytic systems need to incorporate film as an additional type of data and that any new metrics that are developed need to jibe with what the decision makers see on the film. In once respect the analytic systems serve as a mechanism to help the decision makers more efficiently process the film that they see (ie point them to the most important and descriptive film).
Sports Analytics as Separate Field
The thoughts above may suggest that sports analytics is its own unique field and is only superficially connected to the application of analytic tools in business. This of course overlooks the commonalities of the fields and how these commonalities can be utilized to increase the effectiveness of both fields. The danger for the fields is that they, like information in many organizations, become siloed and the advances in one field never get leveraged to advance the other. While the data and objectives may be different, there is significant overlap in many of the tools and techniques used throughout the process.

Data visualization, for example, is a rapidly developing field and there is significant experimentation going on in business, academia, and sports. Feedback from end users will be gathered on the effectiveness and utility of various techniques in all areas, and only if the fields are connected, will these advances be rapidly deployed.

So in the end sports analytics is the same as regular analytics, but different.

Wednesday, March 23, 2011

Impact of defensive pressure, distance on pass completion

The data from 2010 Brazil Serie A provides data on passing that has not been previously available. The first rule of good analysis is that when you get new data, the first thing to do is to look at the data. So I started by generating a historgram of distance of all 118,445 passes in the data set. The result (below) is much as expected with the mean distance of pass being 21.6 yards, with more passes below 20 yards than above 30 yards.

Click here for the rest of the article.

Thursday, March 17, 2011

Who's Your Analyst?

Teams, like all organizations, can easily become a group of silos. Each group is so focused on their tasks that is very difficult to make time to interact with other groups. This is usually not a good structure because one hand then rarely knows what the other is doing and instead of one team focused on complementing each other, the groups become independent actors focused only on doing what they do best. To put this in basketball terms, most organizations have a bunch of Allen Iversons - great at what they do, but not focused on the overall team goal. From an analytics perspective this kind of structure can lead not just to inefficient management, but poor analysis and often wasted time and money.

As teams look to expand their analytic capability, it is tempting to have a draft analyst, a pro player analyst, a game strategy analyst etc. and have each of these analysts sit under a manager in those departments. This is a initially seems like the proper solution and analyts become submerged in a functional area that they're supposed to become expert in and on a daily basis assist those that are trying to utilize and learn the information generated by analyst. This structure leads to two types of inefficiencies as mentioned above. The best organizations have, instead of putting an analyst function within a department, created a department of analytics that acts principally as a consulting group for internal clients.

Businesses that have structure themselves this way have generally found that analysts are able to communicate with each other on a more regular basis, allowing for more sharing of techniques and creative brainstorming around challenging analytic problems. Additionally, within the context of a sports team, having a central analytic consulting group allows for consistency in the language and style in which the analysis is presented.

The consistency of the message is particularly important in an organization that is trying to incorporate a type of information that it has not utilized in the past. If every analyst has their own style and manner of presenting data, than instead of one core institutional language, each function within the team will have their own language. On team where, eventually the scouts, coaches, trainers etc all have to get on the same page, having one core analytic language means that no one has to interpret between the groups.

An additional benefit of this structure is the ability to rapidly and efficiently deploy the analytic capacity within the organization. Different, and predicatable times during the season, different departments may require more analytic capability than their normal baseline needs. For example, as the draft approaches, the amateur personnel department may have a greater need for analysts than they do in the months after the draft or coaches may want extra analytic fire power if the team makes the playoffs.

To be fair though, most sports teams have yet to reach the point that this is an issue. The analytic capabilities are at this point mostly one or two individuals who do often work as an internal consulting group. But as teams begin to invest more in their capabilities in this area, these issues become inevitable and must be carefully thought through. Happily for those starting up now though, much of the research in this area has been done and the results are clear.

Monday, March 14, 2011

Rethinking the NFL

The current labor dispute in the NFL will be negotiated and litigated over the coming months. At its core though it is not just players against owners, but rather a three way negotiation featuring the players, against the small market owners, against the large market owners, against the players.

This of course is not an original insight, but a point that is important to keep in mind as we think about what the NFL could look like. If the litigation all actually goes to judges and juries for final decisions and the appeals are all exhausted, one possible, out come, is that the NFL is declared, by law, to be acting as an illegal monopoly in violation of anti-trust law. If this occurs, then what have become standard labor practices, such as the draft and the salary cap, will essentially be banished forever. It is worth considering what the NFL becomes in that situation, if for no other purpose, than for all parties to understand what is at stake so they can get back to the bargaining table and put a new CBA together.

Once all 32 NFL teams are prohibited from working together to set the labor market for professional football players, there will be teams who go high (imagine Jerry Jones with no salary cap) and some will go low (imagine Mike Brown with no salary floor). Economic theory and common sense both actually agree on the outcome here - a league with a few very good teams that dominate competition and  a few awful teams that do not even have a reasonable chance to win 4 or 5 games a season.

One this situation is reality, then the league will either become unwatchable (how many times do you really need to see the Globetrotters beat the Generals) or they can move to a radically different league structure. There is of course a workable league structure for a league with this type of financial structure that is in practice in many parts of the world: hello relegation!

In an NFL with relegation, we could have three divisions, with the top 12 teams in the first division, the next 12 in the the second division, and whoever is left, or wants to start a team in the third division.

In a relegation/promotion league, teams can play anyone and earn points based on the quality of teams they beat (a win against a first division opponent is worth more than a win against a third division opponent). Teams that earn the most points either stay in the first division or get promoted up to the first division and teams that do not maintain a high enough point total over a season get relegated down to lower divisions. Each division can have its own playoff system so even the Bengals can have a shot at a playoff game every once in a while.

Then there is what I'll refer to here as the Ellison Effect. Imagine if anyone could just start a professional football team and start competing in the Third division. WOuld Larry Ellison (or any of the other billionaires laying about) be interested in perhaps starting a team or two or three in Las Angeles? Perhaps an extra team in Chicago? Sure they would take some losses as they started up and tried to move up the division ladder, but the financial promise of the First division would be more than enough to tempt a few wealthy business folks to give it a try.

I could not begin to put a probability on this scenario, but it is one that is intriguing to me, because I think the drama of teams starting up and recruiting players as well as seeing teams compete to keep their place in the higher divisions would be exhilarating.

From the perspective of most of the owners though this is probably a less than enticing scenario. They have grown to enjoy their multi billion dollar TV contracts and packed stadiums. The prospect of  having to hope to fill a 20,000 seat stadium in the third division (I'm looking at you Buffalo) is probably not something that too many owners really want. Players too, at least as a collective, probably don't love the idea that half of the high paying (and high minimum contract) jobs could be gone. So with that incentive, I invite the various groups of owners and players to continue their negotiations, and not make the judges decide the future of the NFL.

Saturday, March 12, 2011

Analytics and Communication

In his excellent book on the research around the theory of Deliberate Practice, Geoff Colvin distills volumes of often complicated academic research into a clear understandable prose that is easily accessible to a wide audience. Colvin uses a series of carefully chosen anecdotes to explain the various dimensions of this complicated theory (yes, there is more to it than 10,000 hours by the way). What is remarkable about the book, is that it accurately reflects the messages and strength of the research that has been done. There  are varying levels of evidence from the research for different aspects of the theory, and those points are clearly made. The goal of this post is not to sell more copies of Talent is Overrated (though I have provided a link in case you are interested), but rather to make the point that clearly communicating research is at least as important as the work itself. This is just as true for statistical analysis as it is for scholarly research.

At the 2010 Sloan Sports Analytics Conference, I was sitting next to a high level NBA executive at a research presentation. The work being presented was interesting if not revolutionary. When the presentation was over, I went to the front of the room to ask a few questions of the presenter, but was beaten to the punch by the exec I was sitting with who said  (and I paraphrase here a bit) "oh my God, you can talk". And it was true, the presenter had distilled some very complicated analysis down to the core message and accurately conveyed the strengths, potential, and limits of the work in such a way that audience could clearly understand it. If the presenter had not been able to do communicate his research to a non-geek audience (ok, it was SSAC 2010 non-geek is overstating) he would not have been speaking to a full room by the end of his presentation, never mind having extended conversations with NBA execs about it.

Communicating statistical analysis is a careful balance between the strength and limits of the analysis. The story of the analysis has to be conveyed in such a way that a non-geek user of the information can use the analysis properly. Your projection may show that a player is going to improve their rebounding by 20% over the next 3 years, but you also need to convey the risk associated with that analysis. What are the range of likely outcomes? What are the risks?

It is tempting when when working with team executives to make it all too simple and speak in absolutes, especially when others are making similar statements about their point of view. It is incumbent on the analyst not to fall into this trap though, because our analysis does contain variance and we will be wrong. When we are wrong, it becomes easy to dismiss the analysis if we have spoken in absolutes, while if we have strongly communicated (and accepted ourselves) what are research actually says, then, while we may not win every argument, we will win more over the long term.

It is also possible to believe that we are so clever in the techniques we have used to solve a problem, we lose sight of the problem we were trying to solve. It is rare that you will run into an exec who understands or truly cares about how cutting edge the techniques are. They want to know that they are getting good information that they can have confidence in, not that you used some slick new neural network algorithm in R to get the slickest results. This is one of the reasons the communication piece can be so tricky. We have confidence in our results because of the techniques used, and while you may want to have the one sentence description of what you actually did ready in case they ask, management will only have confidence by seeing the results.

After spending hours and hours carefully constructing your analysis, be sure to put a significant amount of time into deciding how to present it. Think like your audience, and what will help them use the analysis properly. If you don't communicate the analysis effectively, then the analysis will be wasted.

Wednesday, March 9, 2011

Players: One Definition

Ask a business man to define a customer, their definition will depend largely on their function within the business. If they are in sales, they will talk about points of contact and sales histories, but if they are in product development, they will talk about demand for new features and usability. Finance, marketing, and H&R would also likely have different definitions for a customer. It is not hard to see however, how a business could benefit from having one comprehensive definition of what a customer is, that brings all of that information together in one place. R&D could then look at sales records to see if the new features under development match the needs and wants of the most profitable customers.

A sports team is no different. Every function within the sport side of a professional team has their own definition for a player. Coaches are often focused on current performance, while the personnel side is often focused on scouting information, and trainers are often focused on health related information. As each of these groups has interest in the information that the other groups have, that is not their focus, and it is often difficult for the various types of information to be synthesized.
Coaches collect, process, and analyze a wide variety of information. Game data includes quantitative data such as performance metrics related to play calls, video, and qualitative grades. Practice data often includes video, specific measurements, and observations. Classroom information includes information on preparedness, and the ability to understand and process game plans.

The personnel operation collects information from a variety of sources. Intelligence is generally qualitative information on a player's background and any current personal issues. Personality information may include quantitative or qualitative psychological assessments as well as anecdotal information gathered from other layers/coaches/friends. Specific skill data can be quantitative or qualitative information on a player's strengths and weaknesses.

The trainers and medical staff focus on a rich set of qualitative and quantitative information regarding a player's injury history including type, treatment, and recovery times. THey often use information on how players train and the frequency with which they train, as well as their pre and post training routines. They also monitor nutrition and hydration.

These are all of course gross simplifications of the broad classes of information utilized by different functions on a team. What is important is not that specific types of information, but rather the synergistic value of keeping one definition of a player that is updated and analyzed by all functions within an organization.

Once all the types of raw data and information on a player are collected consistently collected in one place, then the coach who is wondering how to better motivate a player can see from intelligence information what other coaches have done in the past, or the general manager who is wondering why a high potential prospect is not developing can see that they are not processing information in the classroom well and has poor post-training habits.

While all of this already occurs, it usually occurs when a decision maker in one function asks for information from another function. This may take 5 minutes for a response, it may take 3 days - by which time the original thought that led to the request is gone. An organization that has one definition of a player, has a system that allows thorough and creative analysis to flow freely and not be constrained by the response time of other members of the organization.

The bottom line is that having one definition of what a player is, focuses an organization on what it believes is most important about a player, and drives the resources, strategic thinking, and tactical analysis of the organization through that definition. That process gives the organizations long term strategy the best probability to succeed.