Fancy Stats: The Good, the Bad, and the Outlook
A little look at why I like advanced stats... and where they still fall short.
I first encountered so-called "advanced" stats on this very website. I don't remember which article specifically, but I remember clearly looking up things like Corsi and Fenwick after seeing them mentioned here years ago. As someone who's always loved stats in general, it was a fun and new way of looking at the game I'd always loved to watch. Hockey could satisfy my need for number-crunching while also satisfying my needs for slick playmaking and bruising hits. Now I don't consider myself an expert in these stats, and I know I never will be. Lots of people far more knowledgeable than me are continually designing new ways to present stats, or tweaking old stats to better represent the flow of play.
I also realize that not everybody loves the stats. That's fine. I think there's definitely room for a middle ground here. The stats aren't the be-all-and-end-all. If anything, the study of stats in hockey has shown that there is always room to improve stats. For example, people who studied Corsi (shot attempt differential) have used it at even strength, at 5-on-5, only in close-score situations, and the current preference seems to be adjusting for the score situation. Another thing the study of stats has shown is that a lot of the game is due to luck, or at least things not explainable by the stats so far. The stats people have developed so far hardly explain the whole picture. It doesn't work when the no-stats crowd yells down the statistophiles as nerds who need to watch the game. It also doesn't work when the pro-stats crowd yells down the statistophobes as dumb meatheads who only hate stats because they failed Grade 8 math.
Still, if you're stats-curious, here's a little look at why I like the stats -- and where there's still room to improve.
The Good
The first main thing I like is the idea of correlation. I'm guessing if you've even taken an hour of stats instruction in your life, you know the statistical mantra: "correlation does not imply causation".
Source: xkcd.com
The flip-side of this argument though is that no correlation implies no causation. That's part of why Corsi or Fenwick (unblocked shot attempt differential) are seen as more predictive than traditional stats. A player with a positive Corsi or Fenwick differential is quite likely to see the same trend the following year. A player with a positive plus-minus is much harder to predict in the following year.
Teams follow the same trend. In this article by Travis Yost of TSN, he looks at the correlation between some team stats from year to year. The most valuable table of his piece is included here:
Perfect repeatability would get a score of 1. So as you can see, Fenwick and Corsi aren't great, but they are far superior to some of their counterparts. The fact that there is no correlation between shooting percentage from year to year shows that it's nearly completely luck-based. Only a handful of players can sustain a higher-than-average shooting percentage, which means that on average no team can. It's also important to note that teams rarely keep the same players between years. Changes in players and coaches would be contributing factors in these calculations, meaning that if a team saw no turnover, the repeatability might be a little higher. As a team, success should be evaluated based on something that is likely to occur again in the coming year. Hence the importance of stats.
The Bad
The biggest problem with the stats is that they give an idea of how good a player or team is. They don't give a suggestion as to how to improve them. Bobby Ryan exemplifies a classic issue pointed out about shots-based metrics:
@mikemorris1234 I can shoot a million pucks from the wall.. Is it better than one from the middle?
— Bobby Ryan (@b_ryan9) November 1, 2014
And the thing is, he has a point. Corsi can tell you whether or not what you're doing is working, but you don't improve your Corsi by forcing more shots.
A good analogy I think is temperature. Imagine you feel sick, and you go to a doctor. They take your temperature, and tell you that you have a fever. Now imagine you ask them what to do about it, and they tell you to get colder. Eating ice, making snow angels, and swimming in Lake Superior aren't going to cure your fever. Your temperature is a measuring stick; it tells you there's a problem, but it doesn't tell you how to fix it.
Corsi and Fenwick are like temperature. They can tell you there's a problem, but they can't tell you how to fix it. They can tell you you're doing something well, but they don't tell you what.
The Outlook
People have been putting a lot of work into the "how" portion of advanced stats. One example is people looking at zone entries, and how controlled zone entries tend to work better than dump-and-chase for generating shots. However, the Los Angeles Kings are typically one of the top possession teams in the league year after year, and Darryl Sutter plays a notorious dump-and-chase system. So clearly zone entries are just one small piece of the puzzle. Forechecking, defensive systems, puck battles, and dozens of other things can play a role in how a team performs.
There have been several projects wherein people scrutinize hockey games, trying to quantify specific aspects of the game. Very interesting results have come out of these. One problem though is that it's extremely labour intensive. It takes hours to watch a single hockey game to gather statistics on one type of play, such as zone entries. Gathering a meaningful amount of data can take years. Few people can afford to spend years doing this type of task. Another problem I've heard from a couple people I've met doing this is that watching hockey starts to lose its enjoyment. When you spend hours watching hockey games to see who wins a puck battle to see if it leads to a shot attempt soon after, it becomes hard to watch hockey for fun.
So what will happen? One thing is for certain - if a team starts hiring video teams to analyze these types of plays, we as fans will never see the results. The potential competitive advantage is huge. And I wouldn't be surprised to see any fan who makes it far along this path get scooped up by an NHL team, as happened with so many notable stats bloggers in summer 2014.
So where does this leave us? I don't really know. I love the stats, and I know some people don't. I know the stats tell us lots, but there's still lots more they don't tell us. I don't know if there will be more major breakthroughs in the near future, or if it will just be incremental improvement. I do know though that advanced stats have changed the way I view hockey. And I expect they will continue to shape the way I watch it for years to come.