Beware of Advanced Stats In The Hands of Less-Advanced Statisticians: The Fallacycy of Tom-aruhi Gilbert-miya
Follow me @SharkCircle
If you’re not familiar with the animé the title is referencing, just pretend I titled it “The Tom Gilbert Fallacy Saga” or that I made a Vampire Diaries joke instead. Something like that was the more sensible title I was going to go with before I decided to be ridiculous instead.
Always choose ridiculous.
I should probably hashtag that. And maybe copyright it for an ad campaign, maybe for like a Katy Perry fashion line.
“Always Choose Ridiculous™” © 2013 – 2050. There. I’ve covered all my bases. Copyrighted. Katy Perry here I come!
Please don’t read that the wrong way.
Anyway, (changes to official sounding voice), hello and welcome to a series of blogs I will be doing over the next year, looking at some of the misinformation going around the hockey blogosphere under the guise of “advanced statistics.” My stance on these “advanced statistics” such as corsi-based stats is that they are helpful tools which, when used correctly, can tell you certain things about teams and players that traditional NHL stats cannot, however they are not the perfect, all-illuminating representation of a team or player’s value, either, that some who champion these stats as if they were a new religion would have you believe.
Unfortunately, the more mainstream the use of these stats becomes, the more I see the stats being used incorrectly, where those presenting them often arrive at erroneous conclusions about teams or players because of mathematical or deductive errors, and then present their erroneous conclusions as “facts” to their audience.
And the truth is, even if many of the “advanced stats” champions were not messing up their arithmetic en route to coming to their conclusions, many of their findings would still not be 100% proven, greater truths about a given player or team or the NHL game.
For example, if an NHL player has the best shots-taken/shots-taken-against differential by his team while he is on the ice in the NHL, also known as a corsi-rating, that does not make him the proven best player in the NHL. What it makes him is the player with the best team-shots-taken/team-shots-taken-against differential while he is on the ice, period.
Shots are not goals. Unfortunately for statisticians, goals are not the ultimate measure of a player’s worth, either, especially for defensemen, and they are often subject to chance, which is why the pioneers of this new set of statistics turned to shots in the first place. Shots are a much more frequent occurrence in the NHL than goals, and not as affected by chance and goaltending quality. Thus it makes perfect sense that a new set of statistics based around shots was developed.
The only error is in thinking that shot-based stats are infallible, that all the chance and sample size issues which cloud goal-based stats are completely absent from shot-based stats, and that crunching a couple of corsi-related statistics on a player will enlighten you to some irrefutable greater truth about his value. For example, a little more than a year ago I read somewhere that Martin Havlat was a great “two-way player” because of his corsi. Not so true.
My goal here is not to discredit this new set of statistics, as I actually believe they can be a valuable tool in analyzing our game.
When used correctly.
And that’s really what this series of blogs is about. I’m just tired of reading innumerable articles where the writer presents his opinion as if it is a fact simply because he learned what “corsi” meant a few days earlier and decided to embed some of it in his blog.
Although, in this, my first blog on the subject, I will actually examine the “advanced stats” based analysis of some blogs that have been using these stats for awhile, because even the veterans seem to be fooled by the complexity of these stats at times.
There is no shame in that and everyone makes mistakes; I just hope my look at their analyses here will teach people to not just accept what they read about a player as fact simply because it has some charts attached to it. More often than not, in my experience, most of the bloggers using these stats for the basis of all their analysis make mathematical mistakes which render their conclusions inaccurate.
Again, not trying to point fingers or find blame, my point is simply that trying to reduce a player’s performance to a numerical value through these new statistics is at best a complicated, inexact science, and I think all of us who either present or consume these statistics need to be more mindful of that when attempting to come to conclusions about a player, coach, or team solely based on these statistics.
In today’s example, I will look at a blog posted on Fear The Fin by “The Neutral” which states that “Tom Gilbert was fantastic last season”, and links back to a Copper & Blue blog post as proof for this stated fact.
The blog posted on Fear The Fin is about the author’s picks for the US Olympic Hockey Team in 2014, and he has Tom Gilbert on his first pairing with Kevin Shattenkirk not on the team at all. Keith Yandle is on the team, but only as a seventh defenseman, while Tom Gilbert, Matt Carle, and Cam Fowler are all ahead of him and taking a spot on the roster over Shattenkirk.
I personally believe Kevin Shattenkirk and Keith Yandle are both better than all of Tom Gilbert, Matt Carle, and Cam Fowler, but for the purposes of this blog, I’m only going to focus on the author’s assertion that Tom Gilbert not only deserves to be on this team ahead of the likes of Shattenkirk (and Yandle on the depth chart), but that “he was fantastic last season.”
First, why does Fear The Fin writer The Neutral believe that and present it to his readers as fact, with a link attached that is supposed to provide the proof? Well, apparently the reason is because Copper & Blue writer Derek Zona did a blog titled “Edmonton Oilers’ Zonestart Adjusted Scoring Chances” in which Zona asserts that Gilbert had the best Adjusted Scoring Chance Differential per 15 Minutes of Ice Time of any defenseman on the Oilers team.
This means that Zona is saying Gilbert was on the ice for more scoring chances by his team as compared to scoring chances against his team per 15 minutes ice time than any other Oilers’ defenseman, once adjusted for zone starts.
What are zone starts, or Offensive Zone Start %? It is the percentage of total faceoffs taken in the offensive zone when the player is on the ice, without counting neutral zone faceoffs. In other words, the ratio of offensive zone faceoffs to defensive zone faceoffs when the player is on the ice, expressed as a percentage of total non-neutral-zone faceoffs.
Now you’re probably noticing that the “advanced stats” I mentioned earlier in the blog were corsi-based, which is what most of these bloggers use. However Copper & Blue used something different but similar in their article here, “Scoring Chances.”
What constitutes a scoring chance, you ask? It’s subjective, but apparently some of these bloggers decided to record scoring chances themselves for the entire year, with the help of, it appears, anyone who wanted to help as part of a “Scoring Chance Project.” Sounds foolproof.
However, the “scoring chance” stat is not based on hard, objective data, and the numbers they used are only as reliable as the people recording the data, so coming to concrete conclusions like “he was fantastic last year” off of just this one blog, like Fear The Fin did, even if Copper & Blue had done everything else right after recording the “scoring chance” data, is questionable at best. Especially when you consider it wasn’t the same person with the same standards recording scoring chances for every team, but different people with different standards for almost every team. A fun, informative group-project to be sure, but a scientific study worthy of being pointed to as definitive proof for the value of a player? No.
But a bigger problem than the subjectivity involved in their data gathering is this. First those championing the corsi-based stats want us all to accept that shot quality doesn’t exist, that it always evens out, and that therefore we only have to look at shot quantity, also known as corsi, in order to determine a player or team’s value. This is of course convenient when it comes to their stats since shot “quality” is a lot harder to measure numerically than “quantity,” and if they did have to admit shot quality mattered, it would render their shot-quantity-based stats and the analysis gleaned from them flawed.
But fine, ignoring shot quality is one thing with corsi, and I won’t get into the issues with that in this blog. But now all of a sudden it’s the same with the scoring chance data these people subjectively recorded? We’re going to ignore quality and only look at scoring chance quantity here, too? Wasn’t the whole point of this “Scoring Chance Project” to get a better read for the quality of chances teams were creating than corsi allowed for (which is actually one of the few admissions I’ve seen from the corsi-based stats community that their stats are flawed. Usually you hear the opposite).
Here is why that is so absurd in this case. According to Copper & Blue’s article, Tom Gilbert was on the ice for 3.984 Adjusted Scoring Chances For per 15 minutes of Ice Time vs. 3.806 Adjusted Chances Against per 15 minutes of Ice Time, a difference of only 0.178 or 4.46%.
So you’re really going to look at a 4.46% difference of plus-0.178 (adjusted) scoring chances per 15 minutes of play and say that means a player had a fantastic season? A player who everyone who actually watched him thought played poorly?
Here’s the huge problem with that. How is a 0.178 difference per 15 minutes, or 4.46%, not within the margin of error, either in terms of scoring chance quality, or in terms of the subjective process of recording scoring chances? Even if it were absolutely true that all shot quality evens out eventually (corsi), and that even all scoring chance quality evens out eventually, too, it still wouldn’t all even out enough over one single season’s worth of one player’s shifts to make a difference of 0.178 scoring chances per 15 minutes significant. With that small a difference in scoring chance quantity, the quality of those chances is all the more important, especially for a player whose critics chief complaint against him is that he makes bad decisions and gives the puck away in high scoring areas for the opposition.
In other words, what people say made him bad was never necessarily a huge quantity of scoring chances he gave up while on the ice, but the quality of those chances. Therefore to point to a blog that found he was on the ice for 0.178 more Adjusted (I’ll get to that later) Scoring Chances For per 15 minutes Ice Time than (Adjusted) Scoring Chances Against as proof all the critics were wrong about him being prone to awful giveaways and poor decisions is ridiculous because it does not disprove or even address the main criticism against the player in the first place.
Explained a different way: these blogs suggest Tom Gilbert was fantastic because he was on the ice for 4.46% more Adjusted Scoring Chances For than Adjusted Scoring Chances Against. 4.46%. Even if the quality of the shots generated from these scoring chances they recorded always would even out eventually to an exact mean over the sample size of multiple seasons across the entire team, they almost certainly would not even out to the mean during the sample size of only one season’s worth of one player’s shifts.
Meaning that, over the course of one season, it is completely possible for any player to be on the ice for scoring chances against their team that are on average of a 4.46% higher quality than the scoring chances their own team generates, let alone a player like Tom Gilbert with a reputation for frequent bad giveaways and bad decisions that result in higher-than-average-quality scoring chances against his team.
And as any scientist or statistician will tell you, once you admit that is at least possible, and likely even probable, you can no longer come to any hard conclusions based only on that data. Because if that were the case with Gilbert, his slight 4.46% positive-value in team-scoring chance (quantity) differential while he was on the ice would be completely negated by negative-4.46% value in team-scoring chance quality differential while he was on the ice.
Yet Fear The Fin is ready to name him to the US Olympic Team over Kevin Shattenkirk and Keith Yandle, players who produced much better traditional stats than Gilbert last season (like goals, points, +/-, and other meaningless dinosaur stats), while representing unproven conjecture as fact with statements like “he was fantastic last season. He can move the puck efficiently.” Why? Because of the “Scoring Chance Project”? That trumps goals, points, and +/-? Really? Fear The Fin even goes on to question Minnesota Wild coach Mike Yeo’s decisions regarding his team’s defensive pairings. Clearly, he doesn’t know what they know, what they have proven.
Except their entire basis for questioning this coach is incorrect. I realize it wasn’t intentional, but it nevertheless spreads erroneous information about players or coaches, presented as fact. And many hockey fans who read “findings” like this just assume it’s proven fact as well, because reducing a player to a numeric value relative to other players gives that magic number an assumed objectivity and importance, as everyone understands numbers to be factual, and who can argue that three is more than two?
But it is the often-flawed process that awards one player the value of three and another the value of two, for example, that most hockey fans don’t have time to analyze and double-check for errors.
So we are seeing over and over again that people are publishing faulty statistics and representing them as fact, indoctrinating all the hockey fans who read these conclusions with false information, and also affecting the players, teams, and coaches, too, because these blogs are creating false perceptions about many of them among NHL fans, either positive or negative, and once again presenting their findings as fact, when the methods used to come to these conclusions at best include margins of error within the stats and missing information outside of what they capture, and at worst include errors in math or deduction which render entirely false conclusions instead of just incomplete or inexact ones.
That’s a perception I’m hoping to change with this blog.
Let’s move forward, as unfortunately there are other problems with these blogs on Tom Gilbert, too. Copper & Blue also asserts that Gilbert putting up the team’s best zone-start-adjusted scoring chance differential despite facing the toughest quality of competition makes him just that much more amazing, yet they completely leave out the quality of his teammates.
You can’t introduce a variable like his quality of competition without contextualizing it against the average quality of his teammates. For all we know he always got to play with the Oilers’ four best players, and those players are the real reason for any success Gilbert had on the corsi or scoring-chance meters, not his own play. Probably not, but that’s the point, we don’t know because Copper & Blue only included the contextual information that served their point and made Gilbert look better, without including the information that should always goes side-by-side with quality of competition, quality of teammates.
Moreover, Fear The Fin is equating “best on the Oilers” (in adjusted scoring chance differential per 15 minutes) to being “fantastic,” but the Oilers defense was terrible last year, ranking 23rd in Goals Allowed, and even that doesn’t do justice to how bad they were as a group. They were among the worst in the NHL. Only one other defenseman on the Oilers even had a positive adjusted scoring-chance-differential last season according to the data these bloggers compiled, and that’s Andy Sutton. Well he’s second best, so maybe he’s not fantastic, but he must at least be great, right? One of the sixty best defensemen in the NHL if I’m doing my math correctly?
Then there’s the fact that Gilbert’s numbers did not look nearly as sunny in the first place before the “adjustment” happened. Copper & Blue, maybe themselves or maybe using someone else’s model that they linked to, I can’t be bothered to go through it all, decided that one offensive zone start “would be worth 0.425 scoring chances”, which seems awfully high to me, and since Tom Gilbert started more shifts in the defensive zone than the offensive zone, they added +0.425 Scoring Chances For to his overall tally for every extra shift he had in the defensive zone compared to the offensive zone, which is probably the only reason he even ends up with a positive chance-differential in their chart.
But then when you look in the comments, there is another blogger, also SB Nation, from http://www.nucksmisconduct.com/behind-the-numbers, saying that he found Copper & Blue’s conversion-number to be too high.
The author of that comment, “DanTheStatMan1,” says he found that an offensive zone faceoff would only be worth 0.23 scoring chances, not 0.425. That’s a huge difference.
So who are we to believe? If DanTheStatMan1 is right and the conversion-number Copper & Blue used is wrong, Tom Gilbert probably doesn’t even have a positive chance differential anymore, even after being adjusted for zone starts, and every conclusion Copper & Blue presented about Gilbert, and then Fear The Fin represented as fact based on Copper & Blue’s conclusions, is absolutely, 100% wrong.
And a closer look into the number Copper & Blue used, 0.425, reveals another problem. Without looking into whether that number was calculated correctly (I assume it wasn’t because it sounds too high), or if the number they should have used was 0.23 like the commenter on their blog claimed, Copper & Blue does reveal that they got the number 0.425 from a “scoring chance project for the Rangers”, plus data from the Washington Capitals, which got averaged and then calculated into a conversion of scoring chance differential per offensive zone faceoff.
What’s the problem? Even if they had properly calculated the conversion of scoring chance differential per offensive zone faceoff for those teams, that data is only specific to those teams! You can’t use that data for Tom Gilbert and the Edmonton Oilers! The New York Rangers and Washington Capitals averaged plus-0.425 scoring chances for every offensive zone faceoff, not the Oilers! Not Tom Gilbert.
You can’t use scoring chance differential as the basis for saying Tom Gilbert is a “fantastic” player, the implication being that Tom Gilbert actually has an effect on team scoring chances for and against while he is on the ice, and then use scoring chance data from other teams that he has nothing to do with to change his scoring chance numbers. You are literally giving Tom Gilbert credit for scoring chances created and prevented by the New York Rangers and Washington Capitals, teams that were better offensively and defensively and better at winning faceoffs last year than the Oilers, which means it’s quite probable they converted offensive zone faceoffs into positive-scoring-chance-differentials at a higher rate than Tom Gilbert and the Oilers did. And even if they didn’t, then you are punishing Tom Gilbert for the Rangers’ and Capitals’ offensive zone faceoff scoring-chance-differentials, and you still don’t have the accurate data for Gilbert.
You simply cannot use the scoring chance data for other teams to change Tom Gilbert’s numbers. That’s a basic mistake. And even if you used the Edmonton Oilers’ average rate of converting offensive zone faceoffs into positive-scoring-chance-differentials, you still couldn’t use that to supplement Gilbert’s data, because he’s not on the ice for every Edmonton Oilers offensive zone faceoff, so even then you would be polluting his data with the impact of other players, just this time it would mostly be his teammates with a splash of his own impact as opposed to players on completely different teams with nothing to do with him.
What Copper & Blue should have done was figure out the Oilers’ scoring-chance-differential per offensive zone when Gilbert was on the ice and used that as the conversion rate, provided they actually calculated that number properly this time.
Even then they would have run into even more sample size and scoring chance quality issues, but at least they wouldn’t be supplementing Gilbert’s data with the accomplishments or failures or other teams.
In conclusion, does everyone see the problem now with these stats-based analyses being presented as irrefutable fact? Correction: the many problems? In coming to their conclusions, Copper & Blue and Fear The Fin relied on so many improbable assumptions, and they omitted so many variables and made so many mathematical mistakes, that their ultimate findings were doomed to be incorrect from the start. Analyzing advanced stats in hockey is essentially akin to solving an algebra problem, where a player’s overall value is X, and each of all the available data categories is a variable, and you need to figure out how each piece of data relates to the others and how much weight to give to each category in order to condense them into one value: the player’s value, or X. But just like algebra, if you leave out one of the variables or simply forget it exists (like when Copper & Blue included quality of competition in their analysis but forgot quality of teammates, just as one of the lesser examples), you will come to an incorrect solution for the problem. You simply can’t type all the wrong numbers and formulas into a calculator and expect to get the right answer.
This is what Copper & Blue and Fear The Fin did, and then to top it off, they forgot that even if Copper & Blue hadn’t omitted variables and come to conclusions based off unproven assumptions instead of hard data, in other words even if they had done everything right up until the end of their analysis, their conclusions still would not have been irrefutable fact because there is always a margin of error when it comes to shooting percentage or scoring chance quality over small sample sizes, like one year’s worth of one player’s shifts.
And then to make matters even worse, we have a commenter from this same family of SB Nation blogs calling into question the formula they used for the very basis of their analysis, meaning it’s possible their whole blog on the topic would have been sabotaged from the start even if they hadn’t made all the errors I outlined and used scoring chance data from other teams to change Tom Gilbert’s data.
Moral of the story: Don’t always believe what you read without verifying it for yourself, and by golly, please beware of advanced stats in the hands of less-advanced statisticians, (including myself!).
For those who have made it to the end, thank you for reading! You deserve a medal! The length is because I wanted to address all the errors in these blogs regarding Tom Gilbert just to illustrate how many are currently being made in the “advanced stats” blogosphere, so that everyone can realize, if it only takes one mistake like the ones I’ve pointed out here to skew a statistical analysis, you have to assume that most of analysis you read out there based on these advanced stats contains at least one mistake if there are this many in just one or two blogs, and therefore their conclusions are often going to be faulty.
That’s why you have to take what you read with a grain of salt and confirm for yourself the process the author used to come to his conclusions before taking them at face value. Thanks again for reading! And if you’re interested and your eyes haven’t burned out of your head yet, be sure to check out my other blogs from this week on the home page.
Written by Shark Circle