By Joshua Iversen
The MLB First Year Player Draft has always been something of a shot in the dark. Every year, over 1200 amateur players are selected across 40 rounds. Only a tiny fraction of them will ever make it to Triple-A, let alone the Major Leagues. But as difficult and sometimes random as the draft can be, it’s incredibly important for teams as one of their significant sources of talent acquisition. It is imperative that teams draft well – especially those with lower payrolls.
As important as the draft is, teams are working with finite resources. Each team’s scouting department works tirelessly to evaluate thousands of high school and college players across the nation, but they are often stretched thin. Front offices can only trust a scout’s eye so much, and, at some point in the decision-making process, they must sit down and look at the numbers that players are producing to help make their choice.
But, when doing so, are teams taking into account the environments these amateurs are playing in?
Robert Frey, the Southwest Assistant Regional Manager for Evolution Metrix, recently used data from the past two seasons of NCAA college baseball to compute park factors for each Division 1 college ballpark.
“My motivation was to find a deeper understanding of why player statistics are what they are,” Frey said. “Secondly, I want to bring awareness to college baseball and would like to see more advanced statistics in that department as well. The process wasn’t easy at first, there wasn’t a program that would allow me to quickly obtain a team’s park factor, but I was able to write a program in R to get a team’s park factor with a single click of a button.”
Frey’s park factors are an important development when it comes to evaluating college players. The average baseball fan knows that the Colorado Rockies have an incredibly hitter-friendly ballpark in Coors Field, while Oracle Park in San Francisco is much more favorable towards Giants pitchers. When evaluating the success (or lack thereof) of Major League players, it is essential to keep in mind the environment in which they play all of their home games and how it might affect their numbers.
But for college, even though these park environments are lesser known, this process might be even more important. Scouts are scattered across the country, and no team can see every single draft-eligible player. By the late rounds of the draft teams are often taking players they haven’t necessarily seen often and might even be drafting by the numbers. But, when doing so, are they accounting for how park factors might inflate or diminish these statistics?
“Some teams I have spoken with have an internal database of these park effects and use them to some degree in evaluating collegiate players,” Frey said.
To find out if teams were doing so accurately, I used data from the 2017 and 2018 First Year Player Drafts, as those were the two seasons Frey used to compute his park factors. Using BaseballAmerica’s Draft Database, I matched each college player that was drafted with the park factor for his home ballpark. I separated hitters and pitchers, as each should experience their ballpark’s impact in opposite ways. I then ran a linear regression analysis to find the impact of park factor on draft position.
I elected to include all drafted players whether they signed or not, because even if a player doesn’t sign it shows that a team was willing to use the pick on them. This means that the handful of players who didn’t sign in 2017 but did in 2018 are counted twice. I also chose to exclude notable two-way players Brendan McKay and Tanner Dodson, as both were drafted and are currently being used as both hitters and pitchers. I did not exclude players such as Hunter Greene who have shown two-way skills but have not been used two ways.
If teams are in fact overlooking collegiate park factors, then the linear regression graphs for hitters should show a negative correlation. That would mean that as the park factor increased, draft position would decrease – or, the more hitter-friendly a ballpark rated, the earlier in the draft the hitter was taken. The opposite would be true for pitchers. Theoretically, the more hitter-friendly a ballpark, the later pitchers would be taken.
I do not expect incredibly high Adjusted R-Squared values for any of these analyses. This is because many variables go into each draft pick, including, but not limited to, scouting reports, makeup, defensive performance, and expected signing bonus. As a result, even a fairly low Adjusted R-Squared value can be notable.
The first test I ran included all the 2017 and 2018 data, totaling 604 hitters and 724 pitchers from 296 D1 schools. First, the hitters:
The results showed a 0.001491 Adjusted R-Squared value, obviously fairly low. There did not appear to be any significant relationship between the two variables, but the negative slope of the line of best fit does match our hypothesis. Next, the pitchers:
Nothing significant here at all, with an Adjusted R-Squared of -0.00071. At first glance, when including all players drafted in 2017 and 2018, there doesn’t appear to be a notable relationship between college park factor and draft position.
Next, I decided to isolate the extreme ballparks. The more neutral college environments likely wouldn’t make any notable impact on a player’s statistics and therefore wouldn’t impact a team’s draft decisions. However, the more extreme parks could certainly make a difference. I selected only players from the top 75 most hitter-friendly parks and the top 75 most pitcher-friendly parks, and again began with the hitters:
With an Adjusted R-Squared of 0.003822, this appears to be the strongest relationship so far. This test also had our lowest t-value so far, at 0.1483, making it our most significant relationship. The pitchers:
Again, we see the draft position for pitchers showing no relationship whatsoever with park factors. The Adjusted R-Squared value for this test was negative once again at -0.00157, making our relationship a less predictive measure than simply using the mean.
The first few rounds of the draft are the most important for teams to get right. Thus, most selections in these first handful of rounds are scouted heavily. Teams are less likely to simply scout the stat line when making these choices, meaning they could be influencing our data. So, next I returned to the original complete 2017-18 data set and removed all players selected in the first 300 picks of their draft. The hitters:
Surprisingly, this has been the least successful test for the hitters. With an Adjusted R-Squared of -0.00209 and a very high t-value, as well as a positively sloped line of best fit, this test was borderline meaningless. Next, the pitchers:
Another negative Adjusted R-Squared, this time at -0.00198. It is still appearing as if there is little to no relationship between draft position and park factor for pitchers.
Finally, I combined the two qualifiers and used only players from the most extreme environments selected later than 300th overall. This was obviously my smallest data sample of the tests, but would hopefully provide the most telling and significant results. The hitters:
Again, this test proved unsuccessful, with an Adjusted R-Squared of -0.00354. The pitchers:
The fourth and final test for pitchers yet again had a negative Adjusted R-Squared at -0.00354, identical to that of the hitters. The pitchers also managed a negatively sloped line of best fit in all four tests, contrary to our hypothesis.
Unsurprisingly, between our four tests there does not appear to be a very strong correlation between park factors and draft position. This is especially true when it comes to pitchers. This could be because, when drafting pitchers, teams may be more focused on scouting reports and velocity numbers. It makes sense that pitchers would be looked at differently than hitters, especially with many teams taking gambles on projectability rather than performance.
For the hitters, however, there does seem to be something here. The correlation was small, but still worth mentioning. In our first two tests, using the full data set and that of only the most extreme environments, we did see that the more hitter-friendly a batter’s ballpark was, the more likely a team was to use an earlier draft pick on them. In the draft, every little bit of an advantage counts. Even with such a small relationship if teams can correct for this and pay closer attention to park factors for hitters, they could see their drafts become more successful.
Potential sources of error for this study include sample size (using only two years of draft data) and the possibility that parks were changed between 2017 and 2018 – moving fences back or in, adjusting foul territory, etc.
In conclusion, teams appear to be properly accounting for the player’s environment when drafting college pitchers. However, when it comes to hitters, they tend to slightly favor those that get the benefit of a favorable hitting environment. These teams could stand to benefit by accounting for Frey’s park factors, which could certainly have other uses within the industry as well.
“I want people to take my research and expand upon it,” Frey said. “I definitely see it being applied in terms of strength of competition or applying it within a given state or conference.”