Exploratory Data Analysis using Minor League Batting Statistics

Similar to graphically looking at Nationals minor league pitching stats I wanted to do the same with their minor league hitting stats per team. I decided to look at how the Nationals minor league team OPS is doing relative to their league and level. OPS is a players OBP added to their SLG measure how good a player is doing offensively when those two metrics are taken into account.

Since pitchers also bat I needed to do some data cleaning or the numbers wouldn’t make sense. To clean the data I removed all players from data set that didn’t have more than 20 Plate Appearances this season. The original data set 3255 data points. After adding that filter I got down to 2384 data points. Here’s the layout by level:

Level Data Points
SS-A 273
A 514
A+ 507
AA 518
AAA 572

For each League at each level I wanted to get the average team ERA and compare it to how the Nationals affiliates are doing. In the below table you can see those numbers.


League/Team Level OPS
New York Penn League SS-A .628
Northwest League SS-A .653
Auburn Doubledays SS-A .599
Midwest A .645
South Atlantic A .675
Hagerstown Suns A .736
California A+ .702
Carolina A+ .672
Florida State A+ .653
Potomac Nationals A+ .679
Eastern AA .699
Texas AA .669
Southern AA .678
Harrisburg AA .685
Pacific Coast AAA .727
International AAA .677
Syracuse AAA .631

Note: Data covers the season up to 7/1/2016

Here’s a look at the data graphically:


Auburn is a small sample size so I wouldn’t pay to much attention to the short season portion of the graph just yet. Hagerstown is our best performing offensive team based on OPS. Their team OPS is better than the average OPS for the two leagues at their level(South Atlantic League and California League). Overall Hagerstown(.735) has the second best team OPS in their League(1st is Asheville .756) and third best OPS for their league(1st is Bowling Green at .764). Harrisburgs and Potomac are performing at a little over League average each. On the other end of the spectrum from Hagerstown, Syracuse has a bottom 5 team OPS.

Here’s a look at only the Nationals Minor League affiliates OPS:


In my next blog post I’m going to look at two of the catalyst of the Hagerstown offense Max Schrock and Victor Robles.

Part 2 Source Code:

[code language=”r”]

minors_batting <- getDfFromDir("dataDir")
#Only use cases with more than 20PA’s
minors_batting <- filter(minors_batting, PA>20)
minors_batting <- minors_batting[complete.cases(minors_batting),]

minors_tm_ops <- ddply(minors_batting, .(Tm,Lg,Lvl,Franchise), summarise, ops=mean(OPS))
minors_lg_ops <- ddply(minors_batting, .(Lg,Lvl), summarise, ops=mean(OPS))
aff_ops <- ddply(minors_batting[minors_batting$Franchise=="Washington Nationals",],.(Tm,Lvl), summarise, ops=mean(OPS))

lg_melt_data <- melt(minors_lg_ops)
aff_melt_data <- melt(aff_ops)

#Rename vars so you can bind
colnames(lg_melt_data) <- c("lg_tm", "lvl", "variable", "value")
colnames(aff_melt_data) <- c("lg_tm", "lvl", "variable", "value")
total_melt_data <- rbind(lg_melt_data,aff_melt_data)

ops_graph <- ggplot(data=total_melt_data, aes(x=lvl, value, fill=lg_tm)) + geom_bar(stat="identity", position="dodge") + ggtitle("WSH Minors OPS Per Level")

#Graph of only the nationals
nats_ops_graph <- ggplot(data=aff_melt_data, aes(x=lvl, value, fill=lg_tm)) + geom_bar(stat="identity", position="dodge") + ggtitle("WSH Minors OPS")

