Learning wOBA

Learning wOBA

As I continue to learn R and go down the road of becoming a data scientist. I need to learn how to use and compute advanced statistics. The first advanced analytic I’m going to learn how to compute is weighted on-base average(wOBA). Weighted on-base average combines all the parts of a players offensive game and gives them all appropriate weights for their impact on the game. For example, a HR is given more weight than a BB or a Single because a HR is guarantees a team atleast one run while a BB/Single only scores a run a much lower percentage of the time. General statistics like BA/SLG/OBP only look at a piece of a players offensive game which is why wOBA is a better tool for looking at a players offensive contribution. Fangraphs does a great job of describing the details in the following links:

The links above give a lot of insight on why you should use wOBA. Tom Tango in The Book describes the formula. He also has the standard formula for wOBA that includes and doesn’t include speed on his site . I used the standard formula from his site for my initial calculations with R.

The following R code has the functions I use to calculate speed and non speed wOBA:

 

[code language=”r”]
#Link for standard WOBA formula
#http://tangotiger.com/index.php/site/article/standard-woba
woba_calc_basic <- function(row){
bb <- as.numeric(row["BB"])
hbp <- as.numeric(row["HBP"])
doubles <- as.numeric(row["X2B"])
triples <- as.numeric(row["X3B"])
hr <- as.numeric(row["HR"])
hits <- as.numeric(row["H"])
singles <- hits – triples – doubles – hr
pa <- as.numeric(row["PA"])
ibb <- as.numeric(row["IBB"])
sh <- as.numeric(row["SH"])

numerator <- .7*(bb+hbp) + .9*(singles) + 1.3*(doubles + triples) + 2*hr
denominator <- pa – ibb – sh

return(numerator/denominator)
}

woba_calc_speed <- function(row){
bb <- as.numeric(row["BB"])
hbp <- as.numeric(row["HBP"])
doubles <- as.numeric(row["X2B"])
triples <- as.numeric(row["X3B"])
hr <- as.numeric(row["HR"])
hits <- as.numeric(row["H"])
singles <- hits – triples – doubles – hr
pa <- as.numeric(row["PA"])
ibb <- as.numeric(row["IBB"])
sh <- as.numeric(row["SH"])
sb <- as.numeric(row["SB"])
cs <- as.numeric(row["CS"])

numerator <- .7*(bb+hbp) + .9*(singles) + 1.3*(doubles + triples) + 2*hr + .25*sb + -.5*cs
denominator <- pa – ibb – sh

return(numerator/denominator)
}
[/code]

Once I had those two formula’s in place I retrieved some data from Nationals data from baseball-reference . Then used the code below to apply the two functions to a dataframe.

 

[code language=”r”]
data = read.csv("data/FG_MLB_Batting_Std_All.csv")

#Apply basic woba
data$woba_basic = apply(data, 1, woba_calc_basic)

#Apply speed woba
data$woba_speed = apply(data, 1, woba_calc_speed)

[/code]

Once I had that in place I noticed that Fangraphs kept mentioning ‘wOBA scale’ and how it and wOBA could be used to generate another stat called Weighted Runs Above Average . After doing some digging I found an old article written by Tom Tango on how to compute yearly wOBA from scratch . Above I’ve just used some standard weights to compute a standard wOBA. But standard wOBA doesn’t take into account the yearly offensive environment that players are playing in. Basically some years an offensive action is worth more than others based upon how the whole league is performing. Tom Tango’s article on computing yearly wOBA teaches you how to get the yearly weights and calculate wOBA scale. Continuing down this R, wOBA rabbit hole I transcribe his SQL script into R functions.

To compute the offensive environment(Step 1 and Step 2) in Tom Tangos script I use this function on pitching data:

 

[code language=”r”]
get_runvalues <- function(pitching_data){
#Calculate runs per out
stat <- sum(pitching_data["R"])/(sum(pitching_data["IP"])*3)
rPerOut <- c(stat)

#Calculate runs per bb
stat <- rPerOut + .14
rPerBB <- c(stat)

#Runs per HB
stat <- rPerBB + .025
rPerHB <- c(stat)

#Runs Per Single
stat <- rPerBB + .155
rPer1B <- c(stat)

#Runs Per Double
stat <- rPer1B + .3
rPer2B <- c(stat)

#Runs Per Triple
stat <- rPer2B + .27
rPer3B <- c(stat)

#Runs per HR
rHR <- 1.4

rSB <- .2

rCS <- 2*rPerOut + .075

df <- data.frame(rPerOut, rPerBB, rPerHB, rPer1B, rPer2B, rPer3B, rHR, rSB, rCS)

return(df)
}

[/code]

Step 3 in the SQL script pulls uses the run values to generate the necessary weights to calculate wOBA. The following R methods accomplish this:

[code language=”r”]
get_woba_runsMinus <- function(batting, runValues){
bbNumerator <- as.numeric(runValues["rPerBB"]) * (sum(batting["BB"]) – sum(batting["IBB"]))
hbpNumerator <- as.numeric(runValues["rPerHB"]) * sum(batting["HBP"])

singles <- sum(batting["H"]) – sum(batting["X2B"]) – sum(batting["X3B"]) – sum(batting["HR"])
singleNumerator <- as.numeric(runValues["rPer1B"]) * singles
doubleNumerator <- as.numeric(runValues["rPer2B"]) * sum(batting["X2B"])
tripleNumerator <- as.numeric(runValues["rPer3B"]) * sum(batting["X3B"])
hrNumerator <- as.numeric(runValues["rHR"]) * sum(batting["HR"])
sbNumerator <- as.numeric(runValues["rSB"]) * sum(batting["SB"])
csNumerator <- as.numeric(runValues["rCS"]) * sum(batting["CS"])

numerator <- bbNumerator + hbpNumerator + singleNumerator + doubleNumerator + tripleNumerator + hrNumerator + sbNumerator – csNumerator

#print(paste("numerator is, ", numerator))

denom <- sum(batting["AB"]) – sum(batting["H"]) + sum(batting["SF"])
#print(paste("denominator is, ", denom))

return(numerator/denom)
}

get_woba_runsPlus <- function(batting, runValues){
#Same as runMinus should probable combine and output df
bbNumerator <- as.numeric(runValues["rPerBB"]) * (sum(batting["BB"]) – sum(batting["IBB"]))
hbpNumerator <- as.numeric(runValues["rPerHB"]) * sum(batting["HBP"])

singles <- sum(batting["H"]) – sum(batting["X2B"]) – sum(batting["X3B"]) – sum(batting["HR"])
singleNumerator <- as.numeric(runValues["rPer1B"]) * singles
doubleNumerator <- as.numeric(runValues["rPer2B"]) * sum(batting["X2B"])
tripleNumerator <- as.numeric(runValues["rPer3B"]) * sum(batting["X3B"])
hrNumerator <- as.numeric(runValues["rHR"]) * sum(batting["HR"])
sbNumerator <- as.numeric(runValues["rSB"]) * sum(batting["SB"])
csNumerator <- as.numeric(runValues["rCS"]) * sum(batting["CS"])

numerator <- bbNumerator + hbpNumerator + singleNumerator + doubleNumerator + tripleNumerator + hrNumerator + sbNumerator – csNumerator
denom <- (sum(batting["BB"]) – sum(batting["IBB"])) + sum(batting["HBP"]) + sum(batting["H"])

return(numerator/denom)
}

woba_calc_lgAvgSpeed <- function(batting, runValues){
numerator <- sum(batting["BB"]) – sum(batting["IBB"]) + sum(batting["HBP"]) + sum(batting["H"]) + sum(batting["SB"]) – sum(batting["CS"])
denom <- sum(batting["AB"]) + sum(batting["BB"]) – sum(batting["IBB"]) + sum(batting["HBP"]) + sum(batting["SF"])

return(numerator/denom)
}

woba_calc_lgAvgReg <- function(batting, runValues){
numerator <- sum(batting["BB"]) – sum(batting["IBB"]) + sum(batting["HBP"]) + sum(batting["H"])
denom <- sum(batting["AB"]) + sum(batting["BB"]) – sum(batting["IBB"]) + sum(batting["HBP"]) + sum(batting["SF"])

return(numerator/denom)
}

get_woba_weights <- function(pitching, batting){
#Get the run values
run_values <- get_runvalues(pitching)

#Compute runs minus
runs_minus <- get_woba_runsMinus(batting, run_values)

runs_plus <- get_woba_runsPlus(batting, run_values)

#woba <- woba_calc_lgAvgSpeed(batting, run_values)
woba <- woba_calc_lgAvgReg(batting, run_values)

wobaScale <- 1/(runs_plus + runs_minus)

wobaBB <- (as.numeric(run_values["rPerBB"]) + runs_minus)*wobaScale
wobaHB <- (as.numeric(run_values["rPerHB"]) + runs_minus)*wobaScale
woba1B <- (as.numeric(run_values["rPer1B"]) + runs_minus)*wobaScale
woba2B <- (as.numeric(run_values["rPer2B"]) + runs_minus)*wobaScale
woba3B <- (as.numeric(run_values["rPer3B"]) + runs_minus)*wobaScale
wobaHR <- (as.numeric(run_values["rHR"]) + runs_minus)*wobaScale
wobaSB <- as.numeric(run_values["rSB"])*wobaScale
wobaCS <- as.numeric(run_values["rCS"])*wobaScale

df <- data.frame(run_values, runs_minus, runs_plus, woba, wobaScale, wobaBB, wobaHB, woba1B, woba3B, wobaHR, wobaSB, wobaCS)

return(df)
}
[/code]

With all this in place I now needed to test my functions against the table he has here . The data I ended up using for the test is from Fangraphs . The code below calculates the wOBA weights:

[code language=”r”]
fg_batting_data = read.csv("data/FG_MLB_Batting_Std_All.csv")
fg_pitching_data = read.csv("data/FG_MLB_Pitching_Std_All.csv")
min_pa = subset(fg_batting_data, PA > 80)
wobaWeights = get_woba_weights(fg_pitching_data, fg_batting_data)
wobaWeightsFiltered = get_woba_weights(fg_pitching_data, min_pa)
wobaWeights
[/code]

Here’s the output for the wOBA weights for 2007:

[code]
rPerOut rPerBB rPerHB rPer1B rPer2B rPer3B rHR rSB rCS
1 0.1796024 0.3196024 0.3446024 0.4746024 0.7746024 1.044602 1.4 0.2 0.4342048
runs_minus runs_plus woba wobaScale wobaBB wobaHB woba1B woba2B
1 0.2792543 0.5642809 0.3310523 1.185487 0.7099368 0.739574 0.8936873 1.249333
woba3B wobaHR wobaSB wobaCS
1 1.569415 1.990734 0.2370974 0.5147441
[/code]

These numbers line up with the numbers from Tom Tangos table for 2007 I linked above. Here’s a look at what the side by side numbers:

label original calc my calc
year 2007 2007
rPerOut .179 0.1796024
rPerBB .32 .3196024
rPerHB .34 .3446024
rPer1B .47 .4746024
rPer2B .77 .7746024
rPer3B 1.04 1.044602
rHR 1.40 1.4
rSB .2 .2
rCS .43 .4342048
rMinus .28 .2792543
rPlus .56 .5642809
wOBA .331 .3310523
wobaScale 1.19 1.185487
wobaBB .71 .7099368
wobaHB .74 .739574
woba1B .89 .8936873
woba2B 1.25 1.249333
woba3B 1.57 1.569415
wobaHR 1.99 1.990734
wobaSB .24 .2370974
wobaCS .51 .5147441

With some simplification the weights I produced with the R code above would line up with Tom Tango’s original chart for 2007. Since I’m now able to calculate wOBA weights from scratch I’ll be able to calculate wRAA and apply wOBA in other offensive environments.

2 thoughts on “Learning wOBA

    1. Tom thanks for the feedback I really appreciate it. Got this up late last night and was actually planning on asking you for your thoughts on this. Going to clean up the final output(put the constants in a table) and will look at the equations as well tonight.

Leave a Reply

Your email address will not be published. Required fields are marked *