Enriching Datasets with R

Enriching Datasets with R

If you have a simple data set and have some additional statistics you’d like to add to that dataset you can easily do that with R. Going to add fip, woba, wrc, and wraa to a couple of baseball datasets as an example of this.

To calculate FIP I first needed the following R functions:

[code language=”r”]
#Calculate FIP Constant
fip_constant_calc <- function(pitching_data){
#FIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))-(2*lgK))/lgIP)
era = sum(pitching_data["ER"])/sum(pitching_data["IP"]) * 9
lgHR = sum(pitching_data["HR"])
lgBB = sum(pitching_data["BB"])
lgHBP = sum(pitching_data["HBP"])
lgK = sum(pitching_data["SO"])
lgIP = sum(pitching_data["IP"])

fipConstant = era – (((13*lgHR)+(3*(lgBB+lgHBP))-(2*lgK))/lgIP)
#print(paste("ERA", era))

#Calculate FIP
fip_calc <- function(pitching_data, fipConstant){
#FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
hr = as.numeric(pitching_data["HR"])
bb = as.numeric(pitching_data["BB"])
hbp = as.numeric(pitching_data["HBP"])
k = as.numeric(pitching_data["SO"])
ip = as.numeric(pitching_data["IP"])

fip = ((13*hr)+(3*(bb+hbp))-2*(k))/ip + fipConstant


Once you have the necessary functions to calculate the stat you can use apply in R to apply the function to your dataframe.


[code language=”r”]
#Get Data
fg_pitching_data = read.csv("data/2007/FG_MLB_Pitching_Std_All.csv")
constant <- fip_constant_calc(fg_pitching_data)
fg_pitching_data$fip <- apply(fg_pitching_data, 1, fip_calc, fipConstant=constant)

Here are some R functions to calculate wOBA, wrc, and wraa.

[code language=”r”]
#Calculate wOBA based on weights using FG formula
woba_calc_weights_fg <- function(row, weights){
bb <- as.numeric(row["BB"])
hbp <- as.numeric(row["HBP"])
doubles <- as.numeric(row["X2B"])
triples <- as.numeric(row["X3B"])
hr <- as.numeric(row["HR"])
hits <- as.numeric(row["H"])
singles <- hits – triples – doubles – hr
ab <- as.numeric(row["AB"])
ibb <- as.numeric(row["IBB"])
sf <- as.numeric(row["SF"])

numerator <- as.numeric(weights["wobaBB"])*(bb-ibb)+as.numeric(weights["wobaHB"])*hbp+as.numeric(weights["woba1B"])*singles+as.numeric(weights["woba2B"])*doubles+as.numeric(weights["woba3B"])*triples+as.numeric(weights["wobaHR"])*hr
denom <- ab + bb – ibb + sf + hbp


wraa_calc_fg <- function(row, weights){
numerator <- as.numeric(row["woba"]) – as.numeric(weights["woba"])
denom <- as.numeric(weights["wobaScale"])

result = (numerator/denom) * (as.numeric(row["PA"]))


wrc_calc <- function(row, weights){
woba = as.numeric(row["woba"])
lgWOBA = as.numeric(weights["woba"])
wobaScale = as.numeric(weights["wobaScale"])
lgR = as.numeric(weights["lgR"])
lgPA = as.numeric(weights["lgPA"])
pa = as.numeric(row["PA"])
wrc = (((woba-lgWOBA)/wobaScale)+(lgR/lgPA))*pa


Once you have the functions necessary in place you can add the statistics to your dataset using this code:

[code language=”r”]
fg_batting_data = read.csv(&amp;quot;data/2007/FG_MLB_Batting_Std_All.csv&amp;quot;)
wobaWeights = get_woba_weights(fg_pitching_data, fg_batting_data)
fg_batting_data$woba = apply(fg_batting_data, 1, woba_calc_weights_fg, weights=wobaWeights)
fg_batting_data$wraa = apply(fg_batting_data, 1, wraa_calc_fg, weights=wobaWeights)
fg_batting_data$wrc = apply(fg_batting_data, 1, wrc_calc, weights=wobaWeights)

Note: ‘get_woba_weights’ was addressed in the Learning wOBA blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *