Browsed by
Author: datadidit

2017 Goals

2017 Goals

make it happen

It’s 2017 time for new yearly goals. Here’s a list of some of the goals I have this year that I’m making public:

  • 2 Blogs a month/24 for the year
  • Average 1 Youtube video a month
  • Read 24 Books in the bible
  • Get down to <=185 lbs
  • Beat  2:50:49 time in Half Marathon
  • Run a Spartan Race
  • Read 4 Technical Books(currently reading D3 in Action)
  • Contribute to an open source project
  • Add Dashboard to Hitter Prediction Tool for Hitters
How To Make a Custom NiFi Processor

How To Make a Custom NiFi Processor

I read a couple of forum posts(link1, link2) about converting CSV data to JSON with Apache NiFi. There are already solutions proposed for going from CSV to JSON in both of those links, one of which is writing your own Custom Processor. Since I already have code to convert data from CSV to JSON (see my post), I decided to write a NiFi Processor to accomplish the same thing. This blog entry will show how that was done.

NiFi has a guide for developers reviewing several topics, including the Processor API. The NiFi team also has a Confluence page that documents the archetype command necessary to make a template processor project. The Maven archetype command for creating your processor template is:

mvn archetype:generate -DarchetypeGroupId=org.apache.nifi -DarchetypeArtifactId=nifi-processor-bundle-archetype -DarchetypeVersion=1.0.0 -DnifiVersion=1.1.0

Once you run the command, you’ll be prompted for input. Guidance for what to put in each field can be found in the developer guide or the confluence page linked above. If you make a mistake you can always refactor the names later. Once the command is run you’ll have a project structure that looks similar to this:

NiFi Processor Project Structure. Picture retrieved from this blog post: http://bryanbende.com/development/2015/02/04/custom-processors-for-apache-nifi
NiFi Processor Project Structure. Picture retrieved from this blog post: http://bryanbende.com/development/2015/02/04/custom-processors-for-apache-nifi

 

Before moving forward, I want to note the importance of the ‘org.apache.nifi.processor.Processor’ file located in ‘src/main/resources/META-INF/services’. This file defines the location of the processor you’ll be writing. If you rename the Java file for your processor, make sure you also update the processor name in the file as well.  With the project structure now in place you can begin to develop your first processor. Going forward, I will be using code snippets from the processor I developed to discuss properties, relationships, the onTrigger method, and testing the processor. The full code can be found here.

The developer guide is helpful for understanding concepts used in NiFi processors. However, the best way to learn how to do anything is by looking at the way that other people have done it. For this reason, I decided to  look at NiFi source to see how their developers write Processors. One of the first things that I noticed when looking at the processors was that each one had properties and relationships. When making my convertCSVToJSON processor I knew I needed two properties:

  • header: whether the incoming CSV contains a header
  • field names: The field names to use for the incoming CSV when converting it to flat JSON. If the incoming file already has headers, this can be empty.

Below is the code I wrote for defining those properties.

public static final PropertyDescriptor HEADER = new PropertyDescriptor
.Builder().name("header")
.displayName("header")
.description("Whether or not a header exists in the incoming CSV file.(default true)")
.required(true)
.allowableValues("true", "false")
.defaultValue("true")
.build();
public static final PropertyDescriptor FIELD_NAMES = new PropertyDescriptor
.Builder().name("Field Names")
.displayName("Field Names")
.description("Names of the fields in the CSV if no header exists. Field names must be in order.")
.required(false)
.addValidator(StandardValidators.NON_BLANK_VALIDATOR)
.build();

The properties for my processor are now defined. Next, I need to add relationships. NiFi defines Relationships as:

Relationship: Each Processor has zero or more Relationships defined for it. These Relationships are named to indicate the result of processing a FlowFile. After a Processor has finished processing a FlowFile, it will route (or “transfer”) the FlowFile to one of the Relationships. A DFM is then able to connect each of these Relationships to other components in order to specify where the FlowFile should go next under each potential processing result.

User Guide

In general you’ll have two relationships a SUCCESS relationship for all data that is successfully processed by your Processor and a FAILURE relationship for all data that’s unsuccessfully processed. Relationships aren’t a requirement for your processor you can have 0-N relationships. Below is a code snippet where I defined my relationships.

public static final Relationship REL_SUCCESS = new Relationship.Builder()
.name("success")
.description("Successfully converted incoming CSV file to JSON")
.build();
public static final Relationship REL_FAILURE = new Relationship.Builder()
.name("failure")
.description("Failed to convert incoming CSV file to JSON")
.build();

Once you define your Properties and Relationships you’ll need to add them to your ‘descriptors’ and ‘relationships’ collections. In your initial MyProcessor.java class, you’ll have a template to accomplish that.  Update the code to have any additional relationships or properties you add. This is all done in the ‘init’ method. See example below:

protected void init(final ProcessorInitializationContext context) {
final List<PropertyDescriptor> descriptors = new ArrayList<PropertyDescriptor>();
descriptors.add(HEADER);
descriptors.add(FIELD_NAMES);
this.descriptors = Collections.unmodifiableList(descriptors);
final Set<Relationship> relationships = new HashSet<Relationship>();
relationships.add(REL_SUCCESS);
relationships.add(REL_FAILURE);
this.relationships = Collections.unmodifiableSet(relationships);
csvMapper = new CsvMapper();
}
@Override

I decided to use the ‘onScheduled’ method to initialize my CsvSchema object. The CsvSchema only needs to be created once the processor is configured so it made sense to use the ‘onScheduled’ method in this case. You can read more about Component Lifecycle in the developer guide.  Below is a code snippet of my ‘onScheduled’ method:

public void onScheduled(final ProcessContext context) throws ConfigurationException {
//Retrieve properties from context
Boolean header = context.getProperty(HEADER).asBoolean();
String fieldNames = context.getProperty(FIELD_NAMES).getValue();
/*
* Create Schema based on properties from user.
*/
if(!header && fieldNames!=null){
Builder build = CsvSchema.builder();
for(String field : fieldNames.split(",")){
build.addColumn(field, CsvSchema.ColumnType.NUMBER_OR_STRING);
}
schema = build.build();
}else if(header && fieldNames!=null && !fieldNames.equals("")){
schema = this.buildCsvSchema(fieldNames, header);
}else if(!header && fieldNames==null){
throw new ConfigurationException("File must either contain headers or you must provide them..");
}else{
schema = CsvSchema.emptySchema().withHeader();
}
}

With all the setup out of the way, I can get down to adding the code that actually does the conversion. The work of a processor is all done in the ‘onTrigger’ method. You can see my ‘onTrigger’ method below:

public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
FlowFile flowFile = session.get();
if ( flowFile == null ) {
return;
}
try {
//Read in Data
InputStream stream = session.read(flowFile);
String csv = IOUtils.toString(stream, "UTF-8");
stream.close();
//Convert CSV data to JSON
List<Map<?,?>> objects = this.readObjectsFromCsv(csv);
//Convert to JSON String
String json = this.writeAsJson(objects);
//Output Flowfile
FlowFile output = session.write(flowFile, new OutputStreamCallback(){
@Override
public void process(OutputStream outputStream) throws IOException {
IOUtils.write(json, outputStream, "UTF-8");
}
});
output = session.putAttribute(output, CoreAttributes.MIME_TYPE.key(), APPLICATION_JSON);
//TODO: May want to have a better default name....
output = session.putAttribute(output, CoreAttributes.FILENAME.key(), UUID.randomUUID().toString()+".json");
session.transfer(output, REL_SUCCESS);
} catch (IOException e) {
getLogger().error("Unable to process Change CSV to JSON for this file "+flowFile.getAttributes().get(CoreAttributes.FILENAME));
session.transfer(flowFile, REL_FAILURE);
}
}

The next step is to test the code. NiFi has a TestRunner interface that’s already setup for me to run the unit tests.  Just add your processor as the class that the TestRunner will use. Below is an example:

@Before
public void init() {
testRunner = TestRunners.newTestRunner(ConvertCSVToJSON.class);
}

Then you can go ahead and write your first unit test for your processor:

@Test
public void testWithHeader() throws FileNotFoundException, UnsupportedEncodingException {
//Set Headers
testRunner.setProperty(ConvertCSVToJSON.HEADER, "true");
testRunner.enqueue(new FileInputStream(new File("src/test/resources/WithHeader.csv")));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ConvertCSVToJSON.REL_SUCCESS, 1);
List<MockFlowFile> successFiles = testRunner.getFlowFilesForRelationship(ConvertCSVToJSON.REL_SUCCESS);
for(MockFlowFile mockFile : successFiles){
System.out.println(new String(mockFile.toByteArray(), "UTF-8"));
}
}

You can find more examples of unit tests for processors here. With the code now written and tested, the only thing left is to deploy the code in your NiFi instance so you can use it. To deploy the code you copy the ‘nar’ file produced from your build into $NIFI_HOME/lib . Once you start/restart your NiFi instance, you’ll now be able to access your processor:

Insert Custom Processor Screenshot
Insert Custom Processor Screenshot
Processor in Workspace
Processor in Workspace
Configure Processor properties
Configure Processor properties

Helpful Links:

Updates

Updates

Haven’t wrote a post in awhile was working on moving the site over and building the hitter predictor tool amongst other things. Switched my site from being hosted by WordPress.com to hosting it on my own Server hosted by Digital Ocean and using Google Domains as my registrar. Definitely the more challenging option but it’s been a rewarding experience. Will go into more details on a future post. Planning on continuing to blog but also provide video examples with some of my posts to either help get the point across or avoid the ‘TLDR’ phenomenon. Thanks for stopping by more content to come.

 

Combining R and Java

Combining R and Java

Was curious if there were any libraries out there for combining R and Java so did some research to figure out the best library out there for this. Why? Since I know Java already combining it directly with R is something I was interested in. A use case could be if you have a mathematician who is great at using R to produce models but not so good at writing Java code to tie those models into your application. Combining R and Java(assuming your app is in Java) would would be an easy way to let the mathematician do his job and let the Java developers easily integrate the mathematician’s model.

Libraries

Helpful Links

Example

Thoughts

Libraries

Below are some of the libraries available for combining R and Java.

  • Renjin : This is the library I went with. Library appears to be actively developed. They have a nice website and good docs teaching you how to use the library.
  • RCaller: Was close to choosing this one. Library is actively developed and has good docs(didn’t try to use them but appear intuitive).
  • JRI
  • RServer
  • rJava

Helpful Links

Example

Ended up using renjin as my Java to R library. This example is a simple web service that would allow a user RESTfully input ‘x’ and ‘y’ coordinates and generate a model using R. The webservice also provides functionality for making predicitons based on your model. I used Jetty, CXF, Renjin, Jackson and Maven in the example. The source can be found here. Here are some screen shots showing the example endpoints:

modelpost
Using REST Client to post to data for my model.
getmodel
Hitting the GET endpoint to see my model.
predict
Passing in input to the model to get a prediction.

Thoughts

Renjin was pretty easy to use. But wish they had some more helper methods to get Vector or other custom data types they have into native Java Objects. Also don’t like that the toStrings() only list a certain number of variables(i.e. if you have a Vector and want to see all the ‘names’ it has). At some point I’ll sit down and figure out how to use Shiny but this could be a possible interm solution for using an R Script directly in your webapp.

Modeling Hit Rates Between Minor League Levels

Modeling Hit Rates Between Minor League Levels

Working on figuring out the hit rates for minor leaguer batters between levels. I’d like to take the hit rates(i.e. singles(1B/PA), doubles(2B/PA), triples(3B/PA) and HRs(HR/PA) ) a player had at their previous minor league level and use that data to predict how a player will do at the following level. Similar data has been used as in the previous articles on walk rates and strike out rates. This data set covered 2011-2015 and players with a minimum of 200 PA’s were included in the resulting model. Below are the graphs for each level, models and some thoughts.

A to A+

atoaplushitrates
A to A+ Hit Rates

A theme throughout the graphs will show that the correlation numbers for singles and home runs are high but very low for doubles and triples. These same low correlation numbers for doubles and triples were found in previous research by Matt Klassen at Fangraphs.

Linear models:

  •  A+ Single Rate = (A single rate)*0.53520 + 0.07452
  • A+ Double Rate = (A double rate)*.36379 + .02929
  • A+ Triple Rate = (A triple rate)*.403826 + .004743
  • A+ HR Rate = (A HR rate)*.633131 + .0006235

A+ to AA

aplustoaahitrates
A+ to AA Hit Rates

Linear models:

  • AA Single Rate = (A+ Single rate)*.48235 + .07969
  • AA Double Rate = (A+ Double rate)*.22680 + .03389
  • AA Triple Rate = (A+ Triple rate)*.377505 + .003751
  • AA HR Rate = (A+ HR rate)*.534897 + .007925

AA to AAA

aatoaaahitrate
AA to AAA Hit Rate

Linear models:

  • AAA Single Rate = (AA Single Rate)*.52767 + .07912
  • AAA Double Rate = (AA Double Rate)*.248769 + .03645
  • AAA Triple Rate = (AA Triple Rate)*.355865 + .003757
  • AAA HR Rate = (AA HR Rate)*.58037 + .00881

Whats Next:

  • Perform some validation on the above models
  • Combine the models you’ve generated to predict OBP/SLG/OPS
  • Make models that skip levels
  • Make code more efficient so you can do this faster

 

 

Modeling Strikeout Rate between minor league levels

Modeling Strikeout Rate between minor league levels

In this post I’ll go over my results for predicting strikeout rates between minor league levels. This article will cover the following:

Data

Data Wrangling

Graphs and Correlation

Model and Evaluation

Data

This time around I’ve change my approach up so I can do some cross-validation. The article will cover data from 2004-2015 but I’ll be training my model on data from 2004-2013 and evaluating it using the 2014-2015 data. The data itself consists of 39,349 data points and came from Baseball Reference . The data points represent minor league data from Short Season(SS-A) to AAA ball. I end up removing the SS-A data because currently I’m only modeling data between the full season leagues(A-AAA). Also, players data points were only included if they had >=200 plate appearances.

Data Wrangling

In order to model the data between minor league levels I need to do some data wrangling to get the dataframe in the format I need. The original data has each players season as a different entry.

ramosoriginaldata
Snippet from original dataframe. Each entry represents a year and minor league level the stats are for.

In order for me to graph and get correlation values between minor league levels I need all this data on one row with the stats for each level represented by a column. Below you can see a snippet of the dataframe I use for my analysis:

ramoscorrelationsnippet
Snippet of correlation dataframe.

Notice how in the dataframe above all the stats I need for each level have been merged into one row.

Graphs and Correlation

regressionlineformilbdatausedformodel
Graphs showing the scatter plot and regression lines for the levels of minor league data I modeled.

As you can see from the graphs above a positive linear relationship exists for strike out rate between the minor league levels(A to A+, A+ to AA, AA to AAA) I’ve analyzed. Here are the correlation values for each level:

  • A to A+ :  0.7532319
  • A+ to AA : 0.7717004
  • AA to AAA : 0.7666475

From the numbers above and graphs you can see a ‘strong’ positive correlation exists for the strikeout rate between levels.

Model and Evaluation

The models for the regression line in the graphs above are:

  • A to A+ : A+ SO Rate = .7598*(A SO Rate) + .04591
  • A+ to AA: AA SO Rate = .83204*(A+ SO Rate) + .03608
  • AA to AAA: AAA SO Rate = .80664*(AA SO Rate) + .04147

The ‘Doing Data Science‘ book suggests using R-squared, p-values, and cross-validation to validate linear models. For this article I’ll be using R-squared and cross-validation:

  • A to A+: .5674
  • A+ to AA: .5955
  • AA to AAA: .5877

To do cross validation I’m going to use the data  from 2014-2015. This dataset consists of  of 8198 points. I performed the same steps I described above in the data wrangling section and that bought the dataframe I do my analysis on down to 427 points. The correlation numbers remained strong per level:

  • A to A+: 0.7366793
  • A+ to AA: 0.729288
  • AA to AAA: 0.7794951

Here is a graph showing the regression line against the 2014-2015 data:

crossvalidationmultigraph

To tell how often I’m correct or not I once again used the classification provided by fangraphs in this chart:

fangraphsbbrate
Picture retrieved from http://www.fangraphs.com/library/offense/rate-stats/

This time using the average difference between classifications of K% and got that to be .0291667. So if my model is more than ~.03 off the actual error rate then I say it’s wrong for that data point. Here are my results for each level:

A to A+:

  • Incorrect: 48
  • Correct: 66
  • Percentage Correct: 57.89%

A+ to AA:

  • Incorrect:78
  • Correct: 93
  • Percentage Correct: 54.39%

AA to AAA:

  • Incorrect: 52
  • Correct: 74
  • Percentage Correct: 58.73

 

Modeling Walk Rate between minor league levels

Modeling Walk Rate between minor league levels

After reading through Projecting X by Mike Podhorzer I decided to try and predict some rate statistics between minor league levels. Mike states in his book “Projecting rates makes it dramatically easier to adjust a forecast if necessary.”; therefore if a player is injured or will only have a certain number of plate appearances that year I can still attempt to project performance. The first rate statistic I’m going to attempt project is Walk Rate between minor league levels. This article will cover the following:

Raw Data

Data Cleaning

Correlation and Graphs

Model and Results

Examples

Raw Data

For my model I used data from Baseball Reference and am using the last 7 years of minor league data(2009-2015). Accounting for the Short Season A(SS-A) to AAA affiliates I ended up with over 28,316 data points for my analysis.

Data Cleaning

I’m using R and the original dataframe I had put all the data from each year in different rows. In order to do the calculations I wanted to do I needed to move each players career minor league data to the same row. Also I noticed I needed to filter on plate appearances during a season to make sure I’m getting rid of noise. For example, a player on a rehab assignment in the minor leagues or a player who ended up getting injured for most of the year so they only had 50-100 plate appearances. The min plate appearances I ended up settling on was 200 for a player to be factored into the model. Another thing I’m doing to remove noise is only attempting to model player performance between full season leagues(A, A+, AA, AAA). Once the cleaning of the data was done I had the following data points for each level:

  • A to A+ : 1129
  • A+ to A: 1023
  • AA to AAA: 705

Correlation and Graphs

I was able to get strong correlation numbers for walk rate between minor league levels. You can see the results below:

  • A to A+ : .6301594
  • A+ to AA: .6141332
  • AA to AAA: .620662

Here’s the graphs for each level:

atoaplusbbrategraph

aplustoaamaporig

aatoaaabbrategraph

Model and Results

The linear models for each level are:

  • A to A+: A+ BB% = .63184*(A BB%) + .02882
  • A+ to AA: AA BB% = .6182*(A+ BB%) + .0343
  • AA to AAA: AAA BB% = .5682(AA BB%) + .0342

In order to interpret the success or failure of my results I compared how close I was to getting the actual walk rate. Fangraphs has a great rating scale for walk rate at the Major League level:

fangraphsbbrate
Image from Fangraphs 

The image above gives a classification for multiple levels of walk rates. While based on major league data it’s a good starting point for me to decide a margin of error for my model. The mean difference between each level in the Fangraphs table is .0183333, I ended up rounding and made my margin for error .02. So if my predicted value for a players walk rate was within .02 of being correct I counted counted the model as correct for the player and if my error was greater than that it was wrong. Here are the models results for each level:

  • A to A+
    • Incorrect: 450
    • Correct: 679
    • Percentage Correct: ~.6014
  • A+ to A
    • Incorrect: 445
    • Correct: 578
    • Percentage Correct: ~.565
  • AA to AAA
    • Incorrect: 278
    • Correct: 427
    • Percentage Correct: ~.6056

When I moved the cutoff up a percentage to .03 the models results drastically improve:

  • A to A+
    • Incorrect: 228
    • Correct: 901
    • Percentage Correct: ~.798
  • A+ to AA
    • Incorrect: 246
    • Correct: 777
    • Percentage Correct: ~.7595
  • AA to AAA
    • Incorrect: 144
    • Correct: 561
    • Percentage Correct: ~.7957

Examples

Numbers are cool but where are the actual examples. Ok, lets start off with my worst prediction. The largest error I had between levels was A to A+ and the error was >10%(~.1105). The player in this case was Joey Gallo a quick glance at the player page will show his A walk rate was only .1076 and his A+ walk rate was .2073 which is a 10% improvement between levels. So why did this happen and why didn’t my model do a better job of predicting this. Currently the model is only accounting for the previous seasons walk rate but what if the player is getting a lot of hits at one level and stops swinging as much on the next. In Gallo’s case he only had a .245 BA his year at A ball so that wasn’t the case. More investigation is required to see how the model can get closer on edge cases like this.

galloatoasnippet
Gallo Dataframe Snippet

The lowest I was able to set the error too and still come back with results was ~.00004417. That very close prediction belongs too Erik Gonzalez. Don’t know Erik Gonzalez so I continued to look for results setting the min error to .0002 brought back Stephen Lombardozzi as one of my six results. Lombo’s interesting to hard core Nats fans(like myself) but wanted to continue to look for a more notable name. Finally after upping the number to .003 for A to A+ data I was able to see that the model successfully predicted Houston Astro’s multi-time All Star 2B walk rate for Jose Altuve walk rate within a .003 margin of error.

altuvedfsnippet
Altuve Dataframe snippet

 

Whats Next:

  • Improve model to get a lower max error
  • Predict Strike out rate between levels
  • Predicting more advanced statistics like woba/ops/wrc

 

Correlation between Salary Cap and Winning?

Correlation between Salary Cap and Winning?

After doing my initial blog looking at how much each team is spending per position group. I wanted to take a look to see if there was any correlation between how much teams are spending on a position group and winning. To do this I needed to merge the cap data from spotrac  and season summary data from pro-football-reference . I merged these datasets over the last 5 years but it’d be interesting to try and find data since the salary cap was in place(1994). Here’s a graph of my yearly findings for the last 5 years(2011-2015).

rplot01

Quick review on correlation from Pearson:

  • .00-.19 “very weak”
  • .20-.39 “weak”
  •  .40-.59 “moderate”
  • .60-.79 “strong”
  • .80-1.0 “very strong”

As you can see from the graph the correlation numbers aren’t exactly high. I believe that’s because the best players aren’t necessarily getting paid the most money. For example, before last year Russel Wilson was on his rookie contract and the Seahawks were making the playoffs year after year and only paying Russel $749,176  a year. Now he know doubt had a lot to do with the Seahawks winning before and going forward but his Salary before his new contract wouldn’t have correlated much to winning. Examples like this can be found at each position. This is why it’s necessary to have a good Front Office to continually bring in young talent that can contribute at a lower price. Looking at actual on the field stats and trying to correlate would be a much better exercise than trying to merge cap data with correlation.

Position Correlation
DB 0.05356608
DL -0.10102064
LB 0.14434313
OL -0.0783075
QB 0.08256776
RB -0.0917516
ST -0.05109986
TE -0.04013766
WR 0.07491824

Over the last 5 years there’s not one correlation that’s greater than ‘very weak’. But the positions that do have positive correlations DB, LB, QB, and WR are the positions that GM’s over the last 5 years have been willing to pay. Left Tackle is another position that has been getting paid very well in the league because Left Tackle’s are usually the one’s protecting a QB’s blind spot.

This data emphasizes the importance of drafting well because spending money on a particular position does not correlate with your team winning. Also interesting to note the positions that do have positive correlations must be that way because those players have made it to their second/third contracts and are now getting the big contracts. So it makes me wonder if DB, LB, QB and WR are the positions that have the longest careers in the NFL.

 

 

 

An Ideal Home Media Setup

An Ideal Home Media Setup

I love movies and have a bunch of them but I didn’t want to always have to go upstairs and grab a DVD/Bluray each time I needed access to my movies or any other media that I have so I began searching for an ideal home media setup. I think I’ve found it and figured I’d write about it.

Everything  is moving towards streaming so of course you need a streaming device on all your home tv’s. You can do this with a Smart TV(all TV’s are smart now, although the independent streaming devices are usually much better), Roku, Amazon Fire Stick, Apple TV, Chromecast, XBox One etc… I went with Roku it’s one of the best rated streaming devices and supports numerous apps. You can get a  simple Roku Stick for 49.99 at Best Buy. The sticks are also very convenient for travel because you can stick them right into the HDMI port at your hotel and get access to all your streaming apps.

Now that you have your streaming device what app should you use to host your media at home. After doing some research I went with Plex  it has apps on most streaming devices and will essentially turn your movie collection into a Netflix like interface. Plex is setup in your traditional client server setup. You’ll need to run Plex server on your laptop/desktop/NAS device. Your media should be accessible from the machine that is hosting the Plex server. Plex server will provide you with a Web Interface that you’ll be able to add libraries to representing the directories your media is stored at. Assuming your movies have appropriate names(i.e. a title representing the movie) Plex will find the movie and all the necssary metadata for you. For optimal performance you should follow the suggested naming format:

Once you have the Plex server running all you’ll need to do is install the Plex client on your favorite streaming device. The Plex client then displays all your media in a well organized format for your viewing pleasure.

With this setup you’ll be able to sit anywhere in your home and have access to all your media. With the right setup you can even have access to your home media remotely or share your home media with family and friends through Plex.

 

 

Jackson based JAX-RS Providers(JSON, XML, CSV) Example

Jackson based JAX-RS Providers(JSON, XML, CSV) Example

This blog post discusses returning multiple data formats from your RESTFul endpoints using Jackson JAX-RS providers. Jackson provides a lot of providers that you can see here . The providers allow you to return a POJO(plain old java object) from your REST annotated methods and give back the appropriate media type. Jackson also gives you the ability make your own Entity Providers. Documentation for how to do that is here below I’ll show a generic way to provide CSV’s along with XML and JSON.

Pom

Sample Model Objects

REST Resource

Custom Entity Provider

Configuration

Instructions

Screen Shots

Pom

[code language=”xml”]
<?xml version="1.0"?>
<project
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>datadidit.helpful.hints</groupId>
<artifactId>parent</artifactId>
<version>1.0.0-SNAPSHOT</version>
</parent>
<artifactId>jetty-cxf</artifactId>
<name>jetty-cxf</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>org.apache.cxf</groupId>
<artifactId>cxf-rt-frontend-jaxrs</artifactId>
</dependency>
<dependency>
<groupId>javax.ws.rs</groupId>
<artifactId>javax.ws.rs-api</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-maven-plugin</artifactId>
<version>${jetty.version}</version>
</plugin>
</plugins>
</build>
</project>
[/code]

Model

Simple model object.

[code language=”java”]
package datadidit.helpful.hints.csv.test.model;

import java.util.Date;

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name="SimpleSample")
public class SimpleSample {
private String firstName;

private String lastName;

private Date dob;

public SimpleSample(){}

public SimpleSample(String firstName, String lastName, Date dob){
this.dob = dob;
this.firstName = firstName;
this.lastName = lastName;
}

public String getFirstName() {
return firstName;
}

public void setFirstName(String firstName) {
this.firstName = firstName;
}

public String getLastName() {
return lastName;
}

public void setLastName(String lastName) {
this.lastName = lastName;
}

public Date getDob() {
return dob;
}

public void setDob(Date dob) {
this.dob = dob;
}

}
[/code]

POJO with a Data Structure(Map) embedded in it. This implements the CSVTransformer interface so the CSV Entity Provider below can know how to flatten the POJO.

[code language=”java”]
package datadidit.helpful.hints.csv.test.model;

import java.util.HashMap;
import java.util.Map;

import javax.xml.bind.annotation.XmlRootElement;

import datadidit.helpful.hints.csv.provider.CSVTransformer;

@XmlRootElement(name="ComplexSample")
public class ComplexSample implements CSVTransformer{
private String studentId;

private Map<String, String> grades;

public ComplexSample(){}

public ComplexSample(String studentId, Map<String, String> grades){
this.studentId = studentId;
this.grades = grades;
}

@Override
public Map<?, ?> flatten() {
Map<String, Object> myMap = new HashMap<>();
myMap.put("studentId", studentId);
myMap.putAll(grades);

return myMap;
}

public String getStudentId() {
return studentId;
}

public void setStudentId(String studentId) {
this.studentId = studentId;
}

public Map<String, String> getGrades() {
return grades;
}

public void setGrades(Map<String, String> grades) {
this.grades = grades;
}

}
[/code]

REST

REST Endpoint defining URLs for the Web Service.

[code language=”java”]
package datadidit.helpful.hints.rest;

import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;

import javax.ws.rs.DefaultValue;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;

import datadidit.helpful.hints.csv.provider.CSVTransformer;
import datadidit.helpful.hints.csv.test.model.ComplexSample;
import datadidit.helpful.hints.csv.test.model.SimpleSample;

@Path("CustomProvider")
public class CXFSampleResource {
@GET
@Produces({MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML, "application/csv"})
@Path("test/{caseToUse}")
public List<?> doSomething(@PathParam("caseToUse") @DefaultValue("simple") String caseToUse){
List<Object> test = new ArrayList<>();
if(caseToUse.equalsIgnoreCase("simple")){
for(SimpleSample samp : this.generateSimpleSample())
test.add(samp);
}else{
for(ComplexSample samp : this.generateComplexSample())
test.add(samp);
}

System.out.println("Hello: "+test);
return test;
}

public List<SimpleSample> generateSimpleSample(){
List<SimpleSample> samples = new ArrayList<>();
samples.add(new SimpleSample("hello", "world", new Date()));
samples.add(new SimpleSample("hello", "chad", new Date()));
samples.add(new SimpleSample("hello", "marcus", new Date()));
samples.add(new SimpleSample("hello", "joy", new Date()));
samples.add(new SimpleSample("hello", "mom", new Date()));

return samples;
}

public List<ComplexSample> generateComplexSample(){
Map<String, String> grades = new HashMap<>();
grades.put("Class1", "A");
grades.put("Class2", "B");
grades.put("Class3", "C");
grades.put("Class4", "D");

List<ComplexSample> samples = new ArrayList<>();
samples.add(new ComplexSample(UUID.randomUUID().toString(), grades));
samples.add(new ComplexSample(UUID.randomUUID().toString(), grades));
samples.add(new ComplexSample(UUID.randomUUID().toString(), grades));
samples.add(new ComplexSample(UUID.randomUUID().toString(), grades));

return samples;
}
}
[/code]

Custom Entity Provider

Generic Interface that provides method for flattening a POJO so that the POJO
can be converted to a csv.

[code language=”java”]
package datadidit.helpful.hints.csv.provider;

import java.util.Map;

public interface CSVTransformer {
/**
* Utility method to Flatten POJO so that it can be converted into a CSV
* @return
*/
Map<?,?> flatten();
}
[/code]

Generic Entity Provider to generate a CSV file from a List of POJOs. Uses Jacksons CSV dataformat.

[code language=”java”]
package datadidit.helpful.hints.csv.provider;

import java.io.IOException;
import java.io.OutputStream;
import java.lang.annotation.Annotation;
import java.lang.reflect.Method;
import java.lang.reflect.Type;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.logging.Level;
import java.util.logging.Logger;

import javax.ws.rs.Produces;
import javax.ws.rs.WebApplicationException;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.MultivaluedMap;
import javax.ws.rs.ext.MessageBodyWriter;

import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import com.fasterxml.jackson.dataformat.csv.CsvSchema.Builder;

@Produces("application/csv")
public class CSVBodyWriter implements MessageBodyWriter<Object>{
Logger logger = Logger.getLogger(CSVBodyWriter.class.getName());

public long getSize(Object myCollectionOfObjects, Class type, Type genericType, Annotation[] annotations,
MediaType arg4) {
return 0;
}

public boolean isWriteable(Class type, Type genericType, Annotation[] annotations,
MediaType arg3) {
return true;
}

public void writeTo(Object myCollectionOfObjects, Class type, Type genericType, Annotation[] annotations,
MediaType mediaType, MultivaluedMap httpHeaders, OutputStream entityHeaders)
throws IOException, WebApplicationException {
//Whatever makes it in here should be a list
List<?> myList = new ArrayList<>();
if(myCollectionOfObjects instanceof List && ((myList=(List<?>)myCollectionOfObjects).size()>0)){
CsvMapper csvMapper = new CsvMapper();
CsvSchema schema = null;

/*
* If it’s not a flat POJO must implement
* CSVTransformer
*/
if(implementsCSVTransformer(myList.get(0).getClass())){
Class[] params = {};
try {
Method meth = CSVTransformer.class.getDeclaredMethod("flatten", params);

/*
* Create a new list using the toMap() function
*/
List<Map<String, ?>> listOfMaps = new ArrayList<>();
Set<String> headers = null;
for(Object obj : myList){
Map<String, ?> keyVals = (Map<String, ?>) meth.invoke(obj, params);

if(schema==null){
schema = this.buildSchemaFromKeySet(keyVals.keySet());
headers = keyVals.keySet();
}

//Validate that latest headers are the same as the original ones
if(headers.equals(keyVals.keySet()))
listOfMaps.add(keyVals);
else
logger.warning("Headers should be the same for each objects in the list, excluding this object "+keyVals);
}

csvMapper.writer(schema).writeValue(entityHeaders, listOfMaps);
} catch (Exception e) {
throw new IOException("Unable to retrieve flatten() "+e.getMessage());
}
}else{
schema = csvMapper.schemaFor(myList.get(0).getClass()).withHeader();
csvMapper.writer(schema).writeValue(entityHeaders, myList);
}
}else if(myList.isEmpty()){
logger.warning("Nothing in list to convert to CSV….");
entityHeaders.write(myList.toString().getBytes(Charset.forName("UTF-8")));
}else{
throw new IOException("Not in proper format must pass a java.util.List to use this MessageBodyWriter…");
}
}

public CsvSchema buildSchemaFromKeySet(Set<String> keySet){
Builder build = CsvSchema.builder();
for(String field : keySet){
build.addColumn(field);
}
CsvSchema schema = build.build().withHeader();
return schema;
}

public Boolean implementsCSVTransformer(Class arg1){
Class[] interfaces = arg1.getInterfaces();
for(Class aClass : interfaces){
if(aClass.getName().equals(CSVTransformer.class.getName()))
return true;
}

return false;
}
}
[/code]

Configuration

This xml file configures the CXF servlet, extension mappings and providers for your Web Service to use. Some good docs on this configuration file can be found here.

[code language=”xml”]
<?xml version="1.0" encoding="UTF-8"?>
<web-app id="WebApp_ID" version="2.4"
xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
<display-name>CSV Provider Test</display-name>
<servlet>
<servlet-name>MyApplication</servlet-name>
<servlet-class>org.apache.cxf.jaxrs.servlet.CXFNonSpringJaxrsServlet</servlet-class>
<!– Name of the resource –>
<init-param>
<param-name>jaxrs.serviceClasses</param-name>
<param-value>
datadidit.helpful.hints.rest.CXFSampleResource,
</param-value>
</init-param>
<!– Name of the providers –>
<init-param>
<param-name>jaxrs.providers</param-name>
<param-value>
com.fasterxml.jackson.jaxrs.xml.JacksonJaxbXMLProvider,
com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider,
datadidit.helpful.hints.csv.provider.CSVBodyWriter
</param-value>
</init-param>
<!– Name of the extensions –>
<init-param>
<param-name>jaxrs.extensions</param-name>
<param-value>
csv=application/csv
json=application/json
xml=application/xml
</param-value>
</init-param>
<load-on-startup>1</load-on-startup>
</servlet>

<servlet-mapping>
<servlet-name>MyApplication</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
</web-app>
[/code]

Instructions

  1. From the root of the project run ‘mvn jetty:run’
  2. For the simple example run:
    • xml: http://localhost:8080/CustomProvider/test/simple
    • json: http://localhost:8080/CustomProvider/test/simple.json
    • csv: http://localhost:8080/CustomProvider/test/simple.csv
  3. For the complex example run:
    • xml: http://localhost:8080/CustomProvider/test/complex
    • json: http://localhost:8080/CustomProvider/test/complex.json
    • csv: http://localhost:8080/CustomProvider/test/complex.csv

Screen Shots

When you hit the ‘.csv’ extensions depending on your browser you may notice that the csv is just downloaded as a file.

simple_xml

simple_json

complex_xml

complex_json