If you don't find an answer, please click here to post your question.
Engineering Blog

Open sourcing our Spark Job Server

by Community Manager on ‎02-08-2017 10:03 AM (1,294 Views)

Written by Evan Chan 12/19/2013

Software Development


 

 

Here at Ooyala, we have been investing heavily into Spark. We wrote a REST server for submitting, running, and managing Spark jobs and contexts. From the beginning, we thought about the job server as a generic infrastructure piece that could be open sourced, that seems like many organizations could benefit from. At the just concluded Spark Summit 2013, we happily announced that we had started the open sourcing of our job server (video of our talk). The pull request is available for anyone to review.

 

 

Our vision for Spark is as a multi-team big data service. If we take a quick look at what is repeated by every team deploying Hadoop/Spark jobs, we see:

 

  • a bastion box for running Hadoop/Spark jobs
  • Deploys and process monitoring
  • Tracking and serializing job status, progress, and job results
  • Job input validation
  • No easy way to kill jobs -- look for the Hadoop job ID in the log output ???

 

We wrote the Spark job server to provide a RESTful service that can provide much of the repeated infrastructure involved in productionizing Spark jobs. It can help run ad-hoc Spark jobs, but it is especially helpful with sharing Spark RDDs in one SparkContext amongst multiple jobs. An example is that you can spin up a SparkContext, run a job to load the RDDs, then run multiple low-latency query jobs on the RDDs.

 

Our job server enables our polyglot technology stack. We have Ruby scripts doing job scheduling, and an API server written in Go that hits the job server as well.

 

The response has been great. From talking to other folks at the Spark Summit, it seems that just about everybody is building something similar. So, why repeat the effort everywhere? There is already an offer to contribute a scheduling component from others; somebody else is writing a UI on top of what we just open sourced. We are also not planning to just dump this over the wall; there are many features which we look forward to bringing to the job server, including exploring HA driver/SparkContext support, HA job server support, and more.

 

We are looking forward to working with the wider Spark community!