Spark Local Pipelines

Asher Krim Sun, 12 Mar 2017 15:21:50 -0700

Hi All,

I spent a lot of time at Spark Summit East this year talking with Spark
developers and committers about challenges with productizing Spark. One of
the biggest shortcomings I've encountered in Spark ML pipelines is the lack
of a way to serve single requests with any reasonable performance.
SPARK-10413 explores adding methods for single item prediction, but I'd
like to explore a more holistic approach - a separate local api, with
models that support transformations without depending on Spark at all.


I've written up a doc
<https://docs.google.com/document/d/1Ha4DRMio5A7LjPqiHUnwVzbaxbev6ys04myyz6nDgI4/edit?usp=sharing>
detailing the approach, and I'm happy to discuss alternatives. If this
gains traction, I can create a branch with a minimal example on a simple
transformer (probably something like CountVectorizerModel) so we have
something concrete to continue the discussion on.

Thanks,
Asher Krim
Senior Software Engineer

Spark Local Pipelines

Reply via email to