Currently there is no direct way in Spark to serve models without bringing
in all of Spark as a dependency.

For Spark ML, there is actually no way to do it independently of DataFrames
either (which for single-instance prediction makes things sub-optimal).
That is covered here: https://issues.apache.org/jira/browse/SPARK-10413

So, your options are (in Scala) things like MLeap, PredictionIO, or "roll
your own". Or you can try to export to some other format such as PMML or
PFA. Some MLlib models support PMML export, but for ML it is still missing
(see https://issues.apache.org/jira/browse/SPARK-11171).

There is an external project for PMML too (note licensing) -
https://github.com/jpmml/jpmml-sparkml - which is by now actually quite
comprehensive. It shows that PMML can represent a pretty large subset of
typical ML pipeline functionality.

On the Python side sadly there is even less - I would say your options are
pretty much "roll your own" currently, or export in PMML or PFA.

Finally, part of the "mllib-local" idea was around enabling this local
model-serving (for some initial discussion about the future see
https://issues.apache.org/jira/browse/SPARK-16365).

N

On Thu, 11 Aug 2016 at 06:28 Michael Allman <mich...@videoamp.com> wrote:

> Nick,
>
> Check out MLeap: https://github.com/TrueCar/mleap. It's not python, but
> we use it in production to serve a random forest model trained by a Spark
> ML pipeline.
>
> Thanks,
>
> Michael
>
> On Aug 10, 2016, at 7:50 PM, Nicholas Chammas <nicholas.cham...@gmail.com>
> wrote:
>
> Are there any existing JIRAs covering the possibility of serving up Spark
> ML models via, for example, a regular Python web app?
>
> The story goes like this: You train your model with Spark on several TB of
> data, and now you want to use it in a prediction service that you’re
> building, say with Flask <http://flask.pocoo.org/>. In principle, you
> don’t need Spark anymore since you’re just passing individual data points
> to your model and looking for it to spit some prediction back.
>
> I assume this is something people do today, right? I presume Spark needs
> to run in their web service to serve up the model. (Sorry, I’m new to the
> ML side of Spark. 😅)
>
> Are there any JIRAs discussing potential improvements to this story? I did
> a search, but I’m not sure what exactly to look for. SPARK-4587
> <https://issues.apache.org/jira/browse/SPARK-4587> (model import/export)
> looks relevant, but doesn’t address the story directly.
>
> Nick
> ​
>
>
>

Reply via email to