MLeap is another option (Apache licensed) https://github.com/TrueCar/mleap
------------------------------------------------------------------------------- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action> > On 21 Jul 2016, at 06:47, Simone <simone.mirag...@gmail.com> wrote: > > Thanks for your reply. > > I cannot rely on jpmml due licensing stuff. > I can evaluate writing my own prediction code, but I am looking for a more > general purpose approach. > > Any other thoughts? > Best > Simone > Da: Peyman Mohajerian <mailto:mohaj...@gmail.com> > Inviato: 20/07/2016 21:55 > A: Simone Miraglia <mailto:simone.mirag...@gmail.com> > Cc: User <mailto:user@spark.apache.org> > Oggetto: Re: ML PipelineModel to be scored locally > > One option is to save the model in parquet or json format and then build your > own prediction code. Some also use: > https://github.com/jpmml/jpmml-sparkml > <https://github.com/jpmml/jpmml-sparkml> > It depends on the model, e.g. ml v mllib and other factors whether this works > on or not. Couple of weeks ago there was a long discussion on this topic. > > > On Wed, Jul 20, 2016 at 7:08 AM, Simone Miraglia <simone.mirag...@gmail.com > <mailto:simone.mirag...@gmail.com>> wrote: > Hi all, > > I am working on the following use case involving ML Pipelines. > > 1. I created a Pipeline composed from a set of stages > 2. I called "fit" method on my training set > 3. I validated my model by calling "transform" on my test set > 4. I stored my fitted Pipeline to a shared folder > > Then I have a very low latency interactive application (say a kinda of web > service), that should work as follows: > 1. The app receives a request > 2. A scoring needs to be made, according to my fitted PipelineModel > 3. The app sends the score to the caller, in a synchronous fashion > > Is there a way to call the .transform method of the PipelineModel over a > single Row? > > I will definitely not want to parallelize a single record to a DataFrame, nor > relying on Spark Streaming due to latency requirements. > I would like to use something similar to mllib .predict(Vector) method which > does not rely on Spark Context performing all the computation locally. > > Thanks in advance > Best >