Thanks for your reply. 

I cannot rely on jpmml due licensing stuff.
I can evaluate writing my own prediction code, but I am looking for a more 
general purpose approach. 

Any other thoughts?
Best
Simone

----- Messaggio originale -----
Da: "Peyman Mohajerian" <mohaj...@gmail.com>
Inviato: ‎20/‎07/‎2016 21:55
A: "Simone Miraglia" <simone.mirag...@gmail.com>
Cc: "User" <user@spark.apache.org>
Oggetto: Re: ML PipelineModel to be scored locally

One option is to save the model in parquet or json format and then build your 
own prediction code. Some also use: 
https://github.com/jpmml/jpmml-sparkml
It depends on the model, e.g. ml v mllib and other factors whether this works 
on or not. Couple of weeks ago there was a long discussion on this topic.


On Wed, Jul 20, 2016 at 7:08 AM, Simone Miraglia <simone.mirag...@gmail.com> 
wrote:

Hi all,


I am working on the following use case involving ML Pipelines.


1. I created a Pipeline composed from a set of stages
2. I called "fit" method on my training set
3. I validated my model by calling "transform" on my test set
4. I stored my fitted Pipeline to a shared folder


Then I have a very low latency interactive application (say a kinda of web 
service), that should work as follows:
1. The app receives a request
2. A scoring needs to be made, according to my fitted PipelineModel
3. The app sends the score to the caller, in a synchronous fashion


Is there a way to call the .transform method of the PipelineModel over a single 
Row?


I will definitely not want to parallelize a single record to a DataFrame, nor 
relying on Spark Streaming due to latency requirements.
I would like to use something similar to mllib .predict(Vector) method which 
does not rely on Spark Context performing all the computation locally.


Thanks in advance
Best

Reply via email to