One option is to save the model in parquet or json format and then build
your own prediction code. Some also use:

https://github.com/jpmml/jpmml-sparkml

It depends on the model, e.g. ml v mllib and other factors whether this
works on or not. Couple of weeks ago there was a long discussion on this
topic.

On Wed, Jul 20, 2016 at 7:08 AM, Simone Miraglia <simone.mirag...@gmail.com>
wrote:

> Hi all,
>
> I am working on the following use case involving ML Pipelines.
>
> 1. I created a Pipeline composed from a set of stages
> 2. I called "fit" method on my training set
> 3. I validated my model by calling "transform" on my test set
> 4. I stored my fitted Pipeline to a shared folder
>
> Then I have a very low latency interactive application (say a kinda of web
> service), that should work as follows:
> 1. The app receives a request
> 2. A scoring needs to be made, according to my fitted PipelineModel
> 3. The app sends the score to the caller, in a synchronous fashion
>
> Is there a way to call the .transform method of the PipelineModel over a
> single Row?
>
> I will definitely not want to parallelize a single record to a DataFrame,
> nor relying on Spark Streaming due to latency requirements.
> I would like to use something similar to mllib .predict(Vector) method
> which does not rely on Spark Context performing all the computation locally.
>
> Thanks in advance
> Best
>

Reply via email to