MLeap is another option (Apache licensed) https://github.com/TrueCar/mleap


-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>





> On 21 Jul 2016, at 06:47, Simone <simone.mirag...@gmail.com> wrote:
> 
> Thanks for your reply. 
> 
> I cannot rely on jpmml due licensing stuff.
> I can evaluate writing my own prediction code, but I am looking for a more 
> general purpose approach. 
> 
> Any other thoughts?
> Best
> Simone
> Da: Peyman Mohajerian <mailto:mohaj...@gmail.com>
> Inviato: ‎20/‎07/‎2016 21:55
> A: Simone Miraglia <mailto:simone.mirag...@gmail.com>
> Cc: User <mailto:user@spark.apache.org>
> Oggetto: Re: ML PipelineModel to be scored locally
> 
> One option is to save the model in parquet or json format and then build your 
> own prediction code. Some also use: 
> https://github.com/jpmml/jpmml-sparkml 
> <https://github.com/jpmml/jpmml-sparkml>
> It depends on the model, e.g. ml v mllib and other factors whether this works 
> on or not. Couple of weeks ago there was a long discussion on this topic.
> 
> 
> On Wed, Jul 20, 2016 at 7:08 AM, Simone Miraglia <simone.mirag...@gmail.com 
> <mailto:simone.mirag...@gmail.com>> wrote:
> Hi all,
> 
> I am working on the following use case involving ML Pipelines.
> 
> 1. I created a Pipeline composed from a set of stages
> 2. I called "fit" method on my training set
> 3. I validated my model by calling "transform" on my test set
> 4. I stored my fitted Pipeline to a shared folder
> 
> Then I have a very low latency interactive application (say a kinda of web 
> service), that should work as follows:
> 1. The app receives a request
> 2. A scoring needs to be made, according to my fitted PipelineModel
> 3. The app sends the score to the caller, in a synchronous fashion
> 
> Is there a way to call the .transform method of the PipelineModel over a 
> single Row?
> 
> I will definitely not want to parallelize a single record to a DataFrame, nor 
> relying on Spark Streaming due to latency requirements.
> I would like to use something similar to mllib .predict(Vector) method which 
> does not rely on Spark Context performing all the computation locally.
> 
> Thanks in advance
> Best
> 

Reply via email to