+1 on that. It would be useful to use the model outside of Spark.
_____________________________
From: DB Tsai <[email protected]>
Sent: Wednesday, November 11, 2015 11:57 PM
Subject: Re: thought experiment: use spark ML to real time prediction
To: Nirmal Fernando <[email protected]>
Cc: Andy Davidson <[email protected]>, Adrian Tanase
<[email protected]>, user @spark <[email protected]>
Do you think it will be useful to separate those models and model
loader/writer code into another spark-ml-common jar without any spark platform
dependencies so users can load the models trained by Spark ML in their
application and run the prediction?
Sincerely,
DB Tsai
----------------------------------------------------------
Web: https://www.dbtsai.com
PGP Key ID: 0xAF08DF8D
On Wed, Nov 11, 2015 at 3:14 AM, Nirmal Fernando <[email protected]>
wrote:
As of now, we are basically serializing the ML model and then
deserialize it for prediction at real time.
On Wed, Nov 11, 2015 at 4:39 PM, Adrian Tanase
<[email protected]> wrote:
I
don’t think this answers your question but here’s how you would evaluate the
model in realtime in a streaming app
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html
Maybe you can find a way to extract
portions of MLLib and run them outside of spark – loading the precomputed model
and calling .predict on it…
-adrian
From: Andy Davidson
Date: Tuesday, November 10, 2015 at 11:31 PM
To: "user @spark"
Subject: thought experiment: use spark ML to real time prediction
Lets say I have use spark ML to train a linear model. I know I can save and
load the model to disk. I am not sure how I can use the model in a real time
environment. For example I do not think I can return a “prediction” to the
client using spark streaming easily. Also for some applications the extra
latency created by the batch process might not be acceptable.
If I was not using spark I
would re-implement the model I trained in my batch environment in a lang like
Java and implement a rest service that uses the model to create a prediction
and return the prediction to the client. Many models make predictions using
linear algebra. Implementing predictions is relatively easy if you have a good
vectorized LA package. Is there a way to use a model I trained using spark ML
outside of spark?
As a motivating example,
even if its possible to return data to the client using spark streaming. I
think the mini batch latency would not be acceptable for a high frequency stock
trading system.
Kind regards
Andy
P.s. The examples I have
seen so far use spark streaming to “preprocess” predictions. For example a
recommender system might use what current users are watching to calculate
“trending recommendations”. These are stored on disk and served up to users
when the use the “movie guide”. If a recommendation was a couple of min. old it
would not effect the end users experience.
--
Thanks & regards,
Nirmal
Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/