I have a similar issue. I want to load a model saved by a spark machine
learning job, in a web application.
model.save(jsc.sc(), "myModelPath");
LogisticRegressionModel model =
LogisticRegressionModel.load(
jsc.sc(), "myModelPath");
When I do that, I need to pass a spark context for loading the model. The
model is small and can be saved to local file system, so is there any way to
use it without the spark context? Looks like creating spark context is an
expensive step that internally starts a jetty server. I do not want to start
one more web server inside a web application.
A solution that I received (pasted below) was to export the model into a
generic format such as PMML. I haven't tried it, and I am hoping to find a way
to use the model without adding a lot more dependencies and code to the project.
On Oct 30, 2015, at 2:11 PM, Stefano Baghino
<[email protected]<mailto:[email protected]>> wrote:
One possibility would be to export the model as a PMML (Predictive Model Markup
Language, an XML-based standard to describe predictive models) and then use it
in your Web app (using something like JPMML<https://github.com/jpmml>, for
example). You can directly export (some) models (including LinReg) since Spark
1.4: https://databricks.com/blog/2015/07/02/pmml-support-in-spark-mllib.html
For more info on PMML support on MLlib (including model support):
https://spark.apache.org/docs/latest/mllib-pmml-model-export.html
For more info on the PMML standard:
http://dmg.org/pmml/v4-2-1/GeneralStructure.html
Thanks
Viju
From: Andy Davidson [mailto:[email protected]]
Sent: Tuesday, November 10, 2015 1:32 PM
To: user @spark
Subject: thought experiment: use spark ML to real time prediction
Lets say I have use spark ML to train a linear model. I know I can save and
load the model to disk. I am not sure how I can use the model in a real time
environment. For example I do not think I can return a "prediction" to the
client using spark streaming easily. Also for some applications the extra
latency created by the batch process might not be acceptable.
If I was not using spark I would re-implement the model I trained in my batch
environment in a lang like Java and implement a rest service that uses the
model to create a prediction and return the prediction to the client. Many
models make predictions using linear algebra. Implementing predictions is
relatively easy if you have a good vectorized LA package. Is there a way to use
a model I trained using spark ML outside of spark?
As a motivating example, even if its possible to return data to the client
using spark streaming. I think the mini batch latency would not be acceptable
for a high frequency stock trading system.
Kind regards
Andy
P.s. The examples I have seen so far use spark streaming to "preprocess"
predictions. For example a recommender system might use what current users are
watching to calculate "trending recommendations". These are stored on disk and
served up to users when the use the "movie guide". If a recommendation was a
couple of min. old it would not effect the end users experience.
----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may
contain information that is privileged, confidential and/or proprietary and
subject to important terms and conditions available at
http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
recipient, please delete this message.