[SparkML] RandomForestModel save on disk.

Eugene Morozov Fri, 12 Feb 2016 06:58:59 -0800

Hello,

I'm building simple web service that works with spark and allows users to
train random forest model (mlib API) and use it for prediction. Trained
models are stored on the local file system (web service and spark of just
one worker are run on the same machine).
I'm concerned about prediction performance and established small load
testing to measure prediction latency. That's initially, I will set up hdfs
and bigger spark cluster.


At first I run training 5 really small models (all of them can finish
within 30 seconds).
Next my perf testing framework waits for a minute and start calling
prediction method.

Sometimes I see that not all of the 5 models were saved on disk. There is a
metadata folder for them, but not the data directory that actually contains
parquet files of the models.

I've looked through spark's jira, but haven't found anything similar.
Has anyone experience smth like this?
Could you recommend where to look for?
Might it be something with flushing it to disk immediately (just a wild
idea...)?

Thanks in advance.
--
Be well!
Jean Morozov

[SparkML] RandomForestModel save on disk.

Reply via email to