Hi Sebastian,

You can save models to disk and load them back up. In the snippet below
(copied out of a working Databricks notebook), I train a model, then save
it to disk, then retrieve it back into model2 from disk.

import org.apache.spark.mllib.tree.RandomForest
> import org.apache.spark.mllib.tree.model.RandomForestModel
>


val model = RandomForest.trainClassifier(data, numClasses,
> categoricalFeaturesInfo,
>         numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, seed)
> model.save(sc, inputDir + "models/randomForestModel")
>


val model2 = RandomForestModel.load(sc, inputDir +
> "models/randomForestModel")


Not sure if there is PMML support. The model saves itself into a directory
structure that looks like this:

data/
>   _SUCCESS
>   _common_metadata
>   _metadata
>   part-r-*.gz.parquet (multiple files)
> metadata/
>   _SUCCESS
>   part-00000


HTH

-sujit




On Thu, Oct 22, 2015 at 5:33 AM, Sebastian Kuepers <
sebastian.kuep...@publicispixelpark.de> wrote:

> Hey,
>
> I try to figure out the best practice on saving and loading models which
> have bin fitted with the ML package - i.e. with the RandomForest
> classifier.
>
> There is PMML support in the MLib package afaik but not in ML - is that
> correct?
>
> How do you approach this, so that you do not have to fit your model before
> every prediction job?
>
> Thanks,
> Sebastian
>
>
> Sebastian Küpers
> Account Director
>
> Publicis Pixelpark
> Leibnizstrasse 65, 10629 Berlin
> T +49 30 5058 1838
> M +49 172 389 28 52
> sebastian.kuep...@publicispixelpark.de
> Web: publicispixelpark.de, Twitter: @pubpxp
> Facebook: publicispixelpark.de/facebook
> Publicis Pixelpark - eine Marke der Pixelpark AG
> Vorstand: Horst Wagner (Vorsitzender), Dirk Kedrowitsch
> Aufsichtsratsvorsitzender: Pedro Simko
> Amtsgericht Charlottenburg: HRB 72163
>
>
>
>
>
> ------------------------------------------------------------------------
> Disclaimer The information in this email and any attachments may contain
> proprietary and confidential information that is intended for the
> addressee(s) only. If you are not the intended recipient, you are hereby
> notified that any disclosure, copying, distribution, retention or use of
> the contents of this information is prohibited. When addressed to our
> clients or vendors, any information contained in this e-mail or any
> attachments is subject to the terms and conditions in any governing
> contract. If you have received this e-mail in error, please immediately
> contact the sender and delete the e-mail.
>

Reply via email to