Re: How To Save TF-IDF Model In PySpark

Andy Davidson Fri, 15 Jan 2016 16:34:32 -0800

Are you using 1.6.0 or an older version?

I think I remember something in 1.5.1 saying save was not implemented in
python.



The current doc does not say anything about save()
http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf

http://spark.apache.org/docs/latest/ml-guide.html#saving-and-loading-pipelin
es
"Often times it is worth it to save a model or a pipeline to disk for later
use. In Spark 1.6, a model import/export functionality was added to the
Pipeline API. Most basic transformers are supported as well as some of the
more basic ML models. Please refer to the algorithm¹s API documentation to
see if saving and loading is supported."

andy




From:  Asim Jalis <asimja...@gmail.com>
Date:  Friday, January 15, 2016 at 4:02 PM
To:  "user @spark" <user@spark.apache.org>
Subject:  How To Save TF-IDF Model In PySpark

> Hi,
> 
> I am trying to save a TF-IDF model in PySpark. Looks like this is not
> supported. 
> 
> Using `model.save()` causes:
> 
> AttributeError: 'IDFModel' object has no attribute 'save'
> 
> Using `pickle` causes:
> 
> TypeError: can't pickle lock objects
> 
> Does anyone have suggestions
> 
> Thanks!
> 
> Asim
> 
> Here is the full repro. Start pyspark shell and then run this code in
> it.
> 
> ```
> # Imports
> from pyspark import SparkContext
> from pyspark.mllib.feature import HashingTF
> 
> from pyspark.mllib.regression import LabeledPoint
> from pyspark.mllib.regression import Vectors
> from pyspark.mllib.feature import IDF
> 
> # Create some data
> n = 4
> freqs = [
>     Vectors.sparse(n, (1, 3), (1.0, 2.0)),
>     Vectors.dense([0.0, 1.0, 2.0, 3.0]),
>     Vectors.sparse(n, [1], [1.0])]
> data = sc.parallelize(freqs)
> idf = IDF()
> model = idf.fit(data)
> tfidf = model.transform(data)
> 
> # View
> for r in tfidf.collect(): print(r)
> 
> # Try to save it
> model.save("foo.model")
> 
> # Try to save it with Pickle
> import pickle
> pickle.dump(model, open("model.p", "wb"))
> pickle.dumps(model)
> ```

Re: How To Save TF-IDF Model In PySpark

Reply via email to