Are you using 1.6.0 or an older version? I think I remember something in 1.5.1 saying save was not implemented in python.
The current doc does not say anything about save() http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf http://spark.apache.org/docs/latest/ml-guide.html#saving-and-loading-pipelin es "Often times it is worth it to save a model or a pipeline to disk for later use. In Spark 1.6, a model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models. Please refer to the algorithm¹s API documentation to see if saving and loading is supported." andy From: Asim Jalis <asimja...@gmail.com> Date: Friday, January 15, 2016 at 4:02 PM To: "user @spark" <user@spark.apache.org> Subject: How To Save TF-IDF Model In PySpark > Hi, > > I am trying to save a TF-IDF model in PySpark. Looks like this is not > supported. > > Using `model.save()` causes: > > AttributeError: 'IDFModel' object has no attribute 'save' > > Using `pickle` causes: > > TypeError: can't pickle lock objects > > Does anyone have suggestions > > Thanks! > > Asim > > Here is the full repro. Start pyspark shell and then run this code in > it. > > ``` > # Imports > from pyspark import SparkContext > from pyspark.mllib.feature import HashingTF > > from pyspark.mllib.regression import LabeledPoint > from pyspark.mllib.regression import Vectors > from pyspark.mllib.feature import IDF > > # Create some data > n = 4 > freqs = [ > Vectors.sparse(n, (1, 3), (1.0, 2.0)), > Vectors.dense([0.0, 1.0, 2.0, 3.0]), > Vectors.sparse(n, [1], [1.0])] > data = sc.parallelize(freqs) > idf = IDF() > model = idf.fit(data) > tfidf = model.transform(data) > > # View > for r in tfidf.collect(): print(r) > > # Try to save it > model.save("foo.model") > > # Try to save it with Pickle > import pickle > pickle.dump(model, open("model.p", "wb")) > pickle.dumps(model) > ```