Can you save it to parquet with the vector in one field? Sent from my iPhone
> On 15 Jan, 2016, at 7:33 pm, Andy Davidson <a...@santacruzintegration.com> > wrote: > > Are you using 1.6.0 or an older version? > > I think I remember something in 1.5.1 saying save was not implemented in > python. > > > The current doc does not say anything about save() > http://spark.apache.org/docs/latest/mllib-feature-extraction.html#tf-idf > > http://spark.apache.org/docs/latest/ml-guide.html#saving-and-loading-pipelines > "Often times it is worth it to save a model or a pipeline to disk for later > use. In Spark 1.6, a model import/export functionality was added to the > Pipeline API. Most basic transformers are supported as well as some of the > more basic ML models. Please refer to the algorithm’s API documentation to > see if saving and loading is supported." > > andy > > > > > From: Asim Jalis <asimja...@gmail.com> > Date: Friday, January 15, 2016 at 4:02 PM > To: "user @spark" <user@spark.apache.org> > Subject: How To Save TF-IDF Model In PySpark > > Hi, > > I am trying to save a TF-IDF model in PySpark. Looks like this is not > supported. > > Using `model.save()` causes: > > AttributeError: 'IDFModel' object has no attribute 'save' > > Using `pickle` causes: > > TypeError: can't pickle lock objects > > Does anyone have suggestions > > Thanks! > > Asim > > Here is the full repro. Start pyspark shell and then run this code in > it. > > ``` > # Imports > from pyspark import SparkContext > from pyspark.mllib.feature import HashingTF > > from pyspark.mllib.regression import LabeledPoint > from pyspark.mllib.regression import Vectors > from pyspark.mllib.feature import IDF > > # Create some data > n = 4 > freqs = [ > Vectors.sparse(n, (1, 3), (1.0, 2.0)), > Vectors.dense([0.0, 1.0, 2.0, 3.0]), > Vectors.sparse(n, [1], [1.0])] > data = sc.parallelize(freqs) > idf = IDF() > model = idf.fit(data) > tfidf = model.transform(data) > > # View > for r in tfidf.collect(): print(r) > > # Try to save it > model.save("foo.model") > > # Try to save it with Pickle > import pickle > pickle.dump(model, open("model.p", "wb")) > pickle.dumps(model) > ```