http://stackoverflow.com/questions/37723308/spark-ml-word2vec-serialization-issues
<http://stackoverflow.com/questions/37723308/spark-ml-word2vec-serialization-issues>
I recently refactored our Word2Vec code to move to DataFrame based ml
models, but I am having problem in serializing and loading the model
locally.
I am able to successfully:
1. Fit the dataframe and create the model.
2. Retrieve synonyms.
When I try to serialize the model locally, vectors are not serialized and
hence the size of the file is too small approx 2K for 10GB of data.
FileOutputStream fo = new FileOutputStream("/tmp/word2vec");
ObjectOutputStream so = new ObjectOutputStream(fo);
so.writeObject(word2VecModel);
so.flush();
so.close();
logger.info("Word2Vec model saved");
On loading the model and calling the findSynonyms() function results in
below exception:
java.lang.NullPointerException at
org.apache.spark.ml.feature.Word2VecModel.transform(Word2Vec.scala:224)
Is there a way to save the model locally ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-ML-Word2Vec-Serialization-Issues-tp27125.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]