The C implementation of Word2Vec updates the model using multi-threads
without locking. It is hard to implement it in a distributed way. In
the MLlib implementation, each work holds the entire model in memory
and output the part of model that gets updated. The driver still need
to collect and aggre
I was wondering if there was any chance of getting a more distributed word2vec
implementation. I seem to be running out of memory from big local data
structures such as
val syn1Global = new Array[Float](vocabSize * vectorSize)
Is there anyway chance of getting a version where these are all pu