, May 19, 2015 1:25 PM
To: Shilad Sen
Cc: user
Subject: Re: Word2Vec with billion-word corpora
With vocabulary size 4M and 400 vector size, you need 400 * 4M = 16B floats to
store the model. That is 64GB. We store the model on the driver node in the
current implementation. So I don't think it
With vocabulary size 4M and 400 vector size, you need 400 * 4M = 16B
floats to store the model. That is 64GB. We store the model on the
driver node in the current implementation. So I don't think it would
work. You might try increasing the minCount to decrease the vocabulary
size and reduce the vec