Word2Vec distributed?

Carsten Schnober Wed, 08 Jul 2015 00:45:45 -0700

Hi,
I've been experimenting with the Spark Word2Vec implementation in the
MLLib package.
It seems to me that only the preparatory steps are actually performed in
a distributed way, i.e. stages 0-2 that prepare the data. In stage 3
(mapPartitionsWithIndex at Word2Vec.scala:312), only one node seems to
be working, using one CPU.


I suppose this is related to the discussion in [1], essentially stating
that the original algorithm allows for multi-threading, but not for
distributed computation due to frequent internal communication.

To my understanding, this issue has not been fully resolved in Spark,
has it? I just wonder whether I am interpreting the current situation
correctly.

Thanks!
Carsten

[1] https://issues.apache.org/jira/browse/SPARK-2510

-- 
Carsten Schnober
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP) Lab
FB 20 / Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
[email protected]
www.ukp.tu-darmstadt.de

Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
(AIPHES): www.aiphes.tu-darmstadt.de
PhD program: Knowledge Discovery in Scientific Literature (KDSL)
www.kdsl.tu-darmstadt.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Word2Vec distributed?

Reply via email to