[ 
https://issues.apache.org/jira/browse/FLINK-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952951#comment-15952951
 ] 

Lev Konstantinovskiy commented on FLINK-2094:
---------------------------------------------

Apologies, that feature in Gensim has not been named correctly. It's not really 
about online, but about vocabulary-expansion. It also has not been evaluated 
throughly yet. There has been no research on how good are the vectors for the 
new words seen 10 times compared to words seen 1000 times in initial training.  
Even without the vocabulary expansion, word2vec is dependent on the order in 
which it sees documents due to the learning rate scaling. So having it learn 
"truly online", without knowing the size of the dataset, would be interesting 
new territory.

> Implement Word2Vec
> ------------------
>
>                 Key: FLINK-2094
>                 URL: https://issues.apache.org/jira/browse/FLINK-2094
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Nikolaas Steenbergen
>            Assignee: Nikolaas Steenbergen
>            Priority: Minor
>              Labels: ML
>
> implement Word2Vec
> http://arxiv.org/pdf/1402.3722v1.pdf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to