[ https://issues.apache.org/jira/browse/FLINK-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952951#comment-15952951 ]
Lev Konstantinovskiy commented on FLINK-2094: --------------------------------------------- Apologies, that feature in Gensim has not been named correctly. It's not really about online, but about vocabulary-expansion. It also has not been evaluated throughly yet. There has been no research on how good are the vectors for the new words seen 10 times compared to words seen 1000 times in initial training. Even without the vocabulary expansion, word2vec is dependent on the order in which it sees documents due to the learning rate scaling. So having it learn "truly online", without knowing the size of the dataset, would be interesting new territory. > Implement Word2Vec > ------------------ > > Key: FLINK-2094 > URL: https://issues.apache.org/jira/browse/FLINK-2094 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Nikolaas Steenbergen > Assignee: Nikolaas Steenbergen > Priority: Minor > Labels: ML > > implement Word2Vec > http://arxiv.org/pdf/1402.3722v1.pdf -- This message was sent by Atlassian JIRA (v6.3.15#6346)