Sorry, guys I need to finish this letter first. Full version of it will come shortly.
пн, 6 февр. 2017 г. в 12:49, Katherin Eri <katherinm...@gmail.com>: > Hello, guys. > Theodore, last week I started the review of the PR: > https://github.com/apache/flink/pull/2735 related to *word2Vec for Flink*. > > During this review I have asked myself: why do we need to implement such a > very popular algorithm like *word2vec one more time*, when there is > already availabe implementation in java provided by deeplearning4j.org > <https://deeplearning4j.org/word2vec> library (DL4J -> Apache 2 licence). > This library tries to promote it self, there is a hype around it in ML > sphere, and it was integrated with Apache Spark, to provide scalable > deeplearning calculations. > That's why I thought: could we integrate with this library or not also and > Flink? > 1) Personally I think, providing support and deployment of Deeplearning > algorithms/models in Flink is promising and attractive feature, because: > a) during last two years deeplearning proved its efficiency and this > algorithms used in many applications. For example *Spotify *uses DL based > algorithms for music content extraction: Recommending music on Spotify > with deep learning AUGUST 05, 2014 > <http://benanne.github.io/2014/08/05/spotify-cnns.html> for their music > recommendations. Doing this natively scalable is very attractive. > > > I have investigated that implementation of integration DL4J with Apache > Spark, and got several points: > > 1) It seems that idea of building of our own implementation of word2vec > not such a bad solution, because the integration of DL4J with Spark is too > strongly coupled with Saprk API and it will take time from the side of DL4J > to adopt this integration to Flink. Also I have expected that we will be > able to call just some API, it is not such thing. > 2) > > https://deeplearning4j.org/use_cases > https://www.analyticsvidhya.com/blog/2017/01/t-sne-implementation-r-python/ > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann <trohrm...@apache.org>: > > Hi Katherin, > > welcome to the Flink community. Always great to see new people joining the > community :-) > > Cheers, > Till > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko <katherinm...@gmail.com> > wrote: > > > ok, I've got it. > > I will take a look at https://github.com/apache/flink/pull/2735. > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis < > > theodoros.vasilou...@gmail.com>: > > > > > Hello Katherin, > > > > > > Welcome to the Flink community! > > > > > > The ML component definitely needs a lot of work you are correct, we are > > > facing similar problems to CEP, which we'll hopefully resolve with the > > > restructuring Stephan has mentioned in that thread. > > > > > > If you'd like to help out with PRs we have many open, one I have > started > > > reviewing but got side-tracked is the Word2Vec one [1]. > > > > > > Best, > > > Theodore > > > > > > [1] https://github.com/apache/flink/pull/2735 > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske <fhue...@gmail.com> > > wrote: > > > > > > > Hi Katherin, > > > > > > > > welcome to the Flink community! > > > > Help with reviewing PRs is always very welcome and a great way to > > > > contribute. > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko <katherinm...@gmail.com > >: > > > > > > > > > Thank you, Timo. > > > > > I have started the analysis of the topic. > > > > > And if it necessary, I will try to perform the review of other > pulls) > > > > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther <twal...@apache.org>: > > > > > > > > > > > Hi Katherin, > > > > > > > > > > > > great to hear that you would like to contribute! Welcome! > > > > > > > > > > > > I gave you contributor permissions. You can now assign issues to > > > > > > yourself. I assigned FLINK-1750 to you. > > > > > > Right now there are many open ML pull requests, you are very > > welcome > > > to > > > > > > review the code of others, too. > > > > > > > > > > > > Timo > > > > > > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > > > > > > > Hello, All! > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise > > experience, > > > > > also > > > > > > I > > > > > > > have some expertise with scala (half of the year). > > > > > > > > > > > > > > Last 2 years I have participated in several BigData projects > that > > > > were > > > > > > > related to Machine Learning (Time series analysis, Recommender > > > > systems, > > > > > > > Social networking) and ETL. I have experience with Hadoop, > Apache > > > > Spark > > > > > > and > > > > > > > Hive. > > > > > > > > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink project requires > some > > > work > > > > > in > > > > > > > this area, so that’s why I would like to join Flink and ask me > to > > > > grant > > > > > > the > > > > > > > assignment of the ticket > > > > > > https://issues.apache.org/jira/browse/FLINK-1750 > > > > > > > to me. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >