Thank you, Trevor! You have shared very valuable points; I will consider them.
So I think, I should create finally ticket at Flink’s JIRA, at least for Flink's GPU support and move the related discussion there? I will contact to Suneel regarding DL4J, thanks! пт, 10 февр. 2017 г. в 17:44, Trevor Grant <trevor.d.gr...@gmail.com>: > Also RE: DL4J integration. > > Suneel had done some work on this a while back, and ran into issues. You > might want to chat with him about the pitfalls and 'gotchyas' there. > > > > Trevor Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > http://trevorgrant.org > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant <trevor.d.gr...@gmail.com> > wrote: > > > Sorry for chiming in late. > > > > GPUs on Flink. Till raised a good point- you need to be able to fall > back > > to non-GPU resources if they aren't available. > > > > Fun fact: this has already been developed for Flink vis-a-vis the Apache > > Mahout project. > > > > In short- Mahout exposes a number of tensor functions (vector %*% matrix, > > matrix %*% matrix, etc). If compiled for GPU support, those operations > are > > completed via GPU- and if no GPUs are in fact available, Mahout math > falls > > back to CPUs (and finally back to the JVM). > > > > How this should work is Flink takes care of shipping data around the > > cluster, and when data arrives at the local node- is dumped out to GPU > for > > calculation, loaded back up and shipped back around cluster. In > practice, > > the lack of a persist method for intermediate results makes this > > troublesome (not because of GPUs but for calculating any sort of complex > > algorithm we expect to be able to cache intermediate results). > > > > +1 to FLINK-1730 > > > > Everything in Mahout is modular- distributed engine > > (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA / > > Write-your-own), algorithms, etc. > > > > So to sum up, you're noting the redundancy between ML packages in terms > of > > algorithms- I would recommend checking out Mahout before rolling your own > > GPU integration (else risk redundantly integrating GPUs). If nothing > else- > > it should give you some valuable insight regarding design considerations. > > Also FYI the goal of the Apache Mahout project is to address that problem > > precisely- implement an algorithm once in a mathematically expressive > DSL, > > which is abstracted above the engine so the same code easily ports > between > > engines / native solvers (i.e. CPU/GPU). > > > > https://github.com/apache/mahout/tree/master/viennacl-omp > > https://github.com/apache/mahout/tree/master/viennacl > > > > Best, > > tg > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri <katherinm...@gmail.com> > > wrote: > > > >> Thank you Felix, for provided information. > >> > >> Currently I analyze the provided integration of Flink with SystemML. > >> > >> And also gather the information for the ticket FLINK-1730 > >> <https://issues.apache.org/jira/browse/FLINK-1730>, maybe we will take > it > >> to work, to unlock SystemML/Flink integration. > >> > >> > >> > >> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz <neut...@googlemail.com.invali > >> d>: > >> > >> > Hi Kate, > >> > > >> > 1) - Broadcast: > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+ > >> Only+send+data+to+each+taskmanager+once+for+broadcasts > >> > - Caching: https://issues.apache.org/jira/browse/FLINK-1730 > >> > > >> > 2) I have no idea about the GPU implementation. The SystemML mailing > >> list > >> > will probably help you out their. > >> > > >> > Best regards, > >> > Felix > >> > > >> > 2017-02-08 14:33 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: > >> > > >> > > Thank you Felix, for your point, it is quite interesting. > >> > > > >> > > I will take a look at the code, of the provided Flink integration. > >> > > > >> > > 1) You have these problems with Flink: >>we realized that the > lack > >> of > >> > a > >> > > caching operator and a broadcast issue highly effects the > performance, > >> > have > >> > > you already asked about this the community? In case yes: please > >> provide > >> > the > >> > > reference to the ticket or the topic of letter. > >> > > > >> > > 2) You have said, that SystemML provides GPU support. I have seen > >> > > SystemML’s source code and would like to ask: why you have decided > to > >> > > implement your own integration with cuda? Did you try to consider > >> ND4J, > >> > or > >> > > because it is younger, you support your own implementation? > >> > > > >> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz <neut...@googlemail.com > >: > >> > > > >> > > > Hi Katherin, > >> > > > > >> > > > we are also working in a similar direction. We implemented a > >> prototype > >> > to > >> > > > integrate with SystemML: > >> > > > https://github.com/apache/incubator-systemml/pull/119 > >> > > > SystemML provides many different matrix formats, operations, GPU > >> > support > >> > > > and a couple of DL algorithms. Unfortunately, we realized that the > >> lack > >> > > of > >> > > > a caching operator and a broadcast issue highly effects the > >> performance > >> > > > (e.g. compared to Spark). At the moment I am trying to tackle the > >> > > broadcast > >> > > > issue. But caching is still a problem for us. > >> > > > > >> > > > Best regards, > >> > > > Felix > >> > > > > >> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: > >> > > > > >> > > > > Thank you, Till. > >> > > > > > >> > > > > 1) Regarding ND4J, I didn’t know about such a pity and > >> critical > >> > > > > restriction of it -> lack of sparsity optimizations, and you are > >> > right: > >> > > > > this issue is still actual for them. I saw that Flink uses > Breeze, > >> > but > >> > > I > >> > > > > thought its usage caused by some historical reasons. > >> > > > > > >> > > > > 2) Regarding integration with DL4J, I have read the source > >> code > >> > of > >> > > > > DL4J/Spark integration, that’s why I have declined my idea of > >> reuse > >> > of > >> > > > > their word2vec implementation for now, for example. I can > perform > >> > > deeper > >> > > > > investigation of this topic, if it required. > >> > > > > > >> > > > > > >> > > > > > >> > > > > So I feel that we have the following picture: > >> > > > > > >> > > > > 1) DL integration investigation, could be part of Apache > >> Bahir. > >> > I > >> > > > can > >> > > > > perform futher investigation of this topic, but I thik we need > >> some > >> > > > > separated ticket for this to track this activity. > >> > > > > > >> > > > > 2) GPU support, required for DL is interesting, but > requires > >> > ND4J > >> > > > for > >> > > > > example. > >> > > > > > >> > > > > 3) ND4J couldn’t be incorporated because it doesn’t support > >> > > sparsity > >> > > > > <https://deeplearning4j.org/roadmap.html> [1]. > >> > > > > > >> > > > > Regarding ND4J is this the single blocker for incorporation of > it > >> or > >> > > may > >> > > > be > >> > > > > some others known? > >> > > > > > >> > > > > > >> > > > > [1] https://deeplearning4j.org/roadmap.html > >> > > > > > >> > > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann < > trohrm...@apache.org > >> >: > >> > > > > > >> > > > > Thanks for initiating this discussion Katherin. I think you're > >> right > >> > > that > >> > > > > in general it does not make sense to reinvent the wheel over and > >> over > >> > > > > again. Especially if you only have limited resources at hand. So > >> if > >> > we > >> > > > > could integrate Flink with some existing library that would be > >> great. > >> > > > > > >> > > > > In the past, however, we couldn't find a good library which > >> provided > >> > > > enough > >> > > > > freedom to integrate it with Flink. Especially if you want to > have > >> > > > > distributed and somewhat high-performance implementations of ML > >> > > > algorithms > >> > > > > you would have to take Flink's execution model (capabilities as > >> well > >> > as > >> > > > > limitations) into account. That is mainly the reason why we > >> started > >> > > > > implementing some of the algorithms "natively" on Flink. > >> > > > > > >> > > > > If I remember correctly, then the problem with ND4J was and > still > >> is > >> > > that > >> > > > > it does not support sparse matrices which was a requirement from > >> our > >> > > > side. > >> > > > > As far as I know, it is quite common that you have sparse data > >> > > structures > >> > > > > when dealing with large scale problems. That's why we built our > >> own > >> > > > > abstraction which can have different implementations. Currently, > >> the > >> > > > > default implementation uses Breeze. > >> > > > > > >> > > > > I think the support for GPU based operations and the actual > >> resource > >> > > > > management are two orthogonal things. The implementation would > >> have > >> > to > >> > > > work > >> > > > > with no GPUs available anyway. If the system detects that GPUs > are > >> > > > > available, then ideally it would exploit them. Thus, we could > add > >> > this > >> > > > > feature later and maybe integrate it with FLINK-5131 [1]. > >> > > > > > >> > > > > Concerning the integration with DL4J I think that Theo's > proposal > >> to > >> > do > >> > > > it > >> > > > > in a separate repository (maybe as part of Apache Bahir) is a > good > >> > > idea. > >> > > > > We're currently thinking about outsourcing some of Flink's > >> libraries > >> > > into > >> > > > > sub projects. This could also be an option for the DL4J > >> integration > >> > > then. > >> > > > > In general I think it should be feasible to run DL4J on Flink > >> given > >> > > that > >> > > > it > >> > > > > also runs on Spark. Have you already looked at it closer? > >> > > > > > >> > > > > [1] https://issues.apache.org/jira/browse/FLINK-5131 > >> > > > > > >> > > > > Cheers, > >> > > > > Till > >> > > > > > >> > > > > On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri < > >> > katherinm...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > > > Thank you Theodore, for your reply. > >> > > > > > > >> > > > > > 1) Regarding GPU, your point is clear and I agree with it, > >> ND4J > >> > > > looks > >> > > > > > appropriate. But, my current understanding is that, we also > >> need to > >> > > > cover > >> > > > > > some resource management questions -> when we need to provide > >> GPU > >> > > > support > >> > > > > > we also need to manage it like resource. For example, Mesos > has > >> > > already > >> > > > > > supported GPU like resource item: Initial support for GPU > >> > resources. > >> > > > > > < > >> > https://issues.apache.org/jira/browse/MESOS-4424?jql=text%20~%20GPU > >> > > > > >> > > > > > Flink > >> > > > > > uses Mesos as cluster manager, and this means that this > feature > >> of > >> > > > Mesos > >> > > > > > could be reused. Also memory managing questions in Flink > >> regarding > >> > > GPU > >> > > > > > should be clarified. > >> > > > > > > >> > > > > > 2) Regarding integration with DL4J: what stops us to > >> initialize > >> > > > ticket > >> > > > > > and start the discussion around this topic? We need some user > >> story > >> > > or > >> > > > > the > >> > > > > > community is not sure that DL is really helpful? Why the > >> discussion > >> > > > with > >> > > > > > Adam > >> > > > > > Gibson just finished with no implementation of any idea? What > >> > > concerns > >> > > > do > >> > > > > > we have? > >> > > > > > > >> > > > > > пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis < > >> > > > > > theodoros.vasilou...@gmail.com>: > >> > > > > > > >> > > > > > > Hell all, > >> > > > > > > > >> > > > > > > This is point that has come up in the past: Given the > >> multitude > >> > of > >> > > ML > >> > > > > > > libraries out there, should we have native implementations > in > >> > > FlinkML > >> > > > > or > >> > > > > > > try to integrate other libraries instead? > >> > > > > > > > >> > > > > > > We haven't managed to reach a consensus on this before. My > >> > opinion > >> > > is > >> > > > > > that > >> > > > > > > there is definitely value in having ML algorithms written > >> > natively > >> > > in > >> > > > > > > Flink, both for performance optimization, > >> > > > > > > but more importantly for engineering simplicity, we don't > >> want to > >> > > > force > >> > > > > > > users to use yet another piece of software to run their ML > >> algos > >> > > (at > >> > > > > > least > >> > > > > > > for a basic set of algorithms). > >> > > > > > > > >> > > > > > > We have in the past discussed integrations with DL4J > >> > (particularly > >> > > > > ND4J) > >> > > > > > > with Adam Gibson, the core developer of the library, but we > >> never > >> > > got > >> > > > > > > around to implementing anything. > >> > > > > > > > >> > > > > > > Whether it makes sense to have an integration with DL4J as > >> part > >> > of > >> > > > the > >> > > > > > > Flink distribution would be up for discussion. I would > >> suggest to > >> > > > make > >> > > > > it > >> > > > > > > an independent repo to allow for > >> > > > > > > faster dev/release cycles, and because it wouldn't be > directly > >> > > > related > >> > > > > to > >> > > > > > > the core of Flink so it would add extra reviewing burden to > an > >> > > > already > >> > > > > > > overloaded group of committers. > >> > > > > > > > >> > > > > > > Natively supporting GPU calculations in Flink would be much > >> > better > >> > > > > > achieved > >> > > > > > > through a library like ND4J, the engineering burden would be > >> too > >> > > much > >> > > > > > > otherwise. > >> > > > > > > > >> > > > > > > Regards, > >> > > > > > > Theodore > >> > > > > > > > >> > > > > > > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri < > >> > > > katherinm...@gmail.com> > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > Hello, guys. > >> > > > > > > > > >> > > > > > > > Theodore, last week I started the review of the PR: > >> > > > > > > > https://github.com/apache/flink/pull/2735 related to > >> *word2Vec > >> > > for > >> > > > > > > Flink*. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > During this review I have asked myself: why do we need to > >> > > implement > >> > > > > > such > >> > > > > > > a > >> > > > > > > > very popular algorithm like *word2vec one more time*, when > >> > there > >> > > is > >> > > > > > > already > >> > > > > > > > available implementation in java provided by > >> > deeplearning4j.org > >> > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J -> > >> Apache > >> > 2 > >> > > > > > > licence). > >> > > > > > > > This library tries to promote itself, there is a hype > >> around it > >> > > in > >> > > > ML > >> > > > > > > > sphere, and it was integrated with Apache Spark, to > provide > >> > > > scalable > >> > > > > > > > deeplearning calculations. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > *That's why I thought: could we integrate with this > library > >> or > >> > > not > >> > > > > also > >> > > > > > > and > >> > > > > > > > Flink? * > >> > > > > > > > > >> > > > > > > > 1) Personally I think, providing support and deployment of > >> > > > > > > > *Deeplearning(DL) > >> > > > > > > > algorithms/models in Flink* is promising and attractive > >> > feature, > >> > > > > > because: > >> > > > > > > > > >> > > > > > > > a) during last two years DL proved its efficiency and > >> these > >> > > > > > > algorithms > >> > > > > > > > used in many applications. For example *Spotify *uses DL > >> based > >> > > > > > algorithms > >> > > > > > > > for music content extraction: Recommending music on > Spotify > >> > with > >> > > > deep > >> > > > > > > > learning AUGUST 05, 2014 > >> > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> > for > >> > > their > >> > > > > > music > >> > > > > > > > recommendations. Developers need to scale up DL manually, > >> that > >> > > > causes > >> > > > > a > >> > > > > > > lot > >> > > > > > > > of work, so that’s why such platforms like Flink should > >> support > >> > > > these > >> > > > > > > > models deployment. > >> > > > > > > > > >> > > > > > > > b) Here is presented the scope of Deeplearning usage > >> cases > >> > > > > > > > <https://deeplearning4j.org/use_cases>, so many of this > >> > > scenarios > >> > > > > > > related > >> > > > > > > > to scenarios, that could be supported on Flink. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > 2) But DL uncover such questions like: > >> > > > > > > > > >> > > > > > > > a) scale up calculations over machines > >> > > > > > > > > >> > > > > > > > b) perform these calculations both over CPU and GPU. > >> GPU is > >> > > > > > required > >> > > > > > > to > >> > > > > > > > train big DL models, otherwise learning process could have > >> very > >> > > > slow > >> > > > > > > > convergence. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > 3) I have checked this DL4J library, which already have > >> reach > >> > > > support > >> > > > > > of > >> > > > > > > > many attractive DL models like: Recurrent Networks and > >> LSTMs, > >> > > > > > > Convolutional > >> > > > > > > > Networks (CNN), Restricted Boltzmann Machines (RBM) and > >> others. > >> > > So > >> > > > we > >> > > > > > > won’t > >> > > > > > > > need to implement them independently, but only provide the > >> > > ability > >> > > > of > >> > > > > > > > execution of this models over Flink cluster, the quite > >> similar > >> > > way > >> > > > > like > >> > > > > > > it > >> > > > > > > > was integrated with Apache Spark. > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > Because of all of this I propose: > >> > > > > > > > > >> > > > > > > > 1) To create new ticket in Flink’s JIRA for integration > >> of > >> > > Flink > >> > > > > > with > >> > > > > > > > DL4J and decide on which side this integration should be > >> > > > implemented. > >> > > > > > > > > >> > > > > > > > 2) Support natively GPU resources in Flink and allow > >> > > > calculations > >> > > > > > over > >> > > > > > > > them, like that is described in this publication > >> > > > > > > > https://www.oreilly.com/learning/accelerating-spark- > >> > > > > > workloads-using-gpus > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > *Regarding original issue Implement Word2Vec > >> > > > > > > > <https://issues.apache.org/jira/browse/FLINK-2094>in > Flink, > >> > *I > >> > > > have > >> > > > > > > > investigated its implementation in DL4J and that > >> > implementation > >> > > of > >> > > > > > > > integration DL4J with Apache Spark, and got several > points: > >> > > > > > > > > >> > > > > > > > It seems that idea of building of our own implementation > of > >> > > > word2vec > >> > > > > in > >> > > > > > > > Flink not such a bad solution, because: This DL4J was > >> forced to > >> > > > > > > reimplement > >> > > > > > > > its original word2Vec over Spark. I have checked the > >> > integration > >> > > of > >> > > > > > DL4J > >> > > > > > > > with Spark, and found that it is too strongly coupled with > >> > Spark > >> > > > API, > >> > > > > > so > >> > > > > > > > that it is impossible just to take some DL4J API and reuse > >> it, > >> > > > > instead > >> > > > > > we > >> > > > > > > > need to implement independent integration for Flink. > >> > > > > > > > > >> > > > > > > > *That’s why we simply finish implementation of current PR > >> > > > > > > > **independently **from > >> > > > > > > > integration to DL4J.* > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > Could you please provide your opinion regarding my > questions > >> > and > >> > > > > > points, > >> > > > > > > > what do you think about them? > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > пн, 6 февр. 2017 г. в 12:51, Katherin Eri < > >> > > katherinm...@gmail.com > >> > > > >: > >> > > > > > > > > >> > > > > > > > > Sorry, guys I need to finish this letter first. > >> > > > > > > > > Full version of it will come shortly. > >> > > > > > > > > > >> > > > > > > > > пн, 6 февр. 2017 г. в 12:49, Katherin Eri < > >> > > > katherinm...@gmail.com > >> > > > > >: > >> > > > > > > > > > >> > > > > > > > > Hello, guys. > >> > > > > > > > > Theodore, last week I started the review of the PR: > >> > > > > > > > > https://github.com/apache/flink/pull/2735 related to > >> > *word2Vec > >> > > > for > >> > > > > > > > Flink*. > >> > > > > > > > > > >> > > > > > > > > During this review I have asked myself: why do we need > to > >> > > > implement > >> > > > > > > such > >> > > > > > > > a > >> > > > > > > > > very popular algorithm like *word2vec one more time*, > when > >> > > there > >> > > > is > >> > > > > > > > > already availabe implementation in java provided by > >> > > > > > deeplearning4j.org > >> > > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J -> > >> > Apache > >> > > 2 > >> > > > > > > > licence). > >> > > > > > > > > This library tries to promote it self, there is a hype > >> around > >> > > it > >> > > > in > >> > > > > > ML > >> > > > > > > > > sphere, and it was integrated with Apache Spark, to > >> provide > >> > > > > scalable > >> > > > > > > > > deeplearning calculations. > >> > > > > > > > > That's why I thought: could we integrate with this > >> library or > >> > > not > >> > > > > > also > >> > > > > > > > and > >> > > > > > > > > Flink? > >> > > > > > > > > 1) Personally I think, providing support and deployment > of > >> > > > > > Deeplearning > >> > > > > > > > > algorithms/models in Flink is promising and attractive > >> > feature, > >> > > > > > > because: > >> > > > > > > > > a) during last two years deeplearning proved its > >> > efficiency > >> > > > and > >> > > > > > > this > >> > > > > > > > > algorithms used in many applications. For example > *Spotify > >> > > *uses > >> > > > DL > >> > > > > > > based > >> > > > > > > > > algorithms for music content extraction: Recommending > >> music > >> > on > >> > > > > > Spotify > >> > > > > > > > > with deep learning AUGUST 05, 2014 > >> > > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> > >> for > >> > > > their > >> > > > > > > music > >> > > > > > > > > recommendations. Doing this natively scalable is very > >> > > attractive. > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > I have investigated that implementation of integration > >> DL4J > >> > > with > >> > > > > > Apache > >> > > > > > > > > Spark, and got several points: > >> > > > > > > > > > >> > > > > > > > > 1) It seems that idea of building of our own > >> implementation > >> > of > >> > > > > > word2vec > >> > > > > > > > > not such a bad solution, because the integration of DL4J > >> with > >> > > > Spark > >> > > > > > is > >> > > > > > > > too > >> > > > > > > > > strongly coupled with Saprk API and it will take time > from > >> > the > >> > > > side > >> > > > > > of > >> > > > > > > > DL4J > >> > > > > > > > > to adopt this integration to Flink. Also I have expected > >> that > >> > > we > >> > > > > will > >> > > > > > > be > >> > > > > > > > > able to call just some API, it is not such thing. > >> > > > > > > > > 2) > >> > > > > > > > > > >> > > > > > > > > https://deeplearning4j.org/use_cases > >> > > > > > > > > https://www.analyticsvidhya.com/blog/2017/01/t-sne- > >> > > > > > > > implementation-r-python/ > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann < > >> > > trohrm...@apache.org > >> > > > >: > >> > > > > > > > > > >> > > > > > > > > Hi Katherin, > >> > > > > > > > > > >> > > > > > > > > welcome to the Flink community. Always great to see new > >> > people > >> > > > > > joining > >> > > > > > > > the > >> > > > > > > > > community :-) > >> > > > > > > > > > >> > > > > > > > > Cheers, > >> > > > > > > > > Till > >> > > > > > > > > > >> > > > > > > > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko < > >> > > > > > > > katherinm...@gmail.com> > >> > > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > > > ok, I've got it. > >> > > > > > > > > > I will take a look at > >> > > > https://github.com/apache/flink/pull/2735 > >> > > > > . > >> > > > > > > > > > > >> > > > > > > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis < > >> > > > > > > > > > theodoros.vasilou...@gmail.com>: > >> > > > > > > > > > > >> > > > > > > > > > > Hello Katherin, > >> > > > > > > > > > > > >> > > > > > > > > > > Welcome to the Flink community! > >> > > > > > > > > > > > >> > > > > > > > > > > The ML component definitely needs a lot of work you > >> are > >> > > > > correct, > >> > > > > > we > >> > > > > > > > are > >> > > > > > > > > > > facing similar problems to CEP, which we'll > hopefully > >> > > resolve > >> > > > > > with > >> > > > > > > > the > >> > > > > > > > > > > restructuring Stephan has mentioned in that thread. > >> > > > > > > > > > > > >> > > > > > > > > > > If you'd like to help out with PRs we have many > open, > >> > one I > >> > > > > have > >> > > > > > > > > started > >> > > > > > > > > > > reviewing but got side-tracked is the Word2Vec one > >> [1]. > >> > > > > > > > > > > > >> > > > > > > > > > > Best, > >> > > > > > > > > > > Theodore > >> > > > > > > > > > > > >> > > > > > > > > > > [1] https://github.com/apache/flink/pull/2735 > >> > > > > > > > > > > > >> > > > > > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske < > >> > > > > > fhue...@gmail.com > >> > > > > > > > > >> > > > > > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi Katherin, > >> > > > > > > > > > > > > >> > > > > > > > > > > > welcome to the Flink community! > >> > > > > > > > > > > > Help with reviewing PRs is always very welcome > and a > >> > > great > >> > > > > way > >> > > > > > to > >> > > > > > > > > > > > contribute. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Best, Fabian > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko < > >> > > > > > > > katherinm...@gmail.com > >> > > > > > > > > >: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Thank you, Timo. > >> > > > > > > > > > > > > I have started the analysis of the topic. > >> > > > > > > > > > > > > And if it necessary, I will try to perform the > >> review > >> > > of > >> > > > > > other > >> > > > > > > > > pulls) > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther < > >> > > > > > twal...@apache.org > >> > > > > > > >: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Katherin, > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > great to hear that you would like to > contribute! > >> > > > Welcome! > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > I gave you contributor permissions. You can > now > >> > > assign > >> > > > > > issues > >> > > > > > > > to > >> > > > > > > > > > > > > > yourself. I assigned FLINK-1750 to you. > >> > > > > > > > > > > > > > Right now there are many open ML pull > requests, > >> you > >> > > are > >> > > > > > very > >> > > > > > > > > > welcome > >> > > > > > > > > > > to > >> > > > > > > > > > > > > > review the code of others, too. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Timo > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > >> > > > > > > > > > > > > > > Hello, All! > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year > >> > > > enterprise > >> > > > > > > > > > experience, > >> > > > > > > > > > > > > also > >> > > > > > > > > > > > > > I > >> > > > > > > > > > > > > > > have some expertise with scala (half of the > >> > year). > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Last 2 years I have participated in several > >> > BigData > >> > > > > > > projects > >> > > > > > > > > that > >> > > > > > > > > > > > were > >> > > > > > > > > > > > > > > related to Machine Learning (Time series > >> > analysis, > >> > > > > > > > Recommender > >> > > > > > > > > > > > systems, > >> > > > > > > > > > > > > > > Social networking) and ETL. I have > experience > >> > with > >> > > > > > Hadoop, > >> > > > > > > > > Apache > >> > > > > > > > > > > > Spark > >> > > > > > > > > > > > > > and > >> > > > > > > > > > > > > > > Hive. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink > >> > project > >> > > > > > requires > >> > > > > > > > > some > >> > > > > > > > > > > work > >> > > > > > > > > > > > > in > >> > > > > > > > > > > > > > > this area, so that’s why I would like to > join > >> > Flink > >> > > > and > >> > > > > > ask > >> > > > > > > > me > >> > > > > > > > > to > >> > > > > > > > > > > > grant > >> > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > assignment of the ticket > >> > > > > > > > > > > > > > https://issues.apache.org/jira > >> /browse/FLINK-1750 > >> > > > > > > > > > > > > > > to me. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >