Hi Kate, that's great news. This would help to boost ML on Flink a lot :)
Best regards, Felix 2017-02-13 14:09 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: > Hello guys, > > > > It seems that issue FLINK-1730 > <https://issues.apache.org/jira/browse/FLINK-1730> significantly impacts > integration of Flink with SystemML. > > They have checked several integrations, and Flink’s integration is slowest > <https://github.com/apache/incubator-systemml/pull/119# > issuecomment-222059794> > : > > - MR: LinregDS: 147s (2 jobs); LinregCG w/ 6 iterations: 361s (8 jobs) > w/ mmchain; 628s (14 jobs) w/o mmchain > - Spark: LinregDS: 71s (3 jobs); LinregCG w/ 6 iterations: 41s (8 jobs) > w/ mmchain; 48s (14 jobs) w/o mmchain > - Flink: LinregDS: 212s (3 jobs); LinregCG w/ 6 iterations: 1,047s (14 > jobs) w/o mmchain > > This fact is caused, as already Felix said, by two reasons: > > 1) FLINK-1730 <https://issues.apache.org/jira/browse/FLINK-1730> > > 2) FLINK-4175 <https://issues.apache.org/jira/browse/FLINK-4175> > > As far as FLINK-1730 is not assigned to anyone we would like to take this > ticket to work (my colleges could try to implement it). > > Further discussion of the topic related to FLINK-1730 I would like to > handle in appropriate ticket. > > > пт, 10 февр. 2017 г. в 19:57, Katherin Eri <katherinm...@gmail.com>: > > > I have created the ticket to discuss GPU related questions futher > > https://issues.apache.org/jira/browse/FLINK-5782 > > > > пт, 10 февр. 2017 г. в 18:16, Katherin Eri <katherinm...@gmail.com>: > > > > Thank you, Trevor! > > > > You have shared very valuable points; I will consider them. > > > > So I think, I should create finally ticket at Flink’s JIRA, at least for > > Flink's GPU support and move the related discussion there? > > > > I will contact to Suneel regarding DL4J, thanks! > > > > > > пт, 10 февр. 2017 г. в 17:44, Trevor Grant <trevor.d.gr...@gmail.com>: > > > > Also RE: DL4J integration. > > > > Suneel had done some work on this a while back, and ran into issues. You > > might want to chat with him about the pitfalls and 'gotchyas' there. > > > > > > > > Trevor Grant > > Data Scientist > > https://github.com/rawkintrevo > > http://stackexchange.com/users/3002022/rawkintrevo > > http://trevorgrant.org > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > On Fri, Feb 10, 2017 at 7:37 AM, Trevor Grant <trevor.d.gr...@gmail.com> > > wrote: > > > > > Sorry for chiming in late. > > > > > > GPUs on Flink. Till raised a good point- you need to be able to fall > > back > > > to non-GPU resources if they aren't available. > > > > > > Fun fact: this has already been developed for Flink vis-a-vis the > Apache > > > Mahout project. > > > > > > In short- Mahout exposes a number of tensor functions (vector %*% > matrix, > > > matrix %*% matrix, etc). If compiled for GPU support, those operations > > are > > > completed via GPU- and if no GPUs are in fact available, Mahout math > > falls > > > back to CPUs (and finally back to the JVM). > > > > > > How this should work is Flink takes care of shipping data around the > > > cluster, and when data arrives at the local node- is dumped out to GPU > > for > > > calculation, loaded back up and shipped back around cluster. In > > practice, > > > the lack of a persist method for intermediate results makes this > > > troublesome (not because of GPUs but for calculating any sort of > complex > > > algorithm we expect to be able to cache intermediate results). > > > > > > +1 to FLINK-1730 > > > > > > Everything in Mahout is modular- distributed engine > > > (Flink/Spark/Write-your-own), Native Solvers (OpenMP / ViennaCL / CUDA > / > > > Write-your-own), algorithms, etc. > > > > > > So to sum up, you're noting the redundancy between ML packages in terms > > of > > > algorithms- I would recommend checking out Mahout before rolling your > own > > > GPU integration (else risk redundantly integrating GPUs). If nothing > > else- > > > it should give you some valuable insight regarding design > considerations. > > > Also FYI the goal of the Apache Mahout project is to address that > problem > > > precisely- implement an algorithm once in a mathematically expressive > > DSL, > > > which is abstracted above the engine so the same code easily ports > > between > > > engines / native solvers (i.e. CPU/GPU). > > > > > > https://github.com/apache/mahout/tree/master/viennacl-omp > > > https://github.com/apache/mahout/tree/master/viennacl > > > > > > Best, > > > tg > > > > > > > > > Trevor Grant > > > Data Scientist > > > https://github.com/rawkintrevo > > > http://stackexchange.com/users/3002022/rawkintrevo > > > http://trevorgrant.org > > > > > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > > > > > > > On Fri, Feb 10, 2017 at 7:01 AM, Katherin Eri <katherinm...@gmail.com> > > > wrote: > > > > > >> Thank you Felix, for provided information. > > >> > > >> Currently I analyze the provided integration of Flink with SystemML. > > >> > > >> And also gather the information for the ticket FLINK-1730 > > >> <https://issues.apache.org/jira/browse/FLINK-1730>, maybe we will > take > > it > > >> to work, to unlock SystemML/Flink integration. > > >> > > >> > > >> > > >> чт, 9 февр. 2017 г. в 0:17, Felix Neutatz > <neut...@googlemail.com.invali > > >> d>: > > >> > > >> > Hi Kate, > > >> > > > >> > 1) - Broadcast: > > >> > > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-5%3A+ > > >> Only+send+data+to+each+taskmanager+once+for+broadcasts > > >> > - Caching: https://issues.apache.org/jira/browse/FLINK-1730 > > >> > > > >> > 2) I have no idea about the GPU implementation. The SystemML mailing > > >> list > > >> > will probably help you out their. > > >> > > > >> > Best regards, > > >> > Felix > > >> > > > >> > 2017-02-08 14:33 GMT+01:00 Katherin Eri <katherinm...@gmail.com>: > > >> > > > >> > > Thank you Felix, for your point, it is quite interesting. > > >> > > > > >> > > I will take a look at the code, of the provided Flink integration. > > >> > > > > >> > > 1) You have these problems with Flink: >>we realized that the > > lack > > >> of > > >> > a > > >> > > caching operator and a broadcast issue highly effects the > > performance, > > >> > have > > >> > > you already asked about this the community? In case yes: please > > >> provide > > >> > the > > >> > > reference to the ticket or the topic of letter. > > >> > > > > >> > > 2) You have said, that SystemML provides GPU support. I have > seen > > >> > > SystemML’s source code and would like to ask: why you have decided > > to > > >> > > implement your own integration with cuda? Did you try to consider > > >> ND4J, > > >> > or > > >> > > because it is younger, you support your own implementation? > > >> > > > > >> > > вт, 7 февр. 2017 г. в 18:35, Felix Neutatz < > neut...@googlemail.com > > >: > > >> > > > > >> > > > Hi Katherin, > > >> > > > > > >> > > > we are also working in a similar direction. We implemented a > > >> prototype > > >> > to > > >> > > > integrate with SystemML: > > >> > > > https://github.com/apache/incubator-systemml/pull/119 > > >> > > > SystemML provides many different matrix formats, operations, GPU > > >> > support > > >> > > > and a couple of DL algorithms. Unfortunately, we realized that > the > > >> lack > > >> > > of > > >> > > > a caching operator and a broadcast issue highly effects the > > >> performance > > >> > > > (e.g. compared to Spark). At the moment I am trying to tackle > the > > >> > > broadcast > > >> > > > issue. But caching is still a problem for us. > > >> > > > > > >> > > > Best regards, > > >> > > > Felix > > >> > > > > > >> > > > 2017-02-07 16:22 GMT+01:00 Katherin Eri <katherinm...@gmail.com > >: > > >> > > > > > >> > > > > Thank you, Till. > > >> > > > > > > >> > > > > 1) Regarding ND4J, I didn’t know about such a pity and > > >> critical > > >> > > > > restriction of it -> lack of sparsity optimizations, and you > are > > >> > right: > > >> > > > > this issue is still actual for them. I saw that Flink uses > > Breeze, > > >> > but > > >> > > I > > >> > > > > thought its usage caused by some historical reasons. > > >> > > > > > > >> > > > > 2) Regarding integration with DL4J, I have read the > source > > >> code > > >> > of > > >> > > > > DL4J/Spark integration, that’s why I have declined my idea of > > >> reuse > > >> > of > > >> > > > > their word2vec implementation for now, for example. I can > > perform > > >> > > deeper > > >> > > > > investigation of this topic, if it required. > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > So I feel that we have the following picture: > > >> > > > > > > >> > > > > 1) DL integration investigation, could be part of Apache > > >> Bahir. > > >> > I > > >> > > > can > > >> > > > > perform futher investigation of this topic, but I thik we need > > >> some > > >> > > > > separated ticket for this to track this activity. > > >> > > > > > > >> > > > > 2) GPU support, required for DL is interesting, but > > requires > > >> > ND4J > > >> > > > for > > >> > > > > example. > > >> > > > > > > >> > > > > 3) ND4J couldn’t be incorporated because it doesn’t > support > > >> > > sparsity > > >> > > > > <https://deeplearning4j.org/roadmap.html> [1]. > > >> > > > > > > >> > > > > Regarding ND4J is this the single blocker for incorporation of > > it > > >> or > > >> > > may > > >> > > > be > > >> > > > > some others known? > > >> > > > > > > >> > > > > > > >> > > > > [1] https://deeplearning4j.org/roadmap.html > > >> > > > > > > >> > > > > вт, 7 февр. 2017 г. в 16:26, Till Rohrmann < > > trohrm...@apache.org > > >> >: > > >> > > > > > > >> > > > > Thanks for initiating this discussion Katherin. I think you're > > >> right > > >> > > that > > >> > > > > in general it does not make sense to reinvent the wheel over > and > > >> over > > >> > > > > again. Especially if you only have limited resources at hand. > So > > >> if > > >> > we > > >> > > > > could integrate Flink with some existing library that would be > > >> great. > > >> > > > > > > >> > > > > In the past, however, we couldn't find a good library which > > >> provided > > >> > > > enough > > >> > > > > freedom to integrate it with Flink. Especially if you want to > > have > > >> > > > > distributed and somewhat high-performance implementations of > ML > > >> > > > algorithms > > >> > > > > you would have to take Flink's execution model (capabilities > as > > >> well > > >> > as > > >> > > > > limitations) into account. That is mainly the reason why we > > >> started > > >> > > > > implementing some of the algorithms "natively" on Flink. > > >> > > > > > > >> > > > > If I remember correctly, then the problem with ND4J was and > > still > > >> is > > >> > > that > > >> > > > > it does not support sparse matrices which was a requirement > from > > >> our > > >> > > > side. > > >> > > > > As far as I know, it is quite common that you have sparse data > > >> > > structures > > >> > > > > when dealing with large scale problems. That's why we built > our > > >> own > > >> > > > > abstraction which can have different implementations. > Currently, > > >> the > > >> > > > > default implementation uses Breeze. > > >> > > > > > > >> > > > > I think the support for GPU based operations and the actual > > >> resource > > >> > > > > management are two orthogonal things. The implementation would > > >> have > > >> > to > > >> > > > work > > >> > > > > with no GPUs available anyway. If the system detects that GPUs > > are > > >> > > > > available, then ideally it would exploit them. Thus, we could > > add > > >> > this > > >> > > > > feature later and maybe integrate it with FLINK-5131 [1]. > > >> > > > > > > >> > > > > Concerning the integration with DL4J I think that Theo's > > proposal > > >> to > > >> > do > > >> > > > it > > >> > > > > in a separate repository (maybe as part of Apache Bahir) is a > > good > > >> > > idea. > > >> > > > > We're currently thinking about outsourcing some of Flink's > > >> libraries > > >> > > into > > >> > > > > sub projects. This could also be an option for the DL4J > > >> integration > > >> > > then. > > >> > > > > In general I think it should be feasible to run DL4J on Flink > > >> given > > >> > > that > > >> > > > it > > >> > > > > also runs on Spark. Have you already looked at it closer? > > >> > > > > > > >> > > > > [1] https://issues.apache.org/jira/browse/FLINK-5131 > > >> > > > > > > >> > > > > Cheers, > > >> > > > > Till > > >> > > > > > > >> > > > > On Tue, Feb 7, 2017 at 11:47 AM, Katherin Eri < > > >> > katherinm...@gmail.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > > > Thank you Theodore, for your reply. > > >> > > > > > > > >> > > > > > 1) Regarding GPU, your point is clear and I agree with > it, > > >> ND4J > > >> > > > looks > > >> > > > > > appropriate. But, my current understanding is that, we also > > >> need to > > >> > > > cover > > >> > > > > > some resource management questions -> when we need to > provide > > >> GPU > > >> > > > support > > >> > > > > > we also need to manage it like resource. For example, Mesos > > has > > >> > > already > > >> > > > > > supported GPU like resource item: Initial support for GPU > > >> > resources. > > >> > > > > > < > > >> > https://issues.apache.org/jira/browse/MESOS-4424?jql=text%20~%20GPU > > >> > > > > > >> > > > > > Flink > > >> > > > > > uses Mesos as cluster manager, and this means that this > > feature > > >> of > > >> > > > Mesos > > >> > > > > > could be reused. Also memory managing questions in Flink > > >> regarding > > >> > > GPU > > >> > > > > > should be clarified. > > >> > > > > > > > >> > > > > > 2) Regarding integration with DL4J: what stops us to > > >> initialize > > >> > > > ticket > > >> > > > > > and start the discussion around this topic? We need some > user > > >> story > > >> > > or > > >> > > > > the > > >> > > > > > community is not sure that DL is really helpful? Why the > > >> discussion > > >> > > > with > > >> > > > > > Adam > > >> > > > > > Gibson just finished with no implementation of any idea? > What > > >> > > concerns > > >> > > > do > > >> > > > > > we have? > > >> > > > > > > > >> > > > > > пн, 6 февр. 2017 г. в 15:01, Theodore Vasiloudis < > > >> > > > > > theodoros.vasilou...@gmail.com>: > > >> > > > > > > > >> > > > > > > Hell all, > > >> > > > > > > > > >> > > > > > > This is point that has come up in the past: Given the > > >> multitude > > >> > of > > >> > > ML > > >> > > > > > > libraries out there, should we have native implementations > > in > > >> > > FlinkML > > >> > > > > or > > >> > > > > > > try to integrate other libraries instead? > > >> > > > > > > > > >> > > > > > > We haven't managed to reach a consensus on this before. My > > >> > opinion > > >> > > is > > >> > > > > > that > > >> > > > > > > there is definitely value in having ML algorithms written > > >> > natively > > >> > > in > > >> > > > > > > Flink, both for performance optimization, > > >> > > > > > > but more importantly for engineering simplicity, we don't > > >> want to > > >> > > > force > > >> > > > > > > users to use yet another piece of software to run their ML > > >> algos > > >> > > (at > > >> > > > > > least > > >> > > > > > > for a basic set of algorithms). > > >> > > > > > > > > >> > > > > > > We have in the past discussed integrations with DL4J > > >> > (particularly > > >> > > > > ND4J) > > >> > > > > > > with Adam Gibson, the core developer of the library, but > we > > >> never > > >> > > got > > >> > > > > > > around to implementing anything. > > >> > > > > > > > > >> > > > > > > Whether it makes sense to have an integration with DL4J as > > >> part > > >> > of > > >> > > > the > > >> > > > > > > Flink distribution would be up for discussion. I would > > >> suggest to > > >> > > > make > > >> > > > > it > > >> > > > > > > an independent repo to allow for > > >> > > > > > > faster dev/release cycles, and because it wouldn't be > > directly > > >> > > > related > > >> > > > > to > > >> > > > > > > the core of Flink so it would add extra reviewing burden > to > > an > > >> > > > already > > >> > > > > > > overloaded group of committers. > > >> > > > > > > > > >> > > > > > > Natively supporting GPU calculations in Flink would be > much > > >> > better > > >> > > > > > achieved > > >> > > > > > > through a library like ND4J, the engineering burden would > be > > >> too > > >> > > much > > >> > > > > > > otherwise. > > >> > > > > > > > > >> > > > > > > Regards, > > >> > > > > > > Theodore > > >> > > > > > > > > >> > > > > > > On Mon, Feb 6, 2017 at 11:26 AM, Katherin Eri < > > >> > > > katherinm...@gmail.com> > > >> > > > > > > wrote: > > >> > > > > > > > > >> > > > > > > > Hello, guys. > > >> > > > > > > > > > >> > > > > > > > Theodore, last week I started the review of the PR: > > >> > > > > > > > https://github.com/apache/flink/pull/2735 related to > > >> *word2Vec > > >> > > for > > >> > > > > > > Flink*. > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > During this review I have asked myself: why do we need > to > > >> > > implement > > >> > > > > > such > > >> > > > > > > a > > >> > > > > > > > very popular algorithm like *word2vec one more time*, > when > > >> > there > > >> > > is > > >> > > > > > > already > > >> > > > > > > > available implementation in java provided by > > >> > deeplearning4j.org > > >> > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J -> > > >> Apache > > >> > 2 > > >> > > > > > > licence). > > >> > > > > > > > This library tries to promote itself, there is a hype > > >> around it > > >> > > in > > >> > > > ML > > >> > > > > > > > sphere, and it was integrated with Apache Spark, to > > provide > > >> > > > scalable > > >> > > > > > > > deeplearning calculations. > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > *That's why I thought: could we integrate with this > > library > > >> or > > >> > > not > > >> > > > > also > > >> > > > > > > and > > >> > > > > > > > Flink? * > > >> > > > > > > > > > >> > > > > > > > 1) Personally I think, providing support and deployment > of > > >> > > > > > > > *Deeplearning(DL) > > >> > > > > > > > algorithms/models in Flink* is promising and attractive > > >> > feature, > > >> > > > > > because: > > >> > > > > > > > > > >> > > > > > > > a) during last two years DL proved its efficiency > and > > >> these > > >> > > > > > > algorithms > > >> > > > > > > > used in many applications. For example *Spotify *uses DL > > >> based > > >> > > > > > algorithms > > >> > > > > > > > for music content extraction: Recommending music on > > Spotify > > >> > with > > >> > > > deep > > >> > > > > > > > learning AUGUST 05, 2014 > > >> > > > > > > > <http://benanne.github.io/2014/08/05/spotify-cnns.html> > > for > > >> > > their > > >> > > > > > music > > >> > > > > > > > recommendations. Developers need to scale up DL > manually, > > >> that > > >> > > > causes > > >> > > > > a > > >> > > > > > > lot > > >> > > > > > > > of work, so that’s why such platforms like Flink should > > >> support > > >> > > > these > > >> > > > > > > > models deployment. > > >> > > > > > > > > > >> > > > > > > > b) Here is presented the scope of Deeplearning usage > > >> cases > > >> > > > > > > > <https://deeplearning4j.org/use_cases>, so many of this > > >> > > scenarios > > >> > > > > > > related > > >> > > > > > > > to scenarios, that could be supported on Flink. > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > 2) But DL uncover such questions like: > > >> > > > > > > > > > >> > > > > > > > a) scale up calculations over machines > > >> > > > > > > > > > >> > > > > > > > b) perform these calculations both over CPU and GPU. > > >> GPU is > > >> > > > > > required > > >> > > > > > > to > > >> > > > > > > > train big DL models, otherwise learning process could > have > > >> very > > >> > > > slow > > >> > > > > > > > convergence. > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > 3) I have checked this DL4J library, which already have > > >> reach > > >> > > > support > > >> > > > > > of > > >> > > > > > > > many attractive DL models like: Recurrent Networks and > > >> LSTMs, > > >> > > > > > > Convolutional > > >> > > > > > > > Networks (CNN), Restricted Boltzmann Machines (RBM) and > > >> others. > > >> > > So > > >> > > > we > > >> > > > > > > won’t > > >> > > > > > > > need to implement them independently, but only provide > the > > >> > > ability > > >> > > > of > > >> > > > > > > > execution of this models over Flink cluster, the quite > > >> similar > > >> > > way > > >> > > > > like > > >> > > > > > > it > > >> > > > > > > > was integrated with Apache Spark. > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Because of all of this I propose: > > >> > > > > > > > > > >> > > > > > > > 1) To create new ticket in Flink’s JIRA for > integration > > >> of > > >> > > Flink > > >> > > > > > with > > >> > > > > > > > DL4J and decide on which side this integration should be > > >> > > > implemented. > > >> > > > > > > > > > >> > > > > > > > 2) Support natively GPU resources in Flink and allow > > >> > > > calculations > > >> > > > > > over > > >> > > > > > > > them, like that is described in this publication > > >> > > > > > > > https://www.oreilly.com/learning/accelerating-spark- > > >> > > > > > workloads-using-gpus > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > *Regarding original issue Implement Word2Vec > > >> > > > > > > > <https://issues.apache.org/jira/browse/FLINK-2094>in > > Flink, > > >> > *I > > >> > > > have > > >> > > > > > > > investigated its implementation in DL4J and that > > >> > implementation > > >> > > of > > >> > > > > > > > integration DL4J with Apache Spark, and got several > > points: > > >> > > > > > > > > > >> > > > > > > > It seems that idea of building of our own implementation > > of > > >> > > > word2vec > > >> > > > > in > > >> > > > > > > > Flink not such a bad solution, because: This DL4J was > > >> forced to > > >> > > > > > > reimplement > > >> > > > > > > > its original word2Vec over Spark. I have checked the > > >> > integration > > >> > > of > > >> > > > > > DL4J > > >> > > > > > > > with Spark, and found that it is too strongly coupled > with > > >> > Spark > > >> > > > API, > > >> > > > > > so > > >> > > > > > > > that it is impossible just to take some DL4J API and > reuse > > >> it, > > >> > > > > instead > > >> > > > > > we > > >> > > > > > > > need to implement independent integration for Flink. > > >> > > > > > > > > > >> > > > > > > > *That’s why we simply finish implementation of current > PR > > >> > > > > > > > **independently **from > > >> > > > > > > > integration to DL4J.* > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > Could you please provide your opinion regarding my > > questions > > >> > and > > >> > > > > > points, > > >> > > > > > > > what do you think about them? > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > пн, 6 февр. 2017 г. в 12:51, Katherin Eri < > > >> > > katherinm...@gmail.com > > >> > > > >: > > >> > > > > > > > > > >> > > > > > > > > Sorry, guys I need to finish this letter first. > > >> > > > > > > > > Full version of it will come shortly. > > >> > > > > > > > > > > >> > > > > > > > > пн, 6 февр. 2017 г. в 12:49, Katherin Eri < > > >> > > > katherinm...@gmail.com > > >> > > > > >: > > >> > > > > > > > > > > >> > > > > > > > > Hello, guys. > > >> > > > > > > > > Theodore, last week I started the review of the PR: > > >> > > > > > > > > https://github.com/apache/flink/pull/2735 related to > > >> > *word2Vec > > >> > > > for > > >> > > > > > > > Flink*. > > >> > > > > > > > > > > >> > > > > > > > > During this review I have asked myself: why do we need > > to > > >> > > > implement > > >> > > > > > > such > > >> > > > > > > > a > > >> > > > > > > > > very popular algorithm like *word2vec one more time*, > > when > > >> > > there > > >> > > > is > > >> > > > > > > > > already availabe implementation in java provided by > > >> > > > > > deeplearning4j.org > > >> > > > > > > > > <https://deeplearning4j.org/word2vec> library (DL4J > -> > > >> > Apache > > >> > > 2 > > >> > > > > > > > licence). > > >> > > > > > > > > This library tries to promote it self, there is a hype > > >> around > > >> > > it > > >> > > > in > > >> > > > > > ML > > >> > > > > > > > > sphere, and it was integrated with Apache Spark, to > > >> provide > > >> > > > > scalable > > >> > > > > > > > > deeplearning calculations. > > >> > > > > > > > > That's why I thought: could we integrate with this > > >> library or > > >> > > not > > >> > > > > > also > > >> > > > > > > > and > > >> > > > > > > > > Flink? > > >> > > > > > > > > 1) Personally I think, providing support and > deployment > > of > > >> > > > > > Deeplearning > > >> > > > > > > > > algorithms/models in Flink is promising and attractive > > >> > feature, > > >> > > > > > > because: > > >> > > > > > > > > a) during last two years deeplearning proved its > > >> > efficiency > > >> > > > and > > >> > > > > > > this > > >> > > > > > > > > algorithms used in many applications. For example > > *Spotify > > >> > > *uses > > >> > > > DL > > >> > > > > > > based > > >> > > > > > > > > algorithms for music content extraction: Recommending > > >> music > > >> > on > > >> > > > > > Spotify > > >> > > > > > > > > with deep learning AUGUST 05, 2014 > > >> > > > > > > > > <http://benanne.github.io/ > 2014/08/05/spotify-cnns.html> > > >> for > > >> > > > their > > >> > > > > > > music > > >> > > > > > > > > recommendations. Doing this natively scalable is very > > >> > > attractive. > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > I have investigated that implementation of integration > > >> DL4J > > >> > > with > > >> > > > > > Apache > > >> > > > > > > > > Spark, and got several points: > > >> > > > > > > > > > > >> > > > > > > > > 1) It seems that idea of building of our own > > >> implementation > > >> > of > > >> > > > > > word2vec > > >> > > > > > > > > not such a bad solution, because the integration of > DL4J > > >> with > > >> > > > Spark > > >> > > > > > is > > >> > > > > > > > too > > >> > > > > > > > > strongly coupled with Saprk API and it will take time > > from > > >> > the > > >> > > > side > > >> > > > > > of > > >> > > > > > > > DL4J > > >> > > > > > > > > to adopt this integration to Flink. Also I have > expected > > >> that > > >> > > we > > >> > > > > will > > >> > > > > > > be > > >> > > > > > > > > able to call just some API, it is not such thing. > > >> > > > > > > > > 2) > > >> > > > > > > > > > > >> > > > > > > > > https://deeplearning4j.org/use_cases > > >> > > > > > > > > https://www.analyticsvidhya.com/blog/2017/01/t-sne- > > >> > > > > > > > implementation-r-python/ > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > чт, 19 янв. 2017 г. в 13:29, Till Rohrmann < > > >> > > trohrm...@apache.org > > >> > > > >: > > >> > > > > > > > > > > >> > > > > > > > > Hi Katherin, > > >> > > > > > > > > > > >> > > > > > > > > welcome to the Flink community. Always great to see > new > > >> > people > > >> > > > > > joining > > >> > > > > > > > the > > >> > > > > > > > > community :-) > > >> > > > > > > > > > > >> > > > > > > > > Cheers, > > >> > > > > > > > > Till > > >> > > > > > > > > > > >> > > > > > > > > On Tue, Jan 17, 2017 at 1:02 PM, Katherin Sotenko < > > >> > > > > > > > katherinm...@gmail.com> > > >> > > > > > > > > wrote: > > >> > > > > > > > > > > >> > > > > > > > > > ok, I've got it. > > >> > > > > > > > > > I will take a look at > > >> > > > https://github.com/apache/flink/pull/2735 > > >> > > > > . > > >> > > > > > > > > > > > >> > > > > > > > > > вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis < > > >> > > > > > > > > > theodoros.vasilou...@gmail.com>: > > >> > > > > > > > > > > > >> > > > > > > > > > > Hello Katherin, > > >> > > > > > > > > > > > > >> > > > > > > > > > > Welcome to the Flink community! > > >> > > > > > > > > > > > > >> > > > > > > > > > > The ML component definitely needs a lot of work > you > > >> are > > >> > > > > correct, > > >> > > > > > we > > >> > > > > > > > are > > >> > > > > > > > > > > facing similar problems to CEP, which we'll > > hopefully > > >> > > resolve > > >> > > > > > with > > >> > > > > > > > the > > >> > > > > > > > > > > restructuring Stephan has mentioned in that > thread. > > >> > > > > > > > > > > > > >> > > > > > > > > > > If you'd like to help out with PRs we have many > > open, > > >> > one I > > >> > > > > have > > >> > > > > > > > > started > > >> > > > > > > > > > > reviewing but got side-tracked is the Word2Vec one > > >> [1]. > > >> > > > > > > > > > > > > >> > > > > > > > > > > Best, > > >> > > > > > > > > > > Theodore > > >> > > > > > > > > > > > > >> > > > > > > > > > > [1] https://github.com/apache/flink/pull/2735 > > >> > > > > > > > > > > > > >> > > > > > > > > > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske < > > >> > > > > > fhue...@gmail.com > > >> > > > > > > > > > >> > > > > > > > > > wrote: > > >> > > > > > > > > > > > > >> > > > > > > > > > > > Hi Katherin, > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > welcome to the Flink community! > > >> > > > > > > > > > > > Help with reviewing PRs is always very welcome > > and a > > >> > > great > > >> > > > > way > > >> > > > > > to > > >> > > > > > > > > > > > contribute. > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > Best, Fabian > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko < > > >> > > > > > > > katherinm...@gmail.com > > >> > > > > > > > > >: > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Thank you, Timo. > > >> > > > > > > > > > > > > I have started the analysis of the topic. > > >> > > > > > > > > > > > > And if it necessary, I will try to perform the > > >> review > > >> > > of > > >> > > > > > other > > >> > > > > > > > > pulls) > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther < > > >> > > > > > twal...@apache.org > > >> > > > > > > >: > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hi Katherin, > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > great to hear that you would like to > > contribute! > > >> > > > Welcome! > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > I gave you contributor permissions. You can > > now > > >> > > assign > > >> > > > > > issues > > >> > > > > > > > to > > >> > > > > > > > > > > > > > yourself. I assigned FLINK-1750 to you. > > >> > > > > > > > > > > > > > Right now there are many open ML pull > > requests, > > >> you > > >> > > are > > >> > > > > > very > > >> > > > > > > > > > welcome > > >> > > > > > > > > > > to > > >> > > > > > > > > > > > > > review the code of others, too. > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Timo > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin > Sotenko: > > >> > > > > > > > > > > > > > > Hello, All! > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with > 6-year > > >> > > > enterprise > > >> > > > > > > > > > experience, > > >> > > > > > > > > > > > > also > > >> > > > > > > > > > > > > > I > > >> > > > > > > > > > > > > > > have some expertise with scala (half of > the > > >> > year). > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Last 2 years I have participated in > several > > >> > BigData > > >> > > > > > > projects > > >> > > > > > > > > that > > >> > > > > > > > > > > > were > > >> > > > > > > > > > > > > > > related to Machine Learning (Time series > > >> > analysis, > > >> > > > > > > > Recommender > > >> > > > > > > > > > > > systems, > > >> > > > > > > > > > > > > > > Social networking) and ETL. I have > > experience > > >> > with > > >> > > > > > Hadoop, > > >> > > > > > > > > Apache > > >> > > > > > > > > > > > Spark > > >> > > > > > > > > > > > > > and > > >> > > > > > > > > > > > > > > Hive. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink > > >> > project > > >> > > > > > requires > > >> > > > > > > > > some > > >> > > > > > > > > > > work > > >> > > > > > > > > > > > > in > > >> > > > > > > > > > > > > > > this area, so that’s why I would like to > > join > > >> > Flink > > >> > > > and > > >> > > > > > ask > > >> > > > > > > > me > > >> > > > > > > > > to > > >> > > > > > > > > > > > grant > > >> > > > > > > > > > > > > > the > > >> > > > > > > > > > > > > > > assignment of the ticket > > >> > > > > > > > > > > > > > https://issues.apache.org/jira > > >> /browse/FLINK-1750 > > >> > > > > > > > > > > > > > > to me. > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > > > >