Thank you, Theodore. Shortly speaking I vote for: 1) Online learning 2) Low-latency prediction serving -> Offline learning with the batch API
In details: 1) If streaming is strong side of Flink lets use it, and try to support some online learning or light weight inmemory learning algorithms. Try to build pipeline for them. 2) I think that Flink should be part of production ecosystem, and if now productions require ML support, multiple models deployment and so on, we should serve this. But in my opinion we shouldn’t compete with such projects like PredictionIO, but serve them, to be an execution core. But that means a lot: a. Offline training should be supported, because typically most of ML algs are for offline training. b. Model lifecycle should be supported: ETL+transformation+training+scoring+exploitation quality monitoring I understand that batch world is full of competitors, but for me that doesn’t mean that batch should be ignored. I think that separated streaming/batching applications causes additional deployment and exploitation overhead which typically tried to be avoided. That means that we should attract community to this problem in my opinion. пт, 3 мар. 2017 г. в 15:34, Theodore Vasiloudis < theodoros.vasilou...@gmail.com>: Hello all, >From our previous discussion started by Stavros, we decided to start a planning document [1] to figure out possible next steps for ML on Flink. Our concerns where mainly ensuring active development while satisfying the needs of the community. We have listed a number of proposals for future work in the document. In short they are: - Offline learning with the batch API - Online learning - Offline learning with the streaming API - Low-latency prediction serving I saw there is a number of people willing to work on ML for Flink, but the truth is that we cannot cover all of these suggestions without fragmenting the development too much. So my recommendation is to pick out 2 of these options, create design documents and build prototypes for each library. We can then assess their viability and together with the community decide if we should try to include one (or both) of them in the main Flink distribution. So I invite people to express their opinion about which task they would be willing to contribute and hopefully we can settle on two of these options. Once that is done we can decide how we do the actual work. Since this is highly experimental I would suggest we work on repositories where we have complete control. For that purpose I have created an organization [2] on Github which we can use to create repositories and teams that work on them in an organized manner. Once enough work has accumulated we can start discussing contributing the code to the main distribution. Regards, Theodore [1] https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/ [2] https://github.com/flinkml -- *Yours faithfully, * *Kate Eri.*