Hi Christophe, it is true that FlinkML only targets batch workloads. Also, there has not been any development since a long time.
In March last year, a discussion was started on the dev mailing list about different machine learning features for stream processing [1]. One result of this discussion was FLIP-23 [2] which will add a library for model serving to Flink, i.e., it can load (and update) machine learning models and evaluate them on a stream. If you dig through the mailing list thread, you'll find a link to a Google doc that discusses other possible directions. Best, Fabian [1] https://lists.apache.org/thread.html/eeb80481f3723c160bc923d689416a352d6df4aad98fe7424bf33132@%3Cdev.flink.apache.org%3E [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving 2018-02-05 16:43 GMT+01:00 Christophe Jolif <cjo...@gmail.com>: > Hi all, > > Sorry, this is me again with another question. > > Maybe I did not search deep enough, but it seems the FlinkML API is still > pure batch. > > If I read https://cwiki.apache.org/confluence/display/FLINK/ > FlinkML%3A+Vision+and+Roadmap it seems there was the intend to "exploit > the streaming nature of Flink, and provide functionality designed > specifically for data streams" but from my external point of view, I don't > see much happening here. Is there work in progress towards that? > > I would personally see two use-cases around streaming, first one around > updating an existing model that was build in batch, second one would be > triggering prediction not through a batch job but in a stream job. > > Are these things that are in the works? or maybe already feasible despite > the API looking like purely batch branded? > > Thanks, > -- > Christophe >