Hello Roman, Thank you for bringing this topic. I think it's a great idea to conceptualize how ML and Druid can be tightened together.
What I think could be one of the beneficiary approach for ML training- easy and intuitive integration with ML ecosystem. Easy access druid data sources from python in general and pandas in specifically, along with Jupiter notebooks and other ML popular projects. However, I don't see how models can be incorporated into Druid. Unlike Spark or Flink, Druid is not designed for execution user-programmable code. >From my perspective, trying to execute some ML logic on the druid side will be similar to the "stored procedures" approach which most likely hurt scalability. However, please don't take my point seriously here, I'm not an in-depth expert with Druid. Best, Sayat On Fri, Jan 10, 2020 at 6:41 AM Roman Leventov <leventov...@gmail.com> wrote: > Hello Druid developers, what do you think about the future of Druid & > machine learning? > > Druid has been great at complex aggregations. Could (should?) It make > inroads into ML? Perhaps aggregators which apply the rows against some > pre-trained model and summarize results. > > Should model training stay completely external to Druid, or it could be > incorporated into Druid's data lifecycle on a conceptual level, such as a > recurring "indexing" task which stores the result (the model) in Druid's > deep storage, the model automatically loaded on historical nodes as needed > (just like segments) and certain aggregators pick up the latest model? > > Does this make any sense? In what cases Druid & ML will and will not work > well together, and ML should stay a Spark's prerogative? > > I would be very interested to hear any thoughts on the topic, vague ideas > and questions. > -- Best Regards, Sayat