Hello all, I've started thinking about online learning in Flink and one of the issues that has come up in other frameworks is the ability to prioritize "control" over "data" events in iterations.
To set an example, say we develop an ML model, that ingests events in parallel, performs an aggregation to update the model, and then broadcasts the updated model to back through an iteration/back edge. Using the above nomenclature the events being ingested would be "data" events, and the model update would a "control" event. I talked about this scenario a bit with couple of people (Paris and Gianmarco) and one thing we would like to have is the ability to prioritize the ingestion of control events over the data events. If my understanding is correct, currently there is a buffer/queue of events waiting to be processed for each operator, and each incoming event ends up at the end of that queue. If our data source is fast, and the model updates slow, a lot of data events might be buffered/scheduled to be processed before each model update, because of the speed difference between the two streams. But we would like to update the model that is used to process data events as soon as the newest version becomes available. Is it somehow possible to make the control events "jump" the queue and be processed as soon as they arrive over the data events? Regards, Theodore P.S. This is still very much a theoretical problem, I haven't looked at how such a pipeline would be implemented in Flink.