What is the best way to amortize heavy operations across elements in Flink? (that is what bundles are for, basically)
On Fri, Sep 22, 2023 at 5:09 AM Jan Lukavský <je...@seznam.cz> wrote: > Flink defines bundles in terms of number of elements and processing time, > by default 1000 elements or 1000 milliseconds, whatever happens first. But > bundles are not a "natural" concept in Flink, it uses them merely to comply > with the Beam model. By default, checkpoints are unaligned with bundles. > > Jan > On 9/22/23 01:58, Robert Bradshaw via dev wrote: > > Dataflow uses a work-stealing protocol. The FnAPI has a protocol to ask > the worker to stop at a certain element that has already been sent. > > On Thu, Sep 21, 2023 at 4:24 PM Joey Tran <joey.t...@schrodinger.com> > wrote: > >> Writing a runner and the first strategy for determining bundling size was >> to just start with a bundle size of one and double it until we reach a size >> that we expect to take some targets per-bundle runtime (e.g. maybe 10 >> minutes). I realize that this isn't the greatest strategy for high sized >> cost transforms. I'm curious what kind of strategies other runners take? >> >