Thanks, that all makes sense! On Wed, Jan 27, 2021 at 7:00 PM Jark Wu <imj...@gmail.com> wrote:
> Hi Rex, > > Could you share your query here? It would be helpful to identify the root > cause if we have the query. > > 1) watermark > The framework automatically adds a node (the MiniBatchAssigner) to > generate watermark events as the mini-batch id to broadcast and trigger > mini-batch in the pipeline. > > 2) MiniBatchAssigner(interval=[1000ms], mode=[ProcTime] > It generates a new mini-batch id in an interval of 1000ms in system time. > The mini-batch id is represented by the watermark event. > > 3) TWO_PHASE optimization > If users want to have TWO_PHASE optimization, it requires the aggregate > functions all support the merge() method and the mini-batch is enabled. > > Best, > Jark > > > > > On Tue, 26 Jan 2021 at 19:01, Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >> I am pulling Jark and Godfrey who are more familiar with the planner >> internals. >> >> Best, >> >> Dawid >> On 22/01/2021 20:11, Rex Fenley wrote: >> >> Hello, >> >> Does anyone have any more information here? >> >> Thanks! >> >> On Wed, Jan 20, 2021 at 9:13 PM Rex Fenley <r...@remind101.com> wrote: >> >>> Hi, >>> >>> Our job was experiencing high write amplification on aggregates so we >>> decided to give mini-batch a go. There's a few things I've noticed that are >>> different from our previous job and I would like some clarification. >>> >>> 1) Our operators now say they have Watermarks. We never explicitly added >>> watermarks, and our state is essentially unbounded across all time since it >>> consumes from Debezium and reshapes our database data into another store. >>> Why does it say we have Watermarks then? >>> >>> 2) In our sources I see MiniBatchAssigner(interval=[1000ms], >>> mode=[ProcTime], what does that do? >>> >>> 3) I don't really see anything else different yet in the shape of our >>> plan even though we've turned on >>> configuration.setString( >>> "table.optimizer.agg-phase-strategy", >>> "TWO_PHASE" >>> ) >>> is there a way to check that this optimization is on? We use user >>> defined aggregate functions, does it work for UDAF? >>> >>> Thanks! >>> >>> -- >>> >>> Rex Fenley | Software Engineer - Mobile and Backend >>> >>> >>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>> <https://www.facebook.com/remindhq> >>> >> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> >> -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>