Hi all, Thanks for keeping the discussion running while I was on holidays! I am catching up currently and I will post in the voting thread if I have any comments :)
Cheers, Kostas On Wed, Sep 16, 2020 at 11:25 AM David Anderson <da...@alpinegizmo.com> wrote: > > Aljoscha, > > Thanks for the thorough response. I'm still wanting to think about and > discuss the Trigger topic some more, but I'm content with where you've left > it for now. Everything else seems good. > > David > > On Fri, Sep 11, 2020 at 2:08 PM Aljoscha Krettek <aljos...@apache.org> > wrote: > > > Thanks for the thoughtful comments! I'll try and address them inline > > below. I'm hoping to start a VOTE thread soon if there are no other > > comments by the end of today. > > > > On 10.09.20 15:40, David Anderson wrote: > > > Having just re-read FLIP-134, I think it mostly makes sense, though I'm > > not > > > exactly looking forward to figuring out how to explain it without making > > it > > > seem overly complicated. > > > > Which are the points where you see the explanation could become to > > complex? For me, the only difference in behaviour is processing-time > > timers, which will fail hard in BATCH execution mode. Things like > > shuffle-mode and schedule-mode should be transparent and I would not > > mention them in the documentation except in an advanced section. > > > > > I'm a bit confused by the discussion around custom window Triggers. Yes, > > I > > > agree that complex, mixed Triggers are sometimes useful. And I buy into > > the > > > argument that we want to FAIL hard for processing-time on BATCH. But why > > > not go ahead and FAIL Triggers that can't work, rather than ignoring all > > > custom Triggers? > > > > The motivation is to allow the same program to work on BATCH and on > > STREAMING, and in reality DataStream programs often have Triggers that > > you wouldn't need for BATCH execution. > > > > I do think that this topic is too important to have it as a sub-section > > in this FLIP. I will remove it and write another FLIP just about this > > topic. This will mean that DataStream programs that have Triggers that > > use processing-time will simply fail hard. Which is acceptable for an > > initial version, I thin > > > I do think it's critical that bounded streaming has the same > > configuration > > > as unbounded streaming. Users expect/need things like processing time > > > timers in bounded streaming during development. If I've understood the > > > proposal correctly, this will be the case. > > > > If you're referring to the case where you have STREAMING execution mode > > but your sources are bounded (for development), then yes, I think we're > > on the same page. > > > > > I would prefer WARN over IGNORE as the default for cases where users have > > > explicitly specified something that isn’t going to happen. (I would also > > > like to see a warning given for any job that uses event time timers > > without > > > having a watermark strategy, though that's unrelated to the topic at > > hand.) > > > > Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as > > the default setting for BATCH execution mode. Is there another setting > > where we currently propose IGNORE but you think it should be FAIL? There > > is pipeline.processing-time.end-of-input: IGNORE, which is in line with > > the current behaviour, and failing when timers are set means there won't > > be any to fire in BATCH execution mode. > > > > Aljoscha > > > >