Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

David Anderson Wed, 16 Sep 2020 02:25:47 -0700

Aljoscha,

Thanks for the thorough response. I'm still wanting to think about and
discuss the Trigger topic some more, but I'm content with where you've left
it for now. Everything else seems good.


David

On Fri, Sep 11, 2020 at 2:08 PM Aljoscha Krettek <aljos...@apache.org>
wrote:

> Thanks for the thoughtful comments! I'll try and address them inline
> below. I'm hoping to start a VOTE thread soon if there are no other
> comments by the end of today.
>
> On 10.09.20 15:40, David Anderson wrote:
> > Having just re-read FLIP-134, I think it mostly makes sense, though I'm
> not
> > exactly looking forward to figuring out how to explain it without making
> it
> > seem overly complicated.
>
> Which are the points where you see the explanation could become to
> complex? For me, the only difference in behaviour is processing-time
> timers, which will fail hard in BATCH execution mode. Things like
> shuffle-mode and schedule-mode should be transparent and I would not
> mention them in the documentation except in an advanced section.
>
> > I'm a bit confused by the discussion around custom window Triggers. Yes,
> I
> > agree that complex, mixed Triggers are sometimes useful. And I buy into
> the
> > argument that we want to FAIL hard for processing-time on BATCH. But why
> > not go ahead and FAIL Triggers that can't work, rather than ignoring all
> > custom Triggers?
>
> The motivation is to allow the same program to work on BATCH and on
> STREAMING, and in reality DataStream programs often have Triggers that
> you wouldn't need for BATCH execution.
>
> I do think that this topic is too important to have it as a sub-section
> in this FLIP. I will remove it and write another FLIP just about this
> topic. This will mean that DataStream programs that have Triggers that
> use processing-time will simply fail hard. Which is acceptable for an
> initial version, I thin
> > I do think it's critical that bounded streaming has the same
> configuration
> > as unbounded streaming. Users expect/need things like processing time
> > timers in bounded streaming during development. If I've understood the
> > proposal correctly, this will be the case.
>
> If you're referring to the case where you have STREAMING execution mode
> but your sources are bounded (for development), then yes, I think we're
> on the same page.
>
> > I would prefer WARN over IGNORE as the default for cases where users have
> > explicitly specified something that isn’t going to happen. (I would also
> > like to see a warning given for any job that uses event time timers
> without
> > having a watermark strategy, though that's unrelated to the topic at
> hand.)
>
> Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as
> the default setting for BATCH execution mode. Is there another setting
> where we currently propose IGNORE but you think it should be FAIL? There
> is pipeline.processing-time.end-of-input: IGNORE, which is in line with
> the current behaviour, and failing when timers are set means there won't
> be any to fire in BATCH execution mode.
>
> Aljoscha
>
>

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

Reply via email to