I think this looks reasonable, but just a more general note on
compatibility -- I think it's worth trying to clarify what types of
compatibility we're trying to achieve. Guozhang's 1 & 2 give an important
breakdown (compile vs runtime compatibility). For the type of change
described here, I think it makes sense to clarify the compatibility goals.
The (pure) compile time compatibility vs (pure) runtime compatibility
aren't the only options -- you have some additional intermediate choices as
well.

The proposal describes a change which requires recompiling the plugin
(TimestampExtractor) *and* substituting a runtime library (kafka-streams
jar) to get correct updated behavior. This can get interesting if you
already have N streams apps sharing the same TimestampExtractor. You now
*must* update all of them to the new streams jar if any are to be updated
for the new TimestampExtractor API. For folks with a monolithic
repo/versioning setup, this could potentially be painful since they're
forced to update all apps at once. It's at least not too bad since it can
be isolated to a single commit (without deployment needing to be
coordinated, for example), but if the # of apps gets > 4 or 5, these types
of updates start to be a real pain.

I think this API change is an acceptable (albeit annoying) API
incompatibility right now, but wanted to raise this in the discussion of
this KIP so we consider this moving forward. There definitely are
alternatives that add the new functionality but maintain compatibility
better. In particular, it's possible to define the new interface to require
both APIs:

// new interface
public interface TimestampExtractor {
    long extract(ConsumerRecord<Object, Object> record);
    long extract(ConsumerRecord<Object, Object> record, long
previousTimestamp);
}

which requires more effort for the implementor of the new API, but
maintains compatibility if you want to use a new jar including the
TimestampExtractor even with the old version of streams/the
TimestampExtractor interface (since it will correctly dispatch to the old
implementation). It requires more effort on the part of the framework since
it needs to catch runtime exceptions when the second version of extract()
is missing and fall back to the first version. But in some cases that might
be warranted for the sake of compatibility.

I suspect this update won't cause too much pain right now just because the
number of streams app any user has won't be too large quite yet, but this
seems important to consider moving forward. I think we had some similar
concerns & discussion around the changes to the consumer APIs when trying
to generalize the collection types used in those APIs.

-Ewen


On Mon, Nov 28, 2016 at 10:46 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Done.
>
> If there is no further comments, I would like to start a voting thread
> in a separate email.
>
> -Matthias
>
> On 11/28/16 9:08 AM, Guozhang Wang wrote:
> > Yes it does not include these, again in my previous previous email I
> meant
> > when you say "This is a breaking, incompatible change" people may
> interpret
> > it differently. So better explain it more clearly.
> >
> >
> > Guozhang
> >
> > On Thu, Nov 24, 2016 at 10:31 PM, Matthias J. Sax <matth...@confluent.io
> >
> > wrote:
> >
> >> That does make sense. But KIP-93 does not change anything like this, so
> >> there is nothing to mention, IMHO.
> >>
> >> Or do you mean, the KIP should include that the change is backward
> >> compatible with this regard?
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 11/24/16 5:31 PM, Guozhang Wang wrote:
> >>> What I meant is that, for some changes (e.g. say we change the
> >>> auto-repartition behavior that caused using different name conventions,
> >> or
> >>> some changes that involve changing the underlying state store names,
> etc)
> >>> the existing internal state including the stores and topics will
> probably
> >>> not valid. Some users consider this also as a "backward incompatible
> >>> change" since they cannot just swipe in the new jar and restart.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Wed, Nov 23, 2016 at 3:20 PM, Matthias J. Sax <
> matth...@confluent.io>
> >>> wrote:
> >>>
> >>>> Thanks for the feedback. I updated the KIP for (1) and (2).
> >>>>
> >>>> However not for (3): Why should it be required to reset an
> application?
> >>>> If user processed "good" data with valid timestamps, behavior does not
> >>>> change. If user tried to process "bad" data with invalid timestamps,
> the
> >>>> application does fail currently anyway, so there is nothing to reset.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 11/22/16 9:53 AM, Guozhang Wang wrote:
> >>>>> Regarding the "compatibility" section, I would suggest being a bit
> more
> >>>>> specific about why it is a breaking change. For Streams, it could
> mean
> >>>>> different things:
> >>>>>
> >>>>> 1. User need code change when switching library dependency on the new
> >>>>> version, otherwise it won't compile(I think this is the case for this
> >>>> KIP).
> >>>>> 2. User need code change when switching library dependency on the new
> >>>>> version, otherwise runtime exception will be thrown.
> >>>>> 3. Existing application state as well as internal topics need to be
> >>>> swiped
> >>>>> and the program need to restart from zero.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>> On Fri, Nov 18, 2016 at 12:27 PM, Matthias J. Sax <
> >> matth...@confluent.io
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I want to start a discussion about KIP-93:
> >>>>>>
> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>> 93%3A+Improve+invalid+timestamp+handling+in+Kafka+Streams
> >>>>>>
> >>>>>> Looking forward to your feedback.
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>


-- 
Thanks,
Ewen

Reply via email to