Hi all,
Just some updates. Below is the vote thread:
https://sematext.com/opensee/m/Kafka/uyzND1h1NPW1tLVQR?subj=+VOTE+KIP+557+Add+emit+on+change+support+for+Kafka+Streams
It would be great if we can include this change to Kafka. :)
Cheers,
Richard
On Thu, Feb 27, 2020 at 6:45 PM Richard Yu
wr
Hi all,
@John Will add some notes accordingly.
To all: Thanks for all your input!
It looks like we can wrap up this discussion thread then.
I've started a vote thread, so please feel free to cast your vote there!
We should be pretty close. :)
Cheers,
Richard
On Thu, Feb 27, 2020 at 2:34 PM Jo
Hi Richard,
Thanks for the update!
I read it over, and overall it looks good!
I have only a minor concern about the rate metric definition:
> The rate option indicates the ratio of records dropped to actual volume of
> records passing through the task
That's not the definition of a "rate". It s
Hi all,
I might've made a minor mistake. The processor node level is level 3, not
level 1.
I will correct the KIP accordingly.
After looking over things, I decided to start the voting thread this
afternoon.
Cheers,
Richard
On Thu, Feb 27, 2020 at 12:29 PM Richard Yu
wrote:
> Hi Bruno, Hi John
Hi Bruno, Hi John,
Thanks for your comments! I updated the KIP accordingly, and it looks like
for quite a few points. I was doing some beating around the bush which
could've been avoided.
Looks like we can reduce the metric to Level 1 (per processor node) then.
I've cleaned up most of the unnece
Hi John,
I agree with you. It is better to measure the metric on processor node
level. The users can do the rollup to task-level by themselves.
Best,
Bruno
On Thu, Feb 27, 2020 at 12:09 AM John Roesler wrote:
>
> Hi Richard,
>
> I've been making a final pass over the KIP.
>
> Re: Proposed Behav
Hi Richard,
I've been making a final pass over the KIP.
Re: Proposed Behavior Change:
I think this point is controversial and probably doesn't need to be there at
all:
> 2.b. In certain situations where there is a high volume of idempotent
> updates throughout the Streams DAG, it will be recomm
Hi Richard,
1. Could you change "idempotent update operations will only be dropped
from KTables, not from other classes." -> idempotent update operations
will only be dropped from materialized KTables? For non-materialized
KTables -- as they can occur after optimization of the topology -- we
canno
Hi John,
Sounds goods. It looks like we are close to wrapping things up. If there
isn't any other revisions which needs to be made. (If so, please comment in
the thread)
I will start the voting process this Thursday (Pacific Standard Time).
Cheers,
Richard
On Tue, Feb 25, 2020 at 11:59 AM John R
Hi Richard,
Sorry for the slow reply. I actually think we should avoid checking
equals() for now. Your reasoning is good, but the truth is that
depending on the implementation of equals() is non-trivial,
semantically, and (though I proposed it before), I'm not convinced
it's worth the risk. Much b
nteresting one and boils down to my statement from
>>> above
>>> > > > > > > >>> about
>>> > > > > > > > "what can we actually implement". What I don't understand
>>> is:
>>> > > > > >
not sure about this point. In fact, we have already
>> some
>> > > > > > > >>> no-ops
>> > > > > > > > in Kafka Streams in our join-operators and don't report any
>> of
>> > > > > > > >>>
gt; > >>>>>>>>>> Hi Tommy,
> > > > > > > >>>>>>>>>>
> > > > > > > >>>>>>>>>> Thanks for the context. I can see the attraction of
> > > > > > co
> > > > >>>>>>>>>> We only care about one attribute from the Info table
> > > > > > >>>>>>>>>> (name),
> > > > > > >>> and
> > > > > > >>>>
t;>>>
> > > > > >>>>>>>>> Ahh yes I see. This works, but in the case where you're
> > > > > >>>>>>>>> using
> > > > > >>>>>>>>> schemas as we
ifferent situation, though. Reading between the lines a
> > > > >>> little,
> > > > >>>>>>>>>> it sounds like: in contrast to the example above, in which we
> > > > >>> are
> > > > >>>>>&g
t; > > >>>>>>>>>> It does seem handy to be able to plug in a custom
> > > >>> ChangeDetector
> > > >>>>>>>>>> for this purpose, but I worry about the API complexity. Maybe
> > > >>> you
>
prior" value at all for a large number of
> > >>>>>>>>>> operations. Admitedly, if we extend the proposal to include
> > >>> no-op
> > >>>>>>>>>> detection for stateless operations, we'd pro
gt;>> 3. As a final optimization, after serializing and before
> >>> sending
> >>>>>>>>>> repartition records, compare the serialized data and drop
> >>>>>>>>>> no-ops.
> >>>>>>>>>>
&
; implementation. But if we can leverage equals(), then the
>>> "right
>>>>>>>>>> thing" happens automatically.
>>>>>>>>>
>>>>>>>>> I still don't totally follow why the individual components
&g
gt; fields), have we not covered our bases?
> > >> > > >
> > >> > > >>
> > >> > > >> This dovetails in with my primary UX concern; where would the
> > >> > > >> ChangeDetector actually be registered? None
ure
> >> > > >> that all operations actually preserve the metadata alongside the
> >> > > >> data (e.g., don't accidentally add a mapValues like I did, or you
> >> > > >> drop the metadata). 2. implement a ChangeDetector for every
> >> >
gt;> same as your current solution.
>> > > >
>> > > > I definitely see your point regarding configuration. I was
>> > > > originally thinking about this when the deduplication was going to
>> > > > be opt-in, and it seemed very natura
behavior and not having any configuration
> > > > options, it's harder to find a place for this.
> > > >
> > > >
> > > >
> > > >> A final thought; if it really is a metadata question, can we just
> > > >> plan to
his seems to
> > >> give us a way to support your use case without adding to the
> > >> mental overhead of using Streams for simple things.
> > >
> > > Agree headers could be a decent fit for this particular case
> > > because it
ings should be
> >> possible.
> >>
> >> What are your thoughts? Thanks, -John
> >>
> >>
> >> On Mon, Feb 3, 2020, at 07:19, Thomas Becker wrote:
> >>
> >> Hi John, Can you describe how you'd use filtering/mapping to
> &
ore we also suppress updates that only touch those
>> fields.
>>
>> -Tommy
>>
>>
>> On Fri, 2020-01-31 at 19:30 -0600, John Roesler wrote: [EXTERNAL
>> EMAIL] Attention: This email was sent from outside TiVo. DO NOT
>> CLICK any links or attachments unles
the operation being performed.
________ From: John Roesler
mailto:vvcep...@apache.org>> Sent: Friday, January 31,
2020 4:51 PM To: dev@kafka.apache.org<mailto:dev@kafka.apache.org>
mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557]
Add emit on
; > > out-of-order data. Atm, I don't see any issue in particular, but it
> > > would be great if everybody could think about out-of-order handling and
> > > if/how it affects emit-on-change behavior. Also note, that KIP-280 is
> > > allowing a timestamp-based compa
cords the skipped
> (duplicate) values.
>
> This way, it is easier to observe the effect when this feature is in
> production.
>
> Cheers
>
>
> >
> > -- Forwarded message -
> > From: Richard Yu
> > Date: Sun, Feb 2, 2020 at 10:2
s feel about allowing the mechanism by which no-ops are
> > detected to be pluggable? Meaning use something like a hash by default, but
> > you could optionally provide an implementation of something to use instead,
> > like a ChangeDetector. This could be useful for example to
v@kafka.apache.org>
mailto:dev@kafka.apache.org>>
Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
[EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK
any links or attachments unless you expected them.
is feature is in production.
Cheers
>
> -- Forwarded message -
> From: Richard Yu
> Date: Sun, Feb 2, 2020 at 10:21 AM
> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
> To:
>
>
> Hi Bruno,
>
> Thanks for the reply!
>
gt; > > -Matthias
> > >
> > >
> > > On 1/31/20 5:30 PM, John Roesler wrote:
> > > > Hi Thomas and yuzhihong,
> > > >
> > > > That’s an interesting idea. Can you help think of a use case tha
gt; > On Fri, Jan 31, 2020, at 18:56, yuzhih...@gmail.com wrote:
> > >> I think this is good idea.
> > >>
> > >>> On Jan 31, 2020, at 4:49 PM, Thomas Becker
> > wrote:
> > >>>
> > >>> How do folks feel about allowing th
a hash by default, but
> you could optionally provide an implementation of something to use instead,
> like a ChangeDetector. This could be useful for example to ignore changes
> to certain fields, which may not be relevant to the operation being
> performed.
> >>
is could be useful for example to ignore changes
>>> to certain fields, which may not be relevant to the operation being
>>> performed.
>>> ________________
>>> From: John Roesler
>>> Sent: Friday, January 31, 2020 4:51 PM
>>> To
iday, January 31, 2020 4:51 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
> >
> > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT
> > CLICK any links or attachments unless you expe
PM
> To: dev@kafka.apache.org
> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
>
> [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT
> CLICK any links or attachments unless you expected them.
>
>
certain
fields, which may not be relevant to the operation being performed.
From: John Roesler
Sent: Friday, January 31, 2020 4:51 PM
To: dev@kafka.apache.org
Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams
[EXTERNAL EMAIL] Attention: This
Hello all,
Sorry for my silence. It seems like we are getting close to consensus.
Hopefully, we could move to a vote soon!
All of the reasoning from Matthias and Bruno around timestamp is compelling. I
would be strongly in favor of stating a few things very clearly in the KIP:
1. Streams will dro
Hello to all,
I've finished making some initial modifications to the KIP.
I have decided to keep the implementation section in the KIP for
record-keeping purposes.
For now, we should focus on only the proposed behavior changes instead.
See if you have any comments!
Cheers,
Richard
On Sat, Jan
Hi all,
Thanks for all the discussion!
@John and @Bruno I will survey other possible systems and see what I can do.
Just a question, by systems, I suppose you would mean the pros and cons of
different reporting strategies?
I'm not completely certain on this point, so it would be great if you can
Thank you Matthias for the use cases!
Looking at both use cases, I think you need to elaborate on them in
the KIP, Richard.
Emit from plain KTable:
I agree with Matthias that the lower timestamp makes sense because it
marks the start of the validity of the record. Idempotent records with
a higher
IMHO, the question about semantics depends on the use case, in
particular on the origin of a KTable.
If there is a changlog topic that one reads directly into a KTable,
emit-on-change does actually make sense, because the timestamp indicates
_when_ the update was _effective_. For this case, it is
Hi Bruno,
Thanks for that idea. I hadn't considered that
option before, and it does seem like that would be
the right place to put it if we think it might be
semantically important to control on a
table-by-table basis.
I had been thinking of it less semantically and
more practically. In the conte
Hi Richard,
Thank you for the KIP.
I agree with John that we should focus on the interface and behavior
change in a KIP. We can discuss the implementation later.
I am also +1 for the survey.
I had a thought about this. Couldn't we consider emit-on-change to be
one config of suppress (like `unti
Hi Richard,
Thanks for picking this up! I know of at least one large community member
for which this feature is absolutely essential.
If I understand your two options, it seems like the proposal is to implement
it as a behavior change regardless, and the question is whether to provide
an opt-out
Hi everybody!
I'd like to propose a change that we probably should've added for a long
time now.
The key benefit of this KIP would be reduced traffic in Kafka Streams since
a lot of no-op results would no longer be sent downstream.
Here is the KIP for reference.
https://cwiki.apache.org/confluen
49 matches
Mail list logo