Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-28 Thread Richard Yu
Hi all, Just some updates. Below is the vote thread: https://sematext.com/opensee/m/Kafka/uyzND1h1NPW1tLVQR?subj=+VOTE+KIP+557+Add+emit+on+change+support+for+Kafka+Streams It would be great if we can include this change to Kafka. :) Cheers, Richard On Thu, Feb 27, 2020 at 6:45 PM Richard Yu wr

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi all, @John Will add some notes accordingly. To all: Thanks for all your input! It looks like we can wrap up this discussion thread then. I've started a vote thread, so please feel free to cast your vote there! We should be pretty close. :) Cheers, Richard On Thu, Feb 27, 2020 at 2:34 PM Jo

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread John Roesler
Hi Richard, Thanks for the update! I read it over, and overall it looks good! I have only a minor concern about the rate metric definition: > The rate option indicates the ratio of records dropped to actual volume of > records passing through the task That's not the definition of a "rate". It s

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi all, I might've made a minor mistake. The processor node level is level 3, not level 1. I will correct the KIP accordingly. After looking over things, I decided to start the voting thread this afternoon. Cheers, Richard On Thu, Feb 27, 2020 at 12:29 PM Richard Yu wrote: > Hi Bruno, Hi John

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Richard Yu
Hi Bruno, Hi John, Thanks for your comments! I updated the KIP accordingly, and it looks like for quite a few points. I was doing some beating around the bush which could've been avoided. Looks like we can reduce the metric to Level 1 (per processor node) then. I've cleaned up most of the unnece

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-27 Thread Bruno Cadonna
Hi John, I agree with you. It is better to measure the metric on processor node level. The users can do the rollup to task-level by themselves. Best, Bruno On Thu, Feb 27, 2020 at 12:09 AM John Roesler wrote: > > Hi Richard, > > I've been making a final pass over the KIP. > > Re: Proposed Behav

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-26 Thread John Roesler
Hi Richard, I've been making a final pass over the KIP. Re: Proposed Behavior Change: I think this point is controversial and probably doesn't need to be there at all: > 2.b. In certain situations where there is a high volume of idempotent > updates throughout the Streams DAG, it will be recomm

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-26 Thread Bruno Cadonna
Hi Richard, 1. Could you change "idempotent update operations will only be dropped from KTables, not from other classes." -> idempotent update operations will only be dropped from materialized KTables? For non-materialized KTables -- as they can occur after optimization of the topology -- we canno

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-25 Thread Richard Yu
Hi John, Sounds goods. It looks like we are close to wrapping things up. If there isn't any other revisions which needs to be made. (If so, please comment in the thread) I will start the voting process this Thursday (Pacific Standard Time). Cheers, Richard On Tue, Feb 25, 2020 at 11:59 AM John R

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-25 Thread John Roesler
Hi Richard, Sorry for the slow reply. I actually think we should avoid checking equals() for now. Your reasoning is good, but the truth is that depending on the implementation of equals() is non-trivial, semantically, and (though I proposed it before), I'm not convinced it's worth the risk. Much b

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-22 Thread Richard Yu
nteresting one and boils down to my statement from >>> above >>> > > > > > > >>> about >>> > > > > > > > "what can we actually implement". What I don't understand >>> is: >>> > > > > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread Richard Yu
not sure about this point. In fact, we have already >> some >> > > > > > > >>> no-ops >> > > > > > > > in Kafka Streams in our join-operators and don't report any >> of >> > > > > > > >>>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread Richard Yu
gt; > >>>>>>>>>> Hi Tommy, > > > > > > > >>>>>>>>>> > > > > > > > >>>>>>>>>> Thanks for the context. I can see the attraction of > > > > > > co

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-21 Thread John Roesler
> > > > >>>>>>>>>> We only care about one attribute from the Info table > > > > > > >>>>>>>>>> (name), > > > > > > >>> and > > > > > > >>>>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-20 Thread Bruno Cadonna
t;>>> > > > > > >>>>>>>>> Ahh yes I see. This works, but in the case where you're > > > > > >>>>>>>>> using > > > > > >>>>>>>>> schemas as we

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-19 Thread John Roesler
ifferent situation, though. Reading between the lines a > > > > >>> little, > > > > >>>>>>>>>> it sounds like: in contrast to the example above, in which we > > > > >>> are > > > > >>>>>&g

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-19 Thread Bruno Cadonna
t; > > >>>>>>>>>> It does seem handy to be able to plug in a custom > > > >>> ChangeDetector > > > >>>>>>>>>> for this purpose, but I worry about the API complexity. Maybe > > > >>> you >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-18 Thread Richard Yu
prior" value at all for a large number of > > >>>>>>>>>> operations. Admitedly, if we extend the proposal to include > > >>> no-op > > >>>>>>>>>> detection for stateless operations, we'd pro

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-18 Thread John Roesler
gt;>> 3. As a final optimization, after serializing and before > >>> sending > >>>>>>>>>> repartition records, compare the serialized data and drop > >>>>>>>>>> no-ops. > >>>>>>>>>> &

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread Matthias J. Sax
; implementation. But if we can leverage equals(), then the >>> "right >>>>>>>>>> thing" happens automatically. >>>>>>>>> >>>>>>>>> I still don't totally follow why the individual components &g

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread Richard Yu
gt; fields), have we not covered our bases? > > >> > > > > > >> > > >> > > >> > > >> This dovetails in with my primary UX concern; where would the > > >> > > >> ChangeDetector actually be registered? None

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-17 Thread John Roesler
ure > >> > > >> that all operations actually preserve the metadata alongside the > >> > > >> data (e.g., don't accidentally add a mapValues like I did, or you > >> > > >> drop the metadata). 2. implement a ChangeDetector for every > >> >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-11 Thread Richard Yu
gt;> same as your current solution. >> > > > >> > > > I definitely see your point regarding configuration. I was >> > > > originally thinking about this when the deduplication was going to >> > > > be opt-in, and it seemed very natura

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-06 Thread Richard Yu
behavior and not having any configuration > > > > options, it's harder to find a place for this. > > > > > > > > > > > > > > > >> A final thought; if it really is a metadata question, can we just > > > >> plan to

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-05 Thread John Roesler
his seems to > > >> give us a way to support your use case without adding to the > > >> mental overhead of using Streams for simple things. > > > > > > Agree headers could be a decent fit for this particular case > > > because it&#x

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-05 Thread Ted Yu
ings should be > >> possible. > >> > >> What are your thoughts? Thanks, -John > >> > >> > >> On Mon, Feb 3, 2020, at 07:19, Thomas Becker wrote: > >> > >> Hi John, Can you describe how you'd use filtering/mapping to > &

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-04 Thread Matthias J. Sax
ore we also suppress updates that only touch those >> fields. >> >> -Tommy >> >> >> On Fri, 2020-01-31 at 19:30 -0600, John Roesler wrote: [EXTERNAL >> EMAIL] Attention: This email was sent from outside TiVo. DO NOT >> CLICK any links or attachments unles

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-04 Thread Thomas Becker
the operation being performed. ________ From: John Roesler mailto:vvcep...@apache.org>> Sent: Friday, January 31, 2020 4:51 PM To: dev@kafka.apache.org<mailto:dev@kafka.apache.org> mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557] Add emit on

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
; > > out-of-order data. Atm, I don't see any issue in particular, but it > > > would be great if everybody could think about out-of-order handling and > > > if/how it affects emit-on-change behavior. Also note, that KIP-280 is > > > allowing a timestamp-based compa

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
cords the skipped > (duplicate) values. > > This way, it is easier to observe the effect when this feature is in > production. > > Cheers > > > > > > -- Forwarded message - > > From: Richard Yu > > Date: Sun, Feb 2, 2020 at 10:2

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread John Roesler
s feel about allowing the mechanism by which no-ops are > > detected to be pluggable? Meaning use something like a hash by default, but > > you could optionally provide an implementation of something to use instead, > > like a ChangeDetector. This could be useful for example to

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-03 Thread Thomas Becker
v@kafka.apache.org> mailto:dev@kafka.apache.org>> Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT CLICK any links or attachments unless you expected them.

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Ted Yu
is feature is in production. Cheers > > -- Forwarded message - > From: Richard Yu > Date: Sun, Feb 2, 2020 at 10:21 AM > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > To: > > > Hi Bruno, > > Thanks for the reply! >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Richard Yu
gt; > > -Matthias > > > > > > > > > On 1/31/20 5:30 PM, John Roesler wrote: > > > > Hi Thomas and yuzhihong, > > > > > > > > That’s an interesting idea. Can you help think of a use case tha

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-02 Thread Bruno Cadonna
gt; > On Fri, Jan 31, 2020, at 18:56, yuzhih...@gmail.com wrote: > > >> I think this is good idea. > > >> > > >>> On Jan 31, 2020, at 4:49 PM, Thomas Becker > > wrote: > > >>> > > >>> How do folks feel about allowing th

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-02-01 Thread Richard Yu
a hash by default, but > you could optionally provide an implementation of something to use instead, > like a ChangeDetector. This could be useful for example to ignore changes > to certain fields, which may not be relevant to the operation being > performed. > >>

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread Matthias J. Sax
is could be useful for example to ignore changes >>> to certain fields, which may not be relevant to the operation being >>> performed. >>> ________________ >>> From: John Roesler >>> Sent: Friday, January 31, 2020 4:51 PM >>> To

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread John Roesler
iday, January 31, 2020 4:51 PM > > To: dev@kafka.apache.org > > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > > > > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT > > CLICK any links or attachments unless you expe

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread yuzhihong
PM > To: dev@kafka.apache.org > Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams > > [EXTERNAL EMAIL] Attention: This email was sent from outside TiVo. DO NOT > CLICK any links or attachments unless you expected them. > >

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread Thomas Becker
certain fields, which may not be relevant to the operation being performed. From: John Roesler Sent: Friday, January 31, 2020 4:51 PM To: dev@kafka.apache.org Subject: Re: [KAFKA-557] Add emit on change support for Kafka Streams [EXTERNAL EMAIL] Attention: This

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-31 Thread John Roesler
Hello all, Sorry for my silence. It seems like we are getting close to consensus. Hopefully, we could move to a vote soon! All of the reasoning from Matthias and Bruno around timestamp is compelling. I would be strongly in favor of stating a few things very clearly in the KIP: 1. Streams will dro

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-27 Thread Richard Yu
Hello to all, I've finished making some initial modifications to the KIP. I have decided to keep the implementation section in the KIP for record-keeping purposes. For now, we should focus on only the proposed behavior changes instead. See if you have any comments! Cheers, Richard On Sat, Jan

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-25 Thread Richard Yu
Hi all, Thanks for all the discussion! @John and @Bruno I will survey other possible systems and see what I can do. Just a question, by systems, I suppose you would mean the pros and cons of different reporting strategies? I'm not completely certain on this point, so it would be great if you can

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Bruno Cadonna
Thank you Matthias for the use cases! Looking at both use cases, I think you need to elaborate on them in the KIP, Richard. Emit from plain KTable: I agree with Matthias that the lower timestamp makes sense because it marks the start of the validity of the record. Idempotent records with a higher

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Matthias J. Sax
IMHO, the question about semantics depends on the use case, in particular on the origin of a KTable. If there is a changlog topic that one reads directly into a KTable, emit-on-change does actually make sense, because the timestamp indicates _when_ the update was _effective_. For this case, it is

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread John Roesler
Hi Bruno, Thanks for that idea. I hadn't considered that option before, and it does seem like that would be the right place to put it if we think it might be semantically important to control on a table-by-table basis. I had been thinking of it less semantically and more practically. In the conte

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread Bruno Cadonna
Hi Richard, Thank you for the KIP. I agree with John that we should focus on the interface and behavior change in a KIP. We can discuss the implementation later. I am also +1 for the survey. I had a thought about this. Couldn't we consider emit-on-change to be one config of suppress (like `unti

Re: [KAFKA-557] Add emit on change support for Kafka Streams

2020-01-24 Thread John Roesler
Hi Richard, Thanks for picking this up! I know of at least one large community member for which this feature is absolutely essential. If I understand your two options, it seems like the proposal is to implement it as a behavior change regardless, and the question is whether to provide an opt-out

[KAFKA-557] Add emit on change support for Kafka Streams

2020-01-10 Thread Richard Yu
Hi everybody! I'd like to propose a change that we probably should've added for a long time now. The key benefit of this KIP would be reduced traffic in Kafka Streams since a lot of no-op results would no longer be sent downstream. Here is the KIP for reference. https://cwiki.apache.org/confluen