Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-09 Thread Guozhang Wang
Thanks Matthias, will update the KIP accordingly. On Thu, Aug 9, 2018 at 11:26 AM, Matthias J. Sax wrote: > @Guozhang, I think you can start the VOTE for this KIP? I don't have any > further comments. > > One more nit: we should explicitly state, that the new config is > wall-clock time based. >

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-09 Thread John Roesler
I also have no comments. The KIP looks good to me. -John On Thu, Aug 9, 2018 at 1:26 PM Matthias J. Sax wrote: > @Guozhang, I think you can start the VOTE for this KIP? I don't have any > further comments. > > One more nit: we should explicitly state, that the new config is > wall-clock time bas

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-09 Thread Bill Bejeck
@Guozhang, I've read the KIP and I don't have any further comments in addition to what's already been discussed. Thanks, Bill On Thu, Aug 9, 2018 at 2:26 PM Matthias J. Sax wrote: > @Guozhang, I think you can start the VOTE for this KIP? I don't have any > further comments. > > One more nit: we

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-09 Thread Matthias J. Sax
@Guozhang, I think you can start the VOTE for this KIP? I don't have any further comments. One more nit: we should explicitly state, that the new config is wall-clock time based. -Matthias On 8/7/18 12:59 PM, Matthias J. Sax wrote: > Correct. It's not about reordering. Records will still be pr

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Matthias J. Sax
Correct. It's not about reordering. Records will still be processed in offset-order per partition. For multi-partition task (like joins), we use the timestamp of the "head" record of each partition to determine which record to process first (to process records across partitions in timestamp order

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Thomas Becker
In typing up a scenario to illustrate my question, I think I found the answer ;) We are not assuming timestamps will be strictly increasing within a topic and trying to make processing order deterministic even in the face of that. Thanks for making me think about it (or please correct me if I'm

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Matthias J. Sax
@Thomas, just to rephrase (from my understanding): > So in the scenario you describe, where one topic has >>> vastly lower throughput, you're saying that when the lower throughput topic >>> is fully caught up (no messages in the buffer), the task will idle rather >>> than using the timestamp of th

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Guozhang Wang
@Tommy Yes that's the intent. Again note that the current behavior is indeed "just using the timestamp of the last message I saw", and continue processing what's in the buffer from other streams, but this may introduce out-of-ordering. Guozhang On Tue, Aug 7, 2018 at 9:59 AM, Thomas Becker wr

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Thomas Becker
Thanks Guozhang. So in the scenario you describe, where one topic has vastly lower throughput, you're saying that when the lower throughput topic is fully caught up (no messages in the buffer), the task will idle rather than using the timestamp of the last message it saw from that topic? Initial

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Guozhang Wang
@Ted Yes, I will update the KIP mentioning this as a separate consideration. @Thomas The idle period may be happening during the processing as well. Think: if you are joining two streams with very different throughput traffic, say for an extreme case, one stream comes in as 100K messages / sec,

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-07 Thread Thomas Becker
This looks like a big step in the right direction IMO. So am I correct in assuming this idle period would only come into play after startup when waiting for initial records to be fetched? In other words, once we have seen records from all topics and have established the stream time processing wi

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-03 Thread Ted Yu
Guozhang: I agree. Probably note this on your KIP. Thanks On Fri, Aug 3, 2018 at 6:08 PM Guozhang Wang wrote: > Hello Ted, > > I think dynamic configuration itself would worth an independent KIP, if you > meant to allow users changing the config on-the-fly while not bouncing the > instance. >

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-03 Thread Guozhang Wang
Hello Ted, I think dynamic configuration itself would worth an independent KIP, if you meant to allow users changing the config on-the-fly while not bouncing the instance. Guozhang On Fri, Aug 3, 2018 at 3:33 PM, Ted Yu wrote: > Guozhang: > > Do you plan to support dynamic config for the new

Re: [DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-03 Thread Ted Yu
Guozhang: Do you plan to support dynamic config for the new config entry ? Cheers On Fri, Aug 3, 2018 at 2:00 PM Guozhang Wang wrote: > Hello all, > > I would like to kick off a discussion on the following KIP, to allow users > control when a task can be processed based on its buffered records

[DISCUSS] KIP-353: Allow Users to Configure Kafka Streams Timestamp Synchronization

2018-08-03 Thread Guozhang Wang
Hello all, I would like to kick off a discussion on the following KIP, to allow users control when a task can be processed based on its buffered records, and how the stream time of a task be advanced. https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Syn