Thanks Matthias, will update the KIP accordingly.
On Thu, Aug 9, 2018 at 11:26 AM, Matthias J. Sax
wrote:
> @Guozhang, I think you can start the VOTE for this KIP? I don't have any
> further comments.
>
> One more nit: we should explicitly state, that the new config is
> wall-clock time based.
>
I also have no comments. The KIP looks good to me.
-John
On Thu, Aug 9, 2018 at 1:26 PM Matthias J. Sax
wrote:
> @Guozhang, I think you can start the VOTE for this KIP? I don't have any
> further comments.
>
> One more nit: we should explicitly state, that the new config is
> wall-clock time bas
@Guozhang, I've read the KIP and I don't have any further comments in
addition to what's already been discussed.
Thanks,
Bill
On Thu, Aug 9, 2018 at 2:26 PM Matthias J. Sax
wrote:
> @Guozhang, I think you can start the VOTE for this KIP? I don't have any
> further comments.
>
> One more nit: we
@Guozhang, I think you can start the VOTE for this KIP? I don't have any
further comments.
One more nit: we should explicitly state, that the new config is
wall-clock time based.
-Matthias
On 8/7/18 12:59 PM, Matthias J. Sax wrote:
> Correct. It's not about reordering. Records will still be pr
Correct. It's not about reordering. Records will still be processed in
offset-order per partition.
For multi-partition task (like joins), we use the timestamp of the
"head" record of each partition to determine which record to process
first (to process records across partitions in timestamp order
In typing up a scenario to illustrate my question, I think I found the answer
;) We are not assuming timestamps will be strictly increasing within a topic
and trying to make processing order deterministic even in the face of that.
Thanks for making me think about it (or please correct me if I'm
@Thomas, just to rephrase (from my understanding):
> So in the scenario you describe, where one topic has
>>> vastly lower throughput, you're saying that when the lower throughput topic
>>> is fully caught up (no messages in the buffer), the task will idle rather
>>> than using the timestamp of th
@Tommy
Yes that's the intent. Again note that the current behavior is indeed "just
using the timestamp of the last message I saw", and continue processing
what's in the buffer from other streams, but this may introduce
out-of-ordering.
Guozhang
On Tue, Aug 7, 2018 at 9:59 AM, Thomas Becker
wr
Thanks Guozhang. So in the scenario you describe, where one topic has vastly
lower throughput, you're saying that when the lower throughput topic is fully
caught up (no messages in the buffer), the task will idle rather than using the
timestamp of the last message it saw from that topic? Initial
@Ted
Yes, I will update the KIP mentioning this as a separate consideration.
@Thomas
The idle period may be happening during the processing as well. Think: if
you are joining two streams with very different throughput traffic, say for
an extreme case, one stream comes in as 100K messages / sec,
This looks like a big step in the right direction IMO. So am I correct in
assuming this idle period would only come into play after startup when waiting
for initial records to be fetched? In other words, once we have seen records
from all topics and have established the stream time processing wi
Guozhang:
I agree.
Probably note this on your KIP.
Thanks
On Fri, Aug 3, 2018 at 6:08 PM Guozhang Wang wrote:
> Hello Ted,
>
> I think dynamic configuration itself would worth an independent KIP, if you
> meant to allow users changing the config on-the-fly while not bouncing the
> instance.
>
Hello Ted,
I think dynamic configuration itself would worth an independent KIP, if you
meant to allow users changing the config on-the-fly while not bouncing the
instance.
Guozhang
On Fri, Aug 3, 2018 at 3:33 PM, Ted Yu wrote:
> Guozhang:
>
> Do you plan to support dynamic config for the new
Guozhang:
Do you plan to support dynamic config for the new config entry ?
Cheers
On Fri, Aug 3, 2018 at 2:00 PM Guozhang Wang wrote:
> Hello all,
>
> I would like to kick off a discussion on the following KIP, to allow users
> control when a task can be processed based on its buffered records
Hello all,
I would like to kick off a discussion on the following KIP, to allow users
control when a task can be processed based on its buffered records, and how
the stream time of a task be advanced.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Syn
15 matches
Mail list logo