Thanks for the KIP.

However, I have some doubts that this would work.

In the example, you mention 4 records

> r1(2), r2(3), r3(7), and r4(9)

and that if r3 and r4 would be filtered, the downstream task would not
advance partition time from 3 to 9.

However, if r3 and r4 are filtered, no record will be sent downstream at
all -- hence, there is no record to which partition-time could be
piggy-bagged onto its header.

When we sent r2, we don't know anything about r3 and r4 yet. And if
there is an r5 that is not filtered, r5 would advance partition time
anyway based on its own timestamp, hence no header is required.

Also, I am not a big fan of adding headers in general, as they "leak"
internal implementation details.


Overall, my personal opinion is, that we should change Kafka's message
format and allow for "heartbeat" messages, that don't carry any data,
but only a timestamp. By default, those messages would not be exposed to
an application but would be considered "internal" similar to transaction
markers. However, changing the message format is a mayor change and
hence, I am not sure if it worth doing at all atm.


-Matthias



On 5/20/19 7:20 PM, Richard Yu wrote:
> Hello,
> 
> I wish to introduce a minor addition present in RecordContext (a public
> facing API). This addition works to both provide the user with
> more information regarding the processing state of the partition, but also
> help resolve a bug which Kafka is currently experiencing.
> Here is the KIP Link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-472%3A+%5BSTREAMS%5D+Add+partition+time+field+to+RecordContext
> 
> Cheers,
> Richard Yu
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to