Thanks for the KIP. However, I have some doubts that this would work.
In the example, you mention 4 records > r1(2), r2(3), r3(7), and r4(9) and that if r3 and r4 would be filtered, the downstream task would not advance partition time from 3 to 9. However, if r3 and r4 are filtered, no record will be sent downstream at all -- hence, there is no record to which partition-time could be piggy-bagged onto its header. When we sent r2, we don't know anything about r3 and r4 yet. And if there is an r5 that is not filtered, r5 would advance partition time anyway based on its own timestamp, hence no header is required. Also, I am not a big fan of adding headers in general, as they "leak" internal implementation details. Overall, my personal opinion is, that we should change Kafka's message format and allow for "heartbeat" messages, that don't carry any data, but only a timestamp. By default, those messages would not be exposed to an application but would be considered "internal" similar to transaction markers. However, changing the message format is a mayor change and hence, I am not sure if it worth doing at all atm. -Matthias On 5/20/19 7:20 PM, Richard Yu wrote: > Hello, > > I wish to introduce a minor addition present in RecordContext (a public > facing API). This addition works to both provide the user with > more information regarding the processing state of the partition, but also > help resolve a bug which Kafka is currently experiencing. > Here is the KIP Link: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-472%3A+%5BSTREAMS%5D+Add+partition+time+field+to+RecordContext > > Cheers, > Richard Yu >
signature.asc
Description: OpenPGP digital signature