Hi Martijn,

thanks for referencing both related FLIPs and providing a recommendation. That 
was already helpful. I need to find some time to further investigate this 
topic. So far I agree that it might be the most reasonable approach to just use 
the Kafka timestamp to carry the event time.

Thank you very much!

Kind Regards,
Niklas

> On 13. Dec 2022, at 23:53, Martijn Visser <martijnvis...@apache.org> wrote:
> 
> Hi Niklas,
> 
> On your confirmations:
> 
> a1) The default behaviour is documented at 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#event-time-and-watermarks
>  - Flink uses the timestamp embedded in the Kafka ConsumerRecord. 
> a2) I'm not 100% sure: since Flink 1.15, there's FLIP-182 [1] support added. 
> However, if you have a situation where a single source operator reads from 
> multiple partitions, you can still run into issues. That's why FLIP-217 [2] 
> has been created, but that will only be released with Flink 1.17
> 
> I think answering your question ultimately depends on your business 
> requirements, which I can't extract from your email. My assumption is that 
> your business logic requires you to act on the event time, but then I think 
> it also depends on how late and how out of order your date could be and how 
> quickly you need to make a decision. If you need to act on actual event time, 
> I think it makes the most sense that your data producers actually set the 
> event time as the value in the ConsumerRecord.
> 
> Best regards,
> 
> Martijn
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
> [2] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits
> 
> On Thu, Dec 8, 2022 at 6:21 PM Niklas Wilcke <niklas.wil...@uniberg.com 
> <mailto:niklas.wil...@uniberg.com>> wrote:
>> Hi Flink Community,
>> 
>> I have a few questions regarding the new KafkaSource and event time, which I 
>> wasn't able to answer myself via checking the docs, but please point me to 
>> the right pages in case I missed something. I'm not entirely whether my 
>> knowledge entirely holds for the new KafkaSource, because it is rather new 
>> to me. Please correct me, when I'm making false assumptions.
>> 
>> I make the following assumptions, which you can hopefully confirm.
>> 
>> a1) The KafkaSource by default uses the Kafka timestamp to extract watermarks
>> a2) There is a mechanism in place to synchronise the consumption from all 
>> Kafka partitions based on the event time / watermarks (important in case we 
>> are facing a noticeable consumer lag)
>> 
>> Now lets assume the following scenario
>> 
>> s1) The Kafka timestamp doesn't contain the event time, but only the 
>> timestamp when the record was written to the topic
>> s2) The event timestamp is buried deep in the message payload
>> 
>> My question is now what is best practice to extract the event time 
>> timestamp? I see the following options to handle this problem.
>> 
>> o1) Implement the parsing of the message in the Deserializer and directly 
>> extract the event time timestamp. This comes with the drawback putting a 
>> complex parsing logic into the Deserializer.
>> 
>> o2) Let the producer set the Kafka timestamp to the actual event time. This 
>> comes with the following drawbacks:
>>   - The Kafka write timestamp is lost
>>   - The producer (which is in our case only forwarding messages) needs to 
>> extract the timestamp on its own from the message payload
>> 
>> o3) Leave the Kafka timestamp as it is and create Watermarks downstream in 
>> the pipeline after the parsing. This comes with the following drawbacks:
>>   - Scenarios where event time and Kafka timestamp are drifting apart are 
>> possible and therefor the synchronisation mechanism described in a2 isn't 
>> really guaranteed to work properly.
>> 
>> Currently we are using o3 and I'm investigating, whether o1 and o2 are 
>> better alternatives, which doesn't involve sacrificing a2. Can you share any 
>> best practices here? Is it recommended to always use the Kafka timestamp to 
>> hold the event time? Are there any other points I missed, which makes one of 
>> the options preferable?
>> 
>> Thank you for taking the time.
>> 
>> Kind Regards,
>> Niklas

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to