Hi Flink Community, I have a few questions regarding the new KafkaSource and event time, which I wasn't able to answer myself via checking the docs, but please point me to the right pages in case I missed something. I'm not entirely whether my knowledge entirely holds for the new KafkaSource, because it is rather new to me. Please correct me, when I'm making false assumptions.
I make the following assumptions, which you can hopefully confirm. a1) The KafkaSource by default uses the Kafka timestamp to extract watermarks a2) There is a mechanism in place to synchronise the consumption from all Kafka partitions based on the event time / watermarks (important in case we are facing a noticeable consumer lag) Now lets assume the following scenario s1) The Kafka timestamp doesn't contain the event time, but only the timestamp when the record was written to the topic s2) The event timestamp is buried deep in the message payload My question is now what is best practice to extract the event time timestamp? I see the following options to handle this problem. o1) Implement the parsing of the message in the Deserializer and directly extract the event time timestamp. This comes with the drawback putting a complex parsing logic into the Deserializer. o2) Let the producer set the Kafka timestamp to the actual event time. This comes with the following drawbacks: - The Kafka write timestamp is lost - The producer (which is in our case only forwarding messages) needs to extract the timestamp on its own from the message payload o3) Leave the Kafka timestamp as it is and create Watermarks downstream in the pipeline after the parsing. This comes with the following drawbacks: - Scenarios where event time and Kafka timestamp are drifting apart are possible and therefor the synchronisation mechanism described in a2 isn't really guaranteed to work properly. Currently we are using o3 and I'm investigating, whether o1 and o2 are better alternatives, which doesn't involve sacrificing a2. Can you share any best practices here? Is it recommended to always use the Kafka timestamp to hold the event time? Are there any other points I missed, which makes one of the options preferable? Thank you for taking the time. Kind Regards, Niklas
smime.p7s
Description: S/MIME cryptographic signature