Hi Martijn, thanks for referencing both related FLIPs and providing a recommendation. That was already helpful. I need to find some time to further investigate this topic. So far I agree that it might be the most reasonable approach to just use the Kafka timestamp to carry the event time.
Thank you very much! Kind Regards, Niklas > On 13. Dec 2022, at 23:53, Martijn Visser <martijnvis...@apache.org> wrote: > > Hi Niklas, > > On your confirmations: > > a1) The default behaviour is documented at > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/#event-time-and-watermarks > - Flink uses the timestamp embedded in the Kafka ConsumerRecord. > a2) I'm not 100% sure: since Flink 1.15, there's FLIP-182 [1] support added. > However, if you have a situation where a single source operator reads from > multiple partitions, you can still run into issues. That's why FLIP-217 [2] > has been created, but that will only be released with Flink 1.17 > > I think answering your question ultimately depends on your business > requirements, which I can't extract from your email. My assumption is that > your business logic requires you to act on the event time, but then I think > it also depends on how late and how out of order your date could be and how > quickly you need to make a decision. If you need to act on actual event time, > I think it makes the most sense that your data producers actually set the > event time as the value in the ConsumerRecord. > > Best regards, > > Martijn > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217%3A+Support+watermark+alignment+of+source+splits > > On Thu, Dec 8, 2022 at 6:21 PM Niklas Wilcke <niklas.wil...@uniberg.com > <mailto:niklas.wil...@uniberg.com>> wrote: >> Hi Flink Community, >> >> I have a few questions regarding the new KafkaSource and event time, which I >> wasn't able to answer myself via checking the docs, but please point me to >> the right pages in case I missed something. I'm not entirely whether my >> knowledge entirely holds for the new KafkaSource, because it is rather new >> to me. Please correct me, when I'm making false assumptions. >> >> I make the following assumptions, which you can hopefully confirm. >> >> a1) The KafkaSource by default uses the Kafka timestamp to extract watermarks >> a2) There is a mechanism in place to synchronise the consumption from all >> Kafka partitions based on the event time / watermarks (important in case we >> are facing a noticeable consumer lag) >> >> Now lets assume the following scenario >> >> s1) The Kafka timestamp doesn't contain the event time, but only the >> timestamp when the record was written to the topic >> s2) The event timestamp is buried deep in the message payload >> >> My question is now what is best practice to extract the event time >> timestamp? I see the following options to handle this problem. >> >> o1) Implement the parsing of the message in the Deserializer and directly >> extract the event time timestamp. This comes with the drawback putting a >> complex parsing logic into the Deserializer. >> >> o2) Let the producer set the Kafka timestamp to the actual event time. This >> comes with the following drawbacks: >> - The Kafka write timestamp is lost >> - The producer (which is in our case only forwarding messages) needs to >> extract the timestamp on its own from the message payload >> >> o3) Leave the Kafka timestamp as it is and create Watermarks downstream in >> the pipeline after the parsing. This comes with the following drawbacks: >> - Scenarios where event time and Kafka timestamp are drifting apart are >> possible and therefor the synchronisation mechanism described in a2 isn't >> really guaranteed to work properly. >> >> Currently we are using o3 and I'm investigating, whether o1 and o2 are >> better alternatives, which doesn't involve sacrificing a2. Can you share any >> best practices here? Is it recommended to always use the Kafka timestamp to >> hold the event time? Are there any other points I missed, which makes one of >> the options preferable? >> >> Thank you for taking the time. >> >> Kind Regards, >> Niklas
smime.p7s
Description: S/MIME cryptographic signature