I would like to request a feature for reading data from Kafka Source based on
a timestamp. So that if the application needs to process data from a certain
time, it should be able to do it. I do agree, that there is checkpoint which
gives us a continuation of stream process but what if I want to rewind the
checkpoints.
According to Spark experts, its not advised to edit checkpoints and finding
the right offsets to replay Spark is tricky but replaying from a certain
timestamp is a lot easier, atleast with a decent monitoring system.( the
time from where things started to fall apart like a buggy push or a bad
setting change)

The Kafka consumer APIs support this method OffsetForTimes which can easily
give the right offsets,

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/Consumer.html#offsetsForTimes

Similar to the StartingOffsets and EndingOffsets, it can support
startTimestamp and endTimeStamp

In a SAAS environment, when continuous data keeps flowing, these small
tweaks can help us repair our systems. Spark Structured Streaming is already
great but features like these will keep things under control in a live
production processing environment.

Cheers,
Puneet



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to