I would like to request a feature for reading data from Kafka Source based on a timestamp. So that if the application needs to process data from a certain time, it should be able to do it. I do agree, that there is checkpoint which gives us a continuation of stream process but what if I want to rewind the checkpoints. According to Spark experts, its not advised to edit checkpoints and finding the right offsets to replay Spark is tricky but replaying from a certain timestamp is a lot easier, atleast with a decent monitoring system.( the time from where things started to fall apart like a buggy push or a bad setting change)
The Kafka consumer APIs support this method OffsetForTimes which can easily give the right offsets, https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/consumer/Consumer.html#offsetsForTimes Similar to the StartingOffsets and EndingOffsets, it can support startTimestamp and endTimeStamp In a SAAS environment, when continuous data keeps flowing, these small tweaks can help us repair our systems. Spark Structured Streaming is already great but features like these will keep things under control in a live production processing environment. Cheers, Puneet -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org