There is no automated way to do this today, but you are on the right track
for a hack. If you delete both the entries in _spark_metadata and the
corresponding entries from the checkpoint/offesets of the streaming query,
it will reprocess the corresponding section of the Kafka stream.
On Wed, Sep
Hi,
We are using StructuredStreaming (Spark 2.2.0) for processing data from
Kafka. We read from a Kafka topic, do some conversions, computation and
then use FileSink to store data to partitioned path in HDFS. We have
enabled checkpoint (using a dir in HDFS).
For cases when there is a bad code pus