Deleting the latest .compact file would lose the ability for exactly-once
and lead Spark fail to read from the output directory. If you're reading
the output directory from non-Spark then metadata on output directory
doesn't matter, but there's no exactly-once (exactly-once is achieved
leveraging t
SEE:http://spark.apache.org/docs/2.3.1/streaming-programming-guide.html#checkpointing
"Note that checkpointing of RDDs incurs the cost of saving to reliable storage.
This may cause an increase in the processing time of those batches where RDDs
get checkpointed."
As far as I know, the officia
Are Spark Structured Streaming checkpoint files expected to grow over time
indefinitely? Is there a recommended way to safely age-off old checkpoint data?
Currently we have a Spark Structured Streaming process reading from Kafka and
writing to an HDFS sink, with checkpointing enabled and writing