There are two kinds of checkpointing going on here - metadata and data. The
100 second that you have configured is the data checkpointing (expensive,
large data) where the RDD data is being written to HDFS. The 10 second one
is the metadata checkpoint (cheap, small data) where the metadata of the
q
I'm running into a weird issue with a stateful streaming job I'm running.
(Spark 2.1.0 reading from kafka 0-10 input stream.)
>From what I understand from the docs, by default the checkpoint interval
for stateful streaming is 10 * batchInterval. Since I'm running a batch
interval of 10 seconds, I