I have a keberized HDFS cluster. When I use structured streaming with Kafka
(with SASL_SSL/PLAINTEXT), I believe I’m blocked by Kafka-5294.
It seems like fix version in 0.11.0.0 Kafka client library. I have a Spark 2.3
cluster, and it’s using 0.10.0.1 kafka client library. Do you know if I can
Are Spark Structured Streaming checkpoint files expected to grow over time
indefinitely? Is there a recommended way to safely age-off old checkpoint data?
Currently we have a Spark Structured Streaming process reading from Kafka and
writing to an HDFS sink, with checkpointing enabled and writing
Is it tested whether this fix is backward compatible
(https://issues.apache.org/jira/browse/SPARK-23541) for 2.3.2? I see that fix
version is 2.4.0 in Jira. But quickly reviewing pull request
(https://github.com/apache/spark/pull/20698), it looks like all the code change
is limited to spark-sql