Spark 2.3 and Kafka client library version

2020-04-28 Thread Ahn, Daniel
I have a keberized HDFS cluster. When I use structured streaming with Kafka (with SASL_SSL/PLAINTEXT), I believe I’m blocked by Kafka-5294. It seems like fix version in 0.11.0.0 Kafka client library. I have a Spark 2.3 cluster, and it’s using 0.10.0.1 kafka client library. Do you know if I can

[Structured Streaming] Checkpoint file compact file grows big

2020-04-15 Thread Ahn, Daniel
Are Spark Structured Streaming checkpoint files expected to grow over time indefinitely? Is there a recommended way to safely age-off old checkpoint data? Currently we have a Spark Structured Streaming process reading from Kafka and writing to an HDFS sink, with checkpointing enabled and writing

[Spark SS] Spark-23541 Backward Compatibility on 2.3.2

2019-09-26 Thread Ahn, Daniel
Is it tested whether this fix is backward compatible (https://issues.apache.org/jira/browse/SPARK-23541) for 2.3.2? I see that fix version is 2.4.0 in Jira. But quickly reviewing pull request (https://github.com/apache/spark/pull/20698), it looks like all the code change is limited to spark-sql