The most interesting part is that you've added this: kafka-clients-0.10.2.2.jar Spark 3.0.1 uses Kafka clients 2.4.1. Downgrading with such a big step doesn't help. Please remove that also togrther w/ Spark-Kafka dependency.
G On Thu, 21 Jan 2021, 22:45 gshen, <[email protected]> wrote: > Thanks for the tips! > > I think I figured out what might be causing it. It's the checkpointing to > Microsoft Azure Data Lake Storage (ADLS). > > When I use "local checkpointing" it works, but when i use fails when > there's > a groupBy in the stream. Weirdly it works when there is no groupBy clause > in > the stream. > > It's able to create the checkpoint location and the base files on ADLS, but > it can't write any commits. Hence, why I see the first batch of records, > but > then it crashes on writing the first commit. > > Here's what my checkpointing location looks like: > > abfss://"+container_name+"@"+storage_account_name+".dfs.core.windows.net/ > "+file_name > > and my SparkSession: > > spark = pyspark.sql.SparkSession.builder\ > .appName("pyspark-stream-read")\ > .master("spark://spark-master:7077")\ > .config("spark.executor.memory", "512m")\ > .config("spark.jars.packages", > > "io.delta:delta-core_2.12:0.7.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.hadoop:hadoop-azure:3.2.1") > \ > .config("spark.sql.extensions", > "io.delta.sql.DeltaSparkSessionExtension") \ > .config("spark.sql.catalog.spark_catalog", > "org.apache.spark.sql.delta.catalog.DeltaCatalog") \ > .config("spark.delta.logStore.class", > "org.apache.spark.sql.delta.storage.AzureLogStore") \ > .getOrCreate() > > > > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > >
