Hello,
I'm trying to run a simple Python client against a spark connect server
running in Kubernetes as a proof-of-concept. The client writes a couple
of records to a local Iceberg table. The Iceberg runtime is provisioned
using "--packages" argument to the "start-connect-server.sh" and I see
in the logs that the package is downloaded and added to the classpath of
both the driver and executor successfully.
Nevertheless the job fails because the class
"org.apache.iceberg.spark.source.SparkWrite$WriterFactory" cannot be found.
I'm running Spark 3.5.5 with Java 17. The connect package is already
provisioned in my Spark image and I use the following Spark properties:
# ---
spark.driver.defaultJavaOptions=-Djava.security.properties\=/stackable/spark/conf/security.properties\
-Dlog4j.configurationFile\=/stackable/log_config/log4j2.properties\
-Dmy.custom.jvm.arg\=customValue
spark.driver.extraClassPath=/tmp/ivy2/jars/*\:/stackable/spark/extra-jars/*\:/stackable/spark/connect/spark-connect_2.12-3.5.5.jar
spark.driver.host=spark-connect-server
spark.executor.defaultJavaOptions=-Djava.security.properties\=/stackable/spark/conf/security.properties\
-Dlog4j.configurationFile\=/stackable/log_config/log4j2.properties
spark.executor.instances=1
spark.executor.memory=1024M
spark.executor.memoryOverhead=1m
spark.jars.ivy=/tmp/ivy2
spark.kubernetes.authenticate.driver.serviceAccountName=spark-connect-serviceaccount
spark.kubernetes.driver.container.image=oci.stackable.tech/sdp/spark-k8s\:3.5.5-stackable0.0.0-dev
spark.kubernetes.driver.pod.name=${env\:HOSTNAME}
spark.kubernetes.executor.container.image=oci.stackable.tech/sdp/spark-k8s\:3.5.5-stackable0.0.0-dev
spark.kubernetes.executor.limit.cores=1
spark.kubernetes.executor.podTemplateContainerName=spark
spark.kubernetes.executor.podTemplateFile=/stackable/spark/conf/template.yaml
spark.kubernetes.executor.request.cores=1
spark.kubernetes.namespace=kuttl-test-renewed-redfish
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=/tmp/warehouse
spark.sql.defaultCatalog=local
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
# ---
The connect server is started with:
# ---
start-connect-server.sh --deploy-mode client \
--master
k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS} \
--properties-file /stackable/spark/conf/spark-defaults.conf \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1
# ---
Am I missing something ? Would be grateful for any hints.
Thanks.
PS: the full stack trace on the driver:
# ---
2025-04-06T10:16:14,103 WARN [task-result-getter-0]
org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0
(TID 0) (10.42.0.2 executor 1): java.lang.ClassNotFoundException:
org.apache.iceberg.spark.source.SparkWrite$WriterFactory
at
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:124)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71)
at
java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2034)
at
java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1898)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2224)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)
at
java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2157)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1721)
at
java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606)
at
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)
at
java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606)
at
java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457)
at
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)
at
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:509)
at
java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:467)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException:
org.apache.iceberg.spark.source.SparkWrite$WriterFactory
at java.base/java.lang.ClassLoader.findClass(ClassLoader.java:723)
at
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at
org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
... 34 more
# ---
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org