Hello,

I'm trying to run a simple Python client against a spark connect server running in Kubernetes as a proof-of-concept. The client writes a couple of records to a local Iceberg table. The Iceberg runtime is provisioned using "--packages" argument to the "start-connect-server.sh" and I see in the logs that the package is downloaded and added to the classpath of both the driver and executor successfully.

Nevertheless the job fails because the class "org.apache.iceberg.spark.source.SparkWrite$WriterFactory" cannot be found.

I'm running Spark 3.5.5 with Java 17. The connect package is already provisioned in my Spark image and I use the following Spark properties:

# ---

spark.driver.defaultJavaOptions=-Djava.security.properties\=/stackable/spark/conf/security.properties\ -Dlog4j.configurationFile\=/stackable/log_config/log4j2.properties\ -Dmy.custom.jvm.arg\=customValue
spark.driver.extraClassPath=/tmp/ivy2/jars/*\:/stackable/spark/extra-jars/*\:/stackable/spark/connect/spark-connect_2.12-3.5.5.jar
    spark.driver.host=spark-connect-server
spark.executor.defaultJavaOptions=-Djava.security.properties\=/stackable/spark/conf/security.properties\ -Dlog4j.configurationFile\=/stackable/log_config/log4j2.properties
    spark.executor.instances=1
    spark.executor.memory=1024M
    spark.executor.memoryOverhead=1m
    spark.jars.ivy=/tmp/ivy2
spark.kubernetes.authenticate.driver.serviceAccountName=spark-connect-serviceaccount
spark.kubernetes.driver.container.image=oci.stackable.tech/sdp/spark-k8s\:3.5.5-stackable0.0.0-dev
    spark.kubernetes.driver.pod.name=${env\:HOSTNAME}
spark.kubernetes.executor.container.image=oci.stackable.tech/sdp/spark-k8s\:3.5.5-stackable0.0.0-dev
    spark.kubernetes.executor.limit.cores=1
    spark.kubernetes.executor.podTemplateContainerName=spark
spark.kubernetes.executor.podTemplateFile=/stackable/spark/conf/template.yaml
    spark.kubernetes.executor.request.cores=1
    spark.kubernetes.namespace=kuttl-test-renewed-redfish
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.local.type=hadoop
    spark.sql.catalog.local.warehouse=/tmp/warehouse
    spark.sql.defaultCatalog=local
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 # ---

The connect server  is started with:

# ---
start-connect-server.sh --deploy-mode client \
--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS} \
--properties-file /stackable/spark/conf/spark-defaults.conf  \
--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.1
# ---

Am I missing something ? Would be grateful for any hints.

Thanks.


PS: the full stack trace on the driver:

# ---
2025-04-06T10:16:14,103 WARN [task-result-getter-0] org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0) (10.42.0.2 executor 1): java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory         at org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:124)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:467)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:71)         at java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2034)         at java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1898)         at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2224)         at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)         at java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2157)         at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1721)         at java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606)         at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457)         at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)         at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)         at java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606)         at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457)         at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)         at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)         at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:509)         at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:467)         at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)         at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)         at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)         at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)         at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory
        at java.base/java.lang.ClassLoader.findClass(ClassLoader.java:723)
        at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:592)
        at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
        at org.apache.spark.executor.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:109)
        ... 34 more

# ---



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to