It seem to happening on executor of SC server as I see the error in executor logs. We did verify that there was only one version of iceberg-spark-runtime at the moment. We do include custom catalog imp jar. Though it's a shaded jar I don't see in "org/apache/iceberg/Table" or other iceberg classes when I do "jar -tvf" on that.
I see both jars in 3 spark configs. spark.repl.local.jars, spark.yarn.dist.jars and spark.yarn.secondary.jars. I suspected classloading issue as well as initially error was pointing to it: pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module of loader org.apache.spark.util.*MutableURLClassLoader* @6819e13c; org.apache.iceberg.Table is in unnamed module of loader org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43) Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader error shouldn't be related to that. I still try adding spark flag (--conf "spark.executor.userClassPathFirst=true") when starting spark connect server. it seem both classes gets loaded by same ClassLoader but error still happens: pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module of loader org.apache.spark.util.*ChildFirstURLClassLoader* @a41c33c; org.apache.iceberg.Table is in unnamed module of loader org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) I see ClassLoader @ <some_id> in logs. Are those object Ids? (been awhile working with java) . wondering if multiple instance of same CLassLoader is being initialized by SC. may be doing --verbose:class or heap dump help to verify? On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <b...@tabular.io> wrote: > I think it looks like a version mismatch, perhaps between the SC client > and the server or between where planning occurs and the executors. The > error is that the `SerializableTableWithSize` is not a subclass of `Table`, > but it definitely should be. That sort of problem is usually caused by > class loading issues. Can you double-check that you have only one Iceberg > runtime in the Environment tab of your Spark cluster? > > On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote: > >> PS - issue doesn't happen if we don't use spark-connect and instead just >> use spark-shell or pyspark as OP in github said as well. however stacktrace >> desont seem to point any of the class from spark-connect jar >> (org.apache.spark:spark-connect_2.12:3.5.0). >> >> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote: >> >>> Hi, >>> We are testing spark-connect with iceberg. >>> We tried spark 3.5, iceberg 1.4.x versions (all of >>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >>> >>> with all the 1.4.x jars we are having following issue when running >>> iceberg queries from sparkSession created using spark-connect (--remote >>> "sc://remote-master-node") >>> >>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast >>> to org.apache.iceberg.Table at >>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >>> at >>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >>> at >>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >>> at >>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >>> at >>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >>> at >>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >>> Source) at >>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >>> Source) at >>> >>> Someone else has reported this issue on github as well: >>> https://github.com/apache/iceberg/issues/8978 >>> >>> It's currently working with spark 3.4 and iceberg 1.3 . However Ideally >>> it'd be nice to get it working with spark 3.5 as well as 3.5 has many >>> improvements in spark-connect. >>> >>> Thanks >>> Nirav >>> >> > > -- > Ryan Blue > Tabular >