Hi Ryan, I updated the spark-jira I opened with more information I found after taking heapdump:
https://issues.apache.org/jira/browse/SPARK-46762 class `org.apache.iceberg.Table` is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader . Issue doesn't happen with spark3.4 and iceberg 1.3 as I mentioned in ticket. do you think it's still a spark-connect issue ? I noticed there's a slightly bigger migratory changes in iceberg repo going from 1.3 to 1.4 in order to support spark3.5 . DO you think something might have gotten missed there? Thanks Nirav On Thu, Jan 18, 2024 at 9:46 AM Nirav Patel <nira...@gmail.com> wrote: > Classloading does seem like an issue while using it with Spark Connect 3.5 > and iceberg >= 1.4 version only though. > > It's weird as I also mentioned in previous email that after adding spark > property (spark.executor.userClassPathFirst=true) both classes gets loaded > from same classloader - org.apache.spark.util.ChildFirstURLClassLoader. Not > sure why error would still happen. > > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table (org.apache.iceberg.spark.source. > *SerializableTableWithSize* is in unnamed module of loader > org.apache.spark.util.*ChildFirstURLClassLoader* @a41c33c; > org.apache.iceberg.*Table* is in unnamed module of loader > org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) > > > On Tue, Jan 16, 2024 at 12:53 PM Ryan Blue <b...@tabular.io> wrote: > >> It looks to me like the classloader is the problem. The "child first" >> classloader is apparently loading `Table`, but Spark is loading >> `SerializableTableWithSize` from the parent classloader. Because delegation >> isn't happening properly, you're getting two incompatible classes from the >> same classpath, depending on where a class was loaded for the first time. >> >> On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <nira...@gmail.com> wrote: >> >>> It seem to happening on executor of SC server as I see the error in >>> executor logs. We did verify that there was only one version of >>> iceberg-spark-runtime at the moment. >>> We do include custom catalog imp jar. Though it's a shaded jar I don't >>> see in "org/apache/iceberg/Table" or other iceberg classes when I do "jar >>> -tvf" on that. >>> >>> I see both jars in 3 spark >>> configs. spark.repl.local.jars, spark.yarn.dist.jars >>> and spark.yarn.secondary.jars. >>> >>> I suspected classloading issue as well as initially error was pointing >>> to it: >>> >>> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >>> java.lang.ClassCastException: class >>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to >>> class org.apache.iceberg.Table >>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >>> module of loader org.apache.spark.util.*MutableURLClassLoader* >>> @6819e13c; org.apache.iceberg.Table is in unnamed module of loader >>> org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43) >>> >>> Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader >>> error shouldn't be related to that. I still try adding spark flag (--conf >>> "spark.executor.userClassPathFirst=true") when starting spark connect >>> server. it seem both classes gets loaded by same ClassLoader but error >>> still happens: >>> >>> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >>> java.lang.ClassCastException: class >>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to >>> class org.apache.iceberg.Table >>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >>> module of loader org.apache.spark.util.*ChildFirstURLClassLoader* >>> @a41c33c; org.apache.iceberg.Table is in unnamed module of loader >>> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) >>> >>> I see ClassLoader @ <some_id> in logs. Are those object Ids? (been >>> awhile working with java) . wondering if multiple instance of same >>> CLassLoader is being initialized by SC. may be doing --verbose:class or >>> heap dump help to verify? >>> >>> >>> On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <b...@tabular.io> wrote: >>> >>>> I think it looks like a version mismatch, perhaps between the SC client >>>> and the server or between where planning occurs and the executors. The >>>> error is that the `SerializableTableWithSize` is not a subclass of `Table`, >>>> but it definitely should be. That sort of problem is usually caused by >>>> class loading issues. Can you double-check that you have only one Iceberg >>>> runtime in the Environment tab of your Spark cluster? >>>> >>>> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote: >>>> >>>>> PS - issue doesn't happen if we don't use spark-connect and instead >>>>> just use spark-shell or pyspark as OP in github said as well. however >>>>> stacktrace desont seem to point any of the class from spark-connect jar >>>>> (org.apache.spark:spark-connect_2.12:3.5.0). >>>>> >>>>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> We are testing spark-connect with iceberg. >>>>>> We tried spark 3.5, iceberg 1.4.x versions (all of >>>>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >>>>>> >>>>>> with all the 1.4.x jars we are having following issue when running >>>>>> iceberg queries from sparkSession created using spark-connect (--remote >>>>>> "sc://remote-master-node") >>>>>> >>>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be >>>>>> cast to org.apache.iceberg.Table at >>>>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >>>>>> at >>>>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >>>>>> at >>>>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >>>>>> at >>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >>>>>> at >>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >>>>>> at >>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >>>>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >>>>>> Source) at >>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >>>>>> Source) at >>>>>> >>>>>> Someone else has reported this issue on github as well: >>>>>> https://github.com/apache/iceberg/issues/8978 >>>>>> >>>>>> It's currently working with spark 3.4 and iceberg 1.3 . However >>>>>> Ideally it'd be nice to get it working with spark 3.5 as well as 3.5 has >>>>>> many improvements in spark-connect. >>>>>> >>>>>> Thanks >>>>>> Nirav >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> >