I think it looks like a version mismatch, perhaps between the SC client and the server or between where planning occurs and the executors. The error is that the `SerializableTableWithSize` is not a subclass of `Table`, but it definitely should be. That sort of problem is usually caused by class loading issues. Can you double-check that you have only one Iceberg runtime in the Environment tab of your Spark cluster?
On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote: > PS - issue doesn't happen if we don't use spark-connect and instead just > use spark-shell or pyspark as OP in github said as well. however stacktrace > desont seem to point any of the class from spark-connect jar > (org.apache.spark:spark-connect_2.12:3.5.0). > > On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote: > >> Hi, >> We are testing spark-connect with iceberg. >> We tried spark 3.5, iceberg 1.4.x versions (all of >> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >> >> with all the 1.4.x jars we are having following issue when running >> iceberg queries from sparkSession created using spark-connect (--remote >> "sc://remote-master-node") >> >> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast >> to org.apache.iceberg.Table at >> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >> at >> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >> at >> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >> at >> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >> at >> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >> at >> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >> Source) at >> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >> Source) at >> >> Someone else has reported this issue on github as well: >> https://github.com/apache/iceberg/issues/8978 >> >> It's currently working with spark 3.4 and iceberg 1.3 . However Ideally >> it'd be nice to get it working with spark 3.5 as well as 3.5 has many >> improvements in spark-connect. >> >> Thanks >> Nirav >> > -- Ryan Blue Tabular