Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Ryan Blue Fri, 12 Jan 2024 16:38:04 -0800

I think it looks like a version mismatch, perhaps between the SC client and
the server or between where planning occurs and the executors. The error is
that the `SerializableTableWithSize` is not a subclass of `Table`, but it
definitely should be. That sort of problem is usually caused by class
loading issues. Can you double-check that you have only one Iceberg runtime
in the Environment tab of your Spark cluster?


On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote:

> PS - issue doesn't happen if we don't use spark-connect and instead just
> use spark-shell or pyspark as OP in github said as well. however stacktrace
> desont seem to point any of the class from spark-connect jar
> (org.apache.spark:spark-connect_2.12:3.5.0).
>
> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote:
>
>> Hi,
>> We are testing spark-connect with iceberg.
>> We tried spark 3.5, iceberg 1.4.x versions (all of
>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar)
>>
>> with all the 1.4.x jars we are having following issue when running
>> iceberg queries from sparkSession created using spark-connect (--remote
>> "sc://remote-master-node")
>>
>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast
>> to org.apache.iceberg.Table at
>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>> at
>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50)
>> at
>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52)
>> at
>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79)
>> at
>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>> at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>> Source) at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown
>> Source) at
>>
>> Someone else has reported this issue on github as well:
>> https://github.com/apache/iceberg/issues/8978
>>
>> It's currently working with spark 3.4 and iceberg 1.3 . However Ideally
>> it'd be nice to get it working with spark 3.5 as well as 3.5 has many
>> improvements in spark-connect.
>>
>> Thanks
>> Nirav
>>
>

-- 
Ryan Blue
Tabular

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Reply via email to