Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-02-28 Thread Nirav Patel
Thanks for sharing those issues. it does seem related to me based on similar test case failures they had internally. i could try to drop iceberg-runtime in jars dir of spark and see if that help avoid this as it seems classloading issue comes up when loading using --jars args with spark-connect On

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-02-23 Thread Eduard Tudenhoefner
I wonder if this is somewhat related to https://github.com/apache/spark/commit/6d0fed9a18ff87e73fdf1ee46b6b0d2df8dd5a1b / SPARK-43744 , which appears to have fixed similar issues that you were experiencing for Spark 3.5, but maybe some other place

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-02-22 Thread Nirav Patel
Hi Ryan, I updated the spark-jira I opened with more information I found after taking heapdump: https://issues.apache.org/jira/browse/SPARK-46762 class `org.apache.iceberg.Table` is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader . Issue doesn't happen with spa

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-01-18 Thread Nirav Patel
Classloading does seem like an issue while using it with Spark Connect 3.5 and iceberg >= 1.4 version only though. It's weird as I also mentioned in previous email that after adding spark property (spark.executor.userClassPathFirst=true) both classes gets loaded from same classloader - org.apache.

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-01-16 Thread Ryan Blue
It looks to me like the classloader is the problem. The "child first" classloader is apparently loading `Table`, but Spark is loading `SerializableTableWithSize` from the parent classloader. Because delegation isn't happening properly, you're getting two incompatible classes from the same classpath

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-01-12 Thread Nirav Patel
It seem to happening on executor of SC server as I see the error in executor logs. We did verify that there was only one version of iceberg-spark-runtime at the moment. We do include custom catalog imp jar. Though it's a shaded jar I don't see in "org/apache/iceberg/Table" or other iceberg classes

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-01-12 Thread Ryan Blue
I think it looks like a version mismatch, perhaps between the SC client and the server or between where planning occurs and the executors. The error is that the `SerializableTableWithSize` is not a subclass of `Table`, but it definitely should be. That sort of problem is usually caused by class loa

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

2024-01-09 Thread Nirav Patel
PS - issue doesn't happen if we don't use spark-connect and instead just use spark-shell or pyspark as OP in github said as well. however stacktrace desont seem to point any of the class from spark-connect jar (org.apache.spark:spark-connect_2.12:3.5.0). On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel