Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-23 Thread Dmitri Bourlatchkov
Hi Yun, I'm thinking about this approach: * We compile against unallocated jars. * We relocate Jackson the same way as Iceberg in our build. * We publish the relocated Spark Client jar (thin), without any bundled classes. * The relocated Spark Client jar depends on Iceberg Spark runtime. * The re

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Thanks a lot for the information! So it seems after my previous PR [1857] that reuses the current shadowJar publish, it just publishes the shadow jar, which is included in the module files. It turns out that the POM file we generated have following like once shadow plugins is used

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, I do not see a non-bundle jar published to my local Maven repo .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT maven-metadata-local.xml polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, I think there might be a misunderstanding about how jars and packages are published, the shadowJar job is used to publish the bundle jar for the jar use cases, where all dependency are packed and users uses with spark like following: --jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSH

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Re: --packages, what I meant to say is that even with PR 1908, the published version has the "bundle" classifier. org.apache.polaris polaris-spark-3.5_2.12 20250620185923 true bundle jar 1.1.0-incubating-SNAPSHOT 20250

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Regarding to this question: *Current docs [1] suggest using `--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908 produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` (note:bundle, disregard version).* The version number used in the bundle jar is

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
It's simpler to maintain one version for the same dependency instead of two. There is no confusion for developers -- I can foresee anyone looking at the build script will ask what the Jackson Spark client eventually shipped. Upgrading the version is straightforward. But I'd like to know more detail

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
In any case, IMHO, even updating jackson version numbers in two places is preferable to compiling against shaded packages. On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov wrote: > I suppose we should be able to get the version of Jackson used by Iceberg > from Iceberg POM information, right?

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Current docs [1] suggest using `--packages org.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908 produces `polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` (note: bundle, disregard version). At least this is what I saw in my local build. Is that a concern? [1] https://

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I suppose we should be able to get the version of Jackson used by Iceberg from Iceberg POM information, right? Cheers, Dmitri. On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu wrote: > That's an interesting idea. But it requires us to maintain the consistency > of the Jackson version in two places inst

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
That's an interesting idea. But it requires us to maintain the consistency of the Jackson version in two places instead of one. The original Jackson version has to match with the one shaded in Iceberg spark runtime. Every time we update one, we have to remember to update another. I'm not sure if it

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun and Yufei, > Specifically, why does CreateGenericTableRESTRequest use the shaded Jackson? As discussed off list, request / response payload classes have to work with the version of Jackson included with the Iceberg Spark jars (because they own the RESTClient). That in itself is fine. I'd

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
*-- What is the maven artifact that Spark can automatically pull (via--packages)* Our spark client pulls the following: org.apache.polaris#polaris-spark-3.5_2.12 org.apache.polaris#polaris-core org.apache.polaris#polaris-api-management-model org.apache.iceberg#iceberg-spark-runtime-3.5_2.12

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
> > ... but I'm not sure that running different code with --jar and --packages > is a good idea, even if the differences are only in references to shaded > classes. If one works without shading, why can the other not work without shading? I agreed that it should work for both consistently. That'

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I'd really prefer to get to the bottom of this issue and support --packages. Using --jar is inconvenient in many cases as it requires manually changing the spark installation on disk. Cheers, Dmitri. On Fri, Jun 20, 2025 at 1:07 PM Yufei Gu wrote: > A bit more context on what [1908] is trying

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I definitely agree that we should resolve this issue for 1.0. ... but I'm not sure that running different code with --jar and --packages is a good idea, even if the differences are only in references to shaded classes. If one works without shading, why can the other not work without shading? Tha

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Thanks for the quick response, Yun! > org.apache.polaris#polaris-core > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 IIRC, polaris-core uses Jackson. iceberg-spark-runtime also uses Jackson, but it shades it. I believe I saw issues with using both shaded and non-shaded Jackson in the same S

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
A bit more context on what [1908] is trying to resolve: some Iceberg table operations may fail when the `--packages` config was used to pull Polaris Spark client. IIRC, the write to Iceberg table failed due to the jar conflicts. The details is in the PR description: "the iceberg requires avro 1.12.

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
As for the following point I believe that regardless of the method of including the Client into Spark runtime, the code has to be exactly the same and I doubt it is the same now. WDYT? The code included in the jar for Spark Client is different now with the change, because it now uses a class i

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Sorry, but I do not think this is just about 1.0. I'd like to clarify what is actually happening with the Spark Client jars. Please see my other (recent) message. Thanks, Dmitri. On Fri, Jun 20, 2025 at 12:18 PM yun zou wrote: > Hi Dmitri, > > Thanks for bringing that up! > > Yes, I th

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Thanks for bringing that up! Yes, I think it makes sense to put this in 1.0. I can work on the cherry-pick once everything is addressed. Best Regards, Yun On Fri, Jun 20, 2025 at 7:17 AM Dmitri Bourlatchkov wrote: > Hi All, > > Re: PR [1908] let's use this thread to clarify the pro

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Some questions for clarification: * What is the maven artifact that Spark can automatically pull (via --packages)? * Does that artifact use shaded dependencies? * Does that artifact depend on the Iceberg Spark bundle? * Is the _code_ running in Spark the same when the Polaris Spark Client is pulle