[DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi All, Re: PR [1908] let's use this thread to clarify the problems we're trying to solve and options for solutions. As for me, it looks like some refactoring in the way the Spark Client is built and published may be needed. I think it makes sense to clarify this before 1.0 to avoid changes to M

Re: Discussion Regarding Events Instrumentation - GH PR #1904

2025-06-20 Thread Robert Stupp
Thanks for bringing this up. I referenced the concerns that were mentioned on the PR about * the approach not using a `@Decorator`, mixing concerns. * exception/failure handling. For me, these are important topics that need to be considered in the PR. It would be good to respect those concerns

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
As for the following point I believe that regardless of the method of including the Client into Spark runtime, the code has to be exactly the same and I doubt it is the same now. WDYT? The code included in the jar for Spark Client is different now with the change, because it now uses a class i

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I definitely agree that we should resolve this issue for 1.0. ... but I'm not sure that running different code with --jar and --packages is a good idea, even if the differences are only in references to shaded classes. If one works without shading, why can the other not work without shading? Tha

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I'd really prefer to get to the bottom of this issue and support --packages. Using --jar is inconvenient in many cases as it requires manually changing the spark installation on disk. Cheers, Dmitri. On Fri, Jun 20, 2025 at 1:07 PM Yufei Gu wrote: > A bit more context on what [1908] is trying

Re: [DISCUSS] Prepare for 1.0 Release

2025-06-20 Thread Prashant Singh
Hey folks, I want to thank the whole community for jumping in for the reviews of Rollback Compaction on conflicts feature here . I am happy to share that it has now merged, since the 1.0 boat has not sailed, I cherr

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Thanks for the quick response, Yun! > org.apache.polaris#polaris-core > org.apache.iceberg#iceberg-spark-runtime-3.5_2.12 IIRC, polaris-core uses Jackson. iceberg-spark-runtime also uses Jackson, but it shades it. I believe I saw issues with using both shaded and non-shaded Jackson in the same S

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
A bit more context on what [1908] is trying to resolve: some Iceberg table operations may fail when the `--packages` config was used to pull Polaris Spark client. IIRC, the write to Iceberg table failed due to the jar conflicts. The details is in the PR description: "the iceberg requires avro 1.12.

Re: [DISCUSS] Prepare for 1.0 Release

2025-06-20 Thread Dmitri Bourlatchkov
Thanks for the heads up, Prashant! I agree that it was a good idea to pull Compaction Rollback into 1.0. Do we want to document this feature, or just mention it in release notes? Cheers, Dmitri. On Fri, Jun 20, 2025 at 1:17 PM Prashant Singh wrote: > Hey folks, > I want to thank the whole com

Re: [DISCUSS] Prepare for 1.0 Release

2025-06-20 Thread Yufei Gu
+1 on documenting on the site, which I don't think it's a 1.0 blocker. It's been added into the release notes[1]. [1]. https://docs.google.com/document/d/1JDVdQraoEhOIv7agy7WzIuBQdW0_16jW-DBrnanuW7A/edit?tab=t.0 Yufei On Fri, Jun 20, 2025 at 11:08 AM Dmitri Bourlatchkov wrote: > Thanks for th

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
*-- What is the maven artifact that Spark can automatically pull (via--packages)* Our spark client pulls the following: org.apache.polaris#polaris-spark-3.5_2.12 org.apache.polaris#polaris-core org.apache.polaris#polaris-api-management-model org.apache.iceberg#iceberg-spark-runtime-3.5_2.12

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
> > ... but I'm not sure that running different code with --jar and --packages > is a good idea, even if the differences are only in references to shaded > classes. If one works without shading, why can the other not work without shading? I agreed that it should work for both consistently. That'

Re: Polaris Community Sync on Events

2025-06-20 Thread Robert Stupp
Let me first second your point on frustration about long standing proposals - I completely feel that pain. I also think that it is frustrating for reviewers when concerns are not addressed. But it is also worth noting that getting to a consensus takes time. Getting something into an OSS project

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun and Yufei, > Specifically, why does CreateGenericTableRESTRequest use the shaded Jackson? As discussed off list, request / response payload classes have to work with the version of Jackson included with the Iceberg Spark jars (because they own the RESTClient). That in itself is fine. I'd

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
That's an interesting idea. But it requires us to maintain the consistency of the Jackson version in two places instead of one. The original Jackson version has to match with the one shaded in Iceberg spark runtime. Every time we update one, we have to remember to update another. I'm not sure if it

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
I suppose we should be able to get the version of Jackson used by Iceberg from Iceberg POM information, right? Cheers, Dmitri. On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu wrote: > That's an interesting idea. But it requires us to maintain the consistency > of the Jackson version in two places inst

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Thanks for bringing that up! Yes, I think it makes sense to put this in 1.0. I can work on the cherry-pick once everything is addressed. Best Regards, Yun On Fri, Jun 20, 2025 at 7:17 AM Dmitri Bourlatchkov wrote: > Hi All, > > Re: PR [1908] let's use this thread to clarify the pro

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Some questions for clarification: * What is the maven artifact that Spark can automatically pull (via --packages)? * Does that artifact use shaded dependencies? * Does that artifact depend on the Iceberg Spark bundle? * Is the _code_ running in Spark the same when the Polaris Spark Client is pulle

Re: [DISCUSS] Prepare for 1.0 Release

2025-06-20 Thread Dmitri Bourlatchkov
Hi All, Posting here for visibility: https://lists.apache.org/thread/0z30f3cfvm41hxlbxgp4fqdpv7mfgnv8 I opened that discussion thread about the new Spark Client plugin. My concern is that the linked PR looks like it may require changing out approach to how we publish Maven artifacts for that clie

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Sorry, but I do not think this is just about 1.0. I'd like to clarify what is actually happening with the Spark Client jars. Please see my other (recent) message. Thanks, Dmitri. On Fri, Jun 20, 2025 at 12:18 PM yun zou wrote: > Hi Dmitri, > > Thanks for bringing that up! > > Yes, I th

Re: Polaris Community Sync on Events

2025-06-20 Thread Adnan Hemani
> I also think that it is frustrating for reviewers when concerns are not and > But there are strong and serious objections, not just from me, around the > technical approach. These objections have not been addressed. I generally agree to these statements regarding frustration from unresolved c

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Thanks a lot for the information! So it seems after my previous PR [1857] that reuses the current shadowJar publish, it just publishes the shadow jar, which is included in the module files. It turns out that the POM file we generated have following like once shadow plugins is used

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Current docs [1] suggest using `--packages org.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908 produces `polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` (note: bundle, disregard version). At least this is what I saw in my local build. Is that a concern? [1] https://

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
In any case, IMHO, even updating jackson version numbers in two places is preferable to compiling against shaded packages. On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov wrote: > I suppose we should be able to get the version of Jackson used by Iceberg > from Iceberg POM information, right?

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Yufei Gu
It's simpler to maintain one version for the same dependency instead of two. There is no confusion for developers -- I can foresee anyone looking at the build script will ask what the Jackson Spark client eventually shipped. Upgrading the version is straightforward. But I'd like to know more detail

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, I think there might be a misunderstanding about how jars and packages are published, the shadowJar job is used to publish the bundle jar for the jar use cases, where all dependency are packed and users uses with spark like following: --jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSH

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, I do not see a non-bundle jar published to my local Maven repo .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT maven-metadata-local.xml polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread yun zou
Hi Dmitri, Regarding to this question: *Current docs [1] suggest using `--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908 produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` (note:bundle, disregard version).* The version number used in the bundle jar is

Re: Discussion Regarding Events Instrumentation - GH PR #1904

2025-06-20 Thread Adnan Hemani
Hi Robert, I’ve already responded to those concerns on the PR itself, prior to your comment asking for this email thread. Please take a look and feel free to respond either on the PR or this email thread, now that we’ve alrea

Re: [DISCUSS[ Spark Client jars: maven coordinates and shading

2025-06-20 Thread Dmitri Bourlatchkov
Hi Yun, Re: --packages, what I meant to say is that even with PR 1908, the published version has the "bundle" classifier. org.apache.polaris polaris-spark-3.5_2.12 20250620185923 true bundle jar 1.1.0-incubating-SNAPSHOT 20250