Hi Yun,
I'm thinking about this approach:
* We compile against unallocated jars.
* We relocate Jackson the same way as Iceberg in our build.
* We publish the relocated Spark Client jar (thin), without any bundled
classes.
* The relocated Spark Client jar depends on Iceberg Spark runtime.
* The re
Hi Dmitri,
Thanks a lot for the information! So it seems after my previous PR [1857]
that reuses the current shadowJar
publish, it just publishes the shadow jar, which is included in the module
files.
It turns out that the POM file we generated have following like once shadow
plugins is used
Hi Yun,
I do not see a non-bundle jar published to my local Maven
repo
.m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT
maven-metadata-local.xml
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.
Hi Dmitri,
I think there might be a misunderstanding about how jars and packages are
published, the shadowJar
job is used to publish the bundle jar for the jar use cases, where all
dependency are packed and users uses
with spark like following:
--jar polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSH
Hi Yun,
Re: --packages, what I meant to say is that even with PR 1908, the
published version has the "bundle" classifier.
org.apache.polaris
polaris-spark-3.5_2.12
20250620185923
true
bundle
jar
1.1.0-incubating-SNAPSHOT
20250
Hi Dmitri,
Regarding to this question:
*Current docs [1] suggest using
`--packagesorg.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908
produces`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar`
(note:bundle, disregard version).*
The version number used in the bundle jar is
It's simpler to maintain one version for the same dependency instead of
two. There is no confusion for developers -- I can foresee anyone looking
at the build script will ask what the Jackson Spark client eventually
shipped. Upgrading the version is straightforward. But I'd like to know
more detail
In any case, IMHO, even updating jackson version numbers in two places is
preferable to compiling against shaded packages.
On Fri, Jun 20, 2025 at 3:25 PM Dmitri Bourlatchkov
wrote:
> I suppose we should be able to get the version of Jackson used by Iceberg
> from Iceberg POM information, right?
Hi Yun,
Current docs [1] suggest using `--packages
org.apache.polaris:polaris-spark-3.5_2.12:1.0.0` but PR 1908 produces
`polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar` (note:
bundle, disregard version).
At least this is what I saw in my local build. Is that a concern?
[1]
https://
I suppose we should be able to get the version of Jackson used by Iceberg
from Iceberg POM information, right?
Cheers,
Dmitri.
On Fri, Jun 20, 2025 at 3:08 PM Yufei Gu wrote:
> That's an interesting idea. But it requires us to maintain the consistency
> of the Jackson version in two places inst
That's an interesting idea. But it requires us to maintain the consistency
of the Jackson version in two places instead of one. The original Jackson
version has to match with the one shaded in Iceberg spark runtime. Every
time we update one, we have to remember to update another. I'm not sure if
it
Hi Yun and Yufei,
> Specifically, why does CreateGenericTableRESTRequest use the shaded
Jackson?
As discussed off list, request / response payload classes have to work with
the version of Jackson included with the Iceberg Spark jars (because they
own the RESTClient).
That in itself is fine.
I'd
*-- What is the maven artifact that Spark can automatically pull
(via--packages)*
Our spark client pulls the following:
org.apache.polaris#polaris-spark-3.5_2.12
org.apache.polaris#polaris-core
org.apache.polaris#polaris-api-management-model
org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
>
> ... but I'm not sure that running different code with --jar and --packages
> is a good idea, even if the differences are only in references to shaded
> classes.
If one works without shading, why can the other not work without shading?
I agreed that it should work for both consistently. That'
I'd really prefer to get to the bottom of this issue and support --packages.
Using --jar is inconvenient in many cases as it requires manually changing
the spark installation on disk.
Cheers,
Dmitri.
On Fri, Jun 20, 2025 at 1:07 PM Yufei Gu wrote:
> A bit more context on what [1908] is trying
I definitely agree that we should resolve this issue for 1.0.
... but I'm not sure that running different code with --jar and --packages
is a good idea, even if the differences are only in references to shaded
classes.
If one works without shading, why can the other not work without shading?
Tha
Thanks for the quick response, Yun!
> org.apache.polaris#polaris-core
> org.apache.iceberg#iceberg-spark-runtime-3.5_2.12
IIRC, polaris-core uses Jackson. iceberg-spark-runtime also uses Jackson,
but it shades it.
I believe I saw issues with using both shaded and non-shaded Jackson in the
same S
A bit more context on what [1908] is trying to resolve: some Iceberg table
operations may fail when the `--packages` config was used to pull Polaris
Spark client. IIRC, the write to Iceberg table failed due to the jar
conflicts. The details is in the PR description: "the iceberg requires avro
1.12.
As for the following point
I believe that regardless of the method of including the Client into Spark
runtime, the code has to be exactly the same and I doubt it is the same
now. WDYT?
The code included in the jar for Spark Client is different now with the
change, because it
now uses a class i
Hi Yun,
Sorry, but I do not think this is just about 1.0. I'd like to clarify what
is actually happening with the Spark Client jars. Please see my other
(recent) message.
Thanks,
Dmitri.
On Fri, Jun 20, 2025 at 12:18 PM yun zou wrote:
> Hi Dmitri,
>
> Thanks for bringing that up!
>
> Yes, I th
Hi Dmitri,
Thanks for bringing that up!
Yes, I think it makes sense to put this in 1.0. I can work on the
cherry-pick once everything is addressed.
Best Regards,
Yun
On Fri, Jun 20, 2025 at 7:17 AM Dmitri Bourlatchkov
wrote:
> Hi All,
>
> Re: PR [1908] let's use this thread to clarify the pro
Some questions for clarification:
* What is the maven artifact that Spark can automatically pull (via
--packages)?
* Does that artifact use shaded dependencies?
* Does that artifact depend on the Iceberg Spark bundle?
* Is the _code_ running in Spark the same when the Polaris Spark Client is
pulle
22 matches
Mail list logo