So we basically agree to rename the spark artifact names. Is there any other concern for this PR: https://github.com/apache/iceberg/pull/4158/ ?
On Wed, Feb 23, 2022 at 1:48 AM Ryan Blue <b...@tabular.io> wrote: > I initially supported not renaming for the reason that Jeff raised, but > now I'm more convinced by Kyle's argument. This is confusing and it isn't > that big of a problem to use a different Jar. +1 to renaming. > > On Sun, Feb 20, 2022 at 10:57 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> Agreed with Kyle. An artifact name of Spark3.0 like >> iceberg-spark-runtime-3.0_2.12-0.13.1.jar is more accurate and >> consistent, less confusing for users. >> >> On Sun, Feb 20, 2022 at 10:47 PM Kyle Bendickson <k...@tabular.io> wrote: >> >>> Thanks for bringing this up Jeff! >>> >>> Normally I agree, it’s not a good practice to change artifact name. >>> However, in this case, the artifact has changed already. The >>> “spark3-runtime” used to be for all versions of Spark 3 (at the time Spark >>> 3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0. >>> >>> I encounter many users who have upgraded to newer versions of Spark, but >>> have not upgraded the artifact to the newly versioned by Spark name system >>> as “spark3-runtime” sounds like it encompasses all versions. And they >>> encounter subtle bugs and it’s not a great user experience to solve >>> upgrading that way. >>> >>> These users are, however, updating the Iceberg artifact to the new >>> versions. >>> >>> So I think in this case, breaking naming has benefits. As users who go >>> to upgrade when new Iceberg version are released, and their dependency is >>> not found, they will hopefully check maven and see the new naming >>> convention / artifacts. >>> >>> So I support option 2 also, with naming with Spark and Scala versions. >>> Otherwise, we continue to see people using the old “spark3-runtime” as they >>> upgrade Spark versions and encounter subtle errors (class not found, wrong >>> type signatures due to version mismatch). >>> >>> Users eventually have to upgrade their pom if / when they upgrade Spark, >>> due to incompatibility. This way at least, breaking will be loud as there’s >>> won’t be a new Iceberg version, >>> >>> Is it possible to mark to the old spark3-runtime / spark-runtime as >>> deprecated or otherwise point to the new artifacts in Maven? >>> >>> - Kyle >>> >>> On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zjf...@gmail.com> wrote: >>> >>>> I don't think it is best practice to just change the artifact name of >>>> published jars. Unless we publish a new version with the new naming >>>> convention. >>>> >>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <yezhao...@gmail.com> wrote: >>>> >>>>> I think option 2 is ideal, but I don't know if there is any hard >>>>> requirement from ASF/Maven Central side for us to keep backwards >>>>> compatibility of package names published in maven. If there is a >>>>> requirement then we cannot change it. >>>>> >>>>> As a mitigation, I stated in >>>>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and >>>>> 3.0 jar names do not follow the naming convention of newer versions for >>>>> backwards compatibility. >>>>> >>>>> Best, >>>>> Jack Ye >>>>> >>>>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <open...@gmail.com> wrote: >>>>> >>>>>> Hi everyone >>>>>> >>>>>> The current spark2.4, spark3.0 have the following unaligned runtime >>>>>> artifact names: >>>>>> >>>>>> # Spark 2.4 >>>>>> iceberg-spark-runtime-0.13.1.jar >>>>>> # Spark 3.0 >>>>>> iceberg-spark3-runtime-0.13.1.jar >>>>>> # Spark 3.1 >>>>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar >>>>>> # Spark 3.2 >>>>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar >>>>>> >>>>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can >>>>>> easily recognize: >>>>>> 1. What's the spark major version that the runtime jar is attached to >>>>>> 2. What's the spark scala version that the runtime jar is compiled >>>>>> with >>>>>> >>>>>> But for spark 3.0 and spark 2.4, it's not easy to understand what's >>>>>> the above information. I think we kept those legacy names because they >>>>>> were introduced in older iceberg releases and we wanted to avoid changing >>>>>> the modules that users depend on and opted not to rename, but they are >>>>>> indeed causing confusion for the new community users. >>>>>> >>>>>> In general, we have two options: >>>>>> >>>>>> Option#1: keep the current artifact names, that mean spark 2.4 & >>>>>> spark 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar >>>>>> and iceberg-spark3-runtime-<iceberg-version>.jar until them get retired >>>>>> in >>>>>> the apache iceberg official repo. >>>>>> Option#2: Change the spark2.4 & spark3.0's artifact names to the >>>>>> generic name format: >>>>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar. >>>>>> It makes sharing all the consistent name format between all the spark >>>>>> versions. >>>>>> >>>>>> Personally, I'd prefer option#2 because that looks more friendly for >>>>>> new community users (although it will require the old users to change >>>>>> their >>>>>> pom.xml to the new version). >>>>>> >>>>>> What is your preference ? >>>>>> >>>>>> Reference: >>>>>> 1. Created a PR to change the artifact names and we had few >>>>>> discussions there. https://github.com/apache/iceberg/pull/4158 >>>>>> 2. >>>>>> https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155 >>>>>> >>>>> >>>> >>>> -- >>>> Best Regards >>>> >>>> Jeff Zhang >>>> >>> -- >> Best, >> >> Yufei >> >> `This is not a contribution` >> > > > -- > Ryan Blue > Tabular >