So we basically  agree to rename the spark artifact names.  Is there any
other concern for this PR: https://github.com/apache/iceberg/pull/4158/ ?

On Wed, Feb 23, 2022 at 1:48 AM Ryan Blue <b...@tabular.io> wrote:

> I initially supported not renaming for the reason that Jeff raised, but
> now I'm more convinced by Kyle's argument. This is confusing and it isn't
> that big of a problem to use a different Jar. +1 to renaming.
>
> On Sun, Feb 20, 2022 at 10:57 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> Agreed with Kyle. An artifact name of Spark3.0 like
>> iceberg-spark-runtime-3.0_2.12-0.13.1.jar is more accurate and
>> consistent,  less confusing for users.
>>
>> On Sun, Feb 20, 2022 at 10:47 PM Kyle Bendickson <k...@tabular.io> wrote:
>>
>>> Thanks for bringing this up Jeff!
>>>
>>> Normally I agree, it’s not a good practice to change artifact name.
>>> However, in this case, the artifact has changed already. The
>>> “spark3-runtime” used to be for all versions of Spark 3 (at the time Spark
>>> 3.0 and 3.1). It no longer is, as it’s only tested / used with Spark 3.0.
>>>
>>> I encounter many users who have upgraded to newer versions of Spark, but
>>> have not upgraded the artifact to the newly versioned by Spark name system
>>> as “spark3-runtime” sounds like it encompasses all versions. And they
>>> encounter subtle bugs and it’s not a great user experience to solve
>>> upgrading that way.
>>>
>>> These users are, however, updating the Iceberg artifact to the new
>>> versions.
>>>
>>> So I think in this case, breaking naming has benefits. As users who go
>>> to upgrade when new Iceberg version are released, and their dependency is
>>> not found, they will hopefully check maven and see the new naming
>>> convention / artifacts.
>>>
>>> So I support option 2 also, with naming with Spark and Scala versions.
>>> Otherwise, we continue to see people using the old “spark3-runtime” as they
>>> upgrade Spark versions and encounter subtle errors (class not found, wrong
>>> type signatures due to version mismatch).
>>>
>>> Users eventually have to upgrade their pom if / when they upgrade Spark,
>>> due to incompatibility. This way at least, breaking will be loud as there’s
>>> won’t be a new Iceberg version,
>>>
>>> Is it possible to mark to the old spark3-runtime / spark-runtime as
>>> deprecated or otherwise point to the new artifacts in Maven?
>>>
>>> - Kyle
>>>
>>> On Sun, Feb 20, 2022 at 9:41 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>> I don't think it is best practice to just change the artifact name of
>>>> published jars. Unless we publish a new version with the new naming
>>>> convention.
>>>>
>>>> On Mon, Feb 21, 2022 at 12:36 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>
>>>>> I think option 2 is ideal, but I don't know if there is any hard
>>>>> requirement from ASF/Maven Central side for us to keep backwards
>>>>> compatibility of package names published in maven. If there is a
>>>>> requirement then we cannot change it.
>>>>>
>>>>> As a mitigation, I stated in
>>>>> https://iceberg.apache.org/multi-engine-support that Spark 2.4 and
>>>>> 3.0 jar names do not follow the naming convention of newer versions for
>>>>> backwards compatibility.
>>>>>
>>>>> Best,
>>>>> Jack Ye
>>>>>
>>>>> On Sun, Feb 20, 2022 at 7:03 PM OpenInx <open...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone
>>>>>>
>>>>>> The current spark2.4, spark3.0 have the following unaligned runtime
>>>>>> artifact names:
>>>>>>
>>>>>> # Spark 2.4
>>>>>> iceberg-spark-runtime-0.13.1.jar
>>>>>> # Spark 3.0
>>>>>> iceberg-spark3-runtime-0.13.1.jar
>>>>>> # Spark 3.1
>>>>>> iceberg-spark-runtime-3.1_2.12-0.13.1.jar
>>>>>> # Spark 3.2
>>>>>> iceberg-spark-runtime-3.2_2.12-0.13.1.jar
>>>>>>
>>>>>> From the spark 3.1 and spark 3.2's runtime artifact names, we can
>>>>>> easily recognize:
>>>>>> 1. What's the spark major version that the runtime jar is attached to
>>>>>> 2. What's the spark scala version that the runtime jar is compiled
>>>>>> with
>>>>>>
>>>>>> But for spark 3.0 and spark 2.4,  it's not easy to understand what's
>>>>>> the above information.  I think we kept those legacy names because they
>>>>>> were introduced in older iceberg releases and we wanted to avoid changing
>>>>>> the modules that users depend on and opted not to rename, but they are
>>>>>> indeed causing confusion for the new community users.
>>>>>>
>>>>>> In general,   we have two options:
>>>>>>
>>>>>> Option#1:  keep the current artifact names, that mean spark 2.4 &
>>>>>> spark 3.0 will always use the iceberg-spark-runtime-<iceberg-version>.jar
>>>>>> and iceberg-spark3-runtime-<iceberg-version>.jar until them get retired 
>>>>>> in
>>>>>> the apache iceberg official repo.
>>>>>> Option#2:  Change the spark2.4 & spark3.0's artifact names to the
>>>>>> generic name format:
>>>>>> iceberg-spark-runtime-<spark-major.minor>_<scala-version>-<iceberg-version>.jar.
>>>>>>  It makes sharing all the consistent name format between all the spark
>>>>>> versions.
>>>>>>
>>>>>> Personally, I'd prefer option#2 because that looks more friendly for
>>>>>> new community users (although it will require the old users to change 
>>>>>> their
>>>>>> pom.xml to the new version).
>>>>>>
>>>>>> What is your preference ?
>>>>>>
>>>>>> Reference:
>>>>>> 1.  Created a PR to change the artifact names and we had few
>>>>>> discussions there. https://github.com/apache/iceberg/pull/4158
>>>>>> 2.
>>>>>> https://github.com/apache/iceberg-docs/pull/27#discussion_r800297155
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>> --
>> Best,
>>
>> Yufei
>>
>> `This is not a contribution`
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to