Re: [DISCUSS] Spark 2.5 release

Sean Owen Fri, 20 Sep 2019 11:26:05 -0700

I don't know enough about DSv2 to comment on this part, but, any
theoretical 2.5 is still a ways off. Does waiting for 3.0 to 'stabilize' it
as much as is possible help?


I say that because re: Java 11, the main breaking change is probably the
Hive 2 / Hadoop 3 dependency, JPMML (minor), as well as the general
classloader changes, handling of off-heap memory. These aren't big breaks,
but probably going to break some things. I think we'd want to see a 'proof
of concept' branch to evaluate just how much has to change to get it
working, and that is why I think a 2.5 release would still need more
investigation.

On Fri, Sep 20, 2019 at 1:19 PM Ryan Blue <rb...@netflix.com.invalid> wrote:

> > DSv2 is far from stable right?
>
> No, I think it is reasonably stable and very close to being ready for a
> release.
>
> > All the actual data types are unstable and you guys have completely
> ignored that.
>
> I think what you're referring to is the use of `InternalRow`. That's a
> stable API and there has been no work to avoid using it. In any case, I
> don't think that anyone is suggesting that we delay 3.0 until a replacement
> for `InternalRow` is added, right?
>
> While I understand the motivation for a better solution here, I think the
> pragmatic solution is to continue using `InternalRow`.
>
> > If the goal is to make DSv2 work across 3.x and 2.x, that seems too
> invasive of a change to backport once you consider the parts needed to make
> dsv2 stable.
>
> I believe that those of us working on DSv2 are confident about the current
> stability. We set goals for what to get into the 3.0 release months ago and
> have very nearly reached the point where we are ready for that release.
>
> I don't think instability would be a problem in maintaining compatibility
> between the 2.5 version and the 3.0 version. If we find that we need to
> make API changes (other than additions) then we can make those in the 3.1
> release. Because the goals we set for the 3.0 release have been reached
> with the current API and if we are ready to release 3.0, we can release a
> 2.5 with the same API.
>
> On Fri, Sep 20, 2019 at 11:05 AM Reynold Xin <r...@databricks.com> wrote:
>
>> DSv2 is far from stable right? All the actual data types are unstable and
>> you guys have completely ignored that. We'd need to work on that and that
>> will be a breaking change. If the goal is to make DSv2 work across 3.x and
>> 2.x, that seems too invasive of a change to backport once you consider the
>> parts needed to make dsv2 stable.
>>
>>
>>
>> On Fri, Sep 20, 2019 at 10:47 AM, Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> In the DSv2 sync this week, we talked about a possible Spark 2.5 release
>>> based on the latest Spark 2.4, but with DSv2 and Java 11 support added.
>>>
>>> A Spark 2.5 release with these two additions will help people migrate to
>>> Spark 3.0 when it is released because they will be able to use a single
>>> implementation for DSv2 sources that works in both 2.5 and 3.0. Similarly,
>>> upgrading to 3.0 won't also require also updating to Java 11 because users
>>> could update to Java 11 with the 2.5 release and have fewer major changes.
>>>
>>> Another reason to consider a 2.5 release is that many people are
>>> interested in a release with the latest DSv2 API and support for DSv2 SQL.
>>> I'm already going to be backporting DSv2 support to the Spark 2.4 line, so
>>> it makes sense to share this work with the community.
>>>
>>> This release line would just consist of backports like DSv2 and Java 11
>>> that assist compatibility, to keep the scope of the release small. The
>>> purpose is to assist people moving to 3.0 and not distract from the 3.0
>>> release.
>>>
>>> Would a Spark 2.5 release help anyone else? Are there any concerns about
>>> this plan?
>>>
>>>
>>> rb
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] Spark 2.5 release

Reply via email to