There are a lot of "breaking" changes we made in 2.4 for data source v2, while I agree SPARK-24882 is "breaking" most.
I don't agree SPARK-24882 is half-baked. But I'm willing to revert it if we have a bunch of data source v2 users and they are not willing to update their implementation intensely before data source v2 API is stabilized. On Mon, Sep 10, 2018 at 2:55 PM Arun Mahadevan <ar...@apache.org> wrote: > Ryan's proposal makes a lot of sense. Its better not to release half-baked > changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but > also expected to change further due redesigns before 3.0 so don't see much > value releasing it in 2.4. > > On Sun, 9 Sep 2018 at 22:42, Wenchen Fan <cloud0...@gmail.com> wrote: > >> Strictly speaking, data source v2 is always half-finished until we mark >> it as stable. We need some small milestones to move forward step by step. >> >> The redesign also happens in an incremental way. SPARK-24882 mostly focus >> on the "RDD" part of the API: the separation of reader factory and input >> partitions, the introduction of ScanConfig, etc. Then we focus on the >> high-level abstraction and want to change the "table" part of the API. >> >> In my understanding, each PR should be self-contained. If we are OK to >> have SPARK-24882 in master as an individual commit, I think it's also OK to >> have it in branch 2.4. >> >> I've created https://issues.apache.org/jira/browse/SPARK-25390 to track >> the new abstraction. It doesn't change the API a lot, but update the >> streaming execution engine quite a bit. >> >> Thanks, >> Wenchen >> >> On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue <rb...@netflix.com> wrote: >> >>> Wenchen, can you hold off on the first RC? >>> >>> The half-finished changes from the redesign of the DataSourceV2 API are >>> in master, added in SPARK-24882 >>> <https://github.com/apache/spark/pull/22009>, and are now in the 2.4 >>> branch. We've had a lot of good discussion since that PR was merged to >>> update and fix the design, plus only one of the follow-ups on >>> SPARK-25186 <https://issues.apache.org/jira/browse/SPARK-25186> is >>> done. Clearly, the redesign was too large to get into 2.4 in so little time >>> -- it was proposed about 10 days before the original branch date -- and I >>> don't think it is a good idea to release half-finished major changes. >>> >>> The easiest solution is to revert SPARK-24882 in the release branch. >>> That way we have minor changes in 2.4 and major changes in the next >>> release, instead of major changes in both. What does everyone think? >>> >>> rb >>> >>> On Fri, Sep 7, 2018 at 10:37 AM shane knapp <skn...@berkeley.edu> wrote: >>> >>>> ++joshrosen (thanks for the help w/deploying the jenkins configs) >>>> >>>> the basic 2.4 builds are deployed and building! >>>> >>>> i haven't created (a) build(s) yet for scala 2.12... i'll be >>>> coordinating this w/the databricks folks next week. >>>> >>>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun <dongjoon.h...@gmail.com> >>>> wrote: >>>> >>>>> Thank you, Shane! :D >>>>> >>>>> Bests, >>>>> Dongjoon. >>>>> >>>>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp <skn...@berkeley.edu> >>>>> wrote: >>>>> >>>>>> i'll try and get to the 2.4 branch stuff today... >>>>>> >>>>>> >>>> >>>> >>>> -- >>>> Shane Knapp >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> https://rise.cs.berkeley.edu >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >>