Re: Branch 2.4 is cut

2018-09-09 Thread Ryan Blue
Wenchen, can you hold off on the first RC?

The half-finished changes from the redesign of the DataSourceV2 API are in
master, added in SPARK-24882 ,
and are now in the 2.4 branch. We've had a lot of good discussion since
that PR was merged to update and fix the design, plus only one of the
follow-ups on SPARK-25186
 is done. Clearly, the
redesign was too large to get into 2.4 in so little time -- it was proposed
about 10 days before the original branch date -- and I don't think it is a
good idea to release half-finished major changes.

The easiest solution is to revert SPARK-24882 in the release branch. That
way we have minor changes in 2.4 and major changes in the next release,
instead of major changes in both. What does everyone think?

rb

On Fri, Sep 7, 2018 at 10:37 AM shane knapp  wrote:

> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>
> the basic 2.4 builds are deployed and building!
>
> i haven't created (a) build(s) yet for scala 2.12...  i'll be coordinating
> this w/the databricks folks next week.
>
> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun 
> wrote:
>
>> Thank you, Shane! :D
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp  wrote:
>>
>>> i'll try and get to the 2.4 branch stuff today...
>>>
>>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Ryan Blue
Software Engineer
Netflix


Re: Branch 2.4 is cut

2018-09-09 Thread Wenchen Fan
Strictly speaking, data source v2 is always half-finished until we mark it
as stable. We need some small milestones to move forward step by step.

The redesign also happens in an incremental way. SPARK-24882 mostly focus
on the "RDD" part of the API: the separation of reader factory and input
partitions, the introduction of ScanConfig, etc. Then we focus on the
high-level abstraction and want to change the "table" part of the API.

In my understanding, each PR should be self-contained. If we are OK to
have SPARK-24882 in master as an individual commit, I think it's also OK to
have it in branch 2.4.

I've created https://issues.apache.org/jira/browse/SPARK-25390 to track the
new abstraction. It doesn't change the API a lot, but update the streaming
execution engine quite a bit.

Thanks,
Wenchen

On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue  wrote:

> Wenchen, can you hold off on the first RC?
>
> The half-finished changes from the redesign of the DataSourceV2 API are in
> master, added in SPARK-24882 ,
> and are now in the 2.4 branch. We've had a lot of good discussion since
> that PR was merged to update and fix the design, plus only one of the
> follow-ups on SPARK-25186
>  is done. Clearly, the
> redesign was too large to get into 2.4 in so little time -- it was proposed
> about 10 days before the original branch date -- and I don't think it is a
> good idea to release half-finished major changes.
>
> The easiest solution is to revert SPARK-24882 in the release branch. That
> way we have minor changes in 2.4 and major changes in the next release,
> instead of major changes in both. What does everyone think?
>
> rb
>
> On Fri, Sep 7, 2018 at 10:37 AM shane knapp  wrote:
>
>> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>>
>> the basic 2.4 builds are deployed and building!
>>
>> i haven't created (a) build(s) yet for scala 2.12...  i'll be
>> coordinating this w/the databricks folks next week.
>>
>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun 
>> wrote:
>>
>>> Thank you, Shane! :D
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Fri, Sep 7, 2018 at 9:51 AM shane knapp  wrote:
>>>
 i'll try and get to the 2.4 branch stuff today...


>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Branch 2.4 is cut

2018-09-09 Thread Arun Mahadevan
Ryan's proposal makes a lot of sense. Its better not to release half-baked
changes in 2.4 which not only breaks a lot of the APIs released in 2.3, but
also expected to change further due redesigns before 3.0 so don't see much
value releasing it in 2.4.

On Sun, 9 Sep 2018 at 22:42, Wenchen Fan  wrote:

> Strictly speaking, data source v2 is always half-finished until we mark it
> as stable. We need some small milestones to move forward step by step.
>
> The redesign also happens in an incremental way. SPARK-24882 mostly focus
> on the "RDD" part of the API: the separation of reader factory and input
> partitions, the introduction of ScanConfig, etc. Then we focus on the
> high-level abstraction and want to change the "table" part of the API.
>
> In my understanding, each PR should be self-contained. If we are OK to
> have SPARK-24882 in master as an individual commit, I think it's also OK to
> have it in branch 2.4.
>
> I've created https://issues.apache.org/jira/browse/SPARK-25390 to track
> the new abstraction. It doesn't change the API a lot, but update the
> streaming execution engine quite a bit.
>
> Thanks,
> Wenchen
>
> On Mon, Sep 10, 2018 at 4:20 AM Ryan Blue  wrote:
>
>> Wenchen, can you hold off on the first RC?
>>
>> The half-finished changes from the redesign of the DataSourceV2 API are
>> in master, added in SPARK-24882
>> , and are now in the 2.4
>> branch. We've had a lot of good discussion since that PR was merged to
>> update and fix the design, plus only one of the follow-ups on SPARK-25186
>>  is done. Clearly,
>> the redesign was too large to get into 2.4 in so little time -- it was
>> proposed about 10 days before the original branch date -- and I don't think
>> it is a good idea to release half-finished major changes.
>>
>> The easiest solution is to revert SPARK-24882 in the release branch. That
>> way we have minor changes in 2.4 and major changes in the next release,
>> instead of major changes in both. What does everyone think?
>>
>> rb
>>
>> On Fri, Sep 7, 2018 at 10:37 AM shane knapp  wrote:
>>
>>> ++joshrosen  (thanks for the help w/deploying the jenkins configs)
>>>
>>> the basic 2.4 builds are deployed and building!
>>>
>>> i haven't created (a) build(s) yet for scala 2.12...  i'll be
>>> coordinating this w/the databricks folks next week.
>>>
>>> On Fri, Sep 7, 2018 at 9:53 AM, Dongjoon Hyun 
>>> wrote:
>>>
 Thank you, Shane! :D

 Bests,
 Dongjoon.

 On Fri, Sep 7, 2018 at 9:51 AM shane knapp  wrote:

> i'll try and get to the 2.4 branch stuff today...
>
>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>