Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Dongjoon Hyun
Hi, Xiao.

I agree.

> Merging the feature work after the branch cut should not be
encouraged in general, although some committers did make some exceptions
based on their own judgement. We should try to avoid merging the feature
work after the branch cut.

So, the Apache Spark community accepted your request for delay already.
(Early November to Early December)

-
https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca

I don't think the branch cut should be delayed again. We don't need to have
two weeks after Hyukjin's email.

Given the delay, I'd strongly recommend to cut the branch on 1st December.

I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to start to
stabilize .

Again, it will not block you if you have an exceptional request.

However, it would be helpful for all of us if you make it clear what
features you are waiting for now.

We are creating Apache Spark together.

Bests,
Dongjoon.


On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:

> Correction:
>
> Merging the feature work after the branch cut should not be encouraged in
> general, although some committers did make some exceptions based on their
> own judgement. We should try to avoid merging the feature work after the
> branch cut.
>
> This email is a good reminder message. At least, we have two weeks
> ahead of the proposed branch cut date. I hope each feature owner might
> hurry up and try to finish it before the branch cut.
>
> Xiao
>
> Xiao Li  于2020年11月19日周四 下午11:36写道:
>
>> We should try to merge the feature work after the branch cut. This should
>> not be encouraged in general, although some committers did make some
>> exceptions based on their own judgement.
>>
>> This email is a good reminder message. At least, we have two weeks
>> ahead of the proposed branch cut date. I hope each feature owner might
>> hurry up and try to finish it before the branch cut.
>>
>> Xiao
>>
>> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>>
>>> Thank you for your volunteering!
>>>
>>> Since the previous branch-cuts were always soft-code freeze which
>>> allowed committers to merge to the new branches still for a while, I
>>> believe 1st December will be better for stabilization.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I think we haven’t decided yet the exact branch-cut, code freeze and
 release manager.

 As we planned in https://spark.apache.org/versioning-policy.html

 Early Dec 2020 Code freeze. Release branch cut

 Code freeze and branch cutting is coming.

 Therefore, we should finish if there are any remaining works for Spark
 3.1, and
 switch to QA mode soon.
 I think it’s time to set to keep it on track, and I would like to
 volunteer to help drive this process.

 I am currently thinking 4th Dec as the branch-cut date.

 Any thoughts?

 Thanks all.




Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Xiao Li
Hi, Dongjoon,

Thank you for your feedback. I think *Early December* does not mean we will
cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are a big
deal. Normally, it would be nice to give enough buffer. Based on my
understanding, this email is just a *proposal* and a *reminder*. In the
past, we often got mixed feedbacks.

Anyway, we are collecting the feedbacks from the whole community. Welcome
the inputs from everyone else

Thanks,

Xiao

Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:

> Hi, Xiao.
>
> I agree.
>
> > Merging the feature work after the branch cut should not be
> encouraged in general, although some committers did make some exceptions
> based on their own judgement. We should try to avoid merging the feature
> work after the branch cut.
>
> So, the Apache Spark community accepted your request for delay already.
> (Early November to Early December)
>
> -
> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>
> I don't think the branch cut should be delayed again. We don't need to
> have two weeks after Hyukjin's email.
>
> Given the delay, I'd strongly recommend to cut the branch on 1st December.
>
> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to start to
> stabilize .
>
> Again, it will not block you if you have an exceptional request.
>
> However, it would be helpful for all of us if you make it clear what
> features you are waiting for now.
>
> We are creating Apache Spark together.
>
> Bests,
> Dongjoon.
>
>
> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:
>
>> Correction:
>>
>> Merging the feature work after the branch cut should not be encouraged in
>> general, although some committers did make some exceptions based on their
>> own judgement. We should try to avoid merging the feature work after the
>> branch cut.
>>
>> This email is a good reminder message. At least, we have two weeks
>> ahead of the proposed branch cut date. I hope each feature owner might
>> hurry up and try to finish it before the branch cut.
>>
>> Xiao
>>
>> Xiao Li  于2020年11月19日周四 下午11:36写道:
>>
>>> We should try to merge the feature work after the branch cut. This
>>> should not be encouraged in general, although some committers did make some
>>> exceptions based on their own judgement.
>>>
>>> This email is a good reminder message. At least, we have two weeks
>>> ahead of the proposed branch cut date. I hope each feature owner might
>>> hurry up and try to finish it before the branch cut.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>>>
 Thank you for your volunteering!

 Since the previous branch-cuts were always soft-code freeze which
 allowed committers to merge to the new branches still for a while, I
 believe 1st December will be better for stabilization.

 Bests,
 Dongjoon.


 On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I think we haven’t decided yet the exact branch-cut, code freeze and
> release manager.
>
> As we planned in https://spark.apache.org/versioning-policy.html
>
> Early Dec 2020 Code freeze. Release branch cut
>
> Code freeze and branch cutting is coming.
>
> Therefore, we should finish if there are any remaining works for Spark
> 3.1, and
> switch to QA mode soon.
> I think it’s time to set to keep it on track, and I would like to
> volunteer to help drive this process.
>
> I am currently thinking 4th Dec as the branch-cut date.
>
> Any thoughts?
>
> Thanks all.
>
>


Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Xiao Li
https://github.com/apache/spark/pull/28026 is the major feature I am
tracking. It is painful to keep two sets of CREATE TABLE DDLs with
different behaviors. This hurts the usability of our SQL users, based on
what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I think
we should try our best to address it in 3.1.

Thanks,

Xiao

Xiao Li  于2020年11月20日周五 上午8:52写道:

> Hi, Dongjoon,
>
> Thank you for your feedback. I think *Early December* does not mean we
> will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are a
> big deal. Normally, it would be nice to give enough buffer. Based on my
> understanding, this email is just a *proposal* and a *reminder*. In the
> past, we often got mixed feedbacks.
>
> Anyway, we are collecting the feedbacks from the whole community. Welcome
> the inputs from everyone else
>
> Thanks,
>
> Xiao
>
> Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:
>
>> Hi, Xiao.
>>
>> I agree.
>>
>> > Merging the feature work after the branch cut should not be
>> encouraged in general, although some committers did make some exceptions
>> based on their own judgement. We should try to avoid merging the feature
>> work after the branch cut.
>>
>> So, the Apache Spark community accepted your request for delay already.
>> (Early November to Early December)
>>
>> -
>> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>>
>> I don't think the branch cut should be delayed again. We don't need to
>> have two weeks after Hyukjin's email.
>>
>> Given the delay, I'd strongly recommend to cut the branch on 1st December.
>>
>> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to start
>> to stabilize .
>>
>> Again, it will not block you if you have an exceptional request.
>>
>> However, it would be helpful for all of us if you make it clear what
>> features you are waiting for now.
>>
>> We are creating Apache Spark together.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:
>>
>>> Correction:
>>>
>>> Merging the feature work after the branch cut should not be encouraged
>>> in general, although some committers did make some exceptions based on
>>> their own judgement. We should try to avoid merging the feature work after
>>> the branch cut.
>>>
>>> This email is a good reminder message. At least, we have two weeks
>>> ahead of the proposed branch cut date. I hope each feature owner might
>>> hurry up and try to finish it before the branch cut.
>>>
>>> Xiao
>>>
>>> Xiao Li  于2020年11月19日周四 下午11:36写道:
>>>
 We should try to merge the feature work after the branch cut. This
 should not be encouraged in general, although some committers did make some
 exceptions based on their own judgement.

 This email is a good reminder message. At least, we have two weeks
 ahead of the proposed branch cut date. I hope each feature owner might
 hurry up and try to finish it before the branch cut.

 Xiao

 Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:

> Thank you for your volunteering!
>
> Since the previous branch-cuts were always soft-code freeze which
> allowed committers to merge to the new branches still for a while, I
> believe 1st December will be better for stabilization.
>
> Bests,
> Dongjoon.
>
>
> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I think we haven’t decided yet the exact branch-cut, code freeze and
>> release manager.
>>
>> As we planned in https://spark.apache.org/versioning-policy.html
>>
>> Early Dec 2020 Code freeze. Release branch cut
>>
>> Code freeze and branch cutting is coming.
>>
>> Therefore, we should finish if there are any remaining works for
>> Spark 3.1, and
>> switch to QA mode soon.
>> I think it’s time to set to keep it on track, and I would like to
>> volunteer to help drive this process.
>>
>> I am currently thinking 4th Dec as the branch-cut date.
>>
>> Any thoughts?
>>
>> Thanks all.
>>
>>


Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Dongjoon Hyun
Thank you for sharing, Xiao.

I hope we are able to make some agreement for CREATE TABLE DDLs, too.

Bests,
Dongjoon.

On Fri, Nov 20, 2020 at 9:01 AM Xiao Li  wrote:

> https://github.com/apache/spark/pull/28026 is the major feature I am
> tracking. It is painful to keep two sets of CREATE TABLE DDLs with
> different behaviors. This hurts the usability of our SQL users, based on
> what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I think
> we should try our best to address it in 3.1.
>
> Thanks,
>
> Xiao
>
> Xiao Li  于2020年11月20日周五 上午8:52写道:
>
>> Hi, Dongjoon,
>>
>> Thank you for your feedback. I think *Early December* does not mean we
>> will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are a
>> big deal. Normally, it would be nice to give enough buffer. Based on my
>> understanding, this email is just a *proposal* and a *reminder*. In the
>> past, we often got mixed feedbacks.
>>
>> Anyway, we are collecting the feedbacks from the whole community. Welcome
>> the inputs from everyone else
>>
>> Thanks,
>>
>> Xiao
>>
>> Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:
>>
>>> Hi, Xiao.
>>>
>>> I agree.
>>>
>>> > Merging the feature work after the branch cut should not be
>>> encouraged in general, although some committers did make some exceptions
>>> based on their own judgement. We should try to avoid merging the feature
>>> work after the branch cut.
>>>
>>> So, the Apache Spark community accepted your request for delay already.
>>> (Early November to Early December)
>>>
>>> -
>>> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>>>
>>> I don't think the branch cut should be delayed again. We don't need to
>>> have two weeks after Hyukjin's email.
>>>
>>> Given the delay, I'd strongly recommend to cut the branch on 1st
>>> December.
>>>
>>> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to start
>>> to stabilize .
>>>
>>> Again, it will not block you if you have an exceptional request.
>>>
>>> However, it would be helpful for all of us if you make it clear what
>>> features you are waiting for now.
>>>
>>> We are creating Apache Spark together.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:
>>>
 Correction:

 Merging the feature work after the branch cut should not be encouraged
 in general, although some committers did make some exceptions based on
 their own judgement. We should try to avoid merging the feature work after
 the branch cut.

 This email is a good reminder message. At least, we have two weeks
 ahead of the proposed branch cut date. I hope each feature owner might
 hurry up and try to finish it before the branch cut.

 Xiao

 Xiao Li  于2020年11月19日周四 下午11:36写道:

> We should try to merge the feature work after the branch cut. This
> should not be encouraged in general, although some committers did make 
> some
> exceptions based on their own judgement.
>
> This email is a good reminder message. At least, we have two weeks
> ahead of the proposed branch cut date. I hope each feature owner might
> hurry up and try to finish it before the branch cut.
>
> Xiao
>
> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>
>> Thank you for your volunteering!
>>
>> Since the previous branch-cuts were always soft-code freeze which
>> allowed committers to merge to the new branches still for a while, I
>> believe 1st December will be better for stabilization.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I think we haven’t decided yet the exact branch-cut, code freeze and
>>> release manager.
>>>
>>> As we planned in https://spark.apache.org/versioning-policy.html
>>>
>>> Early Dec 2020 Code freeze. Release branch cut
>>>
>>> Code freeze and branch cutting is coming.
>>>
>>> Therefore, we should finish if there are any remaining works for
>>> Spark 3.1, and
>>> switch to QA mode soon.
>>> I think it’s time to set to keep it on track, and I would like to
>>> volunteer to help drive this process.
>>>
>>> I am currently thinking 4th Dec as the branch-cut date.
>>>
>>> Any thoughts?
>>>
>>> Thanks all.
>>>
>>>


Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Ryan Blue
I think we should be able to get the CREATE TABLE changes in. Now that the
main blocker (EXTERNAL) has been decided, it's just a matter of normal
review comments.

On Fri, Nov 20, 2020 at 9:05 AM Dongjoon Hyun 
wrote:

> Thank you for sharing, Xiao.
>
> I hope we are able to make some agreement for CREATE TABLE DDLs, too.
>
> Bests,
> Dongjoon.
>
> On Fri, Nov 20, 2020 at 9:01 AM Xiao Li  wrote:
>
>> https://github.com/apache/spark/pull/28026 is the major feature I am
>> tracking. It is painful to keep two sets of CREATE TABLE DDLs with
>> different behaviors. This hurts the usability of our SQL users, based on
>> what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I think
>> we should try our best to address it in 3.1.
>>
>> Thanks,
>>
>> Xiao
>>
>> Xiao Li  于2020年11月20日周五 上午8:52写道:
>>
>>> Hi, Dongjoon,
>>>
>>> Thank you for your feedback. I think *Early December* does not mean we
>>> will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are a
>>> big deal. Normally, it would be nice to give enough buffer. Based on my
>>> understanding, this email is just a *proposal* and a *reminder*. In the
>>> past, we often got mixed feedbacks.
>>>
>>> Anyway, we are collecting the feedbacks from the whole community.
>>> Welcome the inputs from everyone else
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:
>>>
 Hi, Xiao.

 I agree.

 > Merging the feature work after the branch cut should not be
 encouraged in general, although some committers did make some exceptions
 based on their own judgement. We should try to avoid merging the feature
 work after the branch cut.

 So, the Apache Spark community accepted your request for delay already.
 (Early November to Early December)

 -
 https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca

 I don't think the branch cut should be delayed again. We don't need to
 have two weeks after Hyukjin's email.

 Given the delay, I'd strongly recommend to cut the branch on 1st
 December.

 I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to start
 to stabilize .

 Again, it will not block you if you have an exceptional request.

 However, it would be helpful for all of us if you make it clear what
 features you are waiting for now.

 We are creating Apache Spark together.

 Bests,
 Dongjoon.


 On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:

> Correction:
>
> Merging the feature work after the branch cut should not be encouraged
> in general, although some committers did make some exceptions based on
> their own judgement. We should try to avoid merging the feature work after
> the branch cut.
>
> This email is a good reminder message. At least, we have two weeks
> ahead of the proposed branch cut date. I hope each feature owner might
> hurry up and try to finish it before the branch cut.
>
> Xiao
>
> Xiao Li  于2020年11月19日周四 下午11:36写道:
>
>> We should try to merge the feature work after the branch cut. This
>> should not be encouraged in general, although some committers did make 
>> some
>> exceptions based on their own judgement.
>>
>> This email is a good reminder message. At least, we have two weeks
>> ahead of the proposed branch cut date. I hope each feature owner might
>> hurry up and try to finish it before the branch cut.
>>
>> Xiao
>>
>> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>>
>>> Thank you for your volunteering!
>>>
>>> Since the previous branch-cuts were always soft-code freeze which
>>> allowed committers to merge to the new branches still for a while, I
>>> believe 1st December will be better for stabilization.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I think we haven’t decided yet the exact branch-cut, code freeze
 and release manager.

 As we planned in https://spark.apache.org/versioning-policy.html

 Early Dec 2020 Code freeze. Release branch cut

 Code freeze and branch cutting is coming.

 Therefore, we should finish if there are any remaining works for
 Spark 3.1, and
 switch to QA mode soon.
 I think it’s time to set to keep it on track, and I would like to
 volunteer to help drive this process.

 I am currently thinking 4th Dec as the branch-cut date.

 Any thoughts?

 Thanks all.



-- 
Ryan Blue
Software Engineer
Netflix


Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Dongjoon Hyun
It sounds great! :)

Thanks, Ryan.

On Fri, Nov 20, 2020 at 9:19 AM Ryan Blue  wrote:

> I think we should be able to get the CREATE TABLE changes in. Now that the
> main blocker (EXTERNAL) has been decided, it's just a matter of normal
> review comments.
>
> On Fri, Nov 20, 2020 at 9:05 AM Dongjoon Hyun 
> wrote:
>
>> Thank you for sharing, Xiao.
>>
>> I hope we are able to make some agreement for CREATE TABLE DDLs, too.
>>
>> Bests,
>> Dongjoon.
>>
>> On Fri, Nov 20, 2020 at 9:01 AM Xiao Li  wrote:
>>
>>> https://github.com/apache/spark/pull/28026 is the major feature I am
>>> tracking. It is painful to keep two sets of CREATE TABLE DDLs with
>>> different behaviors. This hurts the usability of our SQL users, based on
>>> what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I think
>>> we should try our best to address it in 3.1.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Xiao Li  于2020年11月20日周五 上午8:52写道:
>>>
 Hi, Dongjoon,

 Thank you for your feedback. I think *Early December* does not mean we
 will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are a
 big deal. Normally, it would be nice to give enough buffer. Based on my
 understanding, this email is just a *proposal* and a *reminder*. In
 the past, we often got mixed feedbacks.

 Anyway, we are collecting the feedbacks from the whole community.
 Welcome the inputs from everyone else

 Thanks,

 Xiao

 Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:

> Hi, Xiao.
>
> I agree.
>
> > Merging the feature work after the branch cut should not be
> encouraged in general, although some committers did make some exceptions
> based on their own judgement. We should try to avoid merging the feature
> work after the branch cut.
>
> So, the Apache Spark community accepted your request for delay
> already. (Early November to Early December)
>
> -
> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>
> I don't think the branch cut should be delayed again. We don't need to
> have two weeks after Hyukjin's email.
>
> Given the delay, I'd strongly recommend to cut the branch on 1st
> December.
>
> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to
> start to stabilize .
>
> Again, it will not block you if you have an exceptional request.
>
> However, it would be helpful for all of us if you make it clear what
> features you are waiting for now.
>
> We are creating Apache Spark together.
>
> Bests,
> Dongjoon.
>
>
> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li  wrote:
>
>> Correction:
>>
>> Merging the feature work after the branch cut should not be
>> encouraged in general, although some committers did make some exceptions
>> based on their own judgement. We should try to avoid merging the feature
>> work after the branch cut.
>>
>> This email is a good reminder message. At least, we have two weeks
>> ahead of the proposed branch cut date. I hope each feature owner might
>> hurry up and try to finish it before the branch cut.
>>
>> Xiao
>>
>> Xiao Li  于2020年11月19日周四 下午11:36写道:
>>
>>> We should try to merge the feature work after the branch cut. This
>>> should not be encouraged in general, although some committers did make 
>>> some
>>> exceptions based on their own judgement.
>>>
>>> This email is a good reminder message. At least, we have two weeks
>>> ahead of the proposed branch cut date. I hope each feature owner might
>>> hurry up and try to finish it before the branch cut.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>>>
 Thank you for your volunteering!

 Since the previous branch-cuts were always soft-code freeze which
 allowed committers to merge to the new branches still for a while, I
 believe 1st December will be better for stabilization.

 Bests,
 Dongjoon.


 On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I think we haven’t decided yet the exact branch-cut, code freeze
> and release manager.
>
> As we planned in https://spark.apache.org/versioning-policy.html
>
> Early Dec 2020 Code freeze. Release branch cut
>
> Code freeze and branch cutting is coming.
>
> Therefore, we should finish if there are any remaining works for
> Spark 3.1, and
> switch to QA mode soon.
> I think it’s time to set to keep it on track, and I would like to
> volunteer to help drive this process.
>
> I am currently thinking 4th Dec as the branch-cut date.
>
> Any thoughts?
>

Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Xiao Li
Thank you, Ryan!

Xiao

Dongjoon Hyun  于2020年11月20日周五 上午9:20写道:

> It sounds great! :)
>
> Thanks, Ryan.
>
> On Fri, Nov 20, 2020 at 9:19 AM Ryan Blue  wrote:
>
>> I think we should be able to get the CREATE TABLE changes in. Now that
>> the main blocker (EXTERNAL) has been decided, it's just a matter of normal
>> review comments.
>>
>> On Fri, Nov 20, 2020 at 9:05 AM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for sharing, Xiao.
>>>
>>> I hope we are able to make some agreement for CREATE TABLE DDLs, too.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Fri, Nov 20, 2020 at 9:01 AM Xiao Li  wrote:
>>>
 https://github.com/apache/spark/pull/28026 is the major feature I am
 tracking. It is painful to keep two sets of CREATE TABLE DDLs with
 different behaviors. This hurts the usability of our SQL users, based on
 what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I think
 we should try our best to address it in 3.1.

 Thanks,

 Xiao

 Xiao Li  于2020年11月20日周五 上午8:52写道:

> Hi, Dongjoon,
>
> Thank you for your feedback. I think *Early December* does not mean
> we will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th are 
> a
> big deal. Normally, it would be nice to give enough buffer. Based on my
> understanding, this email is just a *proposal* and a *reminder*. In
> the past, we often got mixed feedbacks.
>
> Anyway, we are collecting the feedbacks from the whole community.
> Welcome the inputs from everyone else
>
> Thanks,
>
> Xiao
>
> Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:
>
>> Hi, Xiao.
>>
>> I agree.
>>
>> > Merging the feature work after the branch cut should not be
>> encouraged in general, although some committers did make some exceptions
>> based on their own judgement. We should try to avoid merging the feature
>> work after the branch cut.
>>
>> So, the Apache Spark community accepted your request for delay
>> already. (Early November to Early December)
>>
>> -
>> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>>
>> I don't think the branch cut should be delayed again. We don't need
>> to have two weeks after Hyukjin's email.
>>
>> Given the delay, I'd strongly recommend to cut the branch on 1st
>> December.
>>
>> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to
>> start to stabilize .
>>
>> Again, it will not block you if you have an exceptional request.
>>
>> However, it would be helpful for all of us if you make it clear what
>> features you are waiting for now.
>>
>> We are creating Apache Spark together.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li 
>> wrote:
>>
>>> Correction:
>>>
>>> Merging the feature work after the branch cut should not be
>>> encouraged in general, although some committers did make some exceptions
>>> based on their own judgement. We should try to avoid merging the feature
>>> work after the branch cut.
>>>
>>> This email is a good reminder message. At least, we have two weeks
>>> ahead of the proposed branch cut date. I hope each feature owner might
>>> hurry up and try to finish it before the branch cut.
>>>
>>> Xiao
>>>
>>> Xiao Li  于2020年11月19日周四 下午11:36写道:
>>>
 We should try to merge the feature work after the branch cut. This
 should not be encouraged in general, although some committers did make 
 some
 exceptions based on their own judgement.

 This email is a good reminder message. At least, we have two weeks
 ahead of the proposed branch cut date. I hope each feature owner might
 hurry up and try to finish it before the branch cut.

 Xiao

 Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:

> Thank you for your volunteering!
>
> Since the previous branch-cuts were always soft-code freeze which
> allowed committers to merge to the new branches still for a while, I
> believe 1st December will be better for stabilization.
>
> Bests,
> Dongjoon.
>
>
> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I think we haven’t decided yet the exact branch-cut, code freeze
>> and release manager.
>>
>> As we planned in https://spark.apache.org/versioning-policy.html
>>
>> Early Dec 2020 Code freeze. Release branch cut
>>
>> Code freeze and branch cutting is coming.
>>
>> Therefore, we should finish if there are any remaining works for
>> Spark 3.1, and
>> switch to QA mode soon.
>> I think it’s tim

[SS] full outer stream-stream join

2020-11-20 Thread real-cheng-su
Hi,
 
Stream-stream join in spark structured streaming right now supports INNER,
LEFT OUTER, RIGHT OUTER and LEFT SEMI join type. But it does not support
FULL OUTER join and we are working on to add it in
https://github.com/apache/spark/pull/30395 .
 
Given LEFT OUTER and RIGHT OUTER stream-stream join is supported, the code
needed for FULL OUTER join is actually quite straightforward:

* For left side input row, check if there's a match on right side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in left side state store.
* For right side input row, check if there's a match on left side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in right side state store.
* State store eviction: evict rows from left/right side state store below
watermark, and output rows never matched before (a combination of left outer
and right outer join).

Given FULL OUTER join consumes same amount of space in state store, compared
with INNER/LEFT OUTER/RIGH OUTER join, and pretty easy to add. I don’t see
any issues from system perspective that FULL OUTER join should not be added.

I am wondering is there any major blocker to add FULL OUTER stream-stream
join? Asking in dev mailing list in case we miss anything besides PR review
participation, thanks.

Cheng Su



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 3.1 branch cut 4th Dec?

2020-11-20 Thread Hyukjin Kwon
Just for the record, I'll stick to the date we documented at
https://spark.apache.org/versioning-policy.html

Should be best to stick to what we wrote there given they we delayed once
already.


On Sat, 21 Nov 2020, 02:28 Xiao Li,  wrote:

> Thank you, Ryan!
>
> Xiao
>
> Dongjoon Hyun  于2020年11月20日周五 上午9:20写道:
>
>> It sounds great! :)
>>
>> Thanks, Ryan.
>>
>> On Fri, Nov 20, 2020 at 9:19 AM Ryan Blue  wrote:
>>
>>> I think we should be able to get the CREATE TABLE changes in. Now that
>>> the main blocker (EXTERNAL) has been decided, it's just a matter of normal
>>> review comments.
>>>
>>> On Fri, Nov 20, 2020 at 9:05 AM Dongjoon Hyun 
>>> wrote:
>>>
 Thank you for sharing, Xiao.

 I hope we are able to make some agreement for CREATE TABLE DDLs, too.

 Bests,
 Dongjoon.

 On Fri, Nov 20, 2020 at 9:01 AM Xiao Li  wrote:

> https://github.com/apache/spark/pull/28026 is the major feature I am
> tracking. It is painful to keep two sets of CREATE TABLE DDLs with
> different behaviors. This hurts the usability of our SQL users, based on
> what I heard. Unfortunately, this PR missed Spark 3.0 release. Now, I 
> think
> we should try our best to address it in 3.1.
>
> Thanks,
>
> Xiao
>
> Xiao Li  于2020年11月20日周五 上午8:52写道:
>
>> Hi, Dongjoon,
>>
>> Thank you for your feedback. I think *Early December* does not mean
>> we will cut the branch on Dec 1st. I do not think Dec 1st and Dec 4th 
>> are a
>> big deal. Normally, it would be nice to give enough buffer. Based on my
>> understanding, this email is just a *proposal* and a *reminder*. In
>> the past, we often got mixed feedbacks.
>>
>> Anyway, we are collecting the feedbacks from the whole community.
>> Welcome the inputs from everyone else
>>
>> Thanks,
>>
>> Xiao
>>
>> Dongjoon Hyun  于2020年11月20日周五 上午8:33写道:
>>
>>> Hi, Xiao.
>>>
>>> I agree.
>>>
>>> > Merging the feature work after the branch cut should not be
>>> encouraged in general, although some committers did make some exceptions
>>> based on their own judgement. We should try to avoid merging the feature
>>> work after the branch cut.
>>>
>>> So, the Apache Spark community accepted your request for delay
>>> already. (Early November to Early December)
>>>
>>> -
>>> https://github.com/apache/spark-website/commit/0cd0bdc80503882b4737db7e77cc8f9d17ec12ca
>>>
>>> I don't think the branch cut should be delayed again. We don't need
>>> to have two weeks after Hyukjin's email.
>>>
>>> Given the delay, I'd strongly recommend to cut the branch on 1st
>>> December.
>>>
>>> I'll create a `branch-3.1` on 1st December if Hyujkjin is busy to
>>> start to stabilize .
>>>
>>> Again, it will not block you if you have an exceptional request.
>>>
>>> However, it would be helpful for all of us if you make it clear what
>>> features you are waiting for now.
>>>
>>> We are creating Apache Spark together.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Nov 19, 2020 at 11:38 PM Xiao Li 
>>> wrote:
>>>
 Correction:

 Merging the feature work after the branch cut should not be
 encouraged in general, although some committers did make some 
 exceptions
 based on their own judgement. We should try to avoid merging the 
 feature
 work after the branch cut.

 This email is a good reminder message. At least, we have two weeks
 ahead of the proposed branch cut date. I hope each feature owner might
 hurry up and try to finish it before the branch cut.

 Xiao

 Xiao Li  于2020年11月19日周四 下午11:36写道:

> We should try to merge the feature work after the branch cut. This
> should not be encouraged in general, although some committers did 
> make some
> exceptions based on their own judgement.
>
> This email is a good reminder message. At least, we have two weeks
> ahead of the proposed branch cut date. I hope each feature owner might
> hurry up and try to finish it before the branch cut.
>
> Xiao
>
> Dongjoon Hyun  于2020年11月19日周四 下午4:02写道:
>
>> Thank you for your volunteering!
>>
>> Since the previous branch-cuts were always soft-code freeze which
>> allowed committers to merge to the new branches still for a while, I
>> believe 1st December will be better for stabilization.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Nov 19, 2020 at 3:50 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I think we haven’t decided yet the exact branch-cut, code freeze
>>> and r