Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-07 Thread shane knapp ☠
>
> Will you be nuking all the Jenkins-related code in the repo after the 23rd?
>
> probably not right away...  but soon after jenkins is shut down.  bits of
the docs and spark website will need to be updated as well.

shane
-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Time for Spark 3.2.1?

2021-12-07 Thread Dongjoon Hyun
+1 for new releases.

Dongjoon.

On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:

> +1 to make new maintenance releases for all 3.x branches.
>
> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>
>> Always fine by me if someone wants to roll a release.
>>
>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new
>> release of those wouldn't hurt either, if any of our release managers have
>> the time or inclination. 3.0.x is reaching unofficial end-of-life around
>> now anyway.
>>
>>
>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> It's been two months since Spark 3.2.0 release, and we have resolved
>>> many bug fixes and regressions. What do you guys think about rolling Spark
>>> 3.2.1 release?
>>>
>>> cc @huaxin gao  FYI who I happened to overhear
>>> that is interested in rolling the maintenance release :-).
>>>
>>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
Oh BTW, I realised that it's a holiday season soon this month including
Christmas and new year.
Shall we maybe start rolling the release around next January? I would leave
it to @huaxin gao  :-).

On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun  wrote:

> +1 for new releases.
>
> Dongjoon.
>
> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:
>
>> +1 to make new maintenance releases for all 3.x branches.
>>
>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>
>>> Always fine by me if someone wants to roll a release.
>>>
>>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new
>>> release of those wouldn't hurt either, if any of our release managers have
>>> the time or inclination. 3.0.x is reaching unofficial end-of-life around
>>> now anyway.
>>>
>>>
>>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon  wrote:
>>>
 Hi all,

 It's been two months since Spark 3.2.0 release, and we have resolved
 many bug fixes and regressions. What do you guys think about rolling Spark
 3.2.1 release?

 cc @huaxin gao  FYI who I happened to overhear
 that is interested in rolling the maintenance release :-).

>>>


Re: Time for Spark 3.2.1?

2021-12-07 Thread huaxin gao
I prefer to start rolling the release in January if there is no need to
publish it sooner :)

On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:

> Oh BTW, I realised that it's a holiday season soon this month including
> Christmas and new year.
> Shall we maybe start rolling the release around next January? I would
> leave it to @huaxin gao  :-).
>
> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
> wrote:
>
>> +1 for new releases.
>>
>> Dongjoon.
>>
>> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:
>>
>>> +1 to make new maintenance releases for all 3.x branches.
>>>
>>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>>
 Always fine by me if someone wants to roll a release.

 It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new
 release of those wouldn't hurt either, if any of our release managers have
 the time or inclination. 3.0.x is reaching unofficial end-of-life around
 now anyway.


 On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> It's been two months since Spark 3.2.0 release, and we have resolved
> many bug fixes and regressions. What do you guys think about rolling Spark
> 3.2.1 release?
>
> cc @huaxin gao  FYI who I happened to
> overhear that is interested in rolling the maintenance release :-).
>



Re: Time for Spark 3.2.1?

2021-12-07 Thread Hyukjin Kwon
SGTM!

On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:

> I prefer to start rolling the release in January if there is no need to
> publish it sooner :)
>
> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>
>> Oh BTW, I realised that it's a holiday season soon this month including
>> Christmas and new year.
>> Shall we maybe start rolling the release around next January? I would
>> leave it to @huaxin gao  :-).
>>
>> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
>> wrote:
>>
>>> +1 for new releases.
>>>
>>> Dongjoon.
>>>
>>> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:
>>>
 +1 to make new maintenance releases for all 3.x branches.

 On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:

> Always fine by me if someone wants to roll a release.
>
> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
> new release of those wouldn't hurt either, if any of our release managers
> have the time or inclination. 3.0.x is reaching unofficial end-of-life
> around now anyway.
>
>
> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> It's been two months since Spark 3.2.0 release, and we have resolved
>> many bug fixes and regressions. What do you guys think about rolling 
>> Spark
>> 3.2.1 release?
>>
>> cc @huaxin gao  FYI who I happened to
>> overhear that is interested in rolling the maintenance release :-).
>>
>


Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-07 Thread shane knapp ☠
created an issue to track stuff:

https://issues.apache.org/jira/browse/SPARK-37571

On Tue, Dec 7, 2021 at 8:25 AM shane knapp ☠  wrote:

> Will you be nuking all the Jenkins-related code in the repo after the 23rd?
>>
>> probably not right away...  but soon after jenkins is shut down.  bits of
> the docs and spark website will need to be updated as well.
>
> shane
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Time for Spark 3.2.1?

2021-12-07 Thread Gengliang Wang
+1 for new maintenance releases for all 3.x branches as well.

On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:

> SGTM!
>
> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>
>> I prefer to start rolling the release in January if there is no need to
>> publish it sooner :)
>>
>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>>
>>> Oh BTW, I realised that it's a holiday season soon this month including
>>> Christmas and new year.
>>> Shall we maybe start rolling the release around next January? I would
>>> leave it to @huaxin gao  :-).
>>>
>>> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
>>> wrote:
>>>
 +1 for new releases.

 Dongjoon.

 On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan  wrote:

> +1 to make new maintenance releases for all 3.x branches.
>
> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>
>> Always fine by me if someone wants to roll a release.
>>
>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
>> new release of those wouldn't hurt either, if any of our release managers
>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>> around now anyway.
>>
>>
>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> It's been two months since Spark 3.2.0 release, and we have resolved
>>> many bug fixes and regressions. What do you guys think about rolling 
>>> Spark
>>> 3.2.1 release?
>>>
>>> cc @huaxin gao  FYI who I happened to
>>> overhear that is interested in rolling the maintenance release :-).
>>>
>>


Re: [Apache Spark Jenkins] build system shutting down Dec 23th, 2021

2021-12-07 Thread Gengliang Wang
Thanks for the works, Shane!

On Wed, Dec 8, 2021 at 9:19 AM shane knapp ☠  wrote:

> created an issue to track stuff:
>
> https://issues.apache.org/jira/browse/SPARK-37571
>
> On Tue, Dec 7, 2021 at 8:25 AM shane knapp ☠  wrote:
>
>> Will you be nuking all the Jenkins-related code in the repo after the
>>> 23rd?
>>>
>>> probably not right away...  but soon after jenkins is shut down.  bits
>> of the docs and spark website will need to be updated as well.
>>
>> shane
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Mridul Muralidharan
+1 for maintenance release, and also +1 for doing this in Jan !

Thanks,
Mridul

On Tue, Dec 7, 2021 at 11:41 PM Gengliang Wang  wrote:

> +1 for new maintenance releases for all 3.x branches as well.
>
> On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:
>
>> SGTM!
>>
>> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>>
>>> I prefer to start rolling the release in January if there is no need to
>>> publish it sooner :)
>>>
>>> On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon  wrote:
>>>
 Oh BTW, I realised that it's a holiday season soon this month including
 Christmas and new year.
 Shall we maybe start rolling the release around next January? I would
 leave it to @huaxin gao  :-).

 On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
 wrote:

> +1 for new releases.
>
> Dongjoon.
>
> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan 
> wrote:
>
>> +1 to make new maintenance releases for all 3.x branches.
>>
>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>
>>> Always fine by me if someone wants to roll a release.
>>>
>>> It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
>>> new release of those wouldn't hurt either, if any of our release 
>>> managers
>>> have the time or inclination. 3.0.x is reaching unofficial end-of-life
>>> around now anyway.
>>>
>>>
>>> On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 It's been two months since Spark 3.2.0 release, and we have
 resolved many bug fixes and regressions. What do you guys think about
 rolling Spark 3.2.1 release?

 cc @huaxin gao  FYI who I happened to
 overhear that is interested in rolling the maintenance release :-).

>>>


Re: Time for Spark 3.2.1?

2021-12-07 Thread Jungtaek Lim
+1 for both releases and the time!

On Wed, Dec 8, 2021 at 3:46 PM Mridul Muralidharan  wrote:

>
> +1 for maintenance release, and also +1 for doing this in Jan !
>
> Thanks,
> Mridul
>
> On Tue, Dec 7, 2021 at 11:41 PM Gengliang Wang  wrote:
>
>> +1 for new maintenance releases for all 3.x branches as well.
>>
>> On Wed, Dec 8, 2021 at 8:19 AM Hyukjin Kwon  wrote:
>>
>>> SGTM!
>>>
>>> On Wed, 8 Dec 2021 at 09:07, huaxin gao  wrote:
>>>
 I prefer to start rolling the release in January if there is no need to
 publish it sooner :)

 On Tue, Dec 7, 2021 at 3:59 PM Hyukjin Kwon 
 wrote:

> Oh BTW, I realised that it's a holiday season soon this month
> including Christmas and new year.
> Shall we maybe start rolling the release around next January? I would
> leave it to @huaxin gao  :-).
>
> On Wed, 8 Dec 2021 at 06:19, Dongjoon Hyun 
> wrote:
>
>> +1 for new releases.
>>
>> Dongjoon.
>>
>> On Mon, Dec 6, 2021 at 8:51 PM Wenchen Fan 
>> wrote:
>>
>>> +1 to make new maintenance releases for all 3.x branches.
>>>
>>> On Tue, Dec 7, 2021 at 8:57 AM Sean Owen  wrote:
>>>
 Always fine by me if someone wants to roll a release.

 It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a
 new release of those wouldn't hurt either, if any of our release 
 managers
 have the time or inclination. 3.0.x is reaching unofficial end-of-life
 around now anyway.


 On Mon, Dec 6, 2021 at 6:55 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> It's been two months since Spark 3.2.0 release, and we have
> resolved many bug fixes and regressions. What do you guys think about
> rolling Spark 3.2.1 release?
>
> cc @huaxin gao  FYI who I happened to
> overhear that is interested in rolling the maintenance release :-).
>



[Proposal] Deprecate Trigger.Once and replace with Trigger.AvailableNow

2021-12-07 Thread Jungtaek Lim
Hi dev,

I would like to hear voices about deprecating Trigger.Once, and replacing
it with Trigger.AvailableNow [1] in Structured Streaming.

Rationalization:

The expected behavior of Trigger.Once is like reading all available data
after the last trigger and processing them. This holds true when the last
run was gracefully terminated, but there are cases streaming queries to not
be terminated gracefully. There is a possibility the last run may write the
offset (WAL) for the new batch before termination, then a new run of
Trigger.Once only processes the data which was built in the latest
unfinished batch, and doesn't process new data.

The behavior is not deterministic from the users' point of view, as end
users wouldn't know whether the last run wrote the offset or not, unless
they look into the query's checkpoint by themselves.

While Trigger.AvailableNow came to solve the scalability issue on
Trigger.Once, it also ensures that it tries to process all available data
at the point of time it is triggered, which consistently works as expected
behavior of Trigger.Once.

Proposed Plan:

- Deprecate Trigger.Once in Apache Spark 3.3
- Leave guidance to migrate to Trigger.AvailableNow in migration guide
- Replace all usages of Trigger.Once with Trigger.AvailableNow, except the
test cases of Trigger.Once itself

Please review the proposal and share your voice on this.

Thanks!
Jungtaek Lim

1. https://issues.apache.org/jira/browse/SPARK-36533