Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
A good point agreed.

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Thu, 4 Jul 2024 at 06:14, Martin Grund 
wrote:

> Absolutely we should do that. I thought that the default rule was
> inclusive already so that once folks have their first contribution it would
> automatically allow kicking of the workflows.
>
> On Thu, Jul 4, 2024 at 04:20 Matthew Powers 
> wrote:
>
>> Yea, this would be great.
>>
>> spark-connect-go is still experimental and anything we can do to get it
>> production grade would be a great step IMO.  The Go community is excited to
>> write Spark... with Go!
>>
>> On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> The Spark Connect Go client repository (
>>> https://github.com/apache/spark-connect-go) requires GitHub Actions
>>> runs for individual commits within contributors' PRs.
>>>
>>> This policy was intentionally applied (
>>> https://issues.apache.org/jira/browse/INFRA-24387), but we can change
>>> this default once we reach a consensus on it.
>>>
>>> I would like to allow GitHub Actions runs for contributors by default to
>>> make the development faster. For now, I have been approving individual
>>> commits in their PRs, and this becomes overhead.
>>>
>>> If you have any feedback on this, please let me know.
>>>
>>


Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
Alright! let me start the vote!

On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh 
wrote:

> A good point agreed.
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> PhD  Imperial College
> London 
> London, United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Thu, 4 Jul 2024 at 06:14, Martin Grund 
> wrote:
>
>> Absolutely we should do that. I thought that the default rule was
>> inclusive already so that once folks have their first contribution it would
>> automatically allow kicking of the workflows.
>>
>> On Thu, Jul 4, 2024 at 04:20 Matthew Powers 
>> wrote:
>>
>>> Yea, this would be great.
>>>
>>> spark-connect-go is still experimental and anything we can do to get it
>>> production grade would be a great step IMO.  The Go community is excited to
>>> write Spark... with Go!
>>>
>>> On Wed, Jul 3, 2024 at 8:49 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 The Spark Connect Go client repository (
 https://github.com/apache/spark-connect-go) requires GitHub Actions
 runs for individual commits within contributors' PRs.

 This policy was intentionally applied (
 https://issues.apache.org/jira/browse/INFRA-24387), but we can change
 this default once we reach a consensus on it.

 I would like to allow GitHub Actions runs for contributors by default
 to make the development faster. For now, I have been approving individual
 commits in their PRs, and this becomes overhead.

 If you have any feedback on this, please let me know.

>>>


[VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
Hi all,

I’d like to start a vote for allowing GitHub Actions runs for contributors'
PRs without approvals in apache/spark-connect-go.

Please also refer to:

   - Discussion thread:
https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
   - JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal
[ ] +0
[ ] -1: I don’t think this is a good idea because …

Thank you!


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Denny Lee
+1 (non-binding)

On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  wrote:

> Hi all,
>
> I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>
>


Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-04 Thread Peter Toth
+1

John Zhuge  ezt írta (időpont: 2024. júl. 4., Cs, 5:38):

> +1
>
>
> John Zhuge
>
>
> On Wed, Jul 3, 2024 at 7:41 PM Gengliang Wang  wrote:
>
>> +1
>>
>> On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh  wrote:
>>>
 +1

 On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun 
 wrote:
 >
 > +1
 >
 > Dongjoon
 >
 > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng  wrote:
 >>
 >> +1
 >>
 >> Thank you @Hyukjin Kwon !
 >>
 >> On Wed, Jul 3, 2024 at 8:55 AM bo yang  wrote:
 >>>
 >>> +1 (non-binding)
 >>>
 >>>
 >>> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan 
 wrote:
 
  +1 (non-binding)
 
  Thanks,
  Cheng Pan
 
 
  On Jul 3, 2024, at 08:59, Hyukjin Kwon 
 wrote:
 
  Hi all,
 
  I’d like to start a vote for moving Spark Connect server to
 builtin package (Client API layer stays external).
 
  Please also refer to:
 
 - Discussion thread:
 https://lists.apache.org/thread/odlx9b552dp8yllhrdlp24pf9m9s4tmx
 - JIRA ticket:
 https://issues.apache.org/jira/browse/SPARK-48763
 
  Please vote on the SPIP for the next 72 hours:
 
  [ ] +1: Accept the proposal
  [ ] +0
  [ ] -1: I don’t think this is a good idea because …
 
  Thank you!
 
 

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Mich Talebzadeh
+1 non-binding

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
PhD  Imperial College
London 
London, United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Thu, 4 Jul 2024 at 12:13, Hyukjin Kwon  wrote:

> Hi all,
>
> I’d like to start a vote for allowing GitHub Actions runs for
> contributors' PRs without approvals in apache/spark-connect-go.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
> Thank you!
>
>


[DISCUSS] Auto scaling support for structured streaming

2024-07-04 Thread Nimrod Ofek
Hi,

I remember there was a discussion about better supporting auto scaling for
structured streaming.
Is there anything happening with that for the upcoming Spark 4.0 release?
Will there be support for auto scaling (at least on K8s) spark structured
streaming apps?

Thanks,
Nimrod


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Holden Karau
+1

Although given its a US holiday maybe keep the vote open for an extra day?

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  wrote:

> +1 (non-binding)
>
> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I’d like to start a vote for allowing GitHub Actions runs for
>> contributors' PRs without approvals in apache/spark-connect-go.
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thank you!
>>
>>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Martin Grund
+1 (non-binding)

On Thu, Jul 4, 2024 at 7:15 PM Holden Karau  wrote:

> +1
>
> Although given its a US holiday maybe keep the vote open for an extra day?
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  wrote:
>
>> +1 (non-binding)
>>
>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I’d like to start a vote for allowing GitHub Actions runs for
>>> contributors' PRs without approvals in apache/spark-connect-go.
>>>
>>> Please also refer to:
>>>
>>>- Discussion thread:
>>> https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
>>>- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>> Thank you!
>>>
>>>


Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-04 Thread Hyukjin Kwon
(I will leave this vote open till 10th July, considering that its holiday
season in US)

On Thu, 4 Jul 2024 at 23:39, Peter Toth  wrote:

> +1
>
> John Zhuge  ezt írta (időpont: 2024. júl. 4., Cs,
> 5:38):
>
>> +1
>>
>>
>> John Zhuge
>>
>>
>> On Wed, Jul 3, 2024 at 7:41 PM Gengliang Wang  wrote:
>>
>>> +1
>>>
>>> On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin 
>>> wrote:
>>>
 +1

 On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh  wrote:

> +1
>
> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Dongjoon
> >
> > On Wed, Jul 3, 2024 at 10:58 Xinrong Meng 
> wrote:
> >>
> >> +1
> >>
> >> Thank you @Hyukjin Kwon !
> >>
> >> On Wed, Jul 3, 2024 at 8:55 AM bo yang  wrote:
> >>>
> >>> +1 (non-binding)
> >>>
> >>>
> >>> On Tue, Jul 2, 2024 at 11:22 PM Cheng Pan 
> wrote:
> 
>  +1 (non-binding)
> 
>  Thanks,
>  Cheng Pan
> 
> 
>  On Jul 3, 2024, at 08:59, Hyukjin Kwon 
> wrote:
> 
>  Hi all,
> 
>  I’d like to start a vote for moving Spark Connect server to
> builtin package (Client API layer stays external).
> 
>  Please also refer to:
> 
> - Discussion thread:
> https://lists.apache.org/thread/odlx9b552dp8yllhrdlp24pf9m9s4tmx
> - JIRA ticket:
> https://issues.apache.org/jira/browse/SPARK-48763
> 
>  Please vote on the SPIP for the next 72 hours:
> 
>  [ ] +1: Accept the proposal
>  [ ] +0
>  [ ] -1: I don’t think this is a good idea because …
> 
>  Thank you!
> 
> 
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-04 Thread Hyukjin Kwon
(I will leave this vote open till 10th July, considering that its holiday
season in US)

On Fri, 5 Jul 2024 at 06:12, Martin Grund  wrote:

> +1 (non-binding)
>
> On Thu, Jul 4, 2024 at 7:15 PM Holden Karau 
> wrote:
>
>> +1
>>
>> Although given its a US holiday maybe keep the vote open for an extra day?
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Thu, Jul 4, 2024 at 7:33 AM Denny Lee  wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Thu, Jul 4, 2024 at 19:13 Hyukjin Kwon  wrote:
>>>
 Hi all,

 I’d like to start a vote for allowing GitHub Actions runs for
 contributors' PRs without approvals in apache/spark-connect-go.

 Please also refer to:

- Discussion thread:
 https://lists.apache.org/thread/tsqm0dv01f7jgkv5l4kyvtpw4tc6f420
- JIRA ticket: https://issues.apache.org/jira/browse/INFRA-25936

 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thank you!




Re: Spark decommission

2024-07-04 Thread Arun Ravi
Hi Rajesh,

We use it production at scale. We run spark on kubernetes on aws cloud and
here are the key things that we do
1) we run driver on on-demand node
2) we have configured decommission along with fallback option on to S3, try
the latest single zone S3 for this.
3) We use pvc aware scheduling, ie spark ensures executors try to reuse
available storage volumes created by the driver before requesting for a new
one.
4) we have enabled kubernetes shuffle io wrapper plugin, this allows new
executors to re-register shuffle blocks that it identifies in the reused
pvc. This feature ensures shuffles from lost executors are served by new
executor that refuses the disk.
5) we also configure to retain decommissioned executor details so that
spark can ignore intermittent shuffle fetch failures.

Some of these are best effort, you could also tune number of threads needed
for decommissioning etc based on your workload and run environment.

On Thu, 27 Jun 2024, 09:03 Rajesh Mahindra,  wrote:

> Hi folks,
>
> I am planning to leverage the "Spark Decommission" feature in production
> since our company uses SPOT instances on Kubernetes. I wanted to get a
> sense of how stable the feature is for production usage and if any one has
> thoughts around trying it out in production, especially in kubernetes
> environment.
>
> Thanks,
> Rajesh
>
>