Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-06 Thread Matei Zaharia
+1 on multiple packages here. > On Feb 6, 2025, at 10:54 AM, Rozov, Vlad wrote: > > +1 (non-binding) > > Thank you, > > Vlad > >> On Feb 4, 2025, at 11:05 PM, Wenchen Fan wrote: >> >> Hi all, >> >> Given the positive feedback in the previous DISCUSS email >>

Re: Docs look weird; can't build locally

2025-02-06 Thread Wenchen Fan
Hi Nicholas, I don't know how to set JVM memory for this bundle command, but this PR might help reduce memory usage and avoid the OOM issue. On Fri, Feb 7, 2025 at 12:07 AM Nicholas Chammas wrote: > Gengliang has the CSS/display issue fixed >

Re: Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Mich Talebzadeh
Hi, Is this the correct link to this open source product? Armada - how to run millions of batch jobs over thousands of compute nodes using Kubernetes | G-Research I am fami

Re: Spark Website Styling Issues Partially Resolved

2025-02-06 Thread Reynold Xin
Thanks for fixing these! On Thu, Feb 6, 2025 at 4:40 PM Gengliang Wang wrote: > Hi all, > > The Spark website styling was recently broken due to a violation of > Content Security Policy (CSP). I have fixed the main website ( > spark.apache.org) and the latest documentation site (Spark 3.5.4 docs

Spark Website Styling Issues Partially Resolved

2025-02-06 Thread Gengliang Wang
Hi all, The Spark website styling was recently broken due to a violation of Content Security Policy (CSP). I have fixed the main website (spark.apache.org) and the latest documentation site (Spark 3.5.4 docs ). Most pages should now be functioning. However, s

Extending Spark with a custom ExternalClusterManager

2025-02-06 Thread Dejan Pejchev
Hello Spark community! My name is Dejan Pejchev, and I am a Software Engineer working at G-Research, and I am a maintainer of our Kubernetes multi-cluster batch scheduler called Armada. We are trying to build an integration with Spark, where we would like to use the spark-submit with a master arm

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-06 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Feb 4, 2025, at 11:05 PM, Wenchen Fan wrote: Hi all, Given the positive feedback in the previous DISCUSS email, I'd like to start the vote for the proposal "Publish additional Spark distrib

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-06 Thread John Zhuge
+1 (non-binding) On Thu, Feb 6, 2025 at 7:56 AM Dongjoon Hyun wrote: > +1 for the additional package. > > Dongjoon. > > On Wed, Feb 5, 2025 at 6:30 PM Wenchen Fan wrote: > >> Hi Adam, >> >> Thanks for raising your concerns! This is also why we are not making >> Spark Connect the default but pro

Re: Docs look weird; can't build locally

2025-02-06 Thread Nicholas Chammas
Gengliang has the CSS/display issue fixed . However, the problem that I described below with building the PySpark API docs persists, and I’m not the only one experiencing it . > On F

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-06 Thread Dongjoon Hyun
+1 for the additional package. Dongjoon. On Wed, Feb 5, 2025 at 6:30 PM Wenchen Fan wrote: > Hi Adam, > > Thanks for raising your concerns! This is also why we are not making Spark > Connect the default but providing an additional Spark distribution so that > users can opt in easily. There is a

Re: [DISCUSS] Spark - How to improve our release processes

2025-02-06 Thread Nicholas Chammas
Your first several points align with what I explained for Python regarding abstract vs. concrete dependencies. As I noted, the blocker for progress on reorganizing and cleaning up our Python dependencies in this way is committer alignment. > On Feb 6, 2025, at 9:30 AM, Nimrod Ofek wrote: >

Re: [DISCUSS] Spark - How to improve our release processes

2025-02-06 Thread Nimrod Ofek
Hi, I'll start with a disclaimer: I am mostly a Java / Scala developer so I am not that well oriented with Python best practices. Having said that, here are some thoughts I have about the subject, hope they make sense :) 1. I think that we need to differentiate between code and dependencies

Re: ASF board report draft for February 2025

2025-02-06 Thread Ángel
I guess you're right, sorry. I was only commenting on the statement: 'Apache Spark is a fast and general-purpose engine for large-scale data processing.' I never said this was a stopper. El jue, 6 feb 2025 a las 12:34, Mich Talebzadeh () escribió: > I don't see its relevance to ASF board report?

Re: ASF board report draft for February 2025

2025-02-06 Thread Mich Talebzadeh
I don't see its relevance to ASF board report? It is a minor technicality and probably tangential. It is not a show stopper and the Board does it need to worry about it. Best to take this discussion on its own thread Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analys

Re: ASF board report draft for February 2025

2025-02-06 Thread Ángel
Btw, while analyzing this issue, I've also noticed that exactly the same plan got stringified several times. Not only that, but even within a plan, the same nodes got stringified dozens and dozens of times. I haven't reported it because I added the memoization pattern to fix both things and, despit