Re: ASF board report draft for February 2025

2025-02-05 Thread Ángel
If I'm not wrong, the events were still been generated and stored and contained the plans (but without the description). Maybe we could just simply... generate the strings "on demand" in a lazy fashion, when the user requests it on Spark UI. I don't know if that's even possible, just thought about

Re: ASF board report draft for February 2025

2025-02-05 Thread Wenchen Fan
Hi Angel, AFAIK many people rely on the Spark UI to debug/inspect their queries with the query pan tree and metrics, but you are right that plan string generation is expensive, and we shouldn't do it for every AQE plan change. Maybe we should do it only once to report the final plan for AQE? Let's

Re: ASF board report draft for February 2025

2025-02-05 Thread Ángel
I'd like to add that Spark is not as fast as it should be, primarily due to its internal verbosity, as reported in ticket *SPARK-50992 *. After submitting this PR , I received some comments, which I quic

RE: ASF board report draft for February 2025

2025-02-05 Thread Adam Hobbs
I'd like to add something around the failure to get any traction on shepparding of the structured streaming DRA PR. Multiple times now there have been calls for help to get this initiative over the line and the response has been disappointing. The github PR has been closed due to inaction (ht

ASF board report draft for February 2025

2025-02-05 Thread Matei Zaharia
It’s time to send our next ASF board report again on February 12th. Here’s an initial draft — feel free to suggest changes: = Description: Apache Spark is a fast and general purpose engine for large-scale data processing. It offers high-level APIs in Java, Scala, Python, R

Re: Docs look weird; can't build locally

2025-02-05 Thread Gengliang Wang
FYI, I’ve created a fix for this issue: [SPARK-51104][DOC] Self-host JavaScript and CSS in Spark website On Wed, Feb 5, 2025 at 4:27 PM Hyukjin Kwon wrote: > I updated Jeykll version at > https://github.com/apache/spark/commit/3675568a34d42054686eb213

Re: Quick question: can you guys navigate 3.5.4 Java API documentation?

2025-02-05 Thread Gengliang Wang
FYI, I’ve created a fix for this issue: [SPARK-51104][DOC] Self-host JavaScript and CSS in Spark website On Tue, Feb 4, 2025 at 11:10 PM wrote: > 1.It seems to be a `CSP` issue, as follows: > [image: d14447a60a387863b505f6b66b8e012a.png] > > 2.What is

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
Hi Adam, Thanks for raising your concerns! This is also why we are not making Spark Connect the default but providing an additional Spark distribution so that users can opt in easily. There is a simple fix for this security issue as @Hyukjin Kwon mentioned and we are working on it: https://github

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
> ... but are there examples of how you would actually do any of that with the new API mode? The api mode allows you to switch between Spark Connect architecture and Classic architecture easily for your application, during the migration phase. Your application still compiles with the full Spark de

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Hyukjin Kwon
This is exactly the same case with the Py4J gateway server. We can easily implement that - I am one of the maintainers of Py4J fwiw and running a local Spark Connect server is already there apart from the PR https://github.com/apache/spark/pull/49107. On Thu, 6 Feb 2025 at 10:40, Adam Binford wr

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Adam Binford
-1 (non-binding) for me. I've commented on the PR for this ( https://github.com/apache/spark/pull/49107), but in its current state this seems like it would introduce a massive security vulnerability. If a user launches a "Spark Connect enabled" cluster deploy mode job in a multi-tenant YARN cluster

Re: [Spark SQL]: Are SQL User-Defined Functions on the Roadmap?

2025-02-05 Thread Reynold Xin
There's already one here https://issues.apache.org/jira/browse/SPARK-46057 On Wed, Feb 5, 2025 at 5:16 PM Soumasish wrote: > Here I create one, https://issues.apache.org/jira/browse/SPARK-51102 > > Best Regards > Soumasish Goswami > in: www.linkedin.com/in/soumasish > # (415) 530-0405 > >-

Re: Docs look weird; can't build locally

2025-02-05 Thread Hyukjin Kwon
I updated Jeykll version at https://github.com/apache/spark/commit/3675568a34d42054686eb213d9a56750a6c5c2ab If this happens lately, this might be the culprit .. On Thu, 6 Feb 2025 at 04:00, Nicholas Chammas wrote: > The published docs look weird. Seems like something in the top navigation > bar

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Adam Binford
> > Your application is running as an individual process, which is fully > decoupled from the server (Spark driver). You can pick different Java/Scala > versions, Python versions, dependency versions, or even a different > language such as Go and Rust. I understand this part as a general benefit

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Chao Sun
+1 On Wed, Feb 5, 2025 at 8:42 AM Martin Grund wrote: > +1 > > On Wed, Feb 5, 2025 at 17:15 bo yang wrote: > >> +1 (non-binding) >> >> On Wed, Feb 5, 2025 at 7:51 AM Jules Damji wrote: >> >>> +1 (non-binding) >>> >>> Excuse the thumb typos >>> >>> >>> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen F

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Martin Grund
+1 On Wed, Feb 5, 2025 at 17:15 bo yang wrote: > +1 (non-binding) > > On Wed, Feb 5, 2025 at 7:51 AM Jules Damji wrote: > >> +1 (non-binding) >> >> Excuse the thumb typos >> >> >> On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan wrote: >> >>> Hi all, >>> >>> Given the positive feedback in the prev

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread bo yang
+1 (non-binding) On Wed, Feb 5, 2025 at 7:51 AM Jules Damji wrote: > +1 (non-binding) > > Excuse the thumb typos > > > On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan wrote: > >> Hi all, >> >> Given the positive feedback in the previous DISCUSS email >>

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Jules Damji
+1 (non-binding) Excuse the thumb typos On Tue, 04 Feb 2025 at 11:06 PM, Wenchen Fan wrote: > Hi all, > > Given the positive feedback in the previous DISCUSS email > , I'd > like to start the vote for the proposal "Publish addit

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Denny Lee
+1 (non-binding) On Wed, Feb 5, 2025 at 07:11 huaxin gao wrote: > +1 > > On Wed, Feb 5, 2025 at 1:39 AM Mich Talebzadeh > wrote: > >> +1 >> >> Dr Mich Talebzadeh, >> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR >> >>view my Linkedin profile >>

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread huaxin gao
+1 On Wed, Feb 5, 2025 at 1:39 AM Mich Talebzadeh wrote: > +1 > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > > > > > > > On Wed, 5 Feb 2025 at 08:26, Yumi

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Wenchen Fan
Hi Adam, May bad, I should have provided more context. The official way to use Spark Connect is to deploy a long-lived server to serve many clients (like thriftserver), which needs users to manually set it up, and there is no default. The so-called Spark Connect on-by-default here refers to the ne

Re: [DISCUSS] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Adam Binford
Long time Spark on YARN user with some maybe dumb questions but I'm guessing other users might be wondering the same things. First, what does "Spark Connect enabled by default" actually even mean? I assume this is referring to the "spark.api.mode" discussion from before, but even in that discussio

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Mich Talebzadeh
+1 Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On Wed, 5 Feb 2025 at 08:26, Yuming Wang wrote: > +1 > > On Wed, Feb 5, 2025 at 4:15 PM Max Gekk wrote: > >>

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Yuming Wang
+1 On Wed, Feb 5, 2025 at 4:15 PM Max Gekk wrote: > +1 > > On Wed, Feb 5, 2025 at 9:00 AM Yang Jie wrote: > >> +1 >> >> Jie Yang >> >> On 2025/02/05 07:38:08 Sakthi wrote: >> > +1 (non-binding) >> > >> > On Tue, Feb 4, 2025 at 11:25 PM DB Tsai wrote: >> > >> > > +1 >> > > >> > > DB Tsai | ht

Re: [VOTE] Publish additional Spark distribution with Spark Connect enabled

2025-02-05 Thread Max Gekk
+1 On Wed, Feb 5, 2025 at 9:00 AM Yang Jie wrote: > +1 > > Jie Yang > > On 2025/02/05 07:38:08 Sakthi wrote: > > +1 (non-binding) > > > > On Tue, Feb 4, 2025 at 11:25 PM DB Tsai wrote: > > > > > +1 > > > > > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > > > > > On Feb 4, 20