If I'm not wrong, the events were still been generated and stored and contained the plans (but without the description). Maybe we could just simply... generate the strings "on demand" in a lazy fashion, when the user requests it on Spark UI.
I don't know if that's even possible, just thought about it while walking my dog ...🐶 El jue, 6 feb 2025, 8:41, Wenchen Fan <cloud0...@gmail.com> escribió: > Hi Angel, > > AFAIK many people rely on the Spark UI to debug/inspect their queries with > the query pan tree and metrics, but you are right that plan string > generation is expensive, and we shouldn't do it for every AQE plan change. > Maybe we should do it only once to report the final plan for AQE? Let's > continue the discussion on the PR. > > On Thu, Feb 6, 2025 at 1:48 PM Ángel <angel.alvarez.pas...@gmail.com> > wrote: > >> I'd like to add that Spark is not as fast as it should be, primarily due >> to its internal verbosity, as reported in ticket *SPARK-50992 >> <https://issues.apache.org/jira/browse/SPARK-50992>*. After submitting >> this PR <https://github.com/apache/spark/pull/49724>, I received some >> comments, which I quickly addressed, but the PR has since stalled. >> >> I strongly believe that Spark should prioritize performance over internal >> logging, especially when it has such a significant impact on execution >> speed and can lead to memory issues. >> >> In *GraphFrames*, the temporary workaround was to disable *AQE (Adaptive >> Query Execution)*. Just last week, I gave the same advice to a colleague >> experiencing performance issues with a *Databricks* notebook—and it >> worked. Disabling *AQE* to improve performance because Spark >> continuously generates string descriptions of physical plans internally - >> that very likely noone is going to make use of them - makes little sense to >> me. >> PS: I wish I was wrong, but I really think I am not. >> PS2: The first part of a series of articles I'm wrting about this issue: >> link >> <https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-i-like-it-when-a-plan-comes-together-part-i-48c52a667288> >> >> El jue, 6 feb 2025 a las 6:30, Adam Hobbs >> (<adam.ho...@bendigoadelaide.com.au.invalid>) escribió: >> >>> I'd like to add something around the failure to get any traction on >>> shepparding of the structured streaming DRA PR. Multiple times now there >>> have been calls for help to get this initiative over the line and the >>> response has been disappointing. The github PR has been closed due to >>> inaction (https://github.com/apache/spark/pull/42352). >>> >>> This seems like a bit of a failure in the process >>> . >>> Regards, >>> >>> Adam Hobbs >>> >>> >>> C2 - Internal Use >>> -----Original Message----- >>> From: Matei Zaharia <matei.zaha...@gmail.com> >>> Sent: Thursday, 6 February 2025 2:57 PM >>> To: Spark dev list <dev@spark.apache.org> >>> Cc: priv...@spark.apache.org >>> Subject: ASF board report draft for February 2025 >>> >>> CAUTION: This email originated from outside of the organisation. Do not >>> click links or open attachments unless you recognise the sender's full >>> email address and know the content is safe. >>> >>> >>> It’s time to send our next ASF board report again on February 12th. >>> Here’s an initial draft — feel free to suggest changes: >>> >>> ===================== >>> >>> >>> Description: >>> >>> Apache Spark is a fast and general purpose engine for large-scale data >>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as >>> well as a rich set of libraries including stream processing, machine >>> learning, and graph analytics. >>> >>> Issues for the board: >>> >>> - None >>> >>> Project status: >>> >>> - The Spark 4.0 branch has been cut and has entered the QA stage. We >>> encourage the community to test it out! >>> - We released Spark 3.5.4 on December 20th, 2024. >>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC >>> member (Jie Yang) to the project. >>> - The proposal to "Use plain text logs by default" was successfully >>> passed. >>> >>> Trademarks: >>> >>> - No changes since last report. >>> >>> Latest releases: >>> >>> - Spark 3.5.4 was released on Dec 20, 2024 >>> - Spark 3.4.4 was released on Oct 27, 2024 >>> - Spark 4.0 Preview 2 was released on Sept 26, 2024 >>> >>> Committers and PMC: >>> >>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan). >>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang). >>> >>> ===================== >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >>> ******************************************************************************** >>> >>> This communication is intended only for use of the addressee and may >>> contain legally privileged and confidential information. >>> If you are not the addressee or intended recipient, you are notified >>> that any dissemination, copying or use of any of the information is >>> unauthorised. >>> >>> The legal privilege and confidentiality attached to this e-mail is not >>> waived, lost or destroyed by reason of a mistaken delivery to you. >>> If you have received this message in error, we would appreciate an >>> immediate notification via e-mail to contac...@bendigoadelaide.com.au >>> or by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be >>> permanently deleted from your system. >>> >>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178 >>> >>> >>> ******************************************************************************** >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>