Hi Angel,

AFAIK many people rely on the Spark UI to debug/inspect their queries with
the query pan tree and metrics, but you are right that plan string
generation is expensive, and we shouldn't do it for every AQE plan change.
Maybe we should do it only once to report the final plan for AQE? Let's
continue the discussion on the PR.

On Thu, Feb 6, 2025 at 1:48 PM Ángel <angel.alvarez.pas...@gmail.com> wrote:

> I'd like to add that Spark is not as fast as it should be, primarily due
> to its internal verbosity, as reported in ticket *SPARK-50992
> <https://issues.apache.org/jira/browse/SPARK-50992>*. After submitting
> this  PR <https://github.com/apache/spark/pull/49724>, I received some
> comments, which I quickly addressed, but the PR has since stalled.
>
> I strongly believe that Spark should prioritize performance over internal
> logging, especially when it has such a significant impact on execution
> speed and can lead to memory issues.
>
> In *GraphFrames*, the temporary workaround was to disable *AQE (Adaptive
> Query Execution)*. Just last week, I gave the same advice to a colleague
> experiencing performance issues with a *Databricks* notebook—and it
> worked. Disabling *AQE* to improve performance because Spark continuously
> generates string descriptions of physical plans internally -  that very
> likely noone is going to make use of them - makes little sense to me.
> PS: I wish I was wrong, but I really think I am not.
> PS2: The first part of a series of articles I'm wrting about this issue:
> link
> <https://medium.com/@angel.alvarez.pascua/apache-spark-wtf-i-like-it-when-a-plan-comes-together-part-i-48c52a667288>
>
> El jue, 6 feb 2025 a las 6:30, Adam Hobbs
> (<adam.ho...@bendigoadelaide.com.au.invalid>) escribió:
>
>> I'd like to add something around the failure to get any traction on
>> shepparding of the structured streaming DRA PR.  Multiple times now there
>> have been calls for help to get this initiative over the line and the
>> response has been disappointing.  The github PR has been closed due to
>> inaction (https://github.com/apache/spark/pull/42352).
>>
>> This seems like a bit of a failure in the process
>> .
>> Regards,
>>
>> Adam Hobbs
>>
>>
>> C2 - Internal Use
>> -----Original Message-----
>> From: Matei Zaharia <matei.zaha...@gmail.com>
>> Sent: Thursday, 6 February 2025 2:57 PM
>> To: Spark dev list <dev@spark.apache.org>
>> Cc: priv...@spark.apache.org
>> Subject: ASF board report draft for February 2025
>>
>> CAUTION: This email originated from outside of the organisation. Do not
>> click links or open attachments unless you recognise the sender's full
>> email address and know the content is safe.
>>
>>
>> It’s time to send our next ASF board report again on February 12th.
>> Here’s an initial draft — feel free to suggest changes:
>>
>> =====================
>>
>>
>> Description:
>>
>> Apache Spark is a fast and general purpose engine for large-scale data
>> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
>> well as a rich set of libraries including stream processing, machine
>> learning, and graph analytics.
>>
>> Issues for the board:
>>
>> - None
>>
>> Project status:
>>
>> - The Spark 4.0 branch has been cut and has entered the QA stage. We
>> encourage the community to test it out!
>> - We released Spark 3.5.4 on December 20th, 2024.
>> - The PMC voted to add one new committer (Bingkun Pan) and one new PMC
>> member (Jie Yang) to the project.
>> - The proposal to "Use plain text logs by default" was successfully
>> passed.
>>
>> Trademarks:
>>
>> - No changes since last report.
>>
>> Latest releases:
>>
>> - Spark 3.5.4 was released on Dec 20, 2024
>> - Spark 3.4.4 was released on Oct 27, 2024
>> - Spark 4.0 Preview 2 was released on Sept 26, 2024
>>
>> Committers and PMC:
>>
>> - The latest committer was added on Nov 13, 2024 (Bingkun Pan).
>> - The latest PMC member was added on Jan 21st, 2025 (Jie Yang).
>>
>> =====================
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>> ********************************************************************************
>>
>> This communication is intended only for use of the addressee and may
>> contain legally privileged and confidential information.
>> If you are not the addressee or intended recipient, you are notified that
>> any dissemination, copying or use of any of the information is unauthorised.
>>
>> The legal privilege and confidentiality attached to this e-mail is not
>> waived, lost or destroyed by reason of a mistaken delivery to you.
>> If you have received this message in error, we would appreciate an
>> immediate notification via e-mail to contac...@bendigoadelaide.com.au or
>> by phoning 1300 BENDIGO (1300 236 344), and ask that the e-mail be
>> permanently deleted from your system.
>>
>> Bendigo and Adelaide Bank Limited ABN 11 068 049 178
>>
>>
>> ********************************************************************************
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

Reply via email to