SauronShepherd commented on PR #49724:
URL: https://github.com/apache/spark/pull/49724#issuecomment-2623472398

   I was expecting your first point since it's a behavioral change. However, 
given that keeping the default value as-is has a major impact on memory I 
thought it was worth discussing. I still believe that converting plans to 
strings shouldn't be Spark's default behavior unless strictly necessary. 
Besides, I'm not sure many Spark users would even notice it -well, actually 
maybe they would do, because of the performance improvement in their 
applications-.
   
   I've been working with Spark for over eight years, and whenever I needed to 
inspect a plan, I explicitly ran the explain method rather than relying on the 
Spark UI. However, I’ve often seen teams resort to checkpointing their 
DataFrames simply because Spark took what felt like an eternity to begin 
execution— and this month I've observed the same looking at GraphFrames code. I 
still don’t see the need for Spark to be so verbose internally; in my view, 
performance should take precedence over verbosity in logging.
   
   Regarding the new "off" value being placed at the end, the changes are 
minimal. However, to me, it makes more sense for "off" to come before the value 
that produces less content (i.e., "simple"), following a logical progression 
from lower to higher verbosity.
   
   As for the change in cachedName, it doesn’t yield the exact same result, but 
since it’s merely a literal identifier -a description-  and not a key, this 
difference shouldn't matter. Like the previous changes, this modification is 
crucial to avoid traversing the entire tree of the plan each time a 
CachedRDDBuilder is created. Because even with the new explain mode off, 
despite avoiding the OOM, generating the string still has a significant impact 
in performance, even for not to large plans.
   
   That said, I have no problem in making these changes in my PR—I was just 
hoping to address an internal behavior that I’m not convinced Spark should have 
by default.
   
   Thanks for your feedback, @dongjoon-hyun 
   
   Ángel


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to