itholic opened a new pull request, #49982: URL: https://github.com/apache/spark/pull/49982
### What changes were proposed in this pull request? This PR proposes to improve Column performance when DQC(DataFrameQueryContext) is disabled by delaying to call `getActiveSession` which is pretty expensive. ### Why are the changes needed? To improve the performance of Column operations. ### Does this PR introduce _any_ user-facing change? No, API changes but only improves the performance ### How was this patch tested? Manually tested, and also the existing CI should pass. ```python >>> spark.conf.get("spark.python.sql.dataFrameDebugging.enabled") 'false' ``` **Before fix** ```python >>> import time >>> import pyspark.sql.functions as F >>> >>> c = F.col("name") >>> start = time.time() >>> for i in range(10000): ... _ = c.alias("a") ... >>> print(time.time() - start) 2.061354875564575 ``` **After fix** ```python >>> import time >>> import pyspark.sql.functions as F >>> >>> c = F.col("name") >>> start = time.time() >>> for i in range(10000): ... _ = c.alias("a") ... >>> print(time.time() - start) 0.8050589561462402 ``` And there is no difference when the flag is on: ```python >>> spark.conf.get("spark.python.sql.dataFrameDebugging.enabled") 'true' ``` **Before fix** ```python >>> import time >>> import pyspark.sql.functions as F >>> >>> c = F.col("name") >>> start = time.time() >>> for i in range(10000): ... _ = c.alias("a") ... >>> print(time.time() - start) 3.755108118057251 ``` **After fix** ```python >>> import time >>> import pyspark.sql.functions as F >>> >>> c = F.col("name") >>> start = time.time() >>> for i in range(10000): ... _ = c.alias("a") ... >>> print(time.time() - start) 3.6577670574188232 ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org