itholic opened a new pull request, #49982:
URL: https://github.com/apache/spark/pull/49982

   ### What changes were proposed in this pull request?
   
   This PR proposes to  improve Column performance when 
DQC(DataFrameQueryContext) is disabled by delaying to call `getActiveSession` 
which is pretty expensive.
   
   ### Why are the changes needed?
   
   To improve the performance of Column operations.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, API changes but only improves the performance
   
   ### How was this patch tested?
   
   Manually tested, and also the existing CI should pass.
   
   ```python
   >>> spark.conf.get("spark.python.sql.dataFrameDebugging.enabled")
   'false'
   ```
   
   
   **Before fix**
   ```python
   >>> import time
   >>> import pyspark.sql.functions as F
   >>>
   >>> c = F.col("name")
   >>> start = time.time()
   >>> for i in range(10000):
   ...   _ = c.alias("a")
   ...
   >>> print(time.time() - start)
   2.061354875564575
   ```
   
   **After fix**
   ```python
   >>> import time
   >>> import pyspark.sql.functions as F
   >>>
   >>> c = F.col("name")
   >>> start = time.time()
   >>> for i in range(10000):
   ...   _ = c.alias("a")
   ...
   >>> print(time.time() - start)
   0.8050589561462402
   ```
   
   
   And there is no difference when the flag is on:
   
   
   ```python
   >>> spark.conf.get("spark.python.sql.dataFrameDebugging.enabled")
   'true'
   ```
   
   **Before fix**
   ```python
   >>> import time
   >>> import pyspark.sql.functions as F
   >>>
   >>> c = F.col("name")
   >>> start = time.time()
   >>> for i in range(10000):
   ...   _ = c.alias("a")
   ...
   >>> print(time.time() - start)
   3.755108118057251
   ```
   
   **After fix**
   ```python
   >>> import time
   >>> import pyspark.sql.functions as F
   >>>
   >>> c = F.col("name")
   >>> start = time.time()
   >>> for i in range(10000):
   ...   _ = c.alias("a")
   ...
   >>> print(time.time() - start)
   3.6577670574188232
   ```
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to