vrozov opened a new pull request, #49276:
URL: https://github.com/apache/spark/pull/49276

   ### What changes were proposed in this pull request?
   
   The change improves warning logging in the CacheManager by:
   1. Adds logical plan info to the existing warning messages.
   2. Logs warning message in case an attempt is made to remove data from the 
cache, but data is not present.
   
   ### Why are the changes needed?
   
   The change helps to identify incorrect calls to `Dataset.persist()` and 
`Dataset.unpersist()` as in 
   ```
   Dataset<Row> dataset = ...
   Dataset<Row> dataset1 = dataset.withColumn(...);
   Dataset<Row> dataset2 = dataset1.withColumn(...);
   dataset.persist(); // OK
   dataset1.persist(); // OK
   dataset.persist(); // currently logs warning without logical plan details
   dataset.unpersist(); // OK
   dataset.unpersist(); // no warning
   dataset2.unpersist(); // no warning, the actual call should be on dataset1
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Users may see warning messages like:
   
   ```
   23.12.2024 19:15:03.840 WARN  [pool-30-thread-1] 
org.apache.spark.sql.execution.CacheManager - An attempt was made to cache data 
even though the data had already been cached. Please un-cache data or clear 
cache first.
   Logical plan:
   Relation [i#0] JDBCRelation(test_table) [numPartitions=1]
   ```
   and
   ```
   23.12.2024 19:15:04.207 WARN  [pool-30-thread-1] 
org.apache.spark.sql.execution.CacheManager - Data has not been previously 
cached or it was removed from the cache already.
   Logical plan:
   Project [i#0, i#0 AS year#6]
   +- Relation [i#0] JDBCRelation(test_table) [numPartitions=1]
   ```
   
   ### How was this patch tested?
   
   The change modifies warning log messages.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to