yangguoaws commented on code in PR #49276:
URL: https://github.com/apache/spark/pull/49276#discussion_r1944708425


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala:
##########
@@ -206,7 +209,10 @@ class CacheManager extends Logging with 
AdaptiveSparkPlanHelper {
       plan: LogicalPlan,
       cascade: Boolean,
       blocking: Boolean): Unit = {
-    uncacheByCondition(spark, _.sameResult(plan), cascade, blocking)
+    if (!uncacheByCondition(spark, _.sameResult(plan), cascade, blocking)) {
+      logWarning(log"Data has not been previously cached or it was removed 
from the " +
+        log"cache already.\nLogical plan:\n${MDC(QUERY_PLAN, plan)}")
+    }

Review Comment:
   @gengliangwang This log is to warn that developers are trying to unpersist a 
query_plan which has not been previously cached and show the related query plan 
details.
   
   For example, this sample pyspark code is trying to unpersist a redefined a 
dataframe. This leavs the query plan of the original cached dataframe in 
CacheManager. If this happens in for loop or spark structured streaming 
foreachbatch, the driver memory will constantly increase and lead to memory 
issue.
   ```
       df = spark.createDataFrame(data, ["name", "age", "city"])
       df.persist()
       df.show()
       df = df.withColumn("NAME", upper(col("name")))
       df.show()
       df.unpersist()
   ```
   
   The proposed change here is to help developers easily identify they are 
trying to unpersist a query_plan which has not been previously cached. Then 
developers can review their code to confirm whether they are unpersisting a 
wrong dataframe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to