yangguoaws commented on code in PR #49276: URL: https://github.com/apache/spark/pull/49276#discussion_r1944708425
########## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ########## @@ -206,7 +209,10 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper { plan: LogicalPlan, cascade: Boolean, blocking: Boolean): Unit = { - uncacheByCondition(spark, _.sameResult(plan), cascade, blocking) + if (!uncacheByCondition(spark, _.sameResult(plan), cascade, blocking)) { + logWarning(log"Data has not been previously cached or it was removed from the " + + log"cache already.\nLogical plan:\n${MDC(QUERY_PLAN, plan)}") + } Review Comment: @gengliangwang This log is to warn that developers are trying to unpersist a query_plan which has not been previously cached and show the related query plan details. For example, this sample pyspark code is trying to unpersist a redefined a dataframe. This leavs the query plan of the original cached dataframe in CacheManager. If this happens in for loop or spark structured streaming foreachbatch, the driver memory will constantly increase and lead to memory issue. ``` df = spark.createDataFrame(data, ["name", "age", "city"]) df.persist() df.show() df = df.withColumn("NAME", upper(col("name"))) df.show() df.unpersist() ``` The proposed change here is to help developers easily identify they are trying to unpersist a query_plan which has not been previously cached. Then developers can review their code to confirm whether they are unpersisting a wrong dataframe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org