vrozov commented on code in PR #49276: URL: https://github.com/apache/spark/pull/49276#discussion_r1939814486
########## sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala: ########## @@ -126,7 +126,9 @@ class CacheManager extends Logging with AdaptiveSparkPlanHelper { if (storageLevel == StorageLevel.NONE) { // Do nothing for StorageLevel.NONE since it will not actually cache any data. } else if (lookupCachedDataInternal(normalizedPlan).nonEmpty) { - logWarning("Asked to cache already cached data.") + logWarning(log"An attempt was made to cache data even though the data had already been " + Review Comment: @gengliangwang sounds that we are going in a circle here. The warning message is logged when there is a call to `persist()` when DataSet is already present in the cache, so it is an indication of a bug in the code that calls `persist()`. Either there is a missing call to `unpersist()` on the Dataset (there may be a call to `unpersist()` on another Dataset that was not cached as in the sample I provided in the PR comment) or the call is not necessary at all. The warning message is a way for the user to be notified that there is a problem (bug) in the code and it is up to the user to see how to fix it. If you have better suggestion on the warning message wording, please post it here. As far as I can see, the proposed warning message indicates that - the data had already being cached, so the call may not be necessary - there may be a missing call to `unpersist()` or clear cache -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org