Interesting notion at https://github.com/apache/spark/pull/23650 :
.unpersist() takes an optional 'blocking' argument. If true, the call waits until the resource is freed. Otherwise it doesn't. The default looks pretty inconsistent: - RDD: true - Broadcast: true - Dataset / DataFrame: false - Graph (in GraphX): false - Pyspark RDD: (no option) - Pyspark Broadcast: false - Pyspark DataFrame: false I think false is a better default, as I'd expect it's much more likely that the caller doesn't want to wait around while resources are freed, especially as this happens on the driver. The possible downside is that if the resources don't free up quickly, other operations might not have as much memory available as they otherwise might have. What about making the default false everywhere for Spark 3? I raised it to dev@ just because that seems like a nontrivial behavior change, but maybe it isn't controversial. Sean --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org