Hi,in spark 2.4.0,unpersist  a DF only un-cache the given DataSet ,and
re-compile dependent cached queries after removing the cached query., just
like the question in https://issues.apache.org/jira/browse/SPARK-21478.

When all the Jobs are done, we unpersist the cached data, It can take a
long time to rebuild the cached data which  will never be used again. Take
the following code for example.

val x1 = Seq(1).toDF()
x1.persist()


val x2 = x1.select($"value" * 2)

x2.persist()


val x3 = x2.select($"value" * 2)
x3.persist()

x1.count()

x2.count()

x3.count()
...

x1.unpersist() // never be used again, but will re-compile dependent
cached queries: x2, x3
x2.unpersist() //never be used again, but will re-compile dependent
cached queries: x3

x3.persist() // never be used again



 So, can we expose the parameters *cascade* in the unpersist method.Let the
user choose whether to rebuild or not

def unpersist(blocking: Boolean): this.type = {
  sparkSession.sharedState.cacheManager.uncacheQuery(this, cascade =
false, blocking)
  this
}

Reply via email to