Slight update: broadcast doesn't block by default, but GraphX does: - RDD: true - Broadcast: false - Dataset / DataFrame: false - Graph (in GraphX): true - Pyspark RDD: (no option) - Pyspark Broadcast: false - Pyspark DataFrame: false
(Scala) RDD is the odd one out, with its subclasses in GraphX. I'll make a PR. Most callers already specify non-blocking; some tests seem to need the blocking behavior though so that can be retained. Related: Broadcast's .destroy() method is also blocking by default but almost every invocation of it sets blocking=false. While I'm at it, will probably adjust that too for consistency. On Mon, Jan 28, 2019 at 11:21 AM Reynold Xin <r...@databricks.com> wrote: > > Seems to make sense to have it false by default. > > (I agree this deserves a dev list mention though even if there is easy > consensus). We should make sure we mark the Jira with releasenotes so we can > add it to uograde guide. > > On Mon, Jan 28, 2019 at 8:47 AM Sean Owen <sro...@gmail.com> wrote: >> >> Interesting notion at https://github.com/apache/spark/pull/23650 : >> >> .unpersist() takes an optional 'blocking' argument. If true, the call >> waits until the resource is freed. Otherwise it doesn't. >> >> The default looks pretty inconsistent: >> - RDD: true >> - Broadcast: true >> - Dataset / DataFrame: false >> - Graph (in GraphX): false >> - Pyspark RDD: (no option) >> - Pyspark Broadcast: false >> - Pyspark DataFrame: false >> >> I think false is a better default, as I'd expect it's much more likely >> that the caller doesn't want to wait around while resources are freed, >> especially as this happens on the driver. The possible downside is >> that if the resources don't free up quickly, other operations might >> not have as much memory available as they otherwise might have. >> >> What about making the default false everywhere for Spark 3? >> I raised it to dev@ just because that seems like a nontrivial behavior >> change, but maybe it isn't controversial. >> >> Sean >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org