Slight update: broadcast doesn't block by default, but GraphX does:

- RDD: true
- Broadcast: false
- Dataset / DataFrame: false
- Graph (in GraphX): true
- Pyspark RDD: (no option)
- Pyspark Broadcast: false
- Pyspark DataFrame: false

(Scala) RDD is the odd one out, with its subclasses in GraphX. I'll
make a PR. Most callers already specify non-blocking; some tests seem
to need the blocking behavior though so that can be retained.

Related: Broadcast's .destroy() method is also blocking by default but
almost every invocation of it sets blocking=false. While I'm at it,
will probably adjust that too for consistency.


On Mon, Jan 28, 2019 at 11:21 AM Reynold Xin <r...@databricks.com> wrote:
>
> Seems to make sense to have it false by default.
>
> (I agree this deserves a dev list mention though even if there is easy 
> consensus). We should make sure we mark the Jira with releasenotes so we can 
> add it to uograde guide.
>
> On Mon, Jan 28, 2019 at 8:47 AM Sean Owen <sro...@gmail.com> wrote:
>>
>> Interesting notion at https://github.com/apache/spark/pull/23650 :
>>
>> .unpersist() takes an optional 'blocking' argument. If true, the call
>> waits until the resource is freed. Otherwise it doesn't.
>>
>> The default looks pretty inconsistent:
>> - RDD: true
>> - Broadcast: true
>> - Dataset / DataFrame: false
>> - Graph (in GraphX): false
>> - Pyspark RDD: (no option)
>> - Pyspark Broadcast: false
>> - Pyspark DataFrame: false
>>
>> I think false is a better default, as I'd expect it's much more likely
>> that the caller doesn't want to wait around while resources are freed,
>> especially as this happens on the driver. The possible downside is
>> that if the resources don't free up quickly, other operations might
>> not have as much memory available as they otherwise might have.
>>
>> What about making the default false everywhere for Spark 3?
>> I raised it to dev@ just because that seems like a nontrivial behavior
>> change, but maybe it isn't controversial.
>>
>> Sean
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to