Broadcast Memory Management

Matthias Boehm Wed, 20 Sep 2017 10:30:07 -0700

Hi all,

could someone please help me understand the broadcast life cycle in detail,
especially with regard to memory management?


After reading through the TorrentBroadcast implementation, it seems that
for every broadcast object, the driver holds a strong reference to a
shallow copy (in MEMORY_AND_DISK) as well as a deep copy of the data in
chunked form (in MEMORY_AND_DISK). Now my questions:

1) Is this observation correct or does the driver also hold a strong
reference to the entire object in serialized form?

2) Are there scenarios, other than with local master or explicit reads in
the driver, where the shallow copy is actually used by Spark?

3) Is it a valid workaround to create a wrapper object around the data,
broadcast the wrapper, and immediately delete the data after it has been
blockified to remove the unnecessary memory requirements?


Regards,
Matthias

Broadcast Memory Management

Reply via email to