Is the Manager a python multiprocessing manager? Why are you using
parallelism on python when theoretically most of the heavy lifting is done
via spark?
On Wed, Oct 28, 2015 at 4:27 PM agg212 wrote:
> I would just like to be able to put a Spark DataFrame in a manager.dict()
> and
> be able to ge
I would just like to be able to put a Spark DataFrame in a manager.dict() and
be able to get it out (manager.dict() calls pickle on the object being
stored). Ideally, I would just like to store a pointer to the DataFrame
object so that it remains distributed within Spark (i.e., not materialize
and
What are you trying to accomplish to pickle a Spark DataFrame? If your
dataset is large, it doesn't make much sense to pickle it. If your dataset
is small, maybe it's best to just pickle a Pandas dataframe.
On Tue, Oct 27, 2015 at 9:47 PM, agg212 wrote:
> Hi, I'd like to "pickle" a Spark DataFr