subject:"Re\: Pickle Spark DataFrame"

Re: Pickle Spark DataFrame

2015-11-03 Thread Justin Uang

Is the Manager a python multiprocessing manager? Why are you using parallelism on python when theoretically most of the heavy lifting is done via spark? On Wed, Oct 28, 2015 at 4:27 PM agg212 wrote: > I would just like to be able to put a Spark DataFrame in a manager.dict() > and > be able to ge

Re: Pickle Spark DataFrame

2015-10-28 Thread agg212

I would just like to be able to put a Spark DataFrame in a manager.dict() and be able to get it out (manager.dict() calls pickle on the object being stored). Ideally, I would just like to store a pointer to the DataFrame object so that it remains distributed within Spark (i.e., not materialize and

Re: Pickle Spark DataFrame

2015-10-28 Thread Reynold Xin

What are you trying to accomplish to pickle a Spark DataFrame? If your dataset is large, it doesn't make much sense to pickle it. If your dataset is small, maybe it's best to just pickle a Pandas dataframe. On Tue, Oct 27, 2015 at 9:47 PM, agg212 wrote: > Hi, I'd like to "pickle" a Spark DataFr