Re: Pickle Spark DataFrame

2015-10-28 Thread agg212
I would just like to be able to put a Spark DataFrame in a manager.dict() and be able to get it out (manager.dict() calls pickle on the object being stored). Ideally, I would just like to store a pointer to the DataFrame object so that it remains distributed within Spark (i.e., not materialize and

Pickle Spark DataFrame

2015-10-27 Thread agg212
Hi, I'd like to "pickle" a Spark DataFrame object and have tried the following: import pickle data = sparkContext.jsonFile(data_file) #load file with open('out.pickle', 'wb') as handle: pickle.dump(data, handle) If I convert "data" to a Pandas DataFrame (e.g.,