I would just like to be able to put a Spark DataFrame in a manager.dict() and
be able to get it out (manager.dict() calls pickle on the object being
stored). Ideally, I would just like to store a pointer to the DataFrame
object so that it remains distributed within Spark (i.e., not materialize
and
Hi, I'd like to "pickle" a Spark DataFrame object and have tried the
following:
import pickle
data = sparkContext.jsonFile(data_file) #load file
with open('out.pickle', 'wb') as handle:
pickle.dump(data, handle)
If I convert "data" to a Pandas DataFrame (e.g.,