Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Sean Owen
Yes, this does not read raw pickle files. It reads files written in the standard Spark/Hadoop form for binary objects (SequenceFiles) but uses Python pickling for the serialization. See the docs, which say this reads what saveAsPickleFile() writes. On Mon, Aug 26, 2019 at 12:23 AM hxngillani wrot

Re: How to load Python Pickle File in Spark Data frame

2019-08-26 Thread Roland Johann
The error you provided hints that pySpark seems to read pickle files as sequence files but are written as simple pickle files without having sequencefile format in mind. I’m no pySpark expert, but I suggest you look into loading the pickle files as binary file and deserialize at custom code. ht