Yes, this does not read raw pickle files. It reads files written in
the standard Spark/Hadoop form for binary objects (SequenceFiles) but
uses Python pickling for the serialization. See the docs, which say
this reads what saveAsPickleFile() writes.
On Mon, Aug 26, 2019 at 12:23 AM hxngillani wrot
The error you provided hints that pySpark seems to read pickle files as
sequence files but are written as simple pickle files without having
sequencefile format in mind.
I’m no pySpark expert, but I suggest you look into loading the pickle files as
binary file and deserialize at custom code.
ht