The error you provided hints that pySpark seems to read pickle files as sequence files but are written as simple pickle files without having sequencefile format in mind.
I’m no pySpark expert, but I suggest you look into loading the pickle files as binary file and deserialize at custom code. https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryFiles <https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryFiles> Then you should be able to deserialize the records and flat map the results to get RDD[YourType]. Best Regards Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web: phenetic.io Handelsregister: Amtsgericht Köln (HRB 92595) Geschäftsführer: Roland Johann, Uwe Reimann > Am 26.08.2019 um 07:23 schrieb hxngillani <f2017279...@umt.edu.pk>: > > Hello Dear Members > i want to train model using Bigdl, i have data set of Medical images in the > form of pickle object files (,pck).that pickle file is a 3D image(3D array) > > i have tried > pickleRdd = sc.pickleFilehome/student/BigDL- > trainings/elephantscale/data/volumetric_data/329637-8.pck > sqlContext = SQLContext(sc) > df = sqlContext.createDataFrame(pickleRdd) > > this code throwing and error > Caused by: java.io.IOException: > file:/home/student/BigDL-trainings/elephantscale/data/volumetric_data/329637-8.pck > not a SequenceFile > > > the things i came to know is that > The function > sc.pickleFile > loads a pickle file that is created by > rdd.saveAsPickleFile > > I am loading a pickle file that is created by Python's "pickle" module > My Question is that is there any way to load that file in spark data frame > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >