Hi Naresh, Thank you for the quick response, appreciate it. Removing the option("header","true") and trying
df = spark.read.parquet("test.parquet"), now can read the parquet works. However, I would like to find a way to have the data in csv/readable. still I cannot save df as csv as it throws. ava.lang.UnsupportedOperationException: CSV data source does not support struct<type:tinyint,size:int,indices:array<int>,values:array<double>> data type. Any idea? Best regards, Mina On Tue, Mar 27, 2018 at 10:51 PM, naresh Goud <nareshgoud.du...@gmail.com> wrote: > In case of storing as parquet file I don’t think it requires header. > option("header","true") > > Give a try by removing header option and then try to read it. I haven’t > tried. Just a thought. > > Thank you, > Naresh > > > On Tue, Mar 27, 2018 at 9:47 PM Mina Aslani <aslanim...@gmail.com> wrote: > >> Hi, >> >> >> I am using pyspark. To transform my sample data and create model, I use >> stringIndexer and OneHotEncoder. >> >> >> However, when I try to write data as csv using below command >> >> df.coalesce(1).write.option("header","true").mode(" >> overwrite").csv("output.csv") >> >> >> I get UnsupportedOperationException >> >> java.lang.UnsupportedOperationException: CSV data source does not >> support struct<type:tinyint,size:int,indices:array<int>,values:array<double>> >> data type. >> >> Therefore, to save data and avoid getting the error I use >> >> >> df.coalesce(1).write.option("header","true").mode(" >> overwrite").save("output") >> >> >> The above command saves data but it's in parquet format. >> How can I read parquet file and convert to csv to observe the data? >> >> When I use >> >> df = spark.read.parquet("1.parquet"), it throws: >> >> ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 >> outstanding blocks >> >> Your input is appreciated. >> >> >> Best regards, >> >> Mina >> >> >> >> -- > Thanks, > Naresh > www.linkedin.com/in/naresh-dulam > http://hadoopandspark.blogspot.com/ > >