Hi Gourav, Please find the below link for a detailed understanding.
https://stackoverflow.com/questions/72389385/how-to-load-complex-data-using-pyspark/72391090#72391090 @Bjørn Jørgensen <bjornjorgen...@gmail.com> : I was able to read such kind of data using the below code: spark.read.option("header",True).option("multiline","true").option("escape","\"").csv("sample1.csv") Also, I have one question about one of my columns. I have one column with data like below: [image: image.png] Have a look at the second record. Should I mark it as corrupt record? Or is there anyway to process such kind of records. Thanks, Sid On Thu, May 26, 2022 at 10:54 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > can you please give us a simple map of what the input is and what the > output should be like? From your description it looks a bit difficult to > figure out what exactly or how exactly you want the records actually parsed. > > > Regards, > Gourav Sengupta > > On Wed, May 25, 2022 at 9:08 PM Sid <flinkbyhe...@gmail.com> wrote: > >> Hi Experts, >> >> I have below CSV data that is getting generated automatically. I can't >> change the data manually. >> >> The data looks like below: >> >> 2020-12-12,abc,2000,,INR, >> 2020-12-09,cde,3000,he is a manager,DOLLARS,nothing >> 2020-12-09,fgh,,software_developer,I only manage the development part. >> >> Since I don't have much experience with the other domains. >> >> It is handled by the other people.,INR >> 2020-12-12,abc,2000,,USD, >> >> The third record is a problem. Since the value is separated by the new >> line by the user while filling up the form. So, how do I handle this? >> >> There are 6 columns and 4 records in total. These are the sample records. >> >> Should I load it as RDD and then may be using a regex should eliminate >> the new lines? Or how it should be? with ". /n" ? >> >> Any suggestions? >> >> Thanks, >> Sid >> >