Thank you so much for your time. I have data like below which I tried to load by setting multiple options while reading the file but however, but I am not able to consolidate the 9th column data within itself.
[image: image.png] I tried the below code: df = spark.read.option("header", "true").option("multiline", "true").option("inferSchema", "true").option("quote", '"').option( "delimiter", ",").csv("path") What else I can do? Thanks, Sid On Thu, May 26, 2022 at 1:46 AM Apostolos N. Papadopoulos < papad...@csd.auth.gr> wrote: > Dear Sid, > > can you please give us more info? Is it true that every line may have a > different number of columns? Is there any rule followed by > > every line of the file? From the information you have sent I cannot > fully understand the "schema" of your data. > > Regards, > > Apostolos > > > On 25/5/22 23:06, Sid wrote: > > Hi Experts, > > > > I have below CSV data that is getting generated automatically. I can't > > change the data manually. > > > > The data looks like below: > > > > 2020-12-12,abc,2000,,INR, > > 2020-12-09,cde,3000,he is a manager,DOLLARS,nothing > > 2020-12-09,fgh,,software_developer,I only manage the development part. > > > > Since I don't have much experience with the other domains. > > > > It is handled by the other people.,INR > > 2020-12-12,abc,2000,,USD, > > > > The third record is a problem. Since the value is separated by the new > > line by the user while filling up the form. So, how do I handle this? > > > > There are 6 columns and 4 records in total. These are the sample records. > > > > Should I load it as RDD and then may be using a regex should eliminate > > the new lines? Or how it should be? with ". /n" ? > > > > Any suggestions? > > > > Thanks, > > Sid > > -- > Apostolos N. Papadopoulos, Associate Professor > Department of Informatics > Aristotle University of Thessaloniki > Thessaloniki, GREECE > tel: ++0030312310991918 > email: papad...@csd.auth.gr > twitter: @papadopoulos_ap > web: http://datalab.csd.auth.gr/~apostol > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >