Re: Spark CSV skip lines

2016-09-10 Thread Hyukjin Kwon
As you are reading each record as each file via wholeTextFiles and falttening them to records, I think you can just drop the few lines as you want. Can you just drop or skip few lines from reader.readAll().map(...)? Also, are you sure this is an issue in Spark or external CSV library issue? Do y

Re: Spark CSV skip lines

2016-09-10 Thread Selvam Raman
Hi, I saw this two option already anyway thanks for the idea. i am using wholetext file to read my data(cause there are \n middle of it) and using opencsv to parse the data. In my data first two lines are just some report. how can i eliminate. *How to eliminate first two lines after reading fro

Re: Spark CSV skip lines

2016-09-10 Thread Hyukjin Kwon
Hi Selvam, If your report is commented with any character (e.g. #), you can skip these lines via comment option [1]. If you are using Spark 1.x, then you might be able to do this by manually skipping from the RDD and then making this to DataFrame as below: I haven’t tested this but I think this

Spark CSV skip lines

2016-09-10 Thread Selvam Raman
Hi, I am using spark csv to read csv file. The issue is my files first n lines contains some report and followed by actual data (header and rest of the data). So how can i skip first n lines in spark csv. I dont have any specific comment character in the first byte. Please give me some idea. --