As you are reading each record as each file via wholeTextFiles and
falttening them to records, I think you can just drop the few lines as you
want.
Can you just drop or skip few lines from reader.readAll().map(...)?
Also, are you sure this is an issue in Spark or external CSV library issue?
Do y
Hi,
I saw this two option already anyway thanks for the idea.
i am using wholetext file to read my data(cause there are \n middle of it)
and using opencsv to parse the data. In my data first two lines are just
some report. how can i eliminate.
*How to eliminate first two lines after reading fro
Hi Selvam,
If your report is commented with any character (e.g. #), you can skip these
lines via comment option [1].
If you are using Spark 1.x, then you might be able to do this by manually
skipping from the RDD and then making this to DataFrame as below:
I haven’t tested this but I think this
Hi,
I am using spark csv to read csv file. The issue is my files first n lines
contains some report and followed by actual data (header and rest of the
data).
So how can i skip first n lines in spark csv. I dont have any specific
comment character in the first byte.
Please give me some idea.
--