Hi Selvam, If your report is commented with any character (e.g. #), you can skip these lines via comment option [1].
If you are using Spark 1.x, then you might be able to do this by manually skipping from the RDD and then making this to DataFrame as below: I haven’t tested this but I think this should work. val rdd = sparkContext.textFile("...") val filteredRdd = rdd.mapPartitionsWithIndex { (idx, iter) => if (idx == 0) { iter.drop(10) } else { iter } } val df = new CsvParser().csvRdd(sqlContext, filteredRdd) If you are using Spark 2.0, then it seems there is no way to manually modifying the source data because loading existing RDD or DataSet[String] to DataFrame is not yet supported. There is an issue open[2]. I hope this is helpful. Thanks. [1] https://github.com/apache/spark/blob/27209252f09ff73c58e60c6df8aaba73b308088c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L369 [2] https://issues.apache.org/jira/browse/SPARK-15463 On 10 Sep 2016 6:14 p.m., "Selvam Raman" <sel...@gmail.com> wrote: > Hi, > > I am using spark csv to read csv file. The issue is my files first n lines > contains some report and followed by actual data (header and rest of the > data). > > So how can i skip first n lines in spark csv. I dont have any specific > comment character in the first byte. > > Please give me some idea. > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" >