You can use this function to remove the header from your dataset(applicable to RDD)
def dropHeader(data: RDD[String]): RDD[String] = { data.mapPartitionsWithIndex((idx, lines) => { if (idx == 0) { lines.drop(1) } lines }) } Abhi On Wed, Apr 27, 2016 at 12:55 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > If u r using Scala api you can do > Myrdd.zipwithindex.filter(_._2 >0).map(_._1) > > Maybe a little bit complicated but will do the trick > As per spark CSV, you will get back a data frame which you can reconduct > to rdd. . > Hth > Marco > On 27 Apr 2016 6:59 am, "nihed mbarek" <nihe...@gmail.com> wrote: > >> You can add a filter with string that you are sure available only in the >> header >> >> Le mercredi 27 avril 2016, Divya Gehlot <divya.htco...@gmail.com> a >> écrit : >> >>> yes you can remove the headers by removing the first row >>> >>> can first() or head() to do that >>> >>> >>> Thanks, >>> Divya >>> >>> On 27 April 2016 at 13:24, Ashutosh Kumar <kmr.ashutos...@gmail.com> >>> wrote: >>> >>>> I see there is a library spark-csv which can be used for removing >>>> header and processing of csv files. But it seems it works with sqlcontext >>>> only. Is there a way to remove header from csv files without sqlcontext ? >>>> >>>> Thanks >>>> Ashutosh >>>> >>> >>> >> >> -- >> >> M'BAREK Med Nihed, >> Fedora Ambassador, TUNISIA, Northern Africa >> http://www.nihed.com >> >> <http://tn.linkedin.com/in/nihed> >> >> >>