Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
yep Michael Quinlan,it's working as suggested by Hoe Ren thansk to you and Hoe Ren -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html Sent from the Apache Spark User List mailing list archive at Nabble.

Re: removing first record from RDD[String]

2014-12-23 Thread Michael Quinlan
Hafiz, You can probably use the RDD.mapPartitionsWithIndex method. Mike On Tue, Dec 23, 2014 at 8:35 AM, Hafiz Mujadid [via Apache Spark User List] wrote: > > hi dears! > > Is there some efficient way to drop first line of an RDD[String]? > > any suggestion? > > Thanks > > -

Re: removing first record from RDD[String]

2014-12-23 Thread Erik Erlandson
There is also a lazy implementation: http://erikerlandson.github.io/blog/2014/07/29/deferring-spark-actions-to-lazy-transforms-with-the-promise-rdd/ I generated a PR for it -- there was also an alternate proposal for having it be a library in the new Spark Packages site: http://databricks.com/bl

Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
that's nice if it works -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20837.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: removing first record from RDD[String]

2014-12-23 Thread Jörg Schad
Hi, maybe the drop function is helpful for you (even though this is probably more than you need, still interesting read) http://erikerlandson.github.io/blog/2014/07/27/some-implications-of-supporting-the-scala-drop-method-for-spark-rdds/ Joerg On Tue, Dec 23, 2014 at 5:45 PM, Hao Ren wrote: > H

Re: removing first record from RDD[String]

2014-12-23 Thread Hao Ren
Hi, I guess you would like to remove the header of a CSV file. You can play with partitions. =) // src is your RDD val noHeader = src.mapPartitionsWithIndex( (i, iterator) => if (i == 0 && iterator.hasNext) { iterator.next iterator } else iterator) Thus, you don't need to