Will work. Thanks! zipWithUniqueId() doesn't guarantee continuous ID either.
Srikanth On Tue, Jul 21, 2015 at 9:48 PM, Burak Yavuz <brk...@gmail.com> wrote: > Would monotonicallyIncreasingId > <https://github.com/apache/spark/blob/d4c7a7a3642a74ad40093c96c4bf45a62a470605/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L637> > work for you? > > Best, > Burak > > > > On Tue, Jul 21, 2015 at 4:55 PM, Srikanth <srikanth...@gmail.com> wrote: > >> Hello, >> >> I'm creating dataframes from three CSV files using spark-csv package. I >> want to add a unique ID for each row in dataframe. >> Not sure how withColumn() can be used to achieve this. I need a Long >> value not an UUID. >> >> One option I found was to create a RDD and use zipWithUniqueId. >> >> sqlContext.textFile(file). >>> zipWithUniqueId(). >>> map(case(d, i)=>i.toString + delimiter + d). >>> map(_.split(delimiter)). >>> map(s=>caseclass(...)) >> >> .toDF().select("field1, "field2") >> >> >> Its a bit hacky. Is there an easier way to do this on dataframes and use >> spark-csv? >> >> Srikanth >> > >