RowId unique key for Dataframes

Srikanth Tue, 21 Jul 2015 16:56:22 -0700

Hello,

I'm creating dataframes from three CSV files using spark-csv package. I
want to add a unique ID for each row in dataframe.
Not sure how withColumn() can be used to achieve this. I need a Long value
not an UUID.


One option I found was to create a RDD and use zipWithUniqueId.

sqlContext.textFile(file).
> zipWithUniqueId().
> map(case(d, i)=>i.toString + delimiter + d).
> map(_.split(delimiter)).
> map(s=>caseclass(...))

.toDF().select("field1, "field2")


Its a bit hacky. Is there an easier way to do this on dataframes and use
spark-csv?

Srikanth

RowId unique key for Dataframes

Reply via email to