Re: RowId unique key for Dataframes

Srikanth Tue, 21 Jul 2015 19:34:41 -0700

Will work. Thanks!
zipWithUniqueId() doesn't guarantee continuous ID either.


Srikanth

On Tue, Jul 21, 2015 at 9:48 PM, Burak Yavuz <brk...@gmail.com> wrote:

> Would monotonicallyIncreasingId
> <https://github.com/apache/spark/blob/d4c7a7a3642a74ad40093c96c4bf45a62a470605/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L637>
> work for you?
>
> Best,
> Burak
>
>
>
> On Tue, Jul 21, 2015 at 4:55 PM, Srikanth <srikanth...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm creating dataframes from three CSV files using spark-csv package. I
>> want to add a unique ID for each row in dataframe.
>> Not sure how withColumn() can be used to achieve this. I need a Long
>> value not an UUID.
>>
>> One option I found was to create a RDD and use zipWithUniqueId.
>>
>> sqlContext.textFile(file).
>>> zipWithUniqueId().
>>> map(case(d, i)=>i.toString + delimiter + d).
>>> map(_.split(delimiter)).
>>> map(s=>caseclass(...))
>>
>> .toDF().select("field1, "field2")
>>
>>
>> Its a bit hacky. Is there an easier way to do this on dataframes and use
>> spark-csv?
>>
>> Srikanth
>>
>
>

Re: RowId unique key for Dataframes

Reply via email to