That's great and how would you create an ordered index by partition (by
product in this example)?

Assuming now a dataframe like:

flag | product | price
----------------------
1    |       a |47.808764653746
1    |       b |47.808764653746
1    |       a |31.9869279512204
1    |       b |47.7907893713564
1    |       a |16.7599200038239
1    |       b |16.7599200038239
1    |       b |20.3916014172137


get a new dataframe such as:

flag | product | price | index
----------------------
1    |       a |47.808764653746  | 0
1    |       a |31.9869279512204 | 1
1    |       a |16.7599200038239 | 2
1    |       b |47.808764653746  | 0
1    |       b |47.7907893713564 | 1
1    |       b |20.3916014172137 | 2
1    |       b |16.7599200038239 | 3








On 29 May 2015 at 12:25, Wesley Miao <wesley.mi...@gmail.com> wrote:

> One way I can see is to -
>
> 1. get rdd from your df
> 2. call rdd.zipWithIndex to get a new rdd
> 3. turn your new rdd to a new df
>
> On Fri, May 29, 2015 at 5:43 AM, Cesar Flores <ces...@gmail.com> wrote:
>
>>
>> Assuming that I have the next data frame:
>>
>> flag | price
>> ----------------------
>> 1    |47.808764653746
>> 1    |47.808764653746
>> 1    |31.9869279512204
>> 1    |47.7907893713564
>> 1    |16.7599200038239
>> 1    |16.7599200038239
>> 1    |20.3916014172137
>>
>> How can I create a data frame with an extra indexed column as the next
>> one:
>>
>> flag | price          | index
>> ----------------------|-------
>> 1    |47.808764653746 | 0
>> 1    |47.808764653746 | 1
>> 1    |31.9869279512204| 2
>> 1    |47.7907893713564| 3
>> 1    |16.7599200038239| 4
>> 1    |16.7599200038239| 5
>> 1    |20.3916014172137| 6
>>
>> --
>> Cesar Flores
>>
>
>

Reply via email to