If you are on spark 1.3, use repartitionandSort followed by mappartition.
In 1.4, window functions will be supported, it seems
On 1 Jun 2015 04:10, "Ricardo Almeida" <ricardo.alme...@actnowib.com> wrote:

> That's great and how would you create an ordered index by partition (by
> product in this example)?
>
> Assuming now a dataframe like:
>
> flag | product | price
> ----------------------
> 1    |       a |47.808764653746
> 1    |       b |47.808764653746
> 1    |       a |31.9869279512204
> 1    |       b |47.7907893713564
> 1    |       a |16.7599200038239
> 1    |       b |16.7599200038239
> 1    |       b |20.3916014172137
>
>
> get a new dataframe such as:
>
> flag | product | price | index
> ----------------------
> 1    |       a |47.808764653746  | 0
> 1    |       a |31.9869279512204 | 1
> 1    |       a |16.7599200038239 | 2
> 1    |       b |47.808764653746  | 0
> 1    |       b |47.7907893713564 | 1
> 1    |       b |20.3916014172137 | 2
> 1    |       b |16.7599200038239 | 3
>
>
>
>
>
>
>
>
> On 29 May 2015 at 12:25, Wesley Miao <wesley.mi...@gmail.com> wrote:
>
>> One way I can see is to -
>>
>> 1. get rdd from your df
>> 2. call rdd.zipWithIndex to get a new rdd
>> 3. turn your new rdd to a new df
>>
>> On Fri, May 29, 2015 at 5:43 AM, Cesar Flores <ces...@gmail.com> wrote:
>>
>>>
>>> Assuming that I have the next data frame:
>>>
>>> flag | price
>>> ----------------------
>>> 1    |47.808764653746
>>> 1    |47.808764653746
>>> 1    |31.9869279512204
>>> 1    |47.7907893713564
>>> 1    |16.7599200038239
>>> 1    |16.7599200038239
>>> 1    |20.3916014172137
>>>
>>> How can I create a data frame with an extra indexed column as the next
>>> one:
>>>
>>> flag | price          | index
>>> ----------------------|-------
>>> 1    |47.808764653746 | 0
>>> 1    |47.808764653746 | 1
>>> 1    |31.9869279512204| 2
>>> 1    |47.7907893713564| 3
>>> 1    |16.7599200038239| 4
>>> 1    |16.7599200038239| 5
>>> 1    |20.3916014172137| 6
>>>
>>> --
>>> Cesar Flores
>>>
>>
>>
>

Reply via email to