If you are on spark 1.3, use repartitionandSort followed by mappartition. In 1.4, window functions will be supported, it seems On 1 Jun 2015 04:10, "Ricardo Almeida" <ricardo.alme...@actnowib.com> wrote:
> That's great and how would you create an ordered index by partition (by > product in this example)? > > Assuming now a dataframe like: > > flag | product | price > ---------------------- > 1 | a |47.808764653746 > 1 | b |47.808764653746 > 1 | a |31.9869279512204 > 1 | b |47.7907893713564 > 1 | a |16.7599200038239 > 1 | b |16.7599200038239 > 1 | b |20.3916014172137 > > > get a new dataframe such as: > > flag | product | price | index > ---------------------- > 1 | a |47.808764653746 | 0 > 1 | a |31.9869279512204 | 1 > 1 | a |16.7599200038239 | 2 > 1 | b |47.808764653746 | 0 > 1 | b |47.7907893713564 | 1 > 1 | b |20.3916014172137 | 2 > 1 | b |16.7599200038239 | 3 > > > > > > > > > On 29 May 2015 at 12:25, Wesley Miao <wesley.mi...@gmail.com> wrote: > >> One way I can see is to - >> >> 1. get rdd from your df >> 2. call rdd.zipWithIndex to get a new rdd >> 3. turn your new rdd to a new df >> >> On Fri, May 29, 2015 at 5:43 AM, Cesar Flores <ces...@gmail.com> wrote: >> >>> >>> Assuming that I have the next data frame: >>> >>> flag | price >>> ---------------------- >>> 1 |47.808764653746 >>> 1 |47.808764653746 >>> 1 |31.9869279512204 >>> 1 |47.7907893713564 >>> 1 |16.7599200038239 >>> 1 |16.7599200038239 >>> 1 |20.3916014172137 >>> >>> How can I create a data frame with an extra indexed column as the next >>> one: >>> >>> flag | price | index >>> ----------------------|------- >>> 1 |47.808764653746 | 0 >>> 1 |47.808764653746 | 1 >>> 1 |31.9869279512204| 2 >>> 1 |47.7907893713564| 3 >>> 1 |16.7599200038239| 4 >>> 1 |16.7599200038239| 5 >>> 1 |20.3916014172137| 6 >>> >>> -- >>> Cesar Flores >>> >> >> >