That's great and how would you create an ordered index by partition (by product in this example)?
Assuming now a dataframe like: flag | product | price ---------------------- 1 | a |47.808764653746 1 | b |47.808764653746 1 | a |31.9869279512204 1 | b |47.7907893713564 1 | a |16.7599200038239 1 | b |16.7599200038239 1 | b |20.3916014172137 get a new dataframe such as: flag | product | price | index ---------------------- 1 | a |47.808764653746 | 0 1 | a |31.9869279512204 | 1 1 | a |16.7599200038239 | 2 1 | b |47.808764653746 | 0 1 | b |47.7907893713564 | 1 1 | b |20.3916014172137 | 2 1 | b |16.7599200038239 | 3 On 29 May 2015 at 12:25, Wesley Miao <wesley.mi...@gmail.com> wrote: > One way I can see is to - > > 1. get rdd from your df > 2. call rdd.zipWithIndex to get a new rdd > 3. turn your new rdd to a new df > > On Fri, May 29, 2015 at 5:43 AM, Cesar Flores <ces...@gmail.com> wrote: > >> >> Assuming that I have the next data frame: >> >> flag | price >> ---------------------- >> 1 |47.808764653746 >> 1 |47.808764653746 >> 1 |31.9869279512204 >> 1 |47.7907893713564 >> 1 |16.7599200038239 >> 1 |16.7599200038239 >> 1 |20.3916014172137 >> >> How can I create a data frame with an extra indexed column as the next >> one: >> >> flag | price | index >> ----------------------|------- >> 1 |47.808764653746 | 0 >> 1 |47.808764653746 | 1 >> 1 |31.9869279512204| 2 >> 1 |47.7907893713564| 3 >> 1 |16.7599200038239| 4 >> 1 |16.7599200038239| 5 >> 1 |20.3916014172137| 6 >> >> -- >> Cesar Flores >> > >