Hi Ted, There is no row key persey, and i actually do not want to sort , want to aggregate the subsequent x rows together as a mean value maintaing the order of the row entries,
For ex : - Input rdd [ 12, 45 ] [ 14, 50 ] [ 10, 35 ] [ 11, 50 ] expected output rdd , the below is actually a aggregation by mean on subsequent 2 rows each. [13, 47.5] [10.5, 42.5] @ Alexander : Yes inducing dummy key seems to be one of the ways, ,can you please post a snippet if possible on how to achieve this... On Mon, Mar 28, 2016 at 10:30 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you describe your use case a bit more ? > > Since the row keys are not sorted in your example, there is a chance that > you get indeterministic results when you aggregate on groups of two > successive rows. > > Thanks > > On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog <sujeet....@gmail.com> wrote: > >> Hi, >> >> I have a RDD like this . >> >> [ 12, 45 ] >> [ 14, 50 ] >> [ 10, 35 ] >> [ 11, 50 ] >> >> i want to aggreate values of first two rows into 1 row and subsequenty >> the next two rows into another single row... >> >> i don't have a key to aggregate for using some of the aggregate pyspark >> functions, how to achieve it ? >> >> >> >