Hi Ted,

There is no row key persey, and i actually do not want to sort , want to
aggregate the subsequent x rows together as a mean value maintaing the
order of the row entries,

For ex : -
Input rdd
[ 12, 45 ]
[ 14, 50 ]
[ 10, 35 ]
[ 11, 50 ]

expected output rdd ,  the below is actually a aggregation by mean on
subsequent 2 rows each.

[13,     47.5]
[10.5,  42.5]


@ Alexander :   Yes inducing dummy key seems to be one of the ways, ,can
you please post a snippet if possible on how to achieve this...


On Mon, Mar 28, 2016 at 10:30 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you describe your use case a bit more ?
>
> Since the row keys are not sorted in your example, there is a chance that
> you get indeterministic results when you aggregate on groups of two
> successive rows.
>
> Thanks
>
> On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog <sujeet....@gmail.com> wrote:
>
>> Hi,
>>
>> I have a RDD  like this .
>>
>> [ 12, 45 ]
>> [ 14, 50 ]
>> [ 10, 35 ]
>> [ 11, 50 ]
>>
>> i want to aggreate values of first two rows into 1 row and subsequenty
>> the next two rows into another single row...
>>
>> i don't have a key to aggregate for using some of the aggregate pyspark
>> functions, how to achieve it ?
>>
>>
>>
>

Reply via email to