Hi Ted,
There is no row key persey, and i actually do not want to sort , want to
aggregate the subsequent x rows together as a mean value maintaing the
order of the row entries,
For ex : -
Input rdd
[ 12, 45 ]
[ 14, 50 ]
[ 10, 35 ]
[ 11, 50 ]
expected output rdd , the below is actually a aggreg
Can you describe your use case a bit more ?
Since the row keys are not sorted in your example, there is a chance that
you get indeterministic results when you aggregate on groups of two
successive rows.
Thanks
On Mon, Mar 28, 2016 at 9:21 AM, sujeet jog wrote:
> Hi,
>
> I have a RDD like this
So, why not make a fake key and aggregate on it?
On Mon, Mar 28, 2016 at 6:21 PM, sujeet jog wrote:
> Hi,
>
> I have a RDD like this .
>
> [ 12, 45 ]
> [ 14, 50 ]
> [ 10, 35 ]
> [ 11, 50 ]
>
> i want to aggreate values of first two rows into 1 row and subsequenty the
> next two rows into anothe
Hi,
I have a RDD like this .
[ 12, 45 ]
[ 14, 50 ]
[ 10, 35 ]
[ 11, 50 ]
i want to aggreate values of first two rows into 1 row and subsequenty the
next two rows into another single row...
i don't have a key to aggregate for using some of the aggregate pyspark
functions, how to achieve it ?