Frankly speaking, I think reduceByKey with Partitioner has the same problem too
and it should not be exposed to public user either. Because it is a little hard
to fully understand how the partitioner behaves without looking at the actual
code.
And if there exits a basic contract of a Partitio
reduceByKey(randomPartitioner, (a, b) => a + b) also gives incorrect result
Why reduceByKey with Partitioner exists then?
On Wed, Jun 8, 2016 at 9:22 PM, 汪洋 wrote:
> Hi Alexander,
>
> I think it does not guarantee to be right if an arbitrary Partitioner is
> passed in.
>
> I have created a note
The example violates the basic contract of a Partitioner.
It does make sense to take Partitioner as a param to distinct - though it
is fairly trivial to simulate that in user code as well ...
Regards
Mridul
On Wednesday, June 8, 2016, 汪洋 wrote:
> Hi Alexander,
>
> I think it does not guarantee
Hi Alexander,
I think it does not guarantee to be right if an arbitrary Partitioner is passed
in.
I have created a notebook and you can check it out.
(https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/7973071962862063/2110745399505739/58107563000366