Hi,

If all key fields are primitive types (long) or String, their hash values
should be deterministic.

There are two things that can go wrong:
1) Records are assigned to the wrong group.
2) The computation of a group is buggy.

I'd first check that 1) is correct.
Can you replace the sum function with a simple count and check if the
counts for each group are the same for p=1 and p=8?



Am Do., 22. Aug. 2019 um 11:45 Uhr schrieb anissa moussaoui <
anissa.moussa...@dcbrain.com>:

> Hi Fabian,
>
> My GroupReduce function sum one column of input rows of each group.
>
> My key fields is array of multiple type, in this case is string and long.
> The result that i'm posting is just represents sampling of output dataset.
>
> Thank you in advance !
>
> Anissa
>
> Le jeu. 22 août 2019 à 11:24, Fabian Hueske <fhue...@gmail.com> a écrit :
>
>> Hi Anissa,
>>
>> This looks strange. If I understand your code correctly, your GroupReduce
>> function is summing up a field.
>> Looking at the results that you posted, it seems as if there is some data
>> missing (the total sum does not seem to match).
>>
>> For groupReduce it is important that the grouping keys are deterministic.
>> Since you provide a String array as key definition, there is no
>> KeyExtractor function.
>> However, something that can cause random results are key attributes with
>> random hash values.
>> What is the type of your key fields?
>>
>> Another thing you might want to check is if the input (inputTable) to the
>> groupReduce function is the same with both parallelism settings.
>>
>> Best, Fabian
>>
>

Reply via email to