Hi, If all key fields are primitive types (long) or String, their hash values should be deterministic.
There are two things that can go wrong: 1) Records are assigned to the wrong group. 2) The computation of a group is buggy. I'd first check that 1) is correct. Can you replace the sum function with a simple count and check if the counts for each group are the same for p=1 and p=8? Am Do., 22. Aug. 2019 um 11:45 Uhr schrieb anissa moussaoui < anissa.moussa...@dcbrain.com>: > Hi Fabian, > > My GroupReduce function sum one column of input rows of each group. > > My key fields is array of multiple type, in this case is string and long. > The result that i'm posting is just represents sampling of output dataset. > > Thank you in advance ! > > Anissa > > Le jeu. 22 août 2019 à 11:24, Fabian Hueske <fhue...@gmail.com> a écrit : > >> Hi Anissa, >> >> This looks strange. If I understand your code correctly, your GroupReduce >> function is summing up a field. >> Looking at the results that you posted, it seems as if there is some data >> missing (the total sum does not seem to match). >> >> For groupReduce it is important that the grouping keys are deterministic. >> Since you provide a String array as key definition, there is no >> KeyExtractor function. >> However, something that can cause random results are key attributes with >> random hash values. >> What is the type of your key fields? >> >> Another thing you might want to check is if the input (inputTable) to the >> groupReduce function is the same with both parallelism settings. >> >> Best, Fabian >> >