> On 19 Jul 2019, at 20:57, Eric Barnhill <ericbarnh...@gmail.com> wrote:
>
> Hi Virenda,
>
> I think that's right in terms of initialization. If it is initialized to
> NaN then accumulation will require an additional step getting rid of the
> NaN. Just initialize to zero.
+1
Initialisation with zero will allow the accumulating function to be free of
checks.
>
> I just looked around and it's pretty clear that it is best practice to
> return NaN in the edge case of an average of no values. That is what
> happens in Python when calling numpy.mean([]) and in R when calling
> mean(c()) , and that is also mathematically right.
+1
In-line with other libraries. It is also in-line with java which will throw an
ArithmeticException for 0 / 0 and return NaN for 0.0 / 0.0.
>
> So, and I think this is a step that could be saved until after the
> milestone, a check for zero values and returning NaN in that case should
> probably be somehow implemented. But in terms of under the hood initialize
> to zero.
The code just needs to move the logic for checking if there are any values
(count > 0) into the getMean() method and return appropriately. This should be
added to the contract of Mean by putting into the Javadoc and adding a test to
ensure it does work.
>
>
>
>
> On Thu, Jul 18, 2019 at 7:26 PM Virendra singh Rajpurohit <
> virendrasing...@gmail.com> wrote:
>
>> Hi all,
>> Hope you all are doing well, I had a discussion on Slack with my GSoC
>> mentors regarding this variable initiation. I'm posting it on ML for more
>> opinions.
>>
>> *Should the variables like mean be initiated with NaN or 0?*
>> Because, definitional formula of mean is,
>> mean = (sum of values)/n
>> Hence for n=0 it is 0/0 which is NaN
>> But also Java's SummaryStatistics classes(Double, Long & Int) return
>> average=0 for n=0.
>> As discussed on slack, "The initialization should not set the initial value
>> to NaN. This is a convenience to make getMean() faster. This is likely to
>> cause fewer problems than NaN when used in downstream computations".
>> Assigning '0' will make things faster because if condition to check n value
>> will be removed in calculation and assigning 'NaN' will be more correct.
>> *Alex Herbert* suggested NaN can be used in getMean() method with if
>> condition to check 'n' value, that way we don't check condition everytime a
>> value is added.
>> What are your opinions about it?
>>
>> --
>> *Virendra Singh Rajpurohit*
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org