RE: RDD.aggregate versus accumulables...

2014-11-17 Thread lordjoe
I have been playing with using accumulators (despite the possible error with multiple attempts) These provide a convenient way to get some numbers while still performing business logic. I posted some sample code at http://lordjoesoftware.blogspot.com/. Even if accumulators are not perfect today -

RE: RDD.aggregate versus accumulables...

2014-11-17 Thread Segerlind, Nathan L
: RDD.aggregate versus accumulables... You should never use accumulators for this purpose because you may get incorrect answers. Accumulators can count the same thing multiple times - you cannot rely upon the correctness of the values they compute. See SPARK-732<https://issues.apache.org/jira/browse/SP

Re: RDD.aggregate versus accumulables...

2014-11-17 Thread Surendranauth Hiraman
We use Algebird for calculating things like min/max, stddev, variance, etc. https://github.com/twitter/algebird/wiki -Suren On Mon, Nov 17, 2014 at 11:32 AM, Daniel Siegmann wrote: > You should *never* use accumulators for this purpose because you may get > incorrect answers. Accumulators can

Re: RDD.aggregate versus accumulables...

2014-11-17 Thread Daniel Siegmann
You should *never* use accumulators for this purpose because you may get incorrect answers. Accumulators can count the same thing multiple times - you cannot rely upon the correctness of the values they compute. See SPARK-732 for more info. On Sun,