subject:"Spark Bug\: Counting twice with different results"

Re: Spark Bug: Counting twice with different results

2015-05-22 Thread Sean Owen

This is expected for example if your RDD is the result of random sampling, or if the underlying source is not consistent. You haven't shown any code. On Fri, May 22, 2015 at 3:34 PM, Niklas Wilcke <1wil...@informatik.uni-hamburg.de> wrote: > Hi, > > I have recognized a strange behavior of spark co

Spark Bug: Counting twice with different results

2015-05-22 Thread Niklas Wilcke

Hi, I have recognized a strange behavior of spark core in combination with mllib. Running my pipeline results in a RDD. Calling count() on this RDD results in 160055. Calling count() directly afterwards results in 160044 and so on. The RDD seems to be unstable. How can that be? Do you maybe have