This is expected for example if your RDD is the result of random
sampling, or if the underlying source is not consistent. You haven't
shown any code.
On Fri, May 22, 2015 at 3:34 PM, Niklas Wilcke
<1wil...@informatik.uni-hamburg.de> wrote:
> Hi,
>
> I have recognized a strange behavior of spark co
Hi,
I have recognized a strange behavior of spark core in combination with
mllib. Running my pipeline results in a RDD.
Calling count() on this RDD results in 160055.
Calling count() directly afterwards results in 160044 and so on.
The RDD seems to be unstable.
How can that be? Do you maybe have