Re: inconsistency in count and print

2015-05-18 Thread Stephan Ewen
Hi Michele! I cannot tell you what the problem is at a first glance, but here are some pointers that may help you find the problem: Input split creation determinism - The number of input splits is not really deterministic. It depends on the parallelism of the source (this tells the system how m

Re: inconsistency in count and print

2015-05-16 Thread Michele Bertoni
I forgot i was importing guava in this way import com.google.common.hash.{HashFunction, Hashing} including it in maven but i had also the opportunity to use import org.apache.flink.shaded.com.google.common.hash.{HashFunction, Hashing} none of them is working properly Il giorno 16/mag/2015, all

Re: inconsistency in count and print

2015-05-16 Thread Michele Bertoni
The first time I hash my data is in the reading phase: each line is added of one field that is the hash of its file name, I do this with a custom reader that extends the DelimitedInputFormat and override the open, nextRecord and readRecord methods /* … */ private var id : Long = 0L override de

Re: inconsistency in count and print

2015-05-16 Thread Fabian Hueske
Invalid hash values can certainly cause non-deterministic results. Can you provide a code snippet that shows how and where you used the Guava Hasher? 2015-05-16 11:52 GMT+02:00 Michele Bertoni : > Is it possible that is due to the hasher? > > Inside my code i was using the google guava hasher (

Re: inconsistency in count and print

2015-05-16 Thread Michele Bertoni
Is it possible that is due to the hasher? Inside my code i was using the google guava hasher (sha256 as a Long hash) sometimes I got errors from it (ArrayOutOfBoundException) sometimes i just got different hash for the same id, especially when running on an not-local execution environment I rem