Hi Michele!
I cannot tell you what the problem is at a first glance, but here are some
pointers that may help you find the problem:
Input split creation determinism
- The number of input splits is not really deterministic. It depends on
the parallelism of the source (this tells the system how m
I forgot i was importing guava in this way
import com.google.common.hash.{HashFunction, Hashing}
including it in maven
but i had also the opportunity to use
import org.apache.flink.shaded.com.google.common.hash.{HashFunction, Hashing}
none of them is working properly
Il giorno 16/mag/2015, all
The first time I hash my data is in the reading phase: each line is added of
one field that is the hash of its file name, I do this with a custom reader
that extends the DelimitedInputFormat and override the open, nextRecord and
readRecord methods
/* … */
private var id : Long = 0L
override de
Invalid hash values can certainly cause non-deterministic results.
Can you provide a code snippet that shows how and where you used the Guava
Hasher?
2015-05-16 11:52 GMT+02:00 Michele Bertoni
:
> Is it possible that is due to the hasher?
>
> Inside my code i was using the google guava hasher (
Is it possible that is due to the hasher?
Inside my code i was using the google guava hasher (sha256 as a Long hash)
sometimes I got errors from it (ArrayOutOfBoundException) sometimes i just got
different hash for the same id, especially when running on an not-local
execution environment
I rem