[ https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156306#comment-15156306 ]
ASF GitHub Bot commented on FLINK-3422: --------------------------------------- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1685#issuecomment-186948795 It is pretty crucial that different hash functions are used for the partitioning across machines, and the internal partitioning of data structures. If the same hash function is used for both, many internal data structure partitions will be empty. So far we divided it the following way (admittedly not documented) - murmur hash across machines - Jenkins hash internally in data structures How about we stick with that division and use Murmur Hash in the streaming partitioner as well? > Scramble HashPartitioner hashes > ------------------------------- > > Key: FLINK-3422 > URL: https://issues.apache.org/jira/browse/FLINK-3422 > Project: Flink > Issue Type: Improvement > Components: Streaming > Affects Versions: 0.10.2 > Reporter: Stephan Ewen > Assignee: Gabor Horvath > Priority: Critical > Fix For: 1.0.0 > > > The {{HashPartitioner}} used by the streaming API does not apply any hash > scrambling against bad user hash functions. > We should apply a murmor or jenkins hash on top of the hash code, similar as > in the {{DataSet}} API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)