[ https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156257#comment-15156257 ]
ASF GitHub Bot commented on FLINK-3422: --------------------------------------- GitHub user Xazax-hun opened a pull request: https://github.com/apache/flink/pull/1685 [WIP][FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes. This pull request contains a fix for FLINK-3422. Some of the tests are failing at the moment, because they utilized prior knowledge about the user hash function. Fixing those tests require knowledge about the internals of Flink that I do not possess yet, so Marton Balassi helps me. The Jira ticket mentions both Murmur and Jenkins hash. Murmur hash is already used in the batch implementation: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/shipping/OutputEmitter.java#L187 My approach was to move Jenkins hash from CompactingHashTable to MathUtils and use that in HashPartitioner. In case you think it is better to use murmur hash here, or it has some value to be consistent in this regard with the batch implementation, please let me know. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xazax-hun/flink HashPartitioner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1685.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1685 ---- commit afaa069483423e0bbb448f773cdcb4992689745e Author: Gabor Horvath <xazax....@gmail.com> Date: 2016-02-21T13:54:44Z [FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes. commit 102053618e11e0de784d4d02152dc439a1e274ca Author: Márton Balassi <mbala...@apache.org> Date: 2016-02-21T22:01:00Z [WIP][FLINK-3422][streaming][api-breaking] Update tests reliant on hashing ---- > Scramble HashPartitioner hashes > ------------------------------- > > Key: FLINK-3422 > URL: https://issues.apache.org/jira/browse/FLINK-3422 > Project: Flink > Issue Type: Improvement > Components: Streaming > Affects Versions: 0.10.2 > Reporter: Stephan Ewen > Assignee: Gabor Horvath > Priority: Critical > Fix For: 1.0.0 > > > The {{HashPartitioner}} used by the streaming API does not apply any hash > scrambling against bad user hash functions. > We should apply a murmor or jenkins hash on top of the hash code, similar as > in the {{DataSet}} API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)