Github user greghogan commented on the pull request: https://github.com/apache/flink/pull/1067#issuecomment-135885866 I am currently running release-0.10.0-milestone-1. Debugging with Eclipse and looking at MutableHashTable.initTable, numBuckets is computed as 16086. There are 63 memory segments with 256 buckets each = 16128 total buckets. The last 16128 - 16086 = 42 buckets are not initialized by initTable which terminates the inner loop when bucket == numBuckets. Here is an example header dump from the last memory segment showing the crossover from initialized to uninitialized data. offset, partition, status, count, next-pointer 26880 10 0 0 -72340172838076673 27008 11 0 0 -72340172838076673 27136 12 0 0 -72340172838076673 27264 13 0 0 -72340172838076673 27392 0 -56 9 844425030795264 27520 0 -56 9 -9191846839379296256 27648 0 -56 9 10133099245469696 27776 0 -56 9 12103424082444288 Setting a breakpoint for MutableHashTable.buildBloomFilterForBucket for count < 0, the last memory segment looked as follows (this is from a different execution, operation, and thread). offset, partition, status, count, next-pointer 26880 10 0 9 27584547767975936 27008 11 0 9 -9208735337998712832 27136 12 0 9 4503599694479360 27264 13 0 9 -9219994337067139072 27392 0 0 -32697 1161165883580435 27520 0 3 -15328 18016855230957176 27648 0 5 1388 -33740636012148672 27776 0 6 25494 -17363350186618861 MutableHashTable.buildBloomFilterForBucketsInPartition processed offset 27392 which happened to match the partition number and bucket status even though it looks to be uninitialized. After changing MutableHashTable.initTable to initialize all buckets in all segments I have not seen the bug reoccur. {code} for (int k = 0; k < bucketsPerSegment /* && bucket < numBuckets*/; k++, bucket++) { } {code} I see at least three potential resolutions: 1) have MutableHashTable.initTable initialize all buckets, 2) have MutableHashTable.buildBloomFilterForBucket skip uninitialized buckets, or 3) I have not looked enough at MutableHashTable.getInitialTableSize but it is possible to completely fill the last segment with usable buckets?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---