[ https://issues.apache.org/jira/browse/FLINK-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Louis Xu updated FLINK-13161: ----------------------------- Description: The original code is: {code:java} int minNumBuckets = (int) Math.ceil((estimatedRowCount / loadFactor / NUM_ENTRIES_PER_BUCKET)); int bucketNumSegs = Math.max(1, Math.min(maxSegs, (minNumBuckets >>> table.bucketsPerSegmentBits) + ((minNumBuckets & table.bucketsPerSegmentMask) == 0 ? 0 : 1))); int numBuckets = MathUtils.roundDownToPowerOf2(bucketNumSegs << table.bucketsPerSegmentBits); {code} default value: loadFactor=0.75, NUM_ENTRIES_PER_BUCKET=15,maxSegs = 33(suppose, only need big than the number which calculated by minBunBuckets) We suppose table.bucketsPerSegmentBits = 3, table.bucketsPerSegmentMask = 0b111. It means buckets in a segment is 8. When set estimatedRowCount loop from 1 to 1000, we will see the result in attach file. I will take an example: {code:java} estimatedRowCount: 200, minNumBuckets: 18, bucketNumSegs: 3, numBuckets: 16 {code} We can see the numBuckets is smaller than minNumBuckets. And it request 3 segment, but only 2 segment needed(16 / 8), left one segment wasted. And consider the segment is preallocated, it means some segments will never used. > numBuckets calculate wrong in BinaryHashBucketArea > -------------------------------------------------- > > Key: FLINK-13161 > URL: https://issues.apache.org/jira/browse/FLINK-13161 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.9.0 > Reporter: Louis Xu > Assignee: Louis Xu > Priority: Major > Fix For: 1.9.0 > > > The original code is: > > {code:java} > int minNumBuckets = (int) Math.ceil((estimatedRowCount / loadFactor / > NUM_ENTRIES_PER_BUCKET)); > int bucketNumSegs = Math.max(1, Math.min(maxSegs, (minNumBuckets >>> > table.bucketsPerSegmentBits) + > ((minNumBuckets & table.bucketsPerSegmentMask) == 0 ? 0 : 1))); > int numBuckets = MathUtils.roundDownToPowerOf2(bucketNumSegs << > table.bucketsPerSegmentBits); > {code} > default value: loadFactor=0.75, NUM_ENTRIES_PER_BUCKET=15,maxSegs = > 33(suppose, only need big than the number which calculated by minBunBuckets) > We suppose table.bucketsPerSegmentBits = 3, table.bucketsPerSegmentMask = > 0b111. It means buckets in a segment is 8. > When set estimatedRowCount loop from 1 to 1000, we will see the result in > attach file. > I will take an example: > {code:java} > estimatedRowCount: 200, minNumBuckets: 18, bucketNumSegs: 3, numBuckets: 16 > {code} > We can see the numBuckets is smaller than minNumBuckets. And it request 3 > segment, but only 2 segment needed(16 / 8), left one segment wasted. > And consider the segment is preallocated, it means some segments will never > used. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)