[ https://issues.apache.org/jira/browse/FLINK-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-13161: ----------------------------------- Labels: pull-request-available (was: ) > numBuckets calculate wrong in BinaryHashBucketArea > -------------------------------------------------- > > Key: FLINK-13161 > URL: https://issues.apache.org/jira/browse/FLINK-13161 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.9.0 > Reporter: Louis Xu > Assignee: Louis Xu > Priority: Major > Labels: pull-request-available > Fix For: 1.9.0 > > > The original code is: > > {code:java} > int minNumBuckets = (int) Math.ceil((estimatedRowCount / loadFactor / > NUM_ENTRIES_PER_BUCKET)); > int bucketNumSegs = Math.max(1, Math.min(maxSegs, (minNumBuckets >>> > table.bucketsPerSegmentBits) + > ((minNumBuckets & table.bucketsPerSegmentMask) == 0 ? 0 : 1))); > int numBuckets = MathUtils.roundDownToPowerOf2(bucketNumSegs << > table.bucketsPerSegmentBits); > {code} > default value: loadFactor=0.75, NUM_ENTRIES_PER_BUCKET=15,maxSegs = > 33(suppose, only need big than the number which calculated by minBunBuckets) > We suppose table.bucketsPerSegmentBits = 3, table.bucketsPerSegmentMask = > 0b111. It means buckets in a segment is 8. > When set estimatedRowCount loop from 1 to 1000, we will see the result in > attach file. > I will take an example: > {code:java} > estimatedRowCount: 200, minNumBuckets: 18, bucketNumSegs: 3, numBuckets: 16 > {code} > We can see the numBuckets is smaller than minNumBuckets. And it request 3 > segment, but only 2 segment needed(16 / 8), left one segment wasted. > And consider the segment is preallocated, it means some segments will never > used. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)