On Tue, Mar 28, 2017 at 10:43 AM, Mithun Cy <mithun...@enterprisedb.com> wrote: > On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > As you have said we can solve it if we allocate buckets always in > power-of-2 when we do hash index meta page init. But on other > occasions, when we try to double the existing buckets we can do the > allocation in 4 equal phases. > > But I think there are 2 more ways to solve same, > > A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to > tuplesort and let them use _hash_hashkey2bucket to figure out which > key belong to which bucket. And then sort them. I think this way we > make both sorting and insertion to hash index both consistent with > each other. > > B. In tuple sort we can use hash function bucket = hash_key % > num_buckets instead of existing one which does bitwise "and" to > determine the bucket of hash key. This way we will not wrongly assign > buckets beyond max_buckets and sorted hash keys will be in sync with > actual insertion order of _hash_doinsert. > > > I am adding both the patches Patch_A and Patch_B. My preference is > Patch_A and I am open for suggestion. >
I think both way it can work. I feel there is no hard pressed need that we should make the computation in sorting same as what you do _hash_doinsert. In patch_B, I don't think new naming for variables is good. Assert(!a->isnull1); - hash1 = DatumGetUInt32(a->datum1) & state->hash_mask; + bucket1 = DatumGetUInt32(a->datum1) % state->num_buckets; Can we use hash_mod instead of num_buckets and retain hash1 in above code and similar other places? >>+#define SPLITPOINT_PHASES_PER_GRP 4 >>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1) >>+#define Buckets_First_Split_Group 4 > Fixed. > >>In the above computation +2 and -2 still bothers me. I think you need >>to do this because you have defined split group zero to have four >>buckets, how about if you don't force that and rather define to have >>split point phases only from split point which has four or more >>buckets? > > Okay as suggested instead of group zero having 4 phases of 1 bucket > each, I have recalculated the spare mapping as below. > Few comments: 1. +#define BUCKETS_WITHIN_SP_GRP(sp_g, nphase) \ + ((sp_g < SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE) ? \ + (1 << (sp_g - 1)) : \ + ((nphase) * ((1 << (sp_g - 1)) >> 2))) This will go wrong for split point group zero. In general, I feel if you handle computation for split groups lesser than SPLITPOINT_GROUPS_WITH_ONLY_ONE_PHASE in the caller, then all your macros will become much simpler and less error prone. 2. +#define BUCKET_TO_SPLITPOINT_GRP(num_bucket) (_hash_log2(num_bucket)) What is the use of such a define, can't we directly use _hash_log2 in the caller? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers