Hi, Thank you for the review.
cfbot found a few compiler warnings: > > https://cirrus-ci.com/task/6526903542087680 > [16:47:46.964] make -s -j${BUILD_JOBS} clean > [16:47:47.452] time make -s -j${BUILD_JOBS} world-bin > [16:49:10.496] lwlock.c: In function ‘CreateLWLocks’: > [16:49:10.496] lwlock.c:467:22: error: unused variable ‘found’ > [-Werror=unused-variable] > [16:49:10.496] 467 | bool found; > [16:49:10.496] | ^~~~~ > [16:49:10.496] cc1: all warnings being treated as errors > [16:49:10.496] make[4]: *** [<builtin>: lwlock.o] Error 1 > [16:49:10.496] make[3]: *** [../../../src/backend/common.mk:37: > lmgr-recursive] Error 2 > [16:49:10.496] make[3]: *** Waiting for unfinished jobs.... > [16:49:11.881] make[2]: *** [common.mk:37: storage-recursive] Error 2 > [16:49:11.881] make[2]: *** Waiting for unfinished jobs.... > [16:49:20.195] dynahash.c: In function ‘hash_create’: > [16:49:20.195] dynahash.c:643:37: error: ‘curr_offset’ may be used > uninitialized [-Werror=maybe-uninitialized] > [16:49:20.195] 643 | curr_offset = (((char > *)curr_offset) + (temp * elementSize)); > [16:49:20.195] | > ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > [16:49:20.195] dynahash.c:588:23: note: ‘curr_offset’ was declared here > [16:49:20.195] 588 | void *curr_offset; > [16:49:20.195] | ^~~~~~~~~~~ > [16:49:20.195] cc1: all warnings being treated as errors > [16:49:20.196] make[4]: *** [<builtin>: dynahash.o] Error 1 > > Fixed these. > > > diff --git a/src/backend/utils/hash/dynahash.c > b/src/backend/utils/hash/dynahash.c > > index cd5a00132f..5203f5b30b 100644 > > --- a/src/backend/utils/hash/dynahash.c > > +++ b/src/backend/utils/hash/dynahash.c > > @@ -120,7 +120,6 @@ > > * a good idea of the maximum number of entries!). For non-shared hash > > * tables, the initial directory size can be left at the default. > > */ > > -#define DEF_SEGSIZE 256 > > #define DEF_SEGSIZE_SHIFT 8 /* must be log2(DEF_SEGSIZE) */ > > #define DEF_DIRSIZE 256 > > Why did you move this to the header? Afaict it's just used in > hash_get_shared_size(), which is also in dynahash.c? > > Yes. This was accidentally left behind by the previous version of the patch, so I undid the change. > > static void register_seq_scan(HTAB *hashp); > > static void deregister_seq_scan(HTAB *hashp); > > static bool has_seq_scans(HTAB *hashp); > > - > > +static int find_num_of_segs(long nelem, int *nbuckets, long > num_partitions, long ssize); > > > > /* > > * memory allocation support > > You removed a newline here that probably shouldn't be removed. Fixed this. > > > @@ -468,7 +466,11 @@ hash_create(const char *tabname, long nelem, const > HASHCTL *info, int flags) > > else > > hashp->keycopy = memcpy; > > > > - /* And select the entry allocation function, too. */ > > + /* > > + * And select the entry allocation function, too. XXX should this > also > > + * Assert that flags & HASH_SHARED_MEM is true, since HASH_ALLOC is > > + * currently only set with HASH_SHARED_MEM * > > + */ > > if (flags & HASH_ALLOC) > > hashp->alloc = info->alloc; > > else > > @@ -518,6 +520,7 @@ hash_create(const char *tabname, long nelem, const > HASHCTL *info, int flags) > > > > hashp->frozen = false; > > > > + /* Initializing the HASHHDR variables with default values */ > > hdefault(hashp); > > > > hctl = hashp->hctl; > > I assume these were just observations you made while looking into this? > They > seem unrelated to the change itself? Yes. I removed the first one and left the second one as a code comment. > > > @@ -582,7 +585,8 @@ hash_create(const char *tabname, long nelem, const > HASHCTL *info, int flags) > > freelist_partitions, > > nelem_alloc, > > nelem_alloc_first; > > - > > + void *curr_offset; > > + > > /* > > * If hash table is partitioned, give each freelist an > equal share of > > * the initial allocation. Otherwise only freeList[0] is > used. > > @@ -592,6 +596,20 @@ hash_create(const char *tabname, long nelem, const > HASHCTL *info, int flags) > > else > > freelist_partitions = 1; > > > > + /* > > + * If table is shared, calculate the offset at which to > find the > > + * the first partition of elements > > + */ > > + if (hashp->isshared) > > + { > > + int nsegs; > > + int nbuckets; > > + nsegs = find_num_of_segs(nelem, &nbuckets, > hctl->num_partitions, hctl->ssize); > > + > > + curr_offset = (((char *) hashp->hctl) + > sizeof(HASHHDR) + (info->dsize * sizeof(HASHSEGMENT)) + > > + + (sizeof(HASHBUCKET) * hctl->ssize * nsegs)); > > + } > > + > > Why only do this for shared hashtables? Couldn't we allocate the elments > together with the rest for non-share hashtables too? > I think it is possible to consolidate the allocations for non-shared hash tables too. However, initial elements are much smaller in non-shared hash tables due to their ease of expansion. Therefore, there is probably less benefit in trying to do that for non-shared tables. In addition, the proposed changes are targeted to improve the monitoring in pg_shmem_allocations which won't be applicable to non-shared hashtables. While I believe it is feasible, I am uncertain about the utility of such a change. > Seems a bit ugly to go through element_alloc() when pre-allocating. > Perhaps > it's the best thing we can do to avoid duplicating code, but it seems worth > checking if we can do better. Perhaps we could split element_alloc() into > element_alloc() and element_add() or such? With the latter doing > everything > after hashp->alloc(). > > Makes sense. I split the element_alloc() into element_alloc() and element_add(). > > > - > > + > > /* > > * initialize mutexes if it's a partitioned table > > */ > > Spurious change. > > Fixed. A function called find_num_of_segs() that also sets nbuckets seems a bit > confusing. I also don't like "find_*", as that sounds like it's searching > some datastructure, rather than just doing a bit of math. > I renamed it to compute_buckets_and_segs(). I am open to better suggestions. > segp = (HASHSEGMENT) hashp->alloc(sizeof(HASHBUCKET) * > hashp->ssize); > > > > if (!segp) > > Spurious change. > Fixed. > -static int > > +int > > next_pow2_int(long num) > > { > > if (num > INT_MAX / 2) > > @@ -1957,3 +1995,31 @@ AtEOSubXact_HashTables(bool isCommit, int > nestDepth) > > } > > } > > } > > Why export this? > > It was a stale change, I removed it now > > diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h > > index 932cc4f34d..5e16bd4183 100644 > > --- a/src/include/utils/hsearch.h > > +++ b/src/include/utils/hsearch.h > > @@ -151,7 +151,7 @@ extern void hash_seq_term(HASH_SEQ_STATUS *status); > > extern void hash_freeze(HTAB *hashp); > > extern Size hash_estimate_size(long num_entries, Size entrysize); > > extern long hash_select_dirsize(long num_entries); > > -extern Size hash_get_shared_size(HASHCTL *info, int flags); > > +extern Size hash_get_shared_size(HASHCTL *info, int flags, long > init_size); > > extern void AtEOXact_HashTables(bool isCommit); > > extern void AtEOSubXact_HashTables(bool isCommit, int nestDepth); > > It's imo a bit weird that we have very related logic in > hash_estimate_size() > and hash_get_shared_size(). Why do we need to duplicate it? > > hash_estimate_size() estimates using default values and hash_get_shared_size() calculates using specific values depending on the flags associated with the hash table. For instance, segment_size used by the former is DEF_SEGSIZE and the latter uses info->ssize which is set when the HASH_SEGMENT flag is true. Hence, they might return different values for shared memory sizes. > > These I'd just combine with the ShmemInitStruct("PredXactList"), by > allocating > the additional space. The pointer math is a bit annoying, but it makes much > more sense to have one entry in pg_shmem_allocations. > > Fixed accordingly. > > > - (TransactionId *) ShmemAlloc(TotalProcs * > sizeof(*ProcGlobal->xids)); > > + (TransactionId *) ShmemInitStruct("Proc Transaction Ids", > TotalProcs * sizeof(*ProcGlobal->xids), &found); > > MemSet(ProcGlobal->xids, 0, TotalProcs * > sizeof(*ProcGlobal->xids)); > > - ProcGlobal->subxidStates = (XidCacheStatus *) > ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->subxidStates)); > > + ProcGlobal->subxidStates = (XidCacheStatus *) > ShmemInitStruct("Proc Sub-transaction id states", TotalProcs * > sizeof(*ProcGlobal->subxidStates), &found); > > MemSet(ProcGlobal->subxidStates, 0, TotalProcs * > sizeof(*ProcGlobal->subxidStates)); > > - ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * > sizeof(*ProcGlobal->statusFlags)); > > + ProcGlobal->statusFlags = (uint8 *) ShmemInitStruct("Proc Status > Flags", TotalProcs * sizeof(*ProcGlobal->statusFlags), &found); > > MemSet(ProcGlobal->statusFlags, 0, TotalProcs * > sizeof(*ProcGlobal->statusFlags)); > > > > /* > > Same. > > Although here I'd say it's worth padding the size of each separate > "allocation" by PG_CACHE_LINE_SIZE. > Made this change. > > - fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize)); > > + fpPtr = ShmemInitStruct("Fast path lock arrays", TotalProcs * > (fpLockBitsSize + fpRelIdSize), &found); > > MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize)); > > > > /* For asserts checking we did not overflow. */ > > This one might actually make sense to keep separate, depending on the > configuration it can be reasonably big (max_connection = 1k, > max_locks_per_transaction=1k results in ~5MB).. > > OK PFA the rebased patches with the above changes. Kindly let me know your views. Thank you, Rahila Syed
v2-0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch
Description: Binary data
v2-0001-Account-for-initial-shared-memory-allocated-by-hash_.patch
Description: Binary data