[Re-directing to -hackers] On Fri, Mar 11, 2022 at 2:27 PM Andres Freund <and...@anarazel.de> wrote: > On 2022-03-10 20:09:56 -0500, Tom Lane wrote: > > Andres Freund <and...@anarazel.de> writes: > > > dshash: Add sequential scan support. > > > Add ability to scan all entries sequentially to dshash. The interface is > > > similar but a bit different both from that of dynahash and simple dshash > > > search functions. The most significant differences is that dshash's > > > interfac > > > always needs a call to dshash_seq_term when scan ends. > > > > Umm ... what about error recovery? Or have you just cemented the > > proposition that long-lived dshashes are unsafe? > > I don't think this commit made it worse. dshash_seq_term() releases an lwlock > (which will be released in case of an error) and unsets > hash_table->find_[exclusively_]locked. The latter weren't introduced by this > patch, and are also set by dshash_find(). > > I agree that ->find_[exclusively_]locked are problematic from an error > recovery perspective.
Right, as seen in the build farm at [1]. Also reproducible with something like: @@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size request_size, return false; } + /* XXX random fault injection */ + if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8) + { + close(fd); + elog(ERROR, "chaos"); + return false; + } + I must have thought that it was easy and practical to write no-throw straight-line code and be sure to reach dshash_release_lock(), but I concede that it was a bad idea: even dsa_get_address() can throw*, and you're often likely to need to call that while accessing dshash elements. For example, in lookup_rowtype_tupdesc_internal(), there is a sequence dshash_find(), ..., dsa_get_address(), ..., dshash_release_lock(), and I must have considered the range of code between find and release to be no-throw, but now I know that it is not. > It's per-backend state at least and just used for assertions. We could remove > it. Or stop checking it in places where it could be set wrongly: dshash_find() > and dshash_detach() couldn't check anymore, but the rest of the assertions > would still be valid afaics? Yeah, it's all for assertions... let's just remove it. Those assertions were useful to me at some stage in development but won't hold as well as I thought, at least without widespread PG_FINALLY(), which wouldn't be nice. *dsa_get_address() might need to adjust the memory map with system calls, which might fail. If you think of DSA as not only an allocator but also a poor man's user level virtual memory scheme to tide us over until we get threads, then this is a pretty low level kind of should-not-happen failure that is analogous on some level to SIGBUS or SIGSEGV or something like that, and we should PANIC. Then we could claim that dsa_get_address() is no-throw. At least, that was one argument I had with myself while investigating that strange Solaris shm_open() failure, but ... I lost the argument. It's quite an extreme position to take just to support these assertions, which are of pretty limited value. [1] https://www.postgresql.org/message-id/20220701232009.jcwxpl45bptaxv5n%40alap3.anarazel.de
From e3e87a9ec39d3456a7bf29796102502d3724993e Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 4 Jul 2022 11:56:02 +1200 Subject: [PATCH] Remove spurious assertions from dshash.c. dshash.c maintained flags to track locking, in order to make assertions that the dshash API was being used correctly. Unfortunately the interaction with ereport's non-local exits was not thought through carefully enough. While lwlocks are released automatically, the flags can get out of sync. Since they're only used for assertions anyway, we can just remove them. This problem was noted by Tom and Andres while reviewing changes to support the new shared memory stats system in 15, the second user of dshash.c. On closer inspection, even the earlier typcache.c code is not guaranteed to meet the asserted conditions, now that it's been highlighted that even dsa_get_address() can throw (albeit in unlikely circumstances). Back-patch to 11, where dshash.c arrived. Reported-by: Tom Lane <t...@sss.pgh.pa.us> Reported-by: Andres Freund <and...@anarazel.de> Discussion: https://postgr.es/m/20220311012712.botrpsikaufzt...@alap3.anarazel.de diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c index ec454b4d65..9706e7926b 100644 --- a/src/backend/lib/dshash.c +++ b/src/backend/lib/dshash.c @@ -110,8 +110,6 @@ struct dshash_table dshash_table_control *control; /* Control object in DSM. */ dsa_pointer *buckets; /* Current bucket pointers in DSM. */ size_t size_log2; /* log2(number of buckets) */ - bool find_locked; /* Is any partition lock held by 'find'? */ - bool find_exclusively_locked; /* ... exclusively? */ }; /* Given a pointer to an item, find the entry (user data) it holds. */ @@ -234,9 +232,6 @@ dshash_create(dsa_area *area, const dshash_parameters *params, void *arg) } } - hash_table->find_locked = false; - hash_table->find_exclusively_locked = false; - /* * Set up the initial array of buckets. Our initial size is the same as * the number of partitions. @@ -285,8 +280,6 @@ dshash_attach(dsa_area *area, const dshash_parameters *params, hash_table->params = *params; hash_table->arg = arg; hash_table->control = dsa_get_address(area, control); - hash_table->find_locked = false; - hash_table->find_exclusively_locked = false; Assert(hash_table->control->magic == DSHASH_MAGIC); /* @@ -309,8 +302,6 @@ dshash_attach(dsa_area *area, const dshash_parameters *params, void dshash_detach(dshash_table *hash_table) { - Assert(!hash_table->find_locked); - /* The hash table may have been destroyed. Just free local memory. */ pfree(hash_table); } @@ -400,7 +391,6 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive) partition = PARTITION_FOR_HASH(hash); Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(!hash_table->find_locked); LWLockAcquire(PARTITION_LOCK(hash_table, partition), exclusive ? LW_EXCLUSIVE : LW_SHARED); @@ -418,8 +408,6 @@ dshash_find(dshash_table *hash_table, const void *key, bool exclusive) else { /* The caller will free the lock by calling dshash_release_lock. */ - hash_table->find_locked = true; - hash_table->find_exclusively_locked = exclusive; return ENTRY_FROM_ITEM(item); } } @@ -449,7 +437,6 @@ dshash_find_or_insert(dshash_table *hash_table, partition = &hash_table->control->partitions[partition_index]; Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(!hash_table->find_locked); restart: LWLockAcquire(PARTITION_LOCK(hash_table, partition_index), @@ -494,8 +481,6 @@ restart: } /* The caller must release the lock with dshash_release_lock. */ - hash_table->find_locked = true; - hash_table->find_exclusively_locked = true; return ENTRY_FROM_ITEM(item); } @@ -514,7 +499,6 @@ dshash_delete_key(dshash_table *hash_table, const void *key) bool found; Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(!hash_table->find_locked); hash = hash_key(hash_table, key); partition = PARTITION_FOR_HASH(hash); @@ -551,14 +535,10 @@ dshash_delete_entry(dshash_table *hash_table, void *entry) size_t partition = PARTITION_FOR_HASH(item->hash); Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(hash_table->find_locked); - Assert(hash_table->find_exclusively_locked); Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE)); delete_item(hash_table, item); - hash_table->find_locked = false; - hash_table->find_exclusively_locked = false; LWLockRelease(PARTITION_LOCK(hash_table, partition)); } @@ -572,13 +552,8 @@ dshash_release_lock(dshash_table *hash_table, void *entry) size_t partition_index = PARTITION_FOR_HASH(item->hash); Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(hash_table->find_locked); - Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition_index), - hash_table->find_exclusively_locked - ? LW_EXCLUSIVE : LW_SHARED)); + Assert(LWLockHeldByMe(PARTITION_LOCK(hash_table, partition_index))); - hash_table->find_locked = false; - hash_table->find_exclusively_locked = false; LWLockRelease(PARTITION_LOCK(hash_table, partition_index)); } @@ -644,7 +619,6 @@ dshash_seq_next(dshash_seq_status *status) if (status->curpartition == -1) { Assert(status->curbucket == 0); - Assert(!status->hash_table->find_locked); status->curpartition = 0; @@ -702,8 +676,6 @@ dshash_seq_next(dshash_seq_status *status) status->curitem = dsa_get_address(status->hash_table->area, next_item_pointer); - status->hash_table->find_locked = true; - status->hash_table->find_exclusively_locked = status->exclusive; /* * The caller may delete the item. Store the next item in case of @@ -722,9 +694,6 @@ dshash_seq_next(dshash_seq_status *status) void dshash_seq_term(dshash_seq_status *status) { - status->hash_table->find_locked = false; - status->hash_table->find_exclusively_locked = false; - if (status->curpartition >= 0) LWLockRelease(PARTITION_LOCK(status->hash_table, status->curpartition)); } @@ -743,8 +712,6 @@ dshash_delete_current(dshash_seq_status *status) Assert(status->exclusive); Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(hash_table->find_locked); - Assert(hash_table->find_exclusively_locked); Assert(LWLockHeldByMeInMode(PARTITION_LOCK(hash_table, partition), LW_EXCLUSIVE)); @@ -762,7 +729,6 @@ dshash_dump(dshash_table *hash_table) size_t j; Assert(hash_table->control->magic == DSHASH_MAGIC); - Assert(!hash_table->find_locked); for (i = 0; i < DSHASH_NUM_PARTITIONS; ++i) { -- 2.36.1