On Sun, Jul 3, 2022 at 7:56 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> [Re-directing to -hackers] > > On Fri, Mar 11, 2022 at 2:27 PM Andres Freund <and...@anarazel.de> wrote: > > On 2022-03-10 20:09:56 -0500, Tom Lane wrote: > > > Andres Freund <and...@anarazel.de> writes: > > > > dshash: Add sequential scan support. > > > > Add ability to scan all entries sequentially to dshash. The > interface is > > > > similar but a bit different both from that of dynahash and simple > dshash > > > > search functions. The most significant differences is that dshash's > interfac > > > > always needs a call to dshash_seq_term when scan ends. > > > > > > Umm ... what about error recovery? Or have you just cemented the > > > proposition that long-lived dshashes are unsafe? > > > > I don't think this commit made it worse. dshash_seq_term() releases an > lwlock > > (which will be released in case of an error) and unsets > > hash_table->find_[exclusively_]locked. The latter weren't introduced by > this > > patch, and are also set by dshash_find(). > > > > I agree that ->find_[exclusively_]locked are problematic from an error > > recovery perspective. > > Right, as seen in the build farm at [1]. Also reproducible with something > like: > > @@ -269,6 +269,14 @@ dsm_impl_posix(dsm_op op, dsm_handle handle, Size > request_size, > return false; > } > > + /* XXX random fault injection */ > + if (op == DSM_OP_ATTACH && random() < RAND_MAX / 8) > + { > + close(fd); > + elog(ERROR, "chaos"); > + return false; > + } > + > > I must have thought that it was easy and practical to write no-throw > straight-line code and be sure to reach dshash_release_lock(), but I > concede that it was a bad idea: even dsa_get_address() can throw*, and > you're often likely to need to call that while accessing dshash > elements. For example, in lookup_rowtype_tupdesc_internal(), there is > a sequence dshash_find(), ..., dsa_get_address(), ..., > dshash_release_lock(), and I must have considered the range of code > between find and release to be no-throw, but now I know that it is > not. > > > It's per-backend state at least and just used for assertions. We could > remove > > it. Or stop checking it in places where it could be set wrongly: > dshash_find() > > and dshash_detach() couldn't check anymore, but the rest of the > assertions > > would still be valid afaics? > > Yeah, it's all for assertions... let's just remove it. Those > assertions were useful to me at some stage in development but won't > hold as well as I thought, at least without widespread PG_FINALLY(), > which wouldn't be nice. > > *dsa_get_address() might need to adjust the memory map with system > calls, which might fail. If you think of DSA as not only an allocator > but also a poor man's user level virtual memory scheme to tide us over > until we get threads, then this is a pretty low level kind of > should-not-happen failure that is analogous on some level to SIGBUS or > SIGSEGV or something like that, and we should PANIC. Then we could > claim that dsa_get_address() is no-throw. At least, that was one > argument I had with myself while investigating that strange Solaris > shm_open() failure, but ... I lost the argument. It's quite an > extreme position to take just to support these assertions, which are > of pretty limited value. > > [1] > https://www.postgresql.org/message-id/20220701232009.jcwxpl45bptaxv5n%40alap3.anarazel.de Hi, In the description, `new shared memory stats system in 15` It would be clearer to add `release` before `15`. Cheers