On Wed, Apr 12, 2023 at 7:46 AM Justin Pryzby <pry...@telsasoft.com> wrote: > Unfortunately: > (gdb) p area->control->handle > $3 = 0 > (gdb) p segment_map->header->magic > value has been optimized out > (gdb) p index > $4 = <value optimized out>
Hmm, well index I can find from parameters: > #2 0x0000000000991470 in ExceptionalCondition (conditionName=<value > optimized out>, errorType=<value optimized out>, fileName=<value optimized > out>, lineNumber=1770) at assert.c:69 > #3 0x00000000009b9f97 in get_segment_by_index (area=0x22818c0, index=<value > optimized out>) at dsa.c:1769 > #4 0x00000000009ba192 in dsa_get_address (area=0x22818c0, dp=1099511703168) > at dsa.c:953 We have dp=1099511703168 == 0x10000012680, so index == 1 and the rest is the offset into that segment. It's not the initial segment in the main shared memory area created by the postmaster with dsa_create_in_place() (that'd be index 0), it's in an extra segment that was created with shm_open(). We managed to open and mmap() that segment, but it contains unexpected garbage. Can you print *area->control? And then can you see that the DSM handle is in index 1 in "segment_handles" in there? Then can you see if your system has a file with that number in its name under /dev/shm/, and can you tell me what "od -c /dev/shm/..." shows as the first few lines of stuff at the top, so we can see what that unexpected garbage looks like? Side rant: I don't think there's any particular indication that it's the issue here, but while it's on my mind: I really wish we didn't use random numbers for DSM handles. I understand where it came from: the need to manage SysV shmem keyspace (a DSM mode that almost nobody uses, but whose limitations apply to all modes). We've debugged issues relating to handle collisions before, causing unrelated DSM segments to be confused, back when the random seed was not different in each backend making collisions likely. For every other mode, we could instead use something like (slot, generation) to keep collisions as far apart as possible (generation wraparound), and avoid collisions between unrelated clusters by using the pgdata path as a shm_open() prefix. Another idea is to add a new DSM mode that would use memfd and similar things and pass fds between backends, so that the segments are entirely anonymous and don't need to be cleaned up after a crash (I thought about that while studying the reasons why PostgreSQL can't run on Capsicum (a capabilities research project) or Android (a telephone), both of which banned SysV *and* POSIX shm because system-global namespaces are bad).