On Tue, 24 Dec 2024 at 02:00, Alexander Lakhin <exclus...@gmail.com> wrote: > Please look at the following query, which triggers (sometimes not on a > first run) an assert added with 5983a4cff:
> regression=# SELECT COUNT(*) FROM > (SELECT (aclexplode(proacl)).* FROM pg_proc) a, > (SELECT oid FROM pg_proc UNION ALL SELECT oid FROM pg_proc) b; > WARNING: terminating connection because of crash of another server process > ... > TRAP: failed Assert("memcmp(&snapshot, cattr, sizeof(CompactAttribute)) == > 0"), File: "../../../../src/include/access/tupdesc.h", Line: 191, PID: 1302048 This seems to be caused by the Assert code itself rather than an actual bug. The reason it's not happening every time is because the query is expensive enough to be parallelised and the Assert failure is in the parallel worker. The TupleDesc in question is coming from the type cache, which is in shared memory, and because TupleDescCompactAttr() calls populate_compact_attribute() and overwrites the TupleDesc's CompactAttribute, there seems to be some sort of race condition between the leader and worker regarding the timing of populate_compact_attribute's memset to zero and the repopulation of the CompactAttribute. I added some debug code to compare the two CompactAttributes when the memcmp returns non-zero and I sometimes see all fields match, even with adding a 3-byte array at the end so there's no hole at the end of the struct. The attached adjusts that Assert code so that a fresh CompactAttribute is populated instead of modifying the TupleDesc's one. I'm not sure if populate_compact_attribute_internal() is exactly the nicest way to do this. I'll think a bit harder about that. Assume the attached is POC grade. (This makes me wonder if there's any race condition hazard in prior versions regarding the setting of FormData_pg_attribute.attcacheoff) David
fix_populate_compact_attribute.patch
Description: Binary data