On Wed, Apr 22, 2020 at 6:05 PM Peter Geoghegan <p...@bowt.ie> wrote: > (It would be nice if we could teach Valgrind to "poison" buffers when > we don't have a pin held...that would probably have caught this issue > almost immediately.)
I can get Valgrind to complain about it when the regression tests are run with the attached patch applied. I see this in the logs at several points when "make installcheck" runs: ==1082059== VALGRINDERROR-BEGIN ==1082059== Invalid read of size 4 ==1082059== at 0x21D8DE: btvacuumpage (nbtree.c:1370) ==1082059== by 0x21DA61: btvacuumscan (nbtree.c:1039) ==1082059== by 0x21DBD5: btbulkdelete (nbtree.c:879) ==1082059== by 0x215821: index_bulk_delete (indexam.c:698) ==1082059== by 0x20FDCE: lazy_vacuum_index (vacuumlazy.c:2427) ==1082059== by 0x2103EA: lazy_vacuum_all_indexes (vacuumlazy.c:1794) ==1082059== by 0x211EA1: lazy_scan_heap (vacuumlazy.c:1681) ==1082059== by 0x211EA1: heap_vacuum_rel (vacuumlazy.c:510) ==1082059== by 0x360414: table_relation_vacuum (tableam.h:1457) ==1082059== by 0x360414: vacuum_rel (vacuum.c:1880) ==1082059== by 0x361785: vacuum (vacuum.c:449) ==1082059== by 0x361F0E: ExecVacuum (vacuum.c:249) ==1082059== by 0x4D979C: standard_ProcessUtility (utility.c:823) ==1082059== by 0x4D9C7F: ProcessUtility (utility.c:522) ==1082059== by 0x4D6791: PortalRunUtility (pquery.c:1157) ==1082059== by 0x4D725F: PortalRunMulti (pquery.c:1303) ==1082059== by 0x4D7CEF: PortalRun (pquery.c:779) ==1082059== by 0x4D3BB7: exec_simple_query (postgres.c:1239) ==1082059== by 0x4D4ABD: PostgresMain (postgres.c:4315) ==1082059== by 0x45B0C9: BackendRun (postmaster.c:4510) ==1082059== by 0x45B0C9: BackendStartup (postmaster.c:4202) ==1082059== by 0x45B0C9: ServerLoop (postmaster.c:1727) ==1082059== by 0x45C754: PostmasterMain (postmaster.c:1400) ==1082059== by 0x3BDD68: main (main.c:210) ==1082059== Address 0x6cc7378 is in a rw- anonymous segment ==1082059== ==1082059== VALGRINDERROR-END (The line numbers might be slightly different to master here, but the line from btvacuumpage() is definitely the one that accesses the special area of the B-Tree page after we drop the pin.) This patch is very rough -- it was just the first thing that I tried. I don't know how Valgrind remembers the status of shared memory regions across backends when they're marked with VALGRIND_MAKE_MEM_NOACCESS(). Even still, this idea looks promising. I should try to come up with a committable patch before too long. The good news is that the error I showed is the only error that I see, at least with this rough patch + "make installcheck". It's possible that the patch isn't as effective as it could be, though. For one thing, it definitely won't detect incorrect buffer accesses where a pin is held but a buffer lock is not held. That seems possible, but a bit harder. -- Peter Geoghegan
0002-Valgrind-instrumentation.patch
Description: Binary data