On Thu, Dec 28, 2023 at 11:33:16AM +1300, Thomas Munro wrote: > I guess the large object usage isn't directly relevant (that module's > EOXact stuff seems to be finished before TRANS_COMMIT, but I don't > know that code well). Everything later is supposed to be about > closing/releasing/cleaning up, and for example smgrDoPendingDeletes() > reaches code with this relevant comment: > > * Note: smgr_unlink must treat deletion failure as a WARNING, not an > * ERROR, because we've already decided to commit or abort the current > * xact. > > We don't really have a general ban on ereporting on system call > failure, though. We've just singled unlink() out. Only a few lines > above that we call DropRelationsAllBuffers(rels, nrels), and that > calls smgrnblocks(), and that might need to need to re-open() the > relation file to do lseek(SEEK_END), because PostgreSQL itself has no > tracking of relation size. Hard to say but my best guess is that's > where you might have got your EIO, assuming you dropped the relation > in this transaction?
Yeah. In fact I was confused - this was not lo_unlink(). This uses normal tables, so would've done: "begin;" "DROP TABLE IF EXISTS %s", tablename "DELETE FROM cached_objects WHERE cache_name=%s", tablename "commit;" > > This is pg16 compiled at efa8f6064, runing under centos7. ZFS is 2.2.2, > > but the pool hasn't been upgraded to use the features new since 2.1. > > I've been following recent ZFS stuff from a safe distance as a user. > AFAIK the extremely hard to hit bug fixed in that very recent release > didn't technically require the interesting new feature (namely block > cloning, though I think that helped people find the root cause after a > phase of false blame?). Anyway, it had for symptom some bogus zero > bytes on read, not a spurious EIO. The ZFS bug had to do with bogus bytes which may-or-may-not-be-zero, as I understand. The understanding is that the bug was pre-existing but became more easy to hit in 2.2, and is fixed in 2.2.2 and 2.1.14. -- Justin