I'm not saying there is no multixact bug here, but I wonder if this part of your crasher patch might be the cause:
--- 754,771 ---- errmsg("could not seek to block %u in file \"%s\": %m", blocknum, FilePathName(v->mdfd_vfd)))); ! if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) { ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3); ! ereport(FATAL, ! (errcode(ERRCODE_DISK_FULL), ! errmsg("could not write block %u of relation %s: wrote only %d of %d bytes", ! blocknum, ! relpath(reln->smgr_rnode, forknum), ! nbytes, BLCKSZ), ! errhint("JJ is screwing with the database."))); ! } else { ! nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ); ! } Wouldn't this BLCKSZ/3 business update the page's LSN but not the full contents, meaning that on xlog replay the block wouldn't be rewritten when the xlog replays next time around? That could cause the block to have the upper two thirds containing multixacts in xmax that had been removed by a vacuuming round previous to the crash. (Maybe I'm just too tired and I'm failing to fully understand the torn page protection. I thought I understood how it worked, but now I'm not sure -- I mean I don't see how it can possibly have any value at all. Surely if the disk writes the first 512-byte sector of the page and then forgets the updates to the next 15 sectors, the page will appear as not needing the full page image to be restored ...) Is the page containing the borked multixact value the one that was half-written by this code? Is the problem reproducible if you cause this path to ereport(FATAL) without writing 1/3rd of the page? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers