Stephen Frost wrote: > Greetings, > > Looks like we might not be entirely out of the woods yet regarding > MultiXactId's. After doing an upgrade from 9.2.6 to 9.3.4, we saw the > following: > > ERROR: MultiXactId 6849409 has not been created yet -- apparent wraparound > > The table contents can be select'd out and match the pre-upgrade > backup, but any attempt to VACUUM / VACUUM FULL / CLUSTER fails, > unsurprisingly.
I finally figured what is going on here, though I don't yet have a patch. This has been reported a number of times: https://www.postgresql.org/message-id/20140330040029.GY4582%40tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902%40publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109%40pscs.co.uk https://www.postgresql.org/message-id/20160614173150.GA443784@alvherre.pgsql https://www.postgresql.org/message-id/20160615203829.5798.4...@wrigleys.postgresql.org We theorised that we were missing some place that was failing to pass the "allow_old" flag to GetMultiXactIdMembers; and since we couldn't find any and the problem was worked around simply (by doing SELECT FOR UPDATE or equivalent on the affected tuples), there was no further research. (The allow_old flag is passed for tuples that match an infomask bit pattern that can only come from tuples locked in 9.2 and prior, i.e. one that is never set by 9.3ff). Yesterday I had to deal with it and quickly found what is going wrong: the problem is that 9.2 and earlier it was acceptable (and common) to leave tuples with very old multixacts in xmax, even after multixact counter wraparound. When one such value was found in a live tuple, GetMultiXactIdMembers() would notice that it was out of range and simply return "no members", at which point heap_update and siblings would consider the tuple as not locked and move on. When pg_upgrading a database containing tuples marked like that, the new code would error out, because during 9.3 multixact we considered that it was dangerous to silently allow tuples to be marked by values we didn't keep track of, so we made it an error instead, per https://www.postgresql.org/message-id/20111204122027.GA10035%40tornado.leadboat.com Some cases are allowed to be downgraded to DEBUG, when allow_old is true. I think that was a good choice in general so that possibly-data-eating bugs could be reported, but there's a problem in the specific case of tuples carried over by pg_upgrade whose Multixact is "further in the future" compared to the nextMultiXactId counter. I think it's pretty clear that we should let that error be downgraded to DEBUG too, like the other checks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers