> > > >> A possible theory is that pg_largeobject_metadata_oid_index has been >> corrupt for a long time, allowing a lot of duplicate entries to be made. >> However, unless pg_largeobject's pg_largeobject_loid_pn_index is *also* >> corrupt, you'd think that creation of such duplicates would still be >> stopped by that unique index. There's something mighty odd here. >> > > Let me roll back the test instance to before the first vacuumlo run and > verify if the index was OK before - will only get to do that on monday > though. > > To follow up on this and share my research: The index was corrupt also in the version before the vacuumlo (and possibly much longer), so that was only the indicator, not the cause. pg_largeobject's pg_largeobject_loid_pn_index was OK, and we most certainly did not have an OID overflow as max(OID) was around 90 million, far from the 4 billion for the 32 bit rollover. So that indicated another problem source.
I was able to correlate the transaction ids (xmin) of the duplicate rows in pg_largeobject_meta with times stored in another table, and found the duplicates were all added on one specific day, while the originals nicely spread out over time. I was further able to trace that date back to a cluster upgrade (pg_upgradecluster) we performed with that machine, so I guess we have a pretty good candidate. After the upgrade, no more duplicates were produced. As we have a way to recover, I'm not too concerned about this, but to add to the lessons learned: should there have been messages in the upgrade log or is this a known issue when upgrading? I'm pretty sure I checked the logs, but might have missed something if looking for the wrong things. Thanks, Tobias