>>>>> "Amit" == Amit Kapila <amit.kapil...@gmail.com> writes:
Amit> I haven't tried to reproduce it, but I can see the possibility of Amit> the problem described by you. What should we do next? I could see Amit> few possibilities: (a) Vacuum for main and toast table should Amit> always use same OldestXmin, I don't see how this could be arranged without either disabling the ability to vacuum the toast table independently, or storing the xmin somewhere. Amit> (b) Before removing the row from toast table, we should ensure Amit> that row in the main table is removed, We can't find the main table row version(s) without scanning the whole main table, so this amounts (again) to disabling vacuum on the toast table only. Amit> (c) Change the logic during rewrite such that it can detect this Amit> situation and skip copying the tuple in the main table, This seems more promising, but the problem is how to detect the error safely (since currently, it's just ereport()ed from deep inside the toast code). How about: 1) add a toast_datum_exists() call or similar that returns false if the (first block of) the specified external toast item is missing 2) when copying a RECENTLY_DEAD tuple, check all its external column values using this function beforehand, and if any are missing, treat the tuple as DEAD instead. Amit> on a quick look, this seems tricky, because the toast table TID Amit> might have been reused by that time, Toast pointers don't point to TIDs fortunately, they point to OIDs. The OID could have been reused (if there's been an OID wraparound since the toast item was created), but that should be harmless: it means that we'd incorrectly copy the new value with the old tuple, but the old tuple is never going to be visible to anybody ever again so nothing will see that. Amit> or (d) Ensure that GetOldestXmin doesn't move backwards or write Amit> a new API similar to it which doesn't allow OldestXmin to move Amit> backwards and use that for the purpose of vacuum. This would presumably require storing a reference OldestXmin somewhere (in shared memory? but there'd have to be a global one and one per db, and I don't think we have any shared memory structures at present that are per-db). We'd even have to preserve an oldestXmin for databases with no currently active connections, though we wouldn't have to preserve it across restarts. Amit> Any better ideas? It turns out this issue has come up before (5 years ago!): https://www.postgresql.org/message-id/20362.1359747327%40sss.pgh.pa.us -- Andrew (irc:RhodiumToad)