On Tue, Dec 26, 2017 at 9:21 AM, Nikhil Sontakke <nikh...@2ndquadrant.com> wrote: > The main issue here is that HeapTupleSatisfiesVacuum *assumes* that > rows belonging to an aborted transaction are not visible to anyone > else.
One problem here is that if a transaction aborts, it might have done so after inserting or update a tuple in the heap and before inserting new index entries for the tuple, or after inserting only some of the necessary new index entries. Therefore, even if you prevent pruning, a snapshot from the point of view of the aborted transaction may be inconsistent. Similarly, if it aborts during a DDL operation, it may have made some but not all of the catalog changes involved, so that for example pg_class and pg_attribute could be inconsistent with each other or various pg_attribute rows could even be inconsistent among themselves. If you have a view of the catalog where these problems exist, you can't rely on, for example, being able to build a relcache entry without error. It is possible that you can avoid these problems if your snapshot is always using a command ID value that was reached prior to the error, although I'm not 100% sure that idea has no holes. Another problem is that CTID chains may be broken. Suppose that a transaction T1, using CID 1, does a HOT update of tuple A1 producing a new version A2. Then, later on, when the CID counter is at least 2, it aborts. A snapshot taken from the point of view of T1 at CID 1 should see A2. That will work fine most of the time. However, if transaction T2 comes along after T1 aborts and before logical decoding gets there and does its own HOT update of tuple A1 producing a new version A3, then tuple A2 is inaccessible through the indexes even if it still exists in the heap page. I think this problem is basically unsolvable and likely means that this whole approach needs to be abandoned. One other issue to consider is that the tuple freezing code assumes that any tuple that does not get removed when a page is pruned is OK to freeze. Commit 9c2f0a6c3cc8bb85b78191579760dbe9fb7814ec was necessary to repair a case where that assumption was violated. You might want to consider carefully whether there's any chance that this patch could introduce a similar problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company