On Fri, Jan 19, 2018 at 5:19 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > Regarding the HOT issue - I have to admit I don't quite see why A2 > wouldn't be reachable through the index, but that's likely due to my > limited knowledge of the HOT internals.
The index entries only point to the root tuple in the HOT chain. Any subsequent entries can only be reached by following the CTID pointers (that's why they are called "Heap Only Tuples"). After T1 aborts, we're still OK because the CTID link isn't immediately cleared. But after T2 updates the tuple, it makes A1's CTID link point to A3, leaving no remaining link to A2. Although in most respects PostgreSQL treats commits and aborts surprisingly symmetrically, CTID links are an exception. When T2 comes to A1, it sees that A1's xmax is T1 and checks the status of T1. If T1 is still in progress, it waits. If T2 has committed, it must either abort with a serialization error or update A2 instead under EvalPlanQual semantics, depending on the isolation level. If T2 has aborted, it assumes that the CTID field of T1 is garbage nobody cares about, adds A3 to the page, and makes A1 point to A3 instead of A2. No record of the A1->A2 link is kept anywhere *precisely because* A2 can no longer be visible to anyone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company