The attached patch refactors heap_hot_search_buffer() so that index_getnext() can use it, and modifies index_getnext() to do so.
The idea is based on one of Heikki's index-only scan patches, from 2009: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00676.php I believe, however, that this portion of that patch stands alone, completely independently of index-only scans. At present, we have two copies of the logic to traverse HOT chains floating around, which means they can get out of sync, possibly resulting in bugs. This has already happened once: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00733.php As a nice bonus, the new logic is both simpler and, I believe, more correct than the current logic. In IndexScanDescData, xs_hot_dead, xs_next_hot, and xs_prev_xmax get replaced by a single boolean xs_continue_hot. There is a small behavioral difference: in the current code, when use a non-MVCC snapshot with index_getnext() and walk a HOT chain, each call remembers the *next* TID that should be examined. With this change, we instead remember the TID that we most recently returned, and compute the next TID when index_getnext() is called again. It seems to me that this is really what we should have been doing all along. Imagine, for example, that we have a HOT chain A -> B, where B has not yet committed. We return A and remember that we next intend to look at B. Before index_getnext() is called, B aborts and a new HOT update produces a HOT chain A -> C. The next call to index_getnext() will nonetheless look at B and conclude that it's reached the end of the HOT chain. This doesn't actually matter a bit for current uses of index_getnext(), because - at least according to Heikki's old notes - the only place we use a non-MVCC snapshot with this function is during CLUSTER. And at that point, we have the whole table locked down, so nothing is happening concurrently. I'm not sure there's any possibility of us ever using this function in a way that would break under the current implementation, but what I'm proposing here does seem more robust to me. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
heap-hot-search-refactoring.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers