Hello, hackers. Currently hint bits in the index pages (dead tuples) are set and taken into account only at primary server. Standby just ignores it. It is done for reasons, of course (see RelationGetIndexScan and [1]):
* We do this because the xmin on the primary node could easily be * later than the xmin on the standby node, so that what the primary * thinks is killed is supposed to be visible on standby. So for correct * MVCC for queries during recovery we must ignore these hints and check * all tuples. Also, according to [2] and cases like [3] it seems to be good idea to support "ignore_killed_tuples" on standby. I hope I know the way to support it correctly with reasonable amount of changes. First thing we need to consider - checksums and wal_log_hints are widely used these days. So, at any moment master could send FPW page with new "killed tuples" hints and overwrite hints set by standby. Moreover it is not possible to distinguish hints are set by primary or standby. And there is where hot_standby_feedback comes to play. Master node considers xmin of hot_standy_feedback replicas (RecentGlobalXmin) while setting "killed tuples" bits. So, if hot_standby_feedback is enabled on standby for a while - it could safely trust hint bits from master. Also, standby could set own hints using xmin it sends to primary during feedback (but without marking page as dirty). Of course all is not so easy, there are a few things and corner cases to care about * Looks like RecentGlobalXmin could be moved backwards in case of new replica with lower xmin is connected (or by switching some replica to hot_standby_feedback=on). We must ensure RecentGlobalXmin is moved strictly forward. * hot_standby_feedback could be enabled on the fly. In such a case we need distinguish transactions which are safe or unsafe to deal with hints. Standby could receive fresh RecentGlobalXmin as response to feedback message. All standby transactions with xmin >= RecentGlobalXmin are safe to use hints. * hot_standby_feedback could be disabled on the fly. In such situation standby needs to continue to send feedback while canceling all queries with ignore_killed_tuples=true. Once all such queries are canceled - feedback are no longer needed and should be disabled. Could someone validate my thoughts please? If the idea is mostly correct - I could try to implement and test it. [1] - https://www.postgresql.org/message-id/flat/7067.1529246768%40sss.pgh.pa.us#d9e2e570ba34fc96c4300a362cbe8c38 [2] - https://www.postgresql.org/message-id/flat/12843.1529331619%40sss.pgh.pa.us#6df9694fdfd5d550fbb38e711d162be8 [3] - https://www.postgresql.org/message-id/flat/20170428133818.24368.33533%40wrigleys.postgresql.org