Melanie, As I mentioned to you off-list, I feel like this needs some sort of recency bias. Certainly vacuum, and really almost any conceivable user of this facility, is going to care more about accurate answers for new data than for old data. If there's no recency bias, then I think that eventually answers for more recent LSNs will start to become less accurate, since they've got to share the data structure with more and more time from long ago. I don't think you've done anything about this in this version of the patch, but I might be wrong.
One way to make the standby more accurately mimic the primary would be to base entries on the timestamp-LSN data that is already present in the WAL, i.e. {COMMIT|ABORT} [PREPARED] records. If you only added or updated entries on the primary when logging those records, the standby could redo exactly what the primary did. A disadvantage of this approach is that if there are no commits for a while then your mapping gets out of date, but that might be something we could just tolerate. Another possible solution is to log the changes you make on the primary and have the standby replay those changes. Perhaps I'm wrong to advocate for such solutions, but it feels error-prone to have one algorithm for the primary and a different algorithm for the standby. You now basically have two things that can break and you have to debug what went wrong instead of just one. In terms of testing this, I advocate not so much performance testing as accuracy testing. So for example if you intentionally change the LSN consumption rate during your test, e.g. high LSN consumption rate for a while, then low for while, then high again for a while, and then graph the contents of the final data structure, how well does the data structure model what actually happened? Honestly, my whole concern here is really around the lack of recency bias. If you simply took a sample every N seconds until the buffer was full and then repeatedly thinned the data by throwing away every other sample from the older half of the buffer, then it would be self-evident that accuracy for the older data was going to degrade over time, but also that accuracy for new data wasn't going to degrade no matter how long you ran the algorithm, simply because the newest half of the data never gets thinned. But because you've chosen to throw away the point that leads to the least additional error (on an imaginary request distribution that is just as likely to care about very old things as it is to care about new ones), there's nothing to keep the algorithm from getting into a state where it systematically throws away new data points and keeps old ones. To be clear, I'm not saying the algorithm I just mooted is the right one or that it has no weaknesses; for example, it needlessly throws away precision that it doesn't have to lose when the rate of LSN consumption is constant for a long time. I don't think that necessarily matters because the algorithm doesn't need to be as accurate as possible; it just needs to be accurate enough to get the job done. -- Robert Haas EDB: http://www.enterprisedb.com