On Mon, Aug 28, 2023 at 3:09 PM Peter Geoghegan <p...@bowt.ie> wrote: > I've long emphasized the importance of designs that just try to avoid > disaster. With that in mind, I wonder: have you thought about > conditioning page freezing on whether or not there are already some > frozen tuples on the page? You could perhaps give some weight to > whether or not the page already has at least one or two preexisting > frozen tuples when deciding on whether to freeze it once again now. > You'd be more eager about freezing pages that have no frozen tuples > whatsoever, compared to what you'd do with an otherwise equivalent > page that has no unfrozen tuples.
I'm sure this could be implemented, but it's unclear to me why you would expect it to perform well. Freezing a page that has no frozen tuples yet isn't cheaper than freezing one that does, so for this idea to be a win, the presence of frozen tuples on the page would have to be a signal that the page is likely to be modified again in the near future. In general, I don't see any reason why we should expect that to be the case. One could easily construct a workload where it is the case -- for instance, set up one table T1 where 90% of the tuples are repeatedly updated and the other 10% are never touched, and another table T2 that is insert-only. Once frozen, the never-updated tuples in T1 become sentinels that we can use to know that the table isn't insert-only. But I don't think that's very interesting: you can construct a test case like this for any proposed criterion, just by structuring the test workload so that whatever criterion is being tested is a perfect predictor of whether the page will be modified soon. What really matters here is finding a criterion that is likely to perform well in general, on a test case not known to us beforehand. This isn't an entirely feasible goal, because just as you can construct a test case where any given criterion performs well, so you can also construct one where any given criterion performs poorly. But I think a rule that has a clear theory of operation must be preferable to one that doesn't. The theory that Melanie and Andres are advancing is that a page that has been modified recently (in insert-LSN-time) is more likely to be modified again soon than one that has not i.e. the near future will be like the recent past. I'm not sure what the theory behind the rule you propose here might be; if you articulated it somewhere in your email, I seem to have missed it. -- Robert Haas EDB: http://www.enterprisedb.com