On Tue, Oct 24, 2023 at 12:08 PM Robert Haas <robertmh...@gmail.com> wrote: > Note that whether to remove summaries is a separate question from > whether to generate them in the first place. Right now, I have > wal_summarize_mb controlling whether they get generated in the first > place, but as I noted in another recent email, that isn't an entirely > satisfying solution.
I did some more research on this. My conclusion is that I should remove wal_summarize_mb and just have a GUC summarize_wal = on|off that controls whether the summarizer runs at all. There will be one summary file per checkpoint, no matter how far apart checkpoints are or how large the summary gets. Below I'll explain the reasoning; let me know if you disagree. What I describe above would be a bad plan if it were realistically possible for a summary file to get so large that it might run the machine out of memory either when producing it or when trying to make use of it for an incremental backup. This seems to be a somewhat difficult scenario to create. So far, I haven't been able to generate WAL summary files more than a few tens of megabytes in size, even when summarizing 50+ GB of WAL per summary file. One reason why it's hard to produce large summary files is because, for a single relation fork, the WAL summary size converges to 1 bit per modified block when the number of modified blocks is large. This means that, even if you have a terabyte sized relation, you're looking at no more than perhaps 20MB of summary data no matter how much of it gets modified. Now, somebody could have a 30TB relation and then if they modify the whole thing they could have the better part of a gigabyte of summary data for that relation, but if you've got a 30TB table you probably have enough memory that that's no big deal. But, what if you have multiple relations? I initialized pgbench with a scale factor of 30000 and also with 30000 partitions and did a 1-hour run. I got 4 checkpoints during that time and each one produced an approximately 16MB summary file. The efficiency here drops considerably. For example, one of the files is 16495398 bytes and records information on 7498403 modified blocks, which works out to about 2.2 bytes per modified block. That's more than an order of magnitude worse than what I got in the single-relation case, where the summary file didn't even use two *bits* per modified block. But here again, the file just isn't that big in absolute terms. To get a 1GB+ WAL summary file, you'd need to modify millions of relation forks, maybe tens of millions, and most installations aren't even going to have that many relation forks, let alone be modifying them all frequently. My conclusion here is that it's pretty hard to have a database where WAL summarization is going to use too much memory. I wouldn't be terribly surprised if there are some extreme cases where it happens, but those databases probably aren't great candidates for incremental backup anyway. They're probably databases with millions of relations and frequent, widely-scattered modifications to those relations. And if you have that kind of high turnover rate then incremental backup isn't going to as helpful anyway, so there's probably no reason to enable WAL summarization in the first place. Maybe if you have that plus in the same database cluster you have a 100TB of completely static data that is never modified, and if you also do all of this on a pretty small machine, then you can find a case where incremental backup would have worked well but for the memory consumed by WAL summarization. But I think that's sufficiently niche that the current patch shouldn't concern itself with such cases. If we find that they're common enough to worry about, we might eventually want to do something to mitigate them, but whether that thing looks anything like wal_summarize_mb seems pretty unclear. So I conclude that it's a mistake to include that GUC as currently designed and propose to replace it with a Boolean as described above. Comments? -- Robert Haas EDB: http://www.enterprisedb.com