On Tue, Mar 15, 2022 at 5:48 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > I apologize for thinking out loud a bit here, but I hope this gives you > some insight into my perspective on this. In general, I am skeptical that > we can choose one threshold that will work for all PostgreSQL installations > in the known universe.
I would like to take a slightly contrary position. I think that a system here that involves multiple knobs is going to be too complicated to be of any real-world use, because almost nobody will understand it or tune it properly for their installation. And who is to say that a single setting would even be appropriate for a whole installation, as opposed to something that needs to be tuned for each individual table? A single tunable might be OK, but what I would really like here is a heuristic that, while perhaps not optimal in every environment, is good enough that a very high percentage of users will never need to worry about it. In a case like this, a small gain that happens automatically feels better than a large gain that requires manual tweaking in every install. Now that doesn't answer the question of what that heuristic should be. I initially thought that if compression didn't end up reducing the number of TOAST chunks then there was no point, but that's not true, because having the last chunk be smaller than the rest can allow us to fit more than 4 into a page rather than exactly 4. However, this effect isn't likely to be important if most chunks are full-size chunks. If we insert 100 full-size chunks and 1 partial chunk at the end, we can't save much even if that chunk ends up being 1 byte instead of a full-size chunk. 25 TOAST table pages out of 26 are going to be full of full-size chunks either way, so we can't save more than ~4% and we might easily save nothing at all. As you say, the potential savings are larger as the values get smaller, because a higher proportion of the TOAST table pages are not quite full. So I think the only cases where it's even interesting to think about suppressing this optimization are those where the value is quite small -- and maybe the thing to do is just pick a conservatively large threshold and call it good. For example, say that instead of applying this technique when there are at least 2 TOAST chunks, we apply it when there are at least 500 (i.e. normally 1MB). It's hard for me to imagine that we can ever lose, and in fact I think we could come down quite a bit from there and still end up winning pretty much all the time. Setting the threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may leave some money on the table, but if it means that we don't have meaningful downsides and we don't have tunables for users to fiddle with, it might be worth it. -- Robert Haas EDB: http://www.enterprisedb.com