Am 06.04.22 um 18:55 schrieb Andres Freund: > Hi, > > On 2022-04-06 12:24:20 -0400, Robert Haas wrote: >> On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth >> <gunnar.bl...@pro-open.de> wrote: >>> Fair enough. At that point, a lot of things become unexpectedly painful. >>> How many % of the installed base may that be though? >> >> I don't have statistics on that, but it's large enough that the >> expense associated with the statistics collector is a reasonably >> well-known pain point, and for some users, a really severe one. > > Yea. I've seen well over 100MB/s of write IO solely due to stats files writes > on production systems, years ago.
Wow. Yeah, I tend to forget there's systems like ads' out there ;-) >> I'm fairly sure it's not going to make things so cheap that we can afford to >> add all the statistics anybody wants, but it's so painful that even modest >> relief would be more than welcome. > > It definitely doesn't make stats free. But I'm hopefull that avoiding the > regular writing out / readin back in, and the ability to only optionally store > some stats (by varying allocation size or just having different kinds of > stats), will reduce the cost sufficiently that we can start keeping more > stats. Knock on wood! > Which is not to say that these stats are the right ones (nor that they're the > wrong ones). ;-) > I think if I were to tackle providing more information about toasting, I'd > start not by adding a new stats view, but by adding a function to pgstattuple > that scans the relation and collects stats for each toasted column. An SRF > returning one row for each toastable column. With information like > > - column name > - #inline datums > - #compressed inline datums > - sum(uncompressed inline datum size) > - sum(compressed inline datum size) > - #external datums > - #compressed external datums > - sum(uncompressed external datum size) > - sum(compressed external datum size) > > IIRC this shouldn't require visiting the toast table itself. But it would still require a seqscan and quite some cycles. However, sure, something like that is an option. > Perhaps also an SRF that returns information about each compression method > separately (i.e. collect above information, but split by compression method)? > Perhaps even with the ability to measure how big the gains of recompressing > into another method would be? Even more of the above, but yeah, sound nifty. >>>> However, experience has taught me that a lot of skepticism is >>>> warranted when it comes to claims about how cheap extensions to the >>>> statistics system will be. >>> >>> Again, fair enough! >>> Maybe we first need statistics about statistics collection and handling? ;-) >> >> Heh. > > I've wondered about adding pg_stat_stats the other day, actually :) > https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de OMG LOL! -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de __________________________________________________________________________ "Ceterum censeo SystemD esse delendam" - Cato