On Sun, Oct 4, 2015 at 3:10 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> That seems perfectly reasonable, yes. Should I leave that to you? > > After a closer look I decided that wasn't reasonable at all. Discounting > sticky texts would then mean that after a GC cycle, we might still think > the query texts file is bloated and issue another GC request, which of > course would not shrink the file, so that we'd be doing GC cycles every > time we added a new query.
That seems unlikely to me. Firstly, there is a generic check of 512 bytes per entry within need_gc_qtexts() -- we never GC if that generic limit is not exceeded, and that's far from tiny. Secondly, how long do you think those sticky entries will stay around in the hastable to continue to cause trouble, and dominate over regular entries? There'd have to be constant evictions/cache pressure for this situation to occur, because the threshold within need_gc_qtexts() is based on pg_stat_statements.max, and yet many evictions aggressively evict sticky entries very quickly. Seems like a very unusual workload to me. I think that you're overestimating the cost of discounting sticky entries, which are usually very much in the minority (at least when it matters, with cache pressure), and underestimating the cost of continuing to weigh sticky entries' contribution to mean query text. As I said, my proposal to not have sticky entries contribute to overall mean query text length is based on a problem report involving a continually failing data integration process. That in particular is what I hope to stop having a negative impact on mean query length -- a unique queryid makes for an entry that is bound to remain useless forever (as opposed to just failing the first few times). With the larger limit of MaxAllocHugeSize, I worry about swapping with the exclusive lock held. The fact that there is a big disparity between mean query length for sticky and non-sticky entries is weird. It was seen to happen in the wild only because the sticky entries that clogged things up were not really distinct to a human -- only to the fingerprinting/jumbling code, more or less by accident, which in a practical sense caused a distortion. That's what I'm targeting. I have a hard time imagining any harm from discounting sticky entries with a realistic case. > The mean query len recorded by gc_qtexts() > *has to* match the mean length of what it actually put in the file, not > the mean length of what we might wish it would put in the file. Why? Why not simply care about whether or not the file was unreasonably large relative to available, useful query statistics? I think that mean_query_len isn't something that swings all that much with realistic workloads and 5,000 representative entries -- it certainly should not swing wildly. The problem to an extent was that that accidentally stopped being true. > By the same token, I'm back to believing that it's fundamentally bogus for > entry_dealloc() to compute mean_query_len that way. The most likely > result of that is useless GC cycles. The only thing that will actually > free memory when you've got a lot of dead sticky entries is getting rid of > the sticky hashtable entries, and the only way to do that is to wait for > entry_dealloc() to get rid of 'em. You were unenthused about making that > removal more aggressive, which is fine, but you can't have it both ways. Meanwhile, mean_query_len grows as more "distinct" entries are created, pushing out their garbage collection further and further. Especially when there is a big flood of odd queries that stay sticky. Once again, I'm having a really hard time imagining a minority of current, non-sticky hashtable entries dominating a majority of current, sticky hashtable entries come garbage collection time. Garbage collection is linked to creation of entries, which is also linked to eviction, which aggressively evicts sticky entries in this thrashing scenario that you describe. > It does strike me that when we do get rid of the sticky entries, cleanup > of the texts file might lag a bit longer than it needs to because > mean_query_len is computed before not after deleting whatever entries > we're going to delete. On average, that shouldn't matter ... but if we > are tossing a bunch of dead sticky entries, maybe they would have a higher > mean length than the rest? Yes, that is something that was observed to happen in the problem case I looked into. You know my solution already. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers