On Mon, Jan 29, 2018 at 1:33 AM, Tsunakawa, Takayuki <tsunakawa.ta...@jp.fujitsu.com> wrote: >> Since the above idea would be complex a bit, as an >> alternated idea it might be better to specify the number of worker to launch >> per database by a new GUC parameter or something. If the number of worker >> running on the database exceeds that limit, the launcher doesn't select >> the database even if the database is about to wraparound. > > I'm afraid the GUC would be difficult for the user to understand and tune.
I agree. It's autovacuum's job to do the correct thing. If it's not, then we need figure out how to make it do the right thing. Adding a GUC seems like saying we don't know what the right thing to do is but we hope the user does know. That's not a good answer. Unfortunately, I think a full solution to the problem of allocating AV workers to avoid wraparound is quite complex. Suppose that database A will force a shutdown due to impending wraparound in 4 hours and database B will force a shutdown in 12 hours. On first blush, it seems that we should favor adding workers to A. But it might be that database A needs only 2 hours of vacuuming to avoid a shutdown whereas B needs 12 hours. In that case, it seems that we ought to instead favor adding workers to B. However, it might be the case that A has more table coming do for wraparound 6 hours from now, and we need another 15 hours of vacuuming to avoid that shutdown. That would favor adding workers to A. Then again, it might be that A and B already both have workers, and that adding yet another worker to A won't speed anything up (because only large tables remain to be processed and each has a worker already), whereas adding a worker to B would speed things up (because it still has a big table that we could immediately start to vacuum for wraparound). In that case, perhaps we ought to instead add a worker to B. But, thinking some more, it seems like that should cause autovacuum_vacuum_cost_limit to be reallocated among the workers, making the existing vacuum processes take longer, which might actually make a bad situation worse. It seems possible that the right answer could be to start no new autovacuum worker at all. Given all of the foregoing this seems like a very hard problem. I can't even articulate a clear set of rules for what our priorities should be, and it seems that such rules would depend on the rate at which we're consuming XIDs, how close we are in each database to a wraparound shutdown, what tables exist in each database, how big the not-all-frozen part of each one is, how big their indexes are, how much they're holding back relfrozenxid, and which ones already have workers, among other things. I think it's quite possible that we can come up with something that's better than what we have now without embarking on a huge project, but it's not going to be anywhere near perfect because this is really complicated, and there's a real risk that we'll just making some cases better and others worse rather than actually coming out ahead overall. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company