On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > Thank you for suggestion. It sounds more smarter. So it would be more > better if we vacuums database for anti-wraparound in ascending order > of relfrozenxid?
Currently, we're doing it based on datfrozenxid. I was looking for a small, relatively safe change to improve things, which led to my proposal: continue using datfrozenxid when the database isn't being vacuumed, but when it is, substitute the datfrozenxid that we expect the database *will have* after the tables currently being vacuumed are finished for the actual datfrozenxid. On further reflection, though, I see problems. Suppose db1 has age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m and 400m. db2 has age(datfrozenxid) = 500m. Then suppose #1 starts vacuuming the table with age 600m. The AV launcher sees that, with that table out of the way, we'll be able to bring datfrozenxid forward to 400m and so sends worker #2 to db2. But actually, when the worker in db1 finishes vacuuming the table with age 600m, it's going to have to vacuum the table with age 400m before performing any relfrozenxid update. So actually, the table with age 400m is holding the relfrozenxid at 600m just as much as if it too had a relfrozenxid of 600m. In fact, any tables in the same database that need routine vacuuming are a problem, too: we're going to vacuum all of those before updating relfrozenxid, too. Maybe we should start by making the scheduling algorithm used by the individual workers smarter: 1. Let's teach do_autovacuum() that, when there are any tables needing vacuum for wraparound, either for relfrozenxid or relminmxid, it should vacuum only those and forget about the rest. This seems like an obvious optimization to prevent us from delaying datfrozenxid/datminmxid updates for the sake of vacuuming tables that are "merely" bloated. 2. Let's have do_autovacuum() sort the tables to be vacuumed in ascending order by relfrozenxid, so older tables are vacuumed first. A zero-order idea is to put tables needing relfrozenxid vacuuming before tables needing relminmxid vacuuming, and sort the latter by ascending relminmxid, but maybe there's something smarter possible there. The idea here is to vacuum tables in order of priority rather than in whatever order they happen to be physically mentioned in pg_class. 3. Let's have do_autovacuum() make an extra call to vac_update_datfrozenxid() whenever the next table to be vacuumed is at least 10 million XIDs (or MXIDs) newer than the first one it vacuumed either since the last call to vac_update_datfrozenxid() or, if none, since it started. That way, we don't have to wait for every single table to get vacuumed before we can consider advancing relfrozenxid/relminmxid. 4. When it's performing vacuuming for wraparound, let's have AV workers advertise in shared memory the oldest relfrozenxid and relminmxid that it might exist in the database. Given #1 and #2, this is pretty easy, since we start by moving through tables in increasing relfrozenxid order and then shift to moving through them in increasing relminmxid order. When we're working through the relfrozenxid tables, the oldest relminmxid doesn't move, and the oldest relfrozenxid is that of the next table in the list. When we're working through the relminmxid tables, it's the reverse. We need a little cleverness to figure out what value to advertise when we're on the last table in each list -- it should be the next-higher value, even though that will be above the relevant threshold, not a sentinel value. 5. With those steps in place, I think we can now adopt my previous idea to have the AV launcher use any advertised relfrozenxid (and, as I now realize, relminmxid) instead of the ones found in pg_database, because now we know that individual workers are definitely focused on getting relfrozenxid (and/or relminmxid) as soon as possible, and vacuuming unrelated tables won't help them do it any faster. This gets us fairly close to vacuuming tables in decreasing order of wraparound danger across the entire cluster. It's not perfect. It prefers to keep vacuuming tables in the same database rather than having a worker exit and maybe launching a new one in a different database -- but the alternative is not very appealing. If we didn't do it that way, and if we had a million tables with XIDs that were closely spaced spread across different databases, we'd have to terminate and relaunching workers at a very high rate to get everything sorted out, which would be inefficient and annoying to program. Also, it keeps the existing hard prioritization of relfrozenxid over relminmxid, which could theoretically be wrong for some installation. But I think that might not be a big problem in practice, and it seems like that could be separately improved at another time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company