On Sat, Feb 3, 2018 at 1:48 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Feb 2, 2018 at 1:27 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: >> Thank you for suggestion. It sounds more smarter. So it would be more >> better if we vacuums database for anti-wraparound in ascending order >> of relfrozenxid? > > Currently, we're doing it based on datfrozenxid. I was looking for a > small, relatively safe change to improve things, which led to my > proposal: continue using datfrozenxid when the database isn't being > vacuumed, but when it is, substitute the datfrozenxid that we expect > the database *will have* after the tables currently being vacuumed are > finished for the actual datfrozenxid. > > On further reflection, though, I see problems. Suppose db1 has > age(datfrozenxid) = 600m and two tables with age(relfrozenxid) of 600m > and 400m. db2 has age(datfrozenxid) = 500m. Then suppose #1 starts > vacuuming the table with age 600m. The AV launcher sees that, with > that table out of the way, we'll be able to bring datfrozenxid forward > to 400m and so sends worker #2 to db2. But actually, when the worker > in db1 finishes vacuuming the table with age 600m, it's going to have > to vacuum the table with age 400m before performing any relfrozenxid > update. So actually, the table with age 400m is holding the > relfrozenxid at 600m just as much as if it too had a relfrozenxid of > 600m. In fact, any tables in the same database that need routine > vacuuming are a problem, too: we're going to vacuum all of those > before updating relfrozenxid, too. > > Maybe we should start by making the scheduling algorithm used by the > individual workers smarter: > > 1. Let's teach do_autovacuum() that, when there are any tables needing > vacuum for wraparound, either for relfrozenxid or relminmxid, it > should vacuum only those and forget about the rest. This seems like > an obvious optimization to prevent us from delaying > datfrozenxid/datminmxid updates for the sake of vacuuming tables that > are "merely" bloated. > > 2. Let's have do_autovacuum() sort the tables to be vacuumed in > ascending order by relfrozenxid, so older tables are vacuumed first. > A zero-order idea is to put tables needing relfrozenxid vacuuming > before tables needing relminmxid vacuuming, and sort the latter by > ascending relminmxid, but maybe there's something smarter possible > there. The idea here is to vacuum tables in order of priority rather > than in whatever order they happen to be physically mentioned in > pg_class. > > 3. Let's have do_autovacuum() make an extra call to > vac_update_datfrozenxid() whenever the next table to be vacuumed is at > least 10 million XIDs (or MXIDs) newer than the first one it vacuumed > either since the last call to vac_update_datfrozenxid() or, if none, > since it started. That way, we don't have to wait for every single > table to get vacuumed before we can consider advancing > relfrozenxid/relminmxid. > > 4. When it's performing vacuuming for wraparound, let's have AV > workers advertise in shared memory the oldest relfrozenxid and > relminmxid that it might exist in the database. Given #1 and #2, this > is pretty easy, since we start by moving through tables in increasing > relfrozenxid order and then shift to moving through them in increasing > relminmxid order. When we're working through the relfrozenxid tables, > the oldest relminmxid doesn't move, and the oldest relfrozenxid is > that of the next table in the list. When we're working through the > relminmxid tables, it's the reverse. We need a little cleverness to > figure out what value to advertise when we're on the last table in > each list -- it should be the next-higher value, even though that will > be above the relevant threshold, not a sentinel value.
So I think we should includes tables as well that are not at risk of wraparound in order to get the next-higher value (that is, the oldest table of the non-risked tables) instead of forgetting them. And then we skip vacuuming them. > > 5. With those steps in place, I think we can now adopt my previous > idea to have the AV launcher use any advertised relfrozenxid (and, as > I now realize, relminmxid) instead of the ones found in pg_database, > because now we know that individual workers are definitely focused on > getting relfrozenxid (and/or relminmxid) as soon as possible, and > vacuuming unrelated tables won't help them do it any faster. > > This gets us fairly close to vacuuming tables in decreasing order of > wraparound danger across the entire cluster. It's not perfect. It > prefers to keep vacuuming tables in the same database rather than > having a worker exit and maybe launching a new one in a different > database -- but the alternative is not very appealing. If we didn't > do it that way, and if we had a million tables with XIDs that were > closely spaced spread across different databases, we'd have to > terminate and relaunching workers at a very high rate to get > everything sorted out, which would be inefficient and annoying to > program. Also, it keeps the existing hard prioritization of > relfrozenxid over relminmxid, which could theoretically be wrong for > some installation. But I think that might not be a big problem in > practice, and it seems like that could be separately improved at > another time. > I think this algorithm works fine and improves the current behavior. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center