Matthew T. O'Connor wrote: > Alvaro Herrera wrote: > >The second mode is the "hot table worker" mode, enabled when the worker > >detects that there's already a worker in the database. In this mode, > >the worker is limited to those tables that can be vacuumed in less than > >autovacuum_naptime, so large tables are not considered. Because of > >this, it'll generally not compete with the first mode above -- the > >tables in plain worker were sorted by size, so the small tables were > >among the first vacuumed by the plain worker. The estimated time to > >vacuum may be calculated according to autovacuum_vacuum_delay settings, > >assuming that all pages constitute cache misses. > > How can you determine what tables can be vacuumed within > autovacuum_naptime?
My assumption is that pg_class.relpages * vacuum_cost_page_miss * vacuum_cost_delay = time to vacuum This is of course not the reality, because the delay is not how long it takes to fetch the pages. But it lets us have a value with which we can do something. With the default values, vacuum_cost_delay=10, vacuum_cost_page_miss=10, autovacuum_naptime=60s, we'll consider tables of under 600 pages, 4800 kB (should we include indexes here in the relpages count? My guess is no). A table over 600 pages does not sound like a good candidate for hot, so this seems more or less reasonable to me. On the other hand, maybe we shouldn't tie this to the vacuum cost delay stuff. > So at: > t=0*autovacuume_naptime: worker1 gets started on DBX > t=1*autovacuume_naptime: worker2 gets started on DBX > worker2 determines all tables that need to be vacuumed, > worker2 excludes tables that are too big from it's to-do list, > worker2 gets started working, > worker2 exits when it either: > a) Finishes it's entire to-do-list. > b) Catches up to worker1 > > I think the questions are 1) What is the exact math you are planning on > using to determine which tables are too big? 2) Do we want worker2 to > exit when it catches worker1 or does the fact that we have excluded > tables that re "too big" mean that we don't have to worry about this? Right, I think the fact that we excluded big tables means that this won't be a problem most of the time, but we'll need some sort of protection anyway. I think this is easy to achieve -- store the table each worker is currently processing in shared memory, and have all workers check all other workers. If a plain worker finds that another worker is processing the table already, it skips that table and continues with the next one. A hot table worker instead exits right away (caught up). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match