Ok, scratch that :-) Another round of braindumping below.
Launcher starts one worker in each database. This worker is not going to do vacuum work, just report how much vacuum effort is needed in the database. "Vacuum effort" is measured as the total number of pages in need of vacuum, being the sum of relpages of all tables and indexes needing vacuum. (Note: we weight heap pages the same as index pages. Is this OK?) Create a plan for vacuuming all those databases within the constraints of max_workers. Databases needing the most work are vacuumed first. One worker per database. Thus max_workers databases are being vacuumed in parallel at this time. When one database is finished, the launcher starts a worker in the next database in the list. When the plan is complete (i.e. the list is empty) we can do the whole thing again, excluding the databases that are still being vacuumed. Perhaps we should wait autovacuum_naptime seconds between finishing one vacuum round in all databases and starting the next. How do we measure this: do we start sleeping when the last worker finishes, or when the list is empty? Perhaps we should reserve a worker for vacuuming hot tables. Launcher then uses max_workers-1 workers for the above plan, and the spare worker is continuously connecting to one database, vacuuming hot tables, going away, the launcher starts it again to connect to the next database. Definitional problem: how to decide what's a hot table? One idea (the simplest) is to let the DBA define it. Thus, at most two workers are on any database: one of them is working on normal tables, the other on hot tables. (This idea can be complemented by having another GUC var, autovacuum_hot_workers, which allows the DBA to have more than one worker on hot tables (just for the case where there are too many hot tables). This may be overkill.) Ron Mayer expressed the thought that we're complicating needlessly the UI for vacuum_delay, naptime, etc. He proposes that instead of having cost_delay etc, we have a mbytes_per_second parameter of some sort. This strikes me a good idea, but I think we could make that after this proposal is implemented. So this "take 2" could be implemented, and then we could switch the cost_delay stuff to using a MB/s kind of measurement somehow (he says waving his hands wildly). Greg Stark and Matthew O'Connor say that we're misdirected in having more than one worker per tablespace. I say we're not :-) If we consider Ron Mayer's idea of measuring MB/s, but we do it per tablespace, then we would inflict the correct amount of vacuum pain to each tablespace, sleeping as appropriate. I think this would require workers of different databases to communicate what tablespaces they are using, so that all of them can utilize the correct amount of bandwidth. I'd like to know if this responds to the mentioned people's objections. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate