Josh Berkus wrote:
Matthew,
True, but I think it would be one hour once, rather than 30 minutes 4
times.
Well, generally it would be about 6-8 times at 2-4 minutes each.
Are you saying that you can vacuum a 1 million row table in 2-4
minutes? While a vacuum of the same table with an additional 1 million
dead tuples would take an hour?
This is one of the things I had hoped to add to pg_autovacuum, but never
got to. In addition to just the information from the stats collector on
inserts updates and deletes, pg_autovacuum should also look at the FSM,
and make decisions based on it. Anyone looking for a project?
Hmmm ... I think that's the wrong approach. Once your database is populated,
it's very easy to determine how to set the FSM for a given pg_avd level. If
you're vacuuming after 20% updates, for example, just set fsm_pages to 20% of
the total database pages plus growth & safety margins.
Ok.
I'd be really reluctant to base pv-avd frequency on the fsm settings instead.
What if the user loads 8GB of data but leaves fsm_pages at the default of
10,000? You can't do much with that; you'd have to vacuum if even 1% of the
data changed.
Ok, but as you said above it's very easy to set the FSM once you know
your db size.
The other problem is that calculating data pages from a count of
updates+deletes would require pg_avd to keep more statistics and do more math
for every table. Do we want to do this?
I would think the math is simple enough to not be a big problem. Also,
I did not recommend looking blindly at the FSM as our guide, rather
consulting it as another source of information as to when it would be
useful to vacuum. I don't have a good plan as to how to incorporate
this data, but to a large extent the FSM already tracks table activity
and gives us the most accurate answer about storage growth (short of
using something like contrib/pgstattuple which takes nearly the same
amount of time as an actual vacuum)
But I can't imagine that 2% makes any difference on a large table. In
fact I would think that 10-15% would hardly be noticable, beyond that
I'm not sure.
I've seen performance lag at 10% of records, especially in tables where both
update and select activity focus on one subset of the table (calendar tables,
for example).
Ok.
Valid points, and again I think this points to the fact that
pg_autovacuum needs to be more configurable. Being able to set
different thresholds for different tables will help considerably. In
fact, you may find that some tables should have a vac threshold much
larger than the analyze thresold, while other tables might want the
opposite.
Sure. Though I think we can make the present configuration work with a little
adjustment of the numbers. I'll have a chance to test on production
databases soon.
I look forward to hearing results from your testing.
I would be surprized if you can notice the difference between a vacuum
analyze and a vacuum, especially on large tables.
It's substantial for tables with high statistics settings. A 1,000,000 row
table with 5 columns set to statistics=250 can take 3 minutes to analyze on a
medium-grade server.
In my testing, I never changed the default statistics settings.
I think you need two separate schedules. There are lots of times where
a vacuum doesn't help, and an analyze is all that is needed
Agreed. And I've just talked to a client who may want to use pg_avd's ANALYZE
scheduling but not use vacuum at all. BTW, I think we should have a setting
for this; for example, if -V is -1, don't vacuum.
That would be nice. Easy to add, and something I never thought of....
I'm open to discussion on changing the defaults. Perhaps what it would
be better to use some non-linear (perhaps logorithmic) scaling factor.
That would be cool, too. Though a count of data pages would be a better
scale than a count of rows, and equally obtainable from pg_class.
But we track tuples because we can compare against the count given by
the stats system. I don't know of a way (other than looking at the FSM,
or contrib/pgstattuple ) to see how many dead pages exist.
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster