Vacuum statistics

Alena Rybakina Thu, 30 May 2024 10:34:20 -0700

Hello, everyone!

I think we don't have enough information to analyze vacuum functionality.

Needless to say that the vacuum is the most important process for adatabase system. It prevents problems like table and index bloating andemergency freezing if we have a wraparound problem. Furthermore, itkeeps the visibility map up to date. On the other hand, because ofincorrectly adjusted aggressive settings of autovacuum it can consume alot of computing resources that lead to all queries to the systemrunning longer.

Nowadays the vacuum gathers statistical information about tables, but itis important not for optimizer only.

Because the vacuum is an automation process, there are a lot of settingsthat determine their aggressive functionality to other objects of thedatabase. Besides, sometimes it is important to set a correct parameterfor the specified table, because of its dynamic changes.

An administrator of a database needs to set the settings of autovacuumto have a balance between the vacuum's useful action in the databasesystem on the one hand, and the overhead of its workload on the other.However, it is not enough for him to decide on vacuum functionalitythrough statistical information about the number of vacuum passesthrough tables and operational data from progress_vacuum, because it isavailable only during vacuum operation and does not provide a strategicoverview over the considered period.

To sum up, an automation vacuum has a strategic behavior because thefrequency of its functionality and resource consumption depends on theworkload of the database. Its workload on the database is minimal for anappend-only table and it is a maximum for the table with ahigh-frequency updating. Furthermore, there is a high dependence of thevacuum load on the number and volume of indexes. Because of the absenceof the visibility map for indexes, the vacuum scans the indexcompletely, and the worst situation when it needs to do it during abloating index situation in a small table.

I suggest gathering information about vacuum resource consumption forprocessing indexes and tables and storing it in the table and indexrelationships (for example, PgStat_StatTabEntry structure like it hasrealized for usual statistics). It will allow us to determine how wellthe vacuum is configured and evaluate the effect of overhead on thesystem at the strategic level, the vacuum has gathered this informationalready, but this valuable information doesn't store it.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Vacuum statistics

Reply via email to