vacuum visibility relevance
Hi All, Simplified Scenario: 40+ gig table with 20+ indexes receiving log event type records in an append only fashion. One of the columns is a sequence ID. PG version 11.4 If I start a vacuum on this table when sequence ID is 1 million, and the table continues to accumulate new events, when vacuum completes, should I be thinking about the state of the table and indexes as being vacuumed up to 1 million? Or is vacuum also operating on records inserted after vacuum started? Is there any reason to think about this differently when the vacuum is manual as opposed to auto? I attempted to deal with the append only issue corrected in PG13 by setting autovacuum_freeze_max_age low enough to trigger vacuum but that has been such a wild card I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. There are stats gathering processes pulling from the most recently added data all the time so my priority is to keep performance high for those. Secondly, there's a ton of memory allotted so running vacuum freeze before table pages age out of memory seems like a good idea. I am having trouble with autovacuum keeping up and I suspect there's a communication problem with the stats collector but have put off looking into that because updating to PG15 was "right around the corner". Meanwhile, I have had to run multiple-thread scripts to keep up vacuum. Manual vacuums always work and updates stats correctly but autovacuums hang and occupies workers doing nothing. I realize these are 2 completely different issues. I would love to have better understanding of the initial question but it feels like I may not have to be very aware of how that works if I knew what was causing the latter issue. Any hints, pointers and references are always appreciated. - Senor
Re: vacuum visibility relevance
On Sun, Dec 3, 2023 at 9:08 PM senor wrote: > Hi All, > > Simplified Scenario: > 40+ gig table with 20+ indexes receiving log event type records in an > append only fashion. > One of the columns is a sequence ID. > PG version 11.4 > > If I start a vacuum on this table when sequence ID is 1 million, and the > table continues to accumulate new events, when vacuum completes, should I > be thinking about the state of the table and indexes as being vacuumed up > to 1 million? Or is vacuum also operating on records inserted after vacuum > started? > > Is there any reason to think about this differently when the vacuum is > manual as opposed to auto? > > I attempted to deal with the append only issue corrected in PG13 by > setting autovacuum_freeze_max_age low enough to trigger vacuum but that has > been such a wild card I abandoned it. I'm now trying to find a formula to > trigger a manual vacuum. There are stats gathering processes pulling from > the most recently added data all the time so my priority is to keep > performance high for those. Secondly, there's a ton of memory allotted so > running vacuum freeze before table pages age out of memory seems like a > good idea. > > I am having trouble with autovacuum keeping up and I suspect there's a > communication problem with the stats collector but have put off looking > into that because updating to PG15 was "right around the corner". > Meanwhile, I have had to run multiple-thread scripts to keep up vacuum. > Manual vacuums always work and updates stats correctly but autovacuums hang > and occupies workers doing nothing. > > I realize these are 2 completely different issues. I would love to have > better understanding of the initial question but it feels like I may not > have to be very aware of how that works if I knew what was causing the > latter issue. Any hints, pointers and references are always appreciated. > What do you have autovacuum_vacuum_scale_factor set to? The default 0.1 is almost certainly way too high.
Re: vacuum visibility relevance
On Mon, 2023-12-04 at 02:08 +, senor wrote: > Simplified Scenario: > 40+ gig table with 20+ indexes receiving log event type records in an append > only fashion. Log tables typically receive tons of INSERTs. Over twenty indexes are too many. > One of the columns is a sequence ID. > PG version 11.4 > > > If I start a vacuum on this table when sequence ID is 1 million, and the > table continues > to accumulate new events, when vacuum completes, should I be thinking about > the state of > the table and indexes as being vacuumed up to 1 million? Or is vacuum also > operating on > records inserted after vacuum started? VACUUM processes the whole table, but it determines an "xmin horizon" when it starts, that is, it determines which tuples are safe to remove. Rows added after VACUUM started won't qualify, so they won't be removed. > Is there any reason to think about this differently when the vacuum is manual > as opposed to auto? No, the behavior is the same. > I attempted to deal with the append only issue corrected in PG13 by setting > autovacuum_freeze_max_age low enough to trigger vacuum but that has been such > a wild card > I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. > There are > stats gathering processes pulling from the most recently added data all the > time so my > priority is to keep performance high for those. Secondly, there's a ton of > memory allotted > so running vacuum freeze before table pages age out of memory seems like a > good idea. You can monitor the "n_tup_ins" column in "pg_stat_user_tables" and run VACUUM whebever it has increased enough. But the simple solution is to upgrade. Any version older than v13 is due for an upgrade soon. > I am having trouble with autovacuum keeping up and I suspect there's a > communication > problem with the stats collector but have put off looking into that because > updating > to PG15 was "right around the corner". Meanwhile, I have had to run > multiple-thread > scripts to keep up vacuum. Manual vacuums always work and updates stats > correctly but > autovacuums hang and occupies workers doing nothing. Autovacuum deliberately runs slower than a manual VACUUM. If you are running a version older than v12, the default value for "autovacuum_vacuum_cost_delay" is 20ms, which makes autovacuum really slow. You should change that parameter to 2ms, and if that is still too slow, use ALTER TABLE to set it to 0 on the affected table. Then autovacuum will run as fast as manual VACUUM. In addition, see that "maintenance_work_mem" is set high to make autovacuum fast. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: vacuum visibility relevance
On Mon, 4 Dec 2023 at 15:08, senor wrote: > PG version 11.4 You're missing out on over 3 years of bug fixes running that version. I see you're planning an upgrade to v15. You should update to 11.22 in the meantime. That's the latest and final version of v11. > If I start a vacuum on this table when sequence ID is 1 million, and the > table continues to accumulate new events, when vacuum completes, should I be > thinking about the state of the table and indexes as being vacuumed up to 1 > million? Or is vacuum also operating on records inserted after vacuum started? The transaction ID thresholds are decided at the start of vacuum, so tuples that become dead after vacuum starts won't be touched. > Is there any reason to think about this differently when the vacuum is manual > as opposed to auto? Not in terms of what will be removed, but there are performance considerations. vacuum_cost_delay is 0 but default but autovacuum_vacuum_cost_delay isn't. I believe it's 20ms for v11. That means the default settings allow manual VACUUMs to run quickly, but autovacuums are throttled to be quite slow. > I attempted to deal with the append only issue corrected in PG13 by setting > autovacuum_freeze_max_age low enough to trigger vacuum but that has been such > a wild card I abandoned it. I'm now trying to find a formula to trigger a > manual vacuum. There are stats gathering processes pulling from the most > recently added data all the time so my priority is to keep performance high > for those. Secondly, there's a ton of memory allotted so running vacuum > freeze before table pages age out of memory seems like a good idea. Perhaps you could set autovacuum_analyze_scale_factor to something higher for the tables in question use pg_stat_all_tables.n_mod_since_analyze and do a manual VACUUM ANALYZE. If autovacuum were ever to ANALYZE the tables in question then this would cause issues. > I am having trouble with autovacuum keeping up and I suspect there's a > communication problem with the stats collector but have put off looking into > that because updating to PG15 was "right around the corner". Meanwhile, I > have had to run multiple-thread scripts to keep up vacuum. Manual vacuums > always work and updates stats correctly but autovacuums hang and occupies > workers doing nothing. If you find all the autovacuum workers busy most of the time, then you'll likely want to reduce autovacuum_vacuum_cost_delay to increase the speed of the auto-vacuum workers. David