vacuum visibility relevance

2023-12-03 Thread senor
Hi All,

Simplified Scenario:
40+ gig table with 20+ indexes receiving log event type records in an append 
only fashion.
One of the columns is a sequence ID.
PG version 11.4

If I start a vacuum on this table when sequence ID is 1 million, and the table 
continues to accumulate new events, when vacuum completes, should I be thinking 
about the state of the table and indexes as being vacuumed up to 1 million? Or 
is vacuum also operating on records inserted after vacuum started?

Is there any reason to think about this differently when the vacuum is manual 
as opposed to auto?

I attempted to deal with the append only issue corrected in PG13 by setting 
autovacuum_freeze_max_age low enough to trigger vacuum but that has been such a 
wild card I abandoned it. I'm now trying to find a formula to trigger a manual 
vacuum. There are stats gathering processes pulling from the most recently 
added data all the time so my priority is to keep performance high for those. 
Secondly, there's a ton of memory allotted so running vacuum freeze before 
table pages age out of memory seems like a good idea.

I am having trouble with autovacuum keeping up and I suspect there's a 
communication problem with the stats collector but have put off looking into 
that because updating to PG15 was "right around the corner". Meanwhile, I have 
had to run multiple-thread scripts to keep up vacuum. Manual vacuums always 
work and updates stats correctly but autovacuums hang and occupies workers 
doing nothing.

I realize these are 2 completely different issues. I would love to have better 
understanding of the initial question but it feels like I may not have to be 
very aware of how that works if I knew what was causing the latter issue. Any 
hints, pointers and references are always appreciated.

- ​Senor


Re: vacuum visibility relevance

2023-12-03 Thread Ron Johnson
On Sun, Dec 3, 2023 at 9:08 PM senor  wrote:

> Hi All,
>
> Simplified Scenario:
> 40+ gig table with 20+ indexes receiving log event type records in an
> append only fashion.
> One of the columns is a sequence ID.
> PG version 11.4
>
> If I start a vacuum on this table when sequence ID is 1 million, and the
> table continues to accumulate new events, when vacuum completes, should I
> be thinking about the state of the table and indexes as being vacuumed up
> to 1 million? Or is vacuum also operating on records inserted after vacuum
> started?
>
> Is there any reason to think about this differently when the vacuum is
> manual as opposed to auto?
>
> I attempted to deal with the append only issue corrected in PG13 by
> setting autovacuum_freeze_max_age low enough to trigger vacuum but that has
> been such a wild card I abandoned it. I'm now trying to find a formula to
> trigger a manual vacuum. There are stats gathering processes pulling from
> the most recently added data all the time so my priority is to keep
> performance high for those. Secondly, there's a ton of memory allotted so
> running vacuum freeze before table pages age out of memory seems like a
> good idea.
>
> I am having trouble with autovacuum keeping up and I suspect there's a
> communication problem with the stats collector but have put off looking
> into that because updating to PG15 was "right around the corner".
> Meanwhile, I have had to run multiple-thread scripts to keep up vacuum.
> Manual vacuums always work and updates stats correctly but autovacuums hang
> and occupies workers doing nothing.
>
> I realize these are 2 completely different issues. I would love to have
> better understanding of the initial question but it feels like I may not
> have to be very aware of how that works if I knew what was causing the
> latter issue. Any hints, pointers and references are always appreciated.
>

What do you have autovacuum_vacuum_scale_factor set to?  The default 0.1 is
almost certainly way too high.


Re: vacuum visibility relevance

2023-12-03 Thread Laurenz Albe
On Mon, 2023-12-04 at 02:08 +, senor wrote:
> Simplified Scenario:
> 40+ gig table with 20+ indexes receiving log event type records in an append 
> only fashion.

Log tables typically receive tons of INSERTs.  Over twenty indexes are too many.

> One of the columns is a sequence ID.
> PG version 11.4
> 
> 
> If I start a vacuum on this table when sequence ID is 1 million, and the 
> table continues
> to accumulate new events, when vacuum completes, should I be thinking about 
> the state of
> the table and indexes as being vacuumed up to 1 million? Or is vacuum also 
> operating on
> records inserted after vacuum started?

VACUUM processes the whole table, but it determines an "xmin horizon" when it 
starts,
that is, it determines which tuples are safe to remove.  Rows added after 
VACUUM started
won't qualify, so they won't be removed.

> Is there any reason to think about this differently when the vacuum is manual 
> as opposed to auto?

No, the behavior is the same.

> I attempted to deal with the append only issue corrected in PG13 by setting
> autovacuum_freeze_max_age low enough to trigger vacuum but that has been such 
> a wild card
> I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. 
> There are
> stats gathering processes pulling from the most recently added data all the 
> time so my
> priority is to keep performance high for those. Secondly, there's a ton of 
> memory allotted
> so running vacuum freeze before table pages age out of memory seems like a 
> good idea.

You can monitor the "n_tup_ins" column in "pg_stat_user_tables" and run VACUUM 
whebever
it has increased enough.

But the simple solution is to upgrade.  Any version older than v13 is due for 
an upgrade soon.

> I am having trouble with autovacuum keeping up and I suspect there's a 
> communication
> problem with the stats collector but have put off looking into that because 
> updating
> to PG15 was "right around the corner". Meanwhile, I have had to run 
> multiple-thread
> scripts to keep up vacuum. Manual vacuums always work and updates stats 
> correctly but
> autovacuums hang and occupies workers doing nothing.

Autovacuum deliberately runs slower than a manual VACUUM.
If you are running a version older than v12, the default value for
"autovacuum_vacuum_cost_delay" is 20ms, which makes autovacuum really slow.  
You should
change that parameter to 2ms, and if that is still too slow, use ALTER TABLE to 
set it
to 0 on the affected table.  Then autovacuum will run as fast as manual VACUUM.

In addition, see that "maintenance_work_mem" is set high to make autovacuum 
fast.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: vacuum visibility relevance

2023-12-03 Thread David Rowley
On Mon, 4 Dec 2023 at 15:08, senor  wrote:
> PG version 11.4

You're missing out on over 3 years of bug fixes running that version.
I see you're planning an upgrade to v15. You should update to 11.22 in
the meantime. That's the latest and final version of v11.

> If I start a vacuum on this table when sequence ID is 1 million, and the 
> table continues to accumulate new events, when vacuum completes, should I be 
> thinking about the state of the table and indexes as being vacuumed up to 1 
> million? Or is vacuum also operating on records inserted after vacuum started?

The transaction ID thresholds are decided at the start of vacuum, so
tuples that become dead after vacuum starts won't be touched.

> Is there any reason to think about this differently when the vacuum is manual 
> as opposed to auto?

Not in terms of what will be removed, but there are performance
considerations. vacuum_cost_delay is 0 but default but
autovacuum_vacuum_cost_delay isn't. I believe it's 20ms for v11.  That
means the default settings allow manual VACUUMs to run quickly, but
autovacuums are throttled to be quite slow.

> I attempted to deal with the append only issue corrected in PG13 by setting 
> autovacuum_freeze_max_age low enough to trigger vacuum but that has been such 
> a wild card I abandoned it. I'm now trying to find a formula to trigger a 
> manual vacuum. There are stats gathering processes pulling from the most 
> recently added data all the time so my priority is to keep performance high 
> for those. Secondly, there's a ton of memory allotted so running vacuum 
> freeze before table pages age out of memory seems like a good idea.

Perhaps you could set autovacuum_analyze_scale_factor to something
higher for the tables in question use
pg_stat_all_tables.n_mod_since_analyze and do a manual VACUUM ANALYZE.
If autovacuum were ever to ANALYZE the tables in question then this
would cause issues.

> I am having trouble with autovacuum keeping up and I suspect there's a 
> communication problem with the stats collector but have put off looking into 
> that because updating to PG15 was "right around the corner". Meanwhile, I 
> have had to run multiple-thread scripts to keep up vacuum. Manual vacuums 
> always work and updates stats correctly but autovacuums hang and occupies 
> workers doing nothing.

If you find all the autovacuum workers busy most of the time, then
you'll likely want to reduce autovacuum_vacuum_cost_delay to increase
the speed of the auto-vacuum workers.

David