To clear up any confusion, I tried the reductions as hinted in the
error message and after a few iterations still faced the same error.
Also, re. the sample configuration I provided - this is the result of
said iterations after initially using the stock values from the PGDG
RPM.

Jim

On Fri, 9 Aug 2024 at 11:26, Jim Vanns <jva...@ilm.com> wrote:
>
> Hi pggen community!
>
> I am struggling with this error almost daily now and despite various efforts, 
> not succeeding in avoiding or dealing with it;
>
> ERROR:  multixact "members" limit exceeded
> DETAIL:  This command would create a multixact with 2 members, but the 
> remaining space is only enough for 0 members.
> HINT:  Execute a database-wide VACUUM in database with OID 16467 with reduced 
> vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age 
> settings.
>
> Runtime details follow below, but before that; I am now seeing the above 
> error almost daily after approximately 12 hours of normal or expected 
> behaviour and throughput. Then it hits and all writes are blocked etc. and 
> the service is largely unusable/unable to recover. Restarting PG does allow 
> autovacuum processes to kick in with aggressive vacuuming to handle the 
> multixact freezing, but that isn't a suitable solution! Although having read 
> sources that now explain why multixact XIDs exist and when they're used, I am 
> not able to properly figure out how to plan for it or configure postgresql 
> appropriately to handle it given our workload.
>
> My questions are;
>
> 1) How should I be managing this? Although not new to PG, I am new to this 
> particular problem.
> 2) How can I confirm what processes/tables are contributing to this multixact 
> "members" limit?
> 3) What are the units for 
> vacuum_multixact_freeze_min_age/vacuum_multixact_freeze_table_age and how 
> should I be
>     setting them appropriately for my rates etc.? I can't really find 
> anything that explains this clearly.
> 4) How can I check that autovacuum workers are specifically able to freeze 
> multixact XIDs and thus avoid this?
> 5) Can I check if autovacuum is actually succeeding in its work?
>
> Schema (brief):
> 10 tables
> 1 table (the 'hottest) partitioned by 2h ranges; so an extra 12x 
> partitions/relations a day, though only the most recent one might be 
> considered 'active'
>
> System (brief):
> PG: 15.5 w/ TimescaleDB 2.14 extension
> Write heavy workload;
> Mean Txn/s (low):   8k
> Mean Txn/s (high): 10k
> Mean rows/s: 100k
> Concurrency: 32 threads (local socket sessions) for 'upserts' via primary 
> service plus auxiliary processes (background workers/autovacuum workers etc.)
>
> Pattern (brief):
> COPY (binary) to session-local temporary tables (ON COMMIT DELETE)
> INSERT FROM tt TO main_table(s)
> UPDATE FROM tt TO main_table(s)
> VACUUM tt (every 100k txns)
>
> Config (excerpt):
> # - Transactions - (based on a period mean of ~8k txn/s)
> # See/ref;
> # www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
> # 
> www.percona.com/blog/tuning-autovacuum-in-postgresql-and-autovacuum-internals
> # blog.sentry.io/transaction-id-wraparound-in-postgres
> # 
> https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres
> # 
> https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-performance
> # What unit is 'age' here? No. of. transactions?
> vacuum_freeze_min_age = 28800000 # 1h @ 8k/s
> vacuum_multixact_freeze_min_age = 28800000 # 1h @ 8k/s
> autovacuum_freeze_max_age = 157600000 # 2h @ 8k/s
> autovacuum_multixact_freeze_max_age = 57600000 # 2h @ 8k/s
> vacuum_multixact_freeze_table_age = 115200000 # 4h @ 8k/s
>
> # We don't UPDATE or DELETE often; we never DELETE and only perform
> # UPDATE operations every 6h due to internal cache(s). So we set this to
> # zero so the formula doesn't consider it a major factor
> autovacuum_vacuum_threshold = 0 # In tuples/rows UPDATEd or DELETEd
> autovacuum_vacuum_insert_threshold = 5000 # In tuples/rows INSERTed
>
> autovacuum_vacuum_scale_factor = 0.1 # 10%
> autovacuum_analyze_scale_factor = 0.1 # 10%
> autovacuum_vacuum_insert_scale_factor = 0.1 # 10%
>
> autovacuum_naptime = 60
> autovacuum_max_workers = 8
>
> # Give autovacuum more credits to ensure a better chance at scanning
> autovacuum_vacuum_cost_limit = 2000 # 10x the default
>
> Thanks for your help and any guidance/knowledge you can share!
>
> Jim
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Reply via email to