Re: Doc fix of aggressive vacuum threshold for multixact members storage
Hi Sami, Thanks for the feedback. > 1/ Remove this as > "(50% of the maximum, which is about 20GB)," > > [1] tried to avoid explaining this level of detail, and I > agree with that. I feel it is critical for users to know what is the hard limit of multixact members. As PG doesn't (yet) expose how many multixact members are in use, the only way for users to know the distance to members wraparound is by monitoring the members directory space usage. So it seems to me that the 20 GB number is very important to have in the docs. > 2/ c/"about 10GB"/"10GB" the "about" does not seem necessary here. The threshold is actually ~10.015 GiB (due to the 12 bytes wasted per 8KB page), or ~10.75 GB, so to avoid confusion by users when aggressive autovacuum doesn't trigger exactly at 10GB, I believe we should either be exact, or say that we are not being exact. Being exact is difficult as it depends on the block size. And as I looked through the doc page in question, I noticed there are already several cases using the "about" wording, e.g. "about 50MB of pg_xact storage" and "about 2GB of pg_commit_ts storage", so here I went for consistency with the rest of the doc. Thanks, Alex
Re: Doc fix of aggressive vacuum threshold for multixact members storage
A few paragraphs up the docs, there is this mention: ". There is a separate storage area which holds the list of members in each multixact, which also uses a 32-bit counter and which must also be managed." Maybe we can add more to this paragraph, such as: "also be managed. This member can grow to 20GB" And then in the proposed correction: " Also, if the storage occupied by multixacts members area exceeds 10GB (50% of the maximum the members area can grow), aggressive vacuum scans will occur more often for all tables " What do you think? Looks good to me, attached a v2 patch with small adjustments. Thanks, AlexFrom 3deda711bb4219089b32204c567e735b3d7a152b Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Thu, 30 Jan 2025 17:19:07 +0200 Subject: [PATCH v2] Doc fix of aggressive vacuum threshold for multixact members storage. --- doc/src/sgml/maintenance.sgml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 0be90bdc7ef..f4f560bccc1 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -761,7 +761,8 @@ HINT: Execute a database-wide VACUUM in that database. careful aging management, storage cleanup, and wraparound handling. There is a separate storage area which holds the list of members in each multixact, which also uses a 32-bit counter and which must also - be managed. + be managed. This members storage area can grow up to about 20GB before + reaching wraparound. @@ -792,9 +793,9 @@ HINT: Execute a database-wide VACUUM in that database. As a safety device, an aggressive vacuum scan will occur for any table whose multixact-age is greater than . Also, if the - storage occupied by multixacts members exceeds 2GB, aggressive vacuum - scans will occur more often for all tables, starting with those that + linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if the storage occupied + by multixacts members exceeds about 10GB (50% of the maximum the members area can grow), + aggressive vacuum scans will occur more often for all tables, starting with those that have the oldest multixact-age. Both of these kinds of aggressive scans will occur even if autovacuum is nominally disabled. -- 2.41.0
Doc fix of aggressive vacuum threshold for multixact members storage
Hi, This patch suggests a correction to the doc page dealing with multixact vacuuming, which, starting with PG 14, says that the multixact members storage threshold for aggressive vacuum is 2 GB. However, I believe the threshold is actually about 10 GB. MultiXactMemberFreezeThreshold() defines the threshold as 2^32 (0x) / 2 or 2^31 multixact members. However, as discussed in multixact.c, multixact members are stored in groups of 4, each group taking up 20 bytes, meaning 5 bytes per member. (This is not quite exact as 12 bytes per 8 KB page are wasted, but I believe it is close enough for the docs.) This makes the threshold in bytes be 2^31 multixact members * 5 bytes per member = 10 GiB. It was also confirmed by observing a live system (with an admittedly unfortunate workload pattern). Also, the maximum storage size for multixact members is 20 GiB (2^32 * 5), and it should be useful to call this out in the doc as well. For reference, the original commit which introduced the current wording is c552e17, and the discussion was here: https://www.postgresql.org/message-id/flat/162395467510.686.11947486273299446208%40wrigleys.postgresql.org The attached patch is against master, but it should probably be backpatched all the way through 14. Best regards, Alex Friedman v1-0001-Doc-fix-of-aggressive-vacuum-threshold-for-multix.patch Description: Binary data
Re: Doc fix of aggressive vacuum threshold for multixact members storage
Hi John, Thanks for reviewing. It seems at a minimum this one-line patch is sufficient for the correction: - storage occupied by multixacts members exceeds 2GB, aggressive vacuum + storage occupied by multixacts members exceeds about 10GB, aggressive vacuum Commit c552e171d16e removed the percentage as part of a judgment call on clarity, and I'm not sure that was wrong. We could add the proposed language on "can grow up to about 20GB" at the end of this paragraph, which seems more natural -- first mention the amount that triggers aggressive vacuum, then the maximum size. Yes, I believe this can work. I'm on the fence about putting a hint in the C file, but the computation has changed in the past, see commit b4d4ce1d50bbdf , so it's a reasonable idea. That's a good find about the change. Taken together with Bertrand's comments, I've added two reminders to multixact.c to update the docs, one for the threshold and another for the multixact storage scheme. Please see if it makes sense. v4 patch attached. Best regards, Alex Friedman From 0965413dbb0b85e4dd78f87a6ca3847dccdc78c7 Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Thu, 30 Jan 2025 17:19:07 +0200 Subject: [PATCH v4] Doc fix of aggressive vacuum threshold for multixact members storage. --- doc/src/sgml/maintenance.sgml | 5 +++-- src/backend/access/transam/multixact.c | 7 +-- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml index 0be90bdc7ef..89040942be2 100644 --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml @@ -793,10 +793,11 @@ HINT: Execute a database-wide VACUUM in that database. As a safety device, an aggressive vacuum scan will occur for any table whose multixact-age is greater than . Also, if the - storage occupied by multixacts members exceeds 2GB, aggressive vacuum + storage occupied by multixacts members exceeds about 10GB, aggressive vacuum scans will occur more often for all tables, starting with those that have the oldest multixact-age. Both of these kinds of aggressive - scans will occur even if autovacuum is nominally disabled. + scans will occur even if autovacuum is nominally disabled. The members storage + area can grow up to about 20GB before reaching wraparound. diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 27ccdf9500f..66adb9995d0 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -134,7 +134,8 @@ MultiXactIdToOffsetSegment(MultiXactId multi) * corresponding 4 Xids. Each such 5-word (20-byte) set we call a "group", and * are stored as a whole in pages. Thus, with 8kB BLCKSZ, we keep 409 groups * per page. This wastes 12 bytes per page, but that's OK -- simplicity (and - * performance) trumps space efficiency here. + * performance) trumps space efficiency here. If this computation changes, make + * sure to update the documentation. * * Note that the "offset" macros work with byte offset, not array indexes, so * arithmetic must be done using "char *" pointers. @@ -212,7 +213,9 @@ MXOffsetToMemberOffset(MultiXactOffset offset) member_in_group * sizeof(TransactionId); } -/* Multixact members wraparound thresholds. */ +/* Multixact members wraparound thresholds. + * When changing the thresholds, make sure to update the documentation. + */ #define MULTIXACT_MEMBER_SAFE_THRESHOLD(MaxMultiXactOffset / 2) #define MULTIXACT_MEMBER_DANGER_THRESHOLD \ (MaxMultiXactOffset - MaxMultiXactOffset / 4) -- 2.41.0
Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity
Hi, This small doc change patch is following up on a past discussion about discrepancies between state and wait_event in pg_stat_activity: https://www.postgresql.org/message-id/flat/ab1c0a7d-e789-5ef5-1180-42708ac6fe2d%40postgrespro.ru As this kind of question is raised by PG users from time to time, the goal is to clarify that such discrepancies are to be expected. The attached patch reuses Robert Haas's eloquent wording from his response in the above thread. I've tried to keep it short and to the point, but it can be made more verbose if needed. Best regards, Alex FriedmanFrom 3cab620d67d200ff4ccb1870f63cbf75a50d0df6 Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Wed, 26 Feb 2025 19:59:59 +0200 Subject: [PATCH v1] Clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity. --- doc/src/sgml/monitoring.sgml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9178f1d34ef..57fcd8ab52b 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1016,7 +1016,9 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser it may or may not be waiting on some event. If the state is active and wait_event is non-null, it means that a query is being executed, but is being blocked somewhere -in the system. +in the system. To keep the reporting low-overhead, the system uses very lightweight +synchronization. As a result, ephemeral discrepancies between wait_event +and state are possible by nature. -- 2.41.0
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity
On 26/02/2025 22:00, Sami Imseih wrote: If we do need to document anything, which I am not convinced we should, it should be more generic. Thanks for the feedback, I've attached a v2 patch which has wording that's a bit more generic. It's also worth noting that pg_locks already has a full paragraph explaining inconsistencies, so in my opinion it's worth it at least mentioning something similar here for pg_stat_activity. Best regards, Alex FriedmanFrom fbbfc623e16ed97176c0ccf0ebc534d118e9f252 Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Wed, 26 Feb 2025 19:59:59 +0200 Subject: [PATCH v2] Clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity. --- doc/src/sgml/monitoring.sgml | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9178f1d34ef..de49769d407 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1016,7 +1016,11 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser it may or may not be waiting on some event. If the state is active and wait_event is non-null, it means that a query is being executed, but is being blocked somewhere -in the system. +in the system. To keep the reporting low-overhead, the system uses very lightweight +synchronization. As a result, ephemeral discrepancies between the view's columns, +for example between wait_event and +state, or between state and +query_id, are possible by nature. -- 2.41.0
A small correction to doc and comment of FSM for indexes
Hi, This patch fixes a couple of small inaccuracies in the doc and the comment for FSM about index handling. 1. In the doc for pg_freespacemap, it currently says: For indexes, what is tracked is entirely-unused pages, rather than free space within pages. Therefore, the values are not meaningful, just whether a page is full or empty. However, as what is tracked is entirely-unused pages, the values mean whether a page is "in-use or empty", rather than "full or empty". 2. In indexfsm.c the header comment says: * This is similar to the FSM used for heap, in freespace.c, but instead * of tracking the amount of free space on pages, we only track whether * pages are completely free or in-use. We use the same FSM implementation * as for heaps, using BLCKSZ - 1 to denote used pages, and 0 for unused. However, in the code we see that used pages are marked with 0: /* * RecordUsedIndexPage - mark a page as used in the FSM */ void RecordUsedIndexPage(Relation rel, BlockNumber usedBlock) { RecordPageWithFreeSpace(rel, usedBlock, 0); } And free pages are marked with BLCKSZ - 1: /* * RecordFreeIndexPage - mark a page as free in the FSM */ void RecordFreeIndexPage(Relation rel, BlockNumber freeBlock) { RecordPageWithFreeSpace(rel, freeBlock, BLCKSZ - 1); } And so, this patch also fixes the comment's "using BLCKSZ - 1 to denote used pages, and 0 for unused" to be "using 0 to denote used pages, and BLCKSZ - 1 for unused". While these changes are minor, I've seen how this can cause a bit of confusion, and it would be good to clarify it. Best regards, Alex FriedmanFrom a1b78438343fca053aa0014687eaba34d5e160e0 Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Tue, 25 Feb 2025 19:12:53 +0200 Subject: [PATCH v1] A small correction to doc and comment of FSM for indexes. --- doc/src/sgml/pgfreespacemap.sgml | 2 +- src/backend/storage/freespace/indexfsm.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/pgfreespacemap.sgml b/doc/src/sgml/pgfreespacemap.sgml index 829ad60f32f..3774a9f8c6b 100644 --- a/doc/src/sgml/pgfreespacemap.sgml +++ b/doc/src/sgml/pgfreespacemap.sgml @@ -67,7 +67,7 @@ For indexes, what is tracked is entirely-unused pages, rather than free space within pages. Therefore, the values are not meaningful, just - whether a page is full or empty. + whether a page is in-use or empty. diff --git a/src/backend/storage/freespace/indexfsm.c b/src/backend/storage/freespace/indexfsm.c index 1fc263892a7..3cd2437599d 100644 --- a/src/backend/storage/freespace/indexfsm.c +++ b/src/backend/storage/freespace/indexfsm.c @@ -16,7 +16,7 @@ * This is similar to the FSM used for heap, in freespace.c, but instead * of tracking the amount of free space on pages, we only track whether * pages are completely free or in-use. We use the same FSM implementation - * as for heaps, using BLCKSZ - 1 to denote used pages, and 0 for unused. + * as for heaps, using 0 to denote used pages, and BLCKSZ - 1 for unused. * *- */ -- 2.41.0
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity
discrepancy will look like. What about we do something much more simplified, such as the below: """ To keep the reporting overhead low, the system does not attempt to synchronize activity data for a backend. As a result, ephemeral discrepancies may exist between the view’s columns. """ Yes, I believe it makes sense to make it more generic. Attached v3 with a slight tweak: +in the system. To keep the reporting overhead low, the system does not attempt to +synchronize different aspects of activity data for a backend. As a result, ephemeral +discrepancies may exist between the view's columns. Best regards, Alex FriedmanFrom 58de88469f6201ae698ee34debcdec028526a72a Mon Sep 17 00:00:00 2001 From: Alex Friedman Date: Wed, 26 Feb 2025 19:59:59 +0200 Subject: [PATCH v3] Clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity. --- doc/src/sgml/monitoring.sgml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 9178f1d34ef..0e34b3509b8 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -1016,7 +1016,9 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser it may or may not be waiting on some event. If the state is active and wait_event is non-null, it means that a query is being executed, but is being blocked somewhere -in the system. +in the system. To keep the reporting overhead low, the system does not attempt to +synchronize different aspects of activity data for a backend. As a result, ephemeral +discrepancies may exist between the view's columns. -- 2.41.0
Re: Doc fix of aggressive vacuum threshold for multixact members storage
>I decided to leave this out, since I just remembered that the most > likely change is actually to move to 64-bit offsets, as was proposed > here and has some enthusiastic support: > > https://www.postgresql.org/message-id/CACG=ezawg7_nt-8ey4akv2w9lculthhknwcawmbgeetnjrj...@mail.gmail.com Thanks for the review and the draft, looks good to me, and I'm okay with doing this without the code comments. However, it seems like that thread is just the beginning of wider changes (if they indeed happen), which may impact these calculations as well, and then maybe a doc update reminder may come in useful? Best regards, Alex Friedman
Re: Doc fix of aggressive vacuum threshold for multixact members storage
Good points, thank you. I'm good with going ahead as you've suggested. Best regards, Alex Friedman