Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On 8/21/25 08:13, hubert depesz lubaczewski wrote: On Thu, Aug 21, 2025 at 08:04:25AM -0700, Adrian Klaver wrote: By bouncer I assume you mean something like pgBouncer, a connection pooler. Is it possible to determine what bouncer the queries in question are coming from? From the POV of db, all queries are coming from one of N localhost bouncers. N is usually 2…6. From the POV of the local bouncers, the queries come from range of remote bouncers. Generally we haven't seen any correlation between queries coming from specific ranges of ips. Logged queries, the ones that we see with runtime of 1s, have comments that indicate source, and they some from "all-around". Specifically, "DISCARD ALL" queries are generated by bouncers themselves (both layers). Well so much for that theory:) Just so that it will be clear, I don't expect anyone to be able to diagnose the problem based on description. I'm looking more into idea what to look for. The issue is that with the situation being pretty short, and happening on servers with non-trivial query load, I can't do stuff, like, for example, strace, stuff. Getting to the bottom of the bag of ideas: Have you looked at the OS system log for the time period involved? You mentioned this seemed to involve PREPARE and DISCARD ALL. Is this the same set of statements or is it all over the place? Also it would be helpful to know what bouncer you are actually using and what mode you are running in? Best regards, depesz -- Adrian Klaver adrian.kla...@aklaver.com
Re: Q: GRANT ... WITH ADMIN on PG 17
Am Thu, Aug 21, 2025 at 08:46:00AM -0700 schrieb Adrian Klaver: > >PG 17 documentation says that using "WITH ADMIN" allows the > >role being added to another group role to grant/revoke > >membership in said group to other roles. > > I would start by reading this: > > https://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html Thanks, I did, but did not find the answer to: Is there a way for a role that can manage membership in a group role to not itself be a member of that group role ? Best regards, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: Q: GRANT ... WITH ADMIN on PG 17
On Thu, Aug 21, 2025 at 6:00 PM Karsten Hilbert wrote: > Am Thu, Aug 21, 2025 at 08:46:00AM -0700 schrieb Adrian Klaver: > > https://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html > > Thanks, I did, but did not find the answer to: Is there a > way for a role that can manage membership in a group role to > not itself be a member of that group role ? Yes and no. Depends what you mean by MEMBER... You can be ADMIN of a ROLE, yet not have SET or INHERIT on that ROLE. As ADMIN you can grant yourself those SET and INHERIT, sure. But still, with only ADMIN, you can't SET ROLE. So are you a MEMBER? In the pg_has_role(), you are. In reality, not so much. In fact, I argued (after the fact it broke my architecture, introducing cycles) that have just ADMIN shouldn't have considered it a MEMBER of the role. But that ship had sailed already. 1 year later, we're finally catching up. I truly hope v18 won't reserve us similar backward-compatibility breaks. Read the docs for pg_auth_members. pg_has_role(). create role. If you have CREATEROLE, and do a CREATE ROLE foo, you'll have ADMIN on foo, but not SET or INHERIT (but you can grant them to yourself). Also look at pg_auth_members.grantor::regrole::text and you'll see that the postgres SUPERUSER itself gave you that ADMIN grant. But if you grant yourself the role, it's a separate pg_auth_members row, and you're now the grantor. So I didn't spend time studying your specific use case. That's your job :). But given my painful experience of the past year, I'd answer yes to your question, on logical grounds. If you see what I mean. --DD
Re: Q: GRANT ... WITH ADMIN on PG 17
Am Thu, Aug 21, 2025 at 06:29:36PM +0200 schrieb Dominique Devienne: > > Thanks, I did, but did not find the answer to: Is there a > > way for a role that can manage membership in a group role to > > not itself be a member of that group role ? > > Yes and no. Depends what you mean by MEMBER... ... > So I didn't spend time studying your specific use case. That's your job :). > But given my painful experience of the past year, I'd answer yes to your > question, on logical grounds. If you see what I mean. --DD I followed your posts back then when you worked out your use case so I did have _some_ idea where to look ;-) I just wanted to confirm my understanding in relation to my current usage. David kindly provided the needed affirmation. Regards, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: Domains vs data types
On Thu, Aug 21, 2025 at 2:11 AM Dominique Devienne wrote: > On Wed, Aug 20, 2025 at 7:37 PM Florents Tselai > wrote: > > > On 20 Aug 2025, at 7:47 AM, Ertan Küçükoglu > > > wrote: > > > I would like to learn if there is any benefit of using domains over data > > > types for table column definitions in terms of performance gain/loss. > > > I know that this doesn’t answer your question, but before exploring custom > > types / domains, > > and based on experience, I’d strongly recommend exploring jsonb instead as > > an alternative. > > I stayed out of that thread, but this makes me step out and react. > domains are typically out constraining the value space of a type. > I.e. a type with a CHECK constraint. W/o one, it's pretty useless IMHO. +1 this. The main use for domains is to allow for standard constraints. If you find yourself writing the same constraint over and over, that's when you might consider using them. For example, in the USA, vehicle identification numbers must be exactly 17 characters long. postgres@postgres=# create domain vin as text check (length(VALUE) = 17); CREATE DOMAIN postgres@postgres=# select 'abc'::TEXT::VIN; ERROR: value for domain vin violates check constraint "vin_check" The other use for domains I see is global sequences where you have many tables pulling from the same sequence. postgres@postgres=# create sequence global_id_seq; CREATE SEQUENCE postgres@postgres=# create domain global_id as bigint default nextval('global_id_seq'); CREATE DOMAIN Using domains is not really a performance thing, it's more regarding establishing rigor around type rules. In general, I tend not to use length constraints for tex columns anymore for most cases (just use text!), but I could get behind setting some standard length constraints, say, capped at 1k characters as a safety precaution. merlin merlin
Re: Q: GRANT ... WITH ADMIN on PG 17
On Thu, Aug 21, 2025 at 6:37 PM Karsten Hilbert wrote: > Am Thu, Aug 21, 2025 at 06:29:36PM +0200 schrieb Dominique Devienne: > > > Thanks, I did, but did not find the answer to: Is there a > > > way for a role that can manage membership in a group role to > > > not itself be a member of that group role ? > > > > Yes and no. Depends what you mean by MEMBER... > ... > > So I didn't spend time studying your specific use case. That's your job :). > > But given my painful experience of the past year, I'd answer yes to your > > question, on logical grounds. If you see what I mean. --DD > > I followed your posts back then when you worked out your use > case so I did have _some_ idea where to look ;-) Glad it helped someone, maybe. > I just wanted to confirm my understanding in relation to my > current usage. David kindly provided the needed affirmation. Well, I disagree with David that you're a MEMBER with just ADMIN. pg_has_role(..., 'MEMBER') says yes. But pg_has_role(..., 'SET') says no. If you can't endorse the role's privileges, you're not a member in my book. That was kinda my point. --DD
Re: Q: GRANT ... WITH ADMIN on PG 17
On Thursday, August 21, 2025, Karsten Hilbert wrote: > Am Thu, Aug 21, 2025 at 08:46:00AM -0700 schrieb Adrian Klaver: > > > >PG 17 documentation says that using "WITH ADMIN" allows the > > >role being added to another group role to grant/revoke > > >membership in said group to other roles. > > > > I would start by reading this: > > > > https://rhaas.blogspot.com/2023/01/surviving-without- > superuser-coming-to.html > > Thanks, I did, but did not find the answer to: Is there a > way for a role that can manage membership in a group role to > not itself be a member of that group role ? > A superuser can do this. Otherwise, no. In order to be made admin of a role on must be a member of said role - i.e., “with admin” is only part of the “grant” command. You won’t find docs talking about alternatives because they don’t exist. David J.
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote: > Getting to the bottom of the bag of ideas: > Have you looked at the OS system log for the time period involved? Yes. Mostly dmesg. Nothing interesting logged around the time. > You mentioned this seemed to involve PREPARE and DISCARD ALL. > Is this the same set of statements or is it all over the place? No. From what I can tell it's random sample. > Also it would be helpful to know what bouncer you are actually using and > what mode you are running in? pgBouncer, version 1.23.1. As for more... mostly transaction pooling. Applications go using transaction pooling, but people (dbas, ops) have session pooling. Best regards, depesz
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Thu, Aug 21, 2025 at 12:41:44PM +0100, Thom Brown wrote: > Ah, yeah I meant transparent hugepage: > cat /sys/kernel/mm/transparent_hugepage/enabled > This should show it being set as "never". Ah. Sorry, couldn't decipher. Yes, it's "never". > > # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' > > postgresql-2025-08-19_22.csv | uniq -c | grep -C3 -P '^\s*\d\d' > > 2 2025-08-19 22:09:29.084 UTC > > 1 2025-08-19 22:09:29.094 UTC > > 2 2025-08-19 22:09:29.097 UTC > > 70 2025-08-19 22:09:29.109 UTC > > 90 2025-08-19 22:09:29.110 UTC > > 6 2025-08-19 22:09:29.111 UTC > > 1 2025-08-19 22:09:29.153 UTC > > 1 2025-08-19 22:09:29.555 UTC … > > 22:10:54 all2.410.000.280.220.000.100.00 > > 0.000.00 96.99 > > 22:10:59 all2.830.000.290.190.000.120.00 > > 0.000.00 96.57 > > This output looks fine, so it doesn't show anything concerning, so > suggests the issue is somehow on the Postgres side. > > Did you happen to poll pg_stat_activity at the time to see whether you > had lots of IPC waits? I'm wondering whether the storage layer is > freezing up for a moment. So, we get select * from pg_stat_activity, for client backends that are not idle, every 29 seconds. So, 1 second "freeze" is impossible to cathc. Plus - I suspect that if I ran select * from pg_stat_activity while "in freeze", it would also get frozen. Anyway, I have data from 22:09:22 and 22:09:51. In both cases only 4 non-idle backend. 6 of them had NULL in wait_event* one was Client/ClientRead and one was IPC/BgWorkerShutdown. State_change for the IPC/BgWorkerShutdown backend was 2025-08-19 22:09:51.79504+00 so it was well past the moment when the problem struck. Best regards, depesz
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On 8/21/25 03:07, hubert depesz lubaczewski wrote: On Wed, Aug 20, 2025 at 10:45:13AM -0700, Adrian Klaver wrote: On 8/20/25 09:08, hubert depesz lubaczewski wrote: On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote: Hmm. From initial post: "For ~ 1 second there are no logs going to log (we usually have at 5-20 messages logged per second), no connection, nothing. And then we get bunch (30+) messages with the same milisecond time." Are the 30+ messages all coming in on one connection or multiple connections? Multiple connections. Also to be clear these are statements that are being run on the replica locally, correct? What do you mean locally? I should have been clearer. Are the queries being run against the replica or the primary? All to replica. Primary has its own work, of course, but the problem we're experiencing is on replicas. If I am following there is more then one primary --> replica pair and the problem exists across all the pairs. How many applications servers are hitting the database? To be honest, I'm not sure. I have visibility into dbs, and bouncers, not really into Apps. I know that these are automatically dynamically scaled, so number of app server is very varying. I'd say anything from 40 to 200 app servers hit first layer of bouncers, which we usually have 6-9 (2-3 per az). These go to 2nd layer of bouncers, on the db server itself. By bouncer I assume you mean something like pgBouncer, a connection pooler. Is it possible to determine what bouncer the queries in question are coming from? Best regards, depesz -- Adrian Klaver adrian.kla...@aklaver.com
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Thu, 21 Aug 2025 at 11:03, hubert depesz lubaczewski wrote: > > On Wed, Aug 20, 2025 at 06:30:00PM +0100, Thom Brown wrote: > > Do you have THP enabled? Can you use mpstat and see what %steal shows as? > > Sorry, what is "THP"? I tried searching for "what is thp", and most > common search results are related to some chemical compound. Ah, yeah I meant transparent hugepage: cat /sys/kernel/mm/transparent_hugepage/enabled This should show it being set as "never". > > mpstat, yes, we have it. Let me quickly show what I get: > > Two days ago, at ~ 10:10pm UTC I saw this in Pg logs: > > # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_22.csv > | uniq -c | grep -C3 -P '^\s*\d\d' > 2 2025-08-19 22:09:29.084 UTC > 1 2025-08-19 22:09:29.094 UTC > 2 2025-08-19 22:09:29.097 UTC > 70 2025-08-19 22:09:29.109 UTC > 90 2025-08-19 22:09:29.110 UTC > 6 2025-08-19 22:09:29.111 UTC > 1 2025-08-19 22:09:29.153 UTC > 1 2025-08-19 22:09:29.555 UTC > > As you can see we have 70, and then 90 messages all logged with the same > timestamp. All of them (160) were "duration:x" lines. > > At the time, mpstat (called using `mpstat 5`) looked like this: > > CPU%usr %nice%sys %iowait%irq %soft %steal > %guest %gnice %idle > 22:09:04 all2.250.000.270.240.000.100.00 > 0.000.00 97.14 > 22:09:09 all2.700.000.280.270.000.120.00 > 0.000.00 96.63 > 22:09:14 all2.830.000.630.230.000.090.00 > 0.000.00 96.22 > 22:09:19 all2.660.000.280.230.000.100.00 > 0.000.00 96.74 > 22:09:24 all2.360.000.270.210.000.100.00 > 0.000.00 97.06 > 22:09:29 all2.490.000.400.400.000.110.00 > 0.000.00 96.60 > 22:09:34 all2.650.000.280.230.000.110.00 > 0.000.00 96.73 > 22:09:39 all2.260.000.260.210.000.090.00 > 0.000.00 97.19 > 22:09:44 all2.310.000.250.220.000.090.00 > 0.000.00 97.13 > 22:09:49 all2.440.000.220.240.000.080.00 > 0.000.00 97.02 > 22:09:54 all2.230.000.230.280.000.100.00 > 0.000.00 97.16 > 22:09:59 all2.480.000.260.300.000.110.00 > 0.000.00 96.85 > 22:10:04 all2.290.000.280.260.000.120.00 > 0.000.00 97.06 > 22:10:09 all2.310.000.220.210.000.110.00 > 0.000.00 97.14 > 22:10:14 all2.380.000.250.210.000.100.00 > 0.000.00 97.05 > 22:10:19 all2.540.000.240.200.000.110.00 > 0.000.00 96.91 > 22:10:24 all2.260.000.250.220.000.100.00 > 0.000.00 97.17 > 22:10:29 all2.120.000.270.240.000.090.00 > 0.000.00 97.27 > 22:10:34 all2.560.000.260.260.000.120.00 > 0.000.00 96.80 > 22:10:39 all2.160.000.230.240.000.100.00 > 0.000.00 97.27 > 22:10:44 all2.370.000.260.220.000.100.00 > 0.000.00 97.05 > 22:10:49 all2.250.000.250.610.000.090.00 > 0.000.00 96.80 > 22:10:54 all2.410.000.280.220.000.100.00 > 0.000.00 96.99 > 22:10:59 all2.830.000.290.190.000.120.00 > 0.000.00 96.57 This output looks fine, so it doesn't show anything concerning, so suggests the issue is somehow on the Postgres side. Did you happen to poll pg_stat_activity at the time to see whether you had lots of IPC waits? I'm wondering whether the storage layer is freezing up for a moment. Thom
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Thu, Aug 21, 2025 at 08:04:25AM -0700, Adrian Klaver wrote: > > > > > "For ~ 1 second there are no logs going to log (we usually have at > > > > > 5-20 > > > > > messages logged per second), no connection, nothing. And then we get > > > > > bunch (30+) messages with the same milisecond time." > > > > > Are the 30+ messages all coming in on one connection or multiple > > > > > connections? > > > > Multiple connections. > > > > > Also to be clear these are statements that are being run on the > > > > > replica > > > > > locally, correct? > > > > What do you mean locally? > > > I should have been clearer. Are the queries being run against the replica > > > or > > > the primary? > > All to replica. Primary has its own work, of course, but the problem > > we're experiencing is on replicas. > > If I am following there is more then one primary --> replica pair and the > problem exists across all the pairs. Not all. We have ~ 300 such clusters. The thing doesn't cause any customer-visible issues (after all it's just 1 second delay every so often), so it's generally overlooked when it happens. But we were paying closer attention to one such cluster, and then couple of other, and we've seen this behavior. > > > How many applications servers are hitting the database? > > > > To be honest, I'm not sure. I have visibility into dbs, and bouncers, > > not really into Apps. I know that these are automatically dynamically > > scaled, so number of app server is very varying. > > > > I'd say anything from 40 to 200 app servers hit first layer of bouncers, > > which we usually have 6-9 (2-3 per az). > > > > These go to 2nd layer of bouncers, on the db server itself. > > By bouncer I assume you mean something like pgBouncer, a connection pooler. > Is it possible to determine what bouncer the queries in question are coming > from? >From the POV of db, all queries are coming from one of N localhost bouncers. N is usually 2…6. >From the POV of the local bouncers, the queries come from range of remote bouncers. Generally we haven't seen any correlation between queries coming from specific ranges of ips. Logged queries, the ones that we see with runtime of 1s, have comments that indicate source, and they some from "all-around". Specifically, "DISCARD ALL" queries are generated by bouncers themselves (both layers). Just so that it will be clear, I don't expect anyone to be able to diagnose the problem based on description. I'm looking more into idea what to look for. The issue is that with the situation being pretty short, and happening on servers with non-trivial query load, I can't do stuff, like, for example, strace, stuff. Best regards, depesz
Re: Q: GRANT ... WITH ADMIN on PG 17
On 8/21/25 08:36, Karsten Hilbert wrote: Dear all, PG 17 documentation says that using "WITH ADMIN" allows the role being added to another group role to grant/revoke membership in said group to other roles. I would start by reading this: https://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html Thanks, Karsten -- Adrian Klaver adrian.kla...@aklaver.com
Q: GRANT ... WITH ADMIN on PG 17
Dear all, PG 17 documentation says that using "WITH ADMIN" allows the role being added to another group role to grant/revoke membership in said group to other roles. Does this imply that an ADMIN role _must_ itself be a member of the group role it is to maintain membership of ? The question arises from a scenario where a DBA role would not need to be a member of a clinical group role but would be intended to maintain membership of clinical user roles within that group role. From a security point of view the question might be moot because an ADMIN role could always grant itself membership in the group role -- but it feels wrong for reasons of theoretical "correctness". IOW: - gm-dbo: user role for a DBA admin (not! superuser) - gm-bones: user role for a LLAP doctor - gm-doctors: group role for doctors, upon which are resting access permissions for clinical data - gm-bones is to be a member of gm-doctors in order to access clinical data - gm-dbo is intended to manage membership of gm-bones in gm-doctors - however, gm-dbo need not itself be a member of gm-doctors Is that possible within the current (as of PG 17) framework ? Thanks, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: Q: GRANT ... WITH ADMIN on PG 17
Am Thu, Aug 21, 2025 at 09:11:57AM -0700 schrieb David G. Johnston: > > Thanks, I did, but did not find the answer to: Is there a > > way for a role that can manage membership in a group role to > > not itself be a member of that group role ? > > A superuser can do this. Otherwise, no. In order to be made admin of a > role on must be a member of said role Thanks, fine, that makes it clear to me. Regards, Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
If all your queries are coming through pgBouncer, and only those hang (the server itself responds if you connect directly to it), then it might be this pgBouncer issue: https://github.com/pgbouncer/pgbouncer/issues/1054 Although that issue is now "closed", because the invisible "debug" log message was upgraded to a warning (and I don't think that change is in any released version), the underlying problem still exists: pgbouncer hangs completely (stops forwarding packets) for a while if the PAM authentication queue becomes full. If you have a relatively slow PAM service (such as pam_ldap) then you can trigger it by opening ~100 connections to pgBouncer simultaneously (without waiting for previous ones to authenticate), something like this: for i in `seq 1 100`; do psql -h pgbouncer -p 6432 -U user db_name -c "SELECT 1" & done Thanks, Chris. On Thu, 21 Aug 2025 at 19:17, Adrian Klaver wrote: > On 8/21/25 09:51, hubert depesz lubaczewski wrote: > > On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote: > >> Getting to the bottom of the bag of ideas: > >> Have you looked at the OS system log for the time period involved? > > > > Yes. Mostly dmesg. Nothing interesting logged around the time. > > > >> You mentioned this seemed to involve PREPARE and DISCARD ALL. > >> Is this the same set of statements or is it all over the place? > > > > No. From what I can tell it's random sample. > > > >> Also it would be helpful to know what bouncer you are actually using and > >> what mode you are running in? > > > > pgBouncer, version 1.23.1. As for more... mostly transaction pooling. > > Applications go using transaction pooling, but people (dbas, ops) have > > session pooling. > > Have you looked at?: > > https://www.pgbouncer.org/changelog.html#pgbouncer-124x > > To see if anything stands out. > > Then there is: > > https://www.pgbouncer.org/config.html#max_prepared_statements > > The below may also be worth looking at: > > https://github.com/pgbouncer/pgbouncer/pull/1144 > > I can't help thinking that there is a caching issue at stake, though > that is just a guess. > > > > > > Best regards, > > > > depesz > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > >
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On 8/21/25 09:51, hubert depesz lubaczewski wrote: On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote: Getting to the bottom of the bag of ideas: Have you looked at the OS system log for the time period involved? Yes. Mostly dmesg. Nothing interesting logged around the time. You mentioned this seemed to involve PREPARE and DISCARD ALL. Is this the same set of statements or is it all over the place? No. From what I can tell it's random sample. Also it would be helpful to know what bouncer you are actually using and what mode you are running in? pgBouncer, version 1.23.1. As for more... mostly transaction pooling. Applications go using transaction pooling, but people (dbas, ops) have session pooling. Have you looked at?: https://www.pgbouncer.org/changelog.html#pgbouncer-124x To see if anything stands out. Then there is: https://www.pgbouncer.org/config.html#max_prepared_statements The below may also be worth looking at: https://github.com/pgbouncer/pgbouncer/pull/1144 I can't help thinking that there is a caching issue at stake, though that is just a guess. Best regards, depesz -- Adrian Klaver adrian.kla...@aklaver.com
Re: Q: GRANT ... WITH ADMIN on PG 17
On 8/21/25 09:29, Dominique Devienne wrote: On Thu, Aug 21, 2025 at 6:00 PM Karsten Hilbert wrote: Am Thu, Aug 21, 2025 at 08:46:00AM -0700 schrieb Adrian Klaver: https://rhaas.blogspot.com/2023/01/surviving-without-superuser-coming-to.html Thanks, I did, but did not find the answer to: Is there a way for a role that can manage membership in a group role to not itself be a member of that group role ? Yes and no. Depends what you mean by MEMBER... Read the docs for pg_auth_members. pg_has_role(). create role. If you have CREATEROLE, and do a CREATE ROLE foo, you'll have ADMIN on foo, but not SET or INHERIT (but you can grant them to yourself). That is a matter of choice as described here: https://www.postgresql.org/docs/current/runtime-config-client.html createrole_self_grant (string) If a user who has CREATEROLE but not SUPERUSER creates a role, and if this is set to a non-empty value, the newly-created role will be granted to the creating user with the options specified. The value must be set, inherit, or a comma-separated list of these. The default value is an empty string, which disables the feature. The purpose of this option is to allow a CREATEROLE user who is not a superuser to automatically inherit, or automatically gain the ability to SET ROLE to, any created users. Since a CREATEROLE user is always implicitly granted ADMIN OPTION on created roles, that user could always execute a GRANT statement that would achieve the same effect as this setting. However, it can be convenient for usability reasons if the grant happens automatically. A superuser automatically inherits the privileges of every role and can always SET ROLE to any role, and this setting can be used to produce a similar behavior for CREATEROLE users for users which they create. Also look at pg_auth_members.grantor::regrole::text and you'll see that the postgres SUPERUSER itself gave you that ADMIN grant. But if you grant yourself the role, it's a separate pg_auth_members row, and you're now the grantor. So I didn't spend time studying your specific use case. That's your job :). But given my painful experience of the past year, I'd answer yes to your question, on logical grounds. If you see what I mean. --DD -- Adrian Klaver adrian.kla...@aklaver.com
Re: Domains vs data types
On Wed, Aug 20, 2025 at 7:37 PM Florents Tselai wrote: > > On 20 Aug 2025, at 7:47 AM, Ertan Küçükoglu > > wrote: > > I would like to learn if there is any benefit of using domains over data > > types for table column definitions in terms of performance gain/loss. > I know that this doesn’t answer your question, but before exploring custom > types / domains, > and based on experience, I’d strongly recommend exploring jsonb instead as an > alternative. I stayed out of that thread, but this makes me step out and react. domains are typically out constraining the value space of a type. I.e. a type with a CHECK constraint. W/o one, it's pretty useless IMHO. While json/jsonb is about denormalizing and stuffing unconstrained data, of arbitrary (and often evolving) content. No need for complex relational modeling and its associated constraints and "rigidity". I.e. easy evolution of the data tier, at the cost of applications having to deal with the complexity themselves of constrained data (must expect anything). Your SQL also becomes less expressive or more complex, although PostgreSQL has excellent JSON support. > Also note that using custom types can lead to some confusion initially for > basic stuff > you can’t do things like SELECT mytype.f1 and instead you have to do > (mytype).f1 I guess you're talking about composite types? Not sure OP had that in mind. But that's still a good point. Thanks for sharing. Yes, using a custom type/domain is more metadata, since a new row in pg_type. But that's mostly negligeable. There's always an (integral) OID associated to columns, so only its value changes if you start using a custom type, be it i memory or on disk. So no, using a custom type is unlikely to make things bigger or slower. That said, in my case, because I use the BINARY mode of LIBPQ and COPY, it does matter, as my code knows about built-in OIDs, but not of custom OIDs for custom types. But few people care about such things. FWIW, --DD
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Wed, Aug 20, 2025 at 11:24:29AM -0600, Rob Sargent wrote: > > On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote: > >> Hmm. > >> > >> From initial post: > >> > >> "For ~ 1 second there are no logs going to log (we usually have at 5-20 > >> messages logged per second), no connection, nothing. And then we get > >> bunch (30+) messages with the same milisecond time." > > Is that logger ms or db server ms? The latter seems unlikely to me. Sorry, I don't understand your question. I know it's unlikely, so it seems that something "froze" pg, for ~ 1s. The question is how to diagnose/debug what it is. Best regards, depesz
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Wed, Aug 20, 2025 at 06:30:00PM +0100, Thom Brown wrote: > Do you have THP enabled? Can you use mpstat and see what %steal shows as? Sorry, what is "THP"? I tried searching for "what is thp", and most common search results are related to some chemical compound. mpstat, yes, we have it. Let me quickly show what I get: Two days ago, at ~ 10:10pm UTC I saw this in Pg logs: # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_22.csv | uniq -c | grep -C3 -P '^\s*\d\d' 2 2025-08-19 22:09:29.084 UTC 1 2025-08-19 22:09:29.094 UTC 2 2025-08-19 22:09:29.097 UTC 70 2025-08-19 22:09:29.109 UTC 90 2025-08-19 22:09:29.110 UTC 6 2025-08-19 22:09:29.111 UTC 1 2025-08-19 22:09:29.153 UTC 1 2025-08-19 22:09:29.555 UTC As you can see we have 70, and then 90 messages all logged with the same timestamp. All of them (160) were "duration:x" lines. At the time, mpstat (called using `mpstat 5`) looked like this: CPU%usr %nice%sys %iowait%irq %soft %steal %guest %gnice %idle 22:09:04 all2.250.000.270.240.000.100.00 0.000.00 97.14 22:09:09 all2.700.000.280.270.000.120.00 0.000.00 96.63 22:09:14 all2.830.000.630.230.000.090.00 0.000.00 96.22 22:09:19 all2.660.000.280.230.000.100.00 0.000.00 96.74 22:09:24 all2.360.000.270.210.000.100.00 0.000.00 97.06 22:09:29 all2.490.000.400.400.000.110.00 0.000.00 96.60 22:09:34 all2.650.000.280.230.000.110.00 0.000.00 96.73 22:09:39 all2.260.000.260.210.000.090.00 0.000.00 97.19 22:09:44 all2.310.000.250.220.000.090.00 0.000.00 97.13 22:09:49 all2.440.000.220.240.000.080.00 0.000.00 97.02 22:09:54 all2.230.000.230.280.000.100.00 0.000.00 97.16 22:09:59 all2.480.000.260.300.000.110.00 0.000.00 96.85 22:10:04 all2.290.000.280.260.000.120.00 0.000.00 97.06 22:10:09 all2.310.000.220.210.000.110.00 0.000.00 97.14 22:10:14 all2.380.000.250.210.000.100.00 0.000.00 97.05 22:10:19 all2.540.000.240.200.000.110.00 0.000.00 96.91 22:10:24 all2.260.000.250.220.000.100.00 0.000.00 97.17 22:10:29 all2.120.000.270.240.000.090.00 0.000.00 97.27 22:10:34 all2.560.000.260.260.000.120.00 0.000.00 96.80 22:10:39 all2.160.000.230.240.000.100.00 0.000.00 97.27 22:10:44 all2.370.000.260.220.000.100.00 0.000.00 97.05 22:10:49 all2.250.000.250.610.000.090.00 0.000.00 96.80 22:10:54 all2.410.000.280.220.000.100.00 0.000.00 96.99 22:10:59 all2.830.000.290.190.000.120.00 0.000.00 96.57 Best regards, depesz
Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
On Wed, Aug 20, 2025 at 10:45:13AM -0700, Adrian Klaver wrote: > On 8/20/25 09:08, hubert depesz lubaczewski wrote: > > On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote: > > > Hmm. > > > > > > From initial post: > > > > > > "For ~ 1 second there are no logs going to log (we usually have at 5-20 > > > messages logged per second), no connection, nothing. And then we get > > > bunch (30+) messages with the same milisecond time." > > > Are the 30+ messages all coming in on one connection or multiple > > > connections? > > Multiple connections. > > > Also to be clear these are statements that are being run on the replica > > > locally, correct? > > What do you mean locally? > I should have been clearer. Are the queries being run against the replica or > the primary? All to replica. Primary has its own work, of course, but the problem we're experiencing is on replicas. > > Application servers are *not* on the same server as the db. So no, they > > are not local. > Where are the application servers relative to the replica server? Well, there is a lot of them. All are in AWS EC2, same region, various availability zones. Generally we use 3 AZs, with more or less equal split, so I'd say ~ 33% of app servers is within the same AZ. > How many applications servers are hitting the database? To be honest, I'm not sure. I have visibility into dbs, and bouncers, not really into Apps. I know that these are automatically dynamically scaled, so number of app server is very varying. I'd say anything from 40 to 200 app servers hit first layer of bouncers, which we usually have 6-9 (2-3 per az). These go to 2nd layer of bouncers, on the db server itself. Best regards, depesz