Proposal: Shared Work Mem Area
Hello there! I would like to propose to developers a new feature to replace the private management of the work mem with a management through shared memory: allocated at the start of PostgreSQL this area should be shared to all worker processes as for example Oracle Database PGA do. This would allow an optimal use of this memory area by limiting only its global maximum limit and not a configuration/allocation at the process level. What do you think about my proposal? Cheers, Marco
Re: Proposal: Shared Work Mem Area
Greetings, * Marco Fortina (marco_fort...@hotmail.it) wrote: > I would like to propose to developers a new feature to replace the private > management of the work mem with a management through shared memory: allocated > at the start of PostgreSQL this area should be shared to all worker processes > as for example Oracle Database PGA do. > > This would allow an optimal use of this memory area by limiting only its > global maximum limit and not a configuration/allocation at the process level. > > What do you think about my proposal? There's ongoing work to provide a way to have a global maximum limit which doesn't involve entirely reworking how work_mem works today. The commitfest entry for that work is here: https://commitfest.postgresql.org/42/3867/ If you're interested in that, getting additional reviews and comments on the work would be helpful in moving it forward. Thanks, Stephen signature.asc Description: PGP signature
R: Proposal: Shared Work Mem Area
Thanks Stephen. https://commitfest.postgresql.org/42/3867/ is not exacly what I proposed as new feature to developers. If I'm not wrong, almost all main memory areas have a fixed size: shared_buffers effective_cache_size wal_buffers Instead, work_mem is per-process dynamically allocated up to defined size limit. What I suggested is to replace work_mem from per-process allocation to global and fixed size allocation (see pga_aggregate_target on Oracle) and shared to worker processes. Let's assume the new parameter name is worker_mem_area and this was set to 8GB: with my proposal method each worker process do not use it's own dedicated work_mem but the shared one. In this way each worker is also able to peek free pages from the worker_mem_area if needed. Regards, Marco Da: Stephen Frost Inviato: Giovedì, 06 Aprile, 2023 14:43 A: Marco Fortina Cc: pgsql-gene...@postgresql.org Oggetto: Re: Proposal: Shared Work Mem Area Greetings, * Marco Fortina (marco_fort...@hotmail.it) wrote: > I would like to propose to developers a new feature to replace the private > management of the work mem with a management through shared memory: allocated > at the start of PostgreSQL this area should be shared to all worker processes > as for example Oracle Database PGA do. > > This would allow an optimal use of this memory area by limiting only its > global maximum limit and not a configuration/allocation at the process level. > > What do you think about my proposal? There's ongoing work to provide a way to have a global maximum limit which doesn't involve entirely reworking how work_mem works today. The commitfest entry for that work is here: https://commitfest.postgresql.org/42/3867/ If you're interested in that, getting additional reviews and comments on the work would be helpful in moving it forward. Thanks, Stephen
Re: Proposal: Shared Work Mem Area
Thanks Stephen. https://commitfest.postgresql.org/42/3867/ is not exacly what I proposed as new feature to developers. If I'm not wrong, almost all main memory areas have a fixed size: shared_buffers effective_cache_size wal_buffers Instead, work_mem is per-process dynamically allocated up to defined size limit. What I suggested is to replace work_mem from per-process allocation to global and fixed size allocation (see pga_aggregate_target on Oracle) and shared to worker processes. Let's assume the new parameter name is worker_mem_area and this was set to 8GB: with my proposal method each worker process do not use it's own dedicated work_mem but the shared one. In this way each worker is also able to peek free pages from the worker_mem_area if needed. Regards, Marco Da: Stephen Frost Inviato: Giovedì, 06 Aprile, 2023 14:43 A: Marco Fortina Cc: pgsql-gene...@postgresql.org Oggetto: Re: Proposal: Shared Work Mem Area Greetings, * Marco Fortina (marco_fort...@hotmail.it) wrote: > I would like to propose to developers a new feature to replace the private > management of the work mem with a management through shared memory: allocated > at the start of PostgreSQL this area should be shared to all worker processes > as for example Oracle Database PGA do. > > This would allow an optimal use of this memory area by limiting only its > global maximum limit and not a configuration/allocation at the process level. > > What do you think about my proposal? There's ongoing work to provide a way to have a global maximum limit which doesn't involve entirely reworking how work_mem works today. The commitfest entry for that work is here: https://commitfest.postgresql.org/42/3867/ If you're interested in that, getting additional reviews and comments on the work would be helpful in moving it forward. Thanks, Stephen
Re: R: Proposal: Shared Work Mem Area
Greetings, Please don't top-post on the PG mailing lists, it makes it harder to follow the discussion. * Marco Fortina (marco_fort...@hotmail.it) wrote: > https://commitfest.postgresql.org/42/3867/ is not exacly what I proposed as > new feature to developers. I understood what you were proposing. > If I'm not wrong, almost all main memory areas have a fixed size: > > shared_buffers > effective_cache_size > wal_buffers > > Instead, work_mem is per-process dynamically allocated up to defined size > limit. That's not how work_mem works actually. It's a per-node amount and it's not a per-process overall limit, nor is it really a hard limit though some nodes will do their best to respect the amount configured. > What I suggested is to replace work_mem from per-process allocation to global > and fixed size allocation (see pga_aggregate_target on Oracle) and shared to > worker processes. I understood the suggestion and it's a lot of work for an unclear gain. You noted that having it be pulled from a single area would allow administrators to configure an overall memory usage limit- but that's not the only way to do that and there's an existing effort to do exactly that already underway that's a lot simpler than what you're proposing. While there might be other advantages to having a shared memory segment be used for work_mem, you've not outlined any. > Let's assume the new parameter name is worker_mem_area and this was set to > 8GB: with my proposal method each worker process do not use it's own > dedicated work_mem but the shared one. I understand your suggestion, but making such a large change just to make it isn't sensible, there should be reasoning behind why that's better than what we're doing already or proposing to do. > In this way each worker is also able to peek free pages from the > worker_mem_area if needed. This can be done with the existing approach and doesn't require a shared memory segment for work_mem. We are pretty far from having an actual acceptance system for queries though but I do agree that would be a useful thing to work towards. I don't know that it requires work_mem being in shared memory though. Thanks, Stephen signature.asc Description: PGP signature
Re: Patroni vs pgpool II
On Wed, 05 Apr 2023 16:50:15 +0900 (JST) Tatsuo Ishii wrote: > > But, I heard PgPool is still affected by Split brain syndrome. > > Can you elaborate more? If more than 3 pgpool watchdog nodes (the > number of nodes must be odd) are configured, a split brain can be > avoided. Split brain is a hard situation to avoid. I suppose OP is talking about PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would avoid that. To avoid split brain, you need to implement a combinaison of quorum and (self-)fencing. Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved with the (hardware) watchdog. You can also implement node fencing through the "pre_promote" script to fence the old primary node before promoting the new one. If you need HA with a high level of anti-split-brain security, you'll not be able to avoid some sort of fencing, no matter what. Good luck.
"PANIC: could not open critical system index 2662" - twice
Our PostgreSQL 15.2 instance running on Ubuntu 18.04 has crashed with this error: 2023-04-05 09:24:03.448 UTC [15227] ERROR: index "pg_class_oid_index" contains unexpected zero page at block 0 2023-04-05 09:24:03.448 UTC [15227] HINT: Please REINDEX it. ... 2023-04-05 13:05:25.018 UTC [15437] root@test_behavior_638162834106895162 FATAL: index "pg_class_oid_index" contains unexpected zero page at block 0 2023-04-05 13:05:25.018 UTC [15437] root@test_behavior_638162834106895162 HINT: Please REINDEX it. ... (same error for a few more DBs) 2023-04-05 13:05:25.144 UTC [16965] root@test_behavior_638162855458823077 FATAL: index "pg_class_oid_index" contains unexpected zero page at block 0 2023-04-05 13:05:25.144 UTC [16965] root@test_behavior_638162855458823077 HINT: Please REINDEX it. ... 2023-04-05 13:05:25.404 UTC [17309] root@test_behavior_638162881641031612 PANIC: could not open critical system index 2662 2023-04-05 13:05:25.405 UTC [9372] LOG: server process (PID 17309) was terminated by signal 6: Aborted 2023-04-05 13:05:25.405 UTC [9372] LOG: terminating any other active server processes We had the same thing happened about a month ago on a different database on the same cluster. For a while PG actually ran OK as long as you didn't access that specific DB, but when trying to back up that DB with pg_dump it would crash every time. At that time one of the disks hosting the ZFS dataset with the PG data directory on it was reporting errors, so we thought it was likely due to that. Unfortunately, before we could replace the disks, PG crashed completely and would not start again at all, so I had to rebuild the cluster from scratch and restore from pg_dump backups (still onto the old, bad disks). Once the disks were replaced (all of them) I just copied the data to them using zfs send | zfs receive and didn't bother restoring pg_dump backups again - which was perhaps foolish in hindsight. Well, yesterday it happened again. The server still restarted OK, so I took fresh pg_dump backups of the databases we care about (which ran fine), rebuilt the cluster and restored the pg_dump backups again - now onto the new disks, which are not reporting any problems. So while everything is up and running now this error has me rather concerned. Could the error we're seeing now have been caused by some corruption in the PG data that's been there for a month (so it could still be attributed to the bad disk), which should now be fixed by having restored from backups onto good disks? Could this be a PG bug? What can I do to figure out why this is happening and prevent it from happening again? Advice appreciated!
PostgreSQL Mailing list public archives : search not working ...
Hi, Theory: Maybe the search doesn't always work ... Is this a known problem? My use case: in https://www.postgresql.org/list/ -> "Search the Archives" search for the "large database backup" -> https://www.postgresql.org/search/?m=1&q=large+database+backup&l=&d=365&s=r -> "Your search for large database backup returned no hits." But at least one big thread exists: To: pgsql-general Subject: postgres *large database backup* Date: 2022-11-30 15:40:23 https://www.postgresql.org/message-id/flat/CA%2BONtZ7CdiJ2X8x_8sPHR%2Btz-vv7mR1uoDzLia3fzWVZA9k0JQ%40mail.gmail.com *Changing the "Sort by" ..* - sort by: rank: no hits https://www.postgresql.org/search/?m=1&q=large+database+backup&l=&d=365&s=r - sort by: date: OK ; 1 result https://www.postgresql.org/search/?m=1&q=large+database+backup&l=&d=365&s=d - sort by: reverse date : no hits. https://www.postgresql.org/search/?m=1&q=large+database+backup&l=&d=365&s=i Regards, Imre
UPSERT in Postgres
This is a question/feature request. Virtually all references on the internet today suggests one can "upsert" in PostgreSQL using the "insert ... on conflict do update ..." statement. But this is not complete true. The PostgreSQL's own wiki page (https://wiki.postgresql.org/wiki/UPSERT#.22UPSERT.22_definition) defines UPSERT as "UPSERT" is a DBMS feature that allows a DML statement's author to atomically either insert a row, or on the basis of the row already existing, UPDATE that existing row instead, I believe this definition is correct and consistent with defintion elsewhere (https://en.wiktionary.org/wiki/upsert). An implicit assumption behind this definition is that table must have a primary key for the upsert operation to make sense since it's the primary key that uniquely identifies a row. let's say we have a very simple table ``` create table person ( id int primary key, name text not null, is_active boolean ) ``` Given the definition of upsert, I'd expect an upsert command to do the following. - `upsert into person (id, name) values (0, 'foo')` to insert a new row - `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above Naturally, since there isn't a real upsert command in PostgreSQL this won't work today. But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet seems to suggest. insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id, is_active=excluded.is_active Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column. PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness constraint is violated. Is this behavior wrong? maybe not, I think it is doing what it reads quite literally. That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert. But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key columns. So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue the current behavior is incorrect? This has been a source confusion to say at least. https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned limitation. The downside is it is rather verbose. *Question* This there a way to do an upsert proper prior to PG15? *Feature Request* Given that UPSERT is an *idempotent* operator it is extremely useful. Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily. Regards, Louis Tian
Re: UPSERT in Postgres
On Thu, Apr 6, 2023 at 1:21 PM Louis Tian wrote: > An implicit assumption behind this definition is that table must have a > primary key for the upsert operation to make sense since it's the primary key > that uniquely identifies a row. It could just be a unique index or a unique constraint. So you can upsert on any individual unique constraint/index, or the primary key. Of course there might be several on a given table, but you can only use one as the "conflict arbiter" per statement. > But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" > like a lot of references on the internet seems to suggest. > > insert into person (id, name) values (0, 'foo') on conflict ("id") do update > set id=excluded.id, name=excluded.name > insert into person (id, is_active) values (0, true) on conflict ("id") do > update set id=excluded.id, is_active=excluded.is_active > > Unfortunately. the second statement will fail due to violation of the not > null constraint on the "name" column. > PostgreSQL will always try to insert the row into the table first. and only > fallback to update when the uniqueness constraint is violated. > Is this behavior wrong? maybe not, I think it is doing what it reads quite > literally. It sort of has to work that way, though. In general your example might *not* fail, due to a row-level before trigger in the insert path. Why doesn't your proposed upsert syntax have the same problem? I mean, how could it not? I guess it doesn't if you assume that it'll never take the insert path with your not NULL constraint example? But if you know that for sure, why not just use a regular update statement? On the other hand, if you're not sure if the insert path can be taken, then why is it actually helpful to not just throw an error at the earliest opportunity? Surely upsert means "update or insert", so why wouldn't the user expect to see an error like this, independent of the specifics of the row in question? Isn't the user tacitly saying "I don't specifically know if the update or insert path will be taken in respect of any given row" by using ON CONFLICT DO UPDATE in the first place? > That being said, I have never had a need for the ON CONFLICT DO UPDATE > statement other than where I need upsert. > But using it as "upsert" is only valid when the table is absent of any NOT > NULL constraint on it's non primary key columns. I don't know what you mean by that. "Valid"? > The MERGE command introduced in PG15 in theory can be used to do UPSERT > properly that is void of the aforementioned limitation. > The downside is it is rather verbose. The MERGE command has various race conditions that are particularly relevant to UPSERT type use cases. See the wiki page you referenced for a huge amount of information on this. > *Feature Request* > Given that UPSERT is an *idempotent* operator it is extremely useful. In general UPSERT (or any definition of it that I can think of) does not imply idempotency. -- Peter Geoghegan
Re: UPSERT in Postgres
On 4/5/23 23:21, Louis Tian wrote: This is a question/feature request. Given the definition of upsert, I'd expect an upsert command to do the following. - `upsert into person (id, name) values (0, 'foo')` to insert a new row - `upsert into person (id, is_active) values (0, true)` updates the is_active column for the row inserted above Naturally, since there isn't a real upsert command in PostgreSQL this won't work today. But can we achieve the same effect with "INSERT ... ON CONFLICT DO UPDATE" like a lot of references on the internet seems to suggest. insert into person (id, name) values (0, 'foo') on conflict ("id") do update set id=excluded.id, name=excluded.name insert into person (id, is_active) values (0, true) on conflict ("id") do update set id=excluded.id, is_active=excluded.is_active insert into person (id, name, is_active) values (0, '', true) on conflict ("id") do update set id=excluded.id, name=person.name, is_active=excluded.is_active ; INSERT 0 1 select * from person; id | name | is_active +--+--- 0 | foo | t Unfortunately. the second statement will fail due to violation of the not null constraint on the "name" column. PostgreSQL will always try to insert the row into the table first. and only fallback to update when the uniqueness constraint is violated. Is this behavior wrong? maybe not, I think it is doing what it reads quite literally. That being said, I have never had a need for the ON CONFLICT DO UPDATE statement other than where I need upsert. But using it as "upsert" is only valid when the table is absent of any NOT NULL constraint on it's non primary key columns. So, if my experience/use case is typical (meaning the main purpose / use case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue the current behavior is incorrect? This has been a source confusion to say at least. https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f The MERGE command introduced in PG15 in theory can be used to do UPSERT properly that is void of the aforementioned limitation. The downside is it is rather verbose. *Question* This there a way to do an upsert proper prior to PG15? *Feature Request* Given that UPSERT is an *idempotent* operator it is extremely useful. Would love to see an UPSERT command in PostgreSQL so one can 'upsert' properly and easily. Regards, Louis Tian -- Adrian Klaver adrian.kla...@aklaver.com
Re: UPSERT in Postgres
That is the answer. Postgresql can upsert easily via triggers and on conflict. Thanks, Ben On Thu, Apr 6, 2023, 5:01 PM Adrian Klaver wrote: > On 4/5/23 23:21, Louis Tian wrote: > > This is a question/feature request. > > > > > Given the definition of upsert, I'd expect an upsert command to do the > following. > > - `upsert into person (id, name) values (0, 'foo')` to insert a new row > > - `upsert into person (id, is_active) values (0, true)` updates the > is_active column for the row inserted above > > > > Naturally, since there isn't a real upsert command in PostgreSQL this > won't work today. > > But can we achieve the same effect with "INSERT ... ON CONFLICT DO > UPDATE" like a lot of references on the internet seems to suggest. > > > > insert into person (id, name) values (0, 'foo') on conflict ("id") do > update set id=excluded.id, name=excluded.name > > insert into person (id, is_active) values (0, true) on conflict > ("id") do update set id=excluded.id, is_active=excluded.is_active > > insert into person (id, name, is_active) values (0, '', true) on > conflict ("id") do update set id=excluded.id, name=person.name, > is_active=excluded.is_active ; > INSERT 0 1 > > select * from person; > id | name | is_active > +--+--- >0 | foo | t > > > > > Unfortunately. the second statement will fail due to violation of the > not null constraint on the "name" column. > > PostgreSQL will always try to insert the row into the table first. and > only fallback to update when the uniqueness constraint is violated. > > Is this behavior wrong? maybe not, I think it is doing what it reads > quite literally. > > That being said, I have never had a need for the ON CONFLICT DO UPDATE > statement other than where I need upsert. > > But using it as "upsert" is only valid when the table is absent of any > NOT NULL constraint on it's non primary key columns. > > So, if my experience/use case is typical (meaning the main purpose / use > case for ON CONFLICT DO UPDATE is to support upsert) then it can be argue > the current behavior is incorrect? > > > > This has been a source confusion to say at least. > > > https://stackoverflow.com/questions/48816629/on-conflict-do-nothing-in-postgres-with-a-not-null-constraint > > > https://www.postgresql.org/message-id/flat/1996438.1604952901%40sss.pgh.pa.us#ace8adc1354265aca5672028faea0c0f > > > > The MERGE command introduced in PG15 in theory can be used to do UPSERT > properly that is void of the aforementioned limitation. > > The downside is it is rather verbose. > > > > *Question* > > This there a way to do an upsert proper prior to PG15? > > > > *Feature Request* > > Given that UPSERT is an *idempotent* operator it is extremely useful. > > Would love to see an UPSERT command in PostgreSQL so one can 'upsert' > properly and easily. > > > > > > Regards, > > Louis Tian > > > > > > > > > > > > > > > > > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > > > >
Re: PostgreSQL Mailing list public archives : search not working ...
Greetings, * Imre Samu (pella.s...@gmail.com) wrote: > Theory: Maybe the search doesn't always work ... It does ... but perhaps not as expected (or as ideal). > Is this a known problem? Yes and no. > My use case: > in https://www.postgresql.org/list/ -> "Search the Archives" > search for the "large database backup" > -> > https://www.postgresql.org/search/?m=1&q=large+database+backup&l=&d=365&s=r > -> "Your search for large database backup returned no hits." Note that the default is "within the last year". That may not be ideal though and perhaps we should change it. > But at least one big thread exists: "big" is perhaps over-stating it, and most of it is actually about ZFS anyway, though a few of the initial emails on the thread are informative. > To: pgsql-general > Subject: postgres *large database backup* > Date: 2022-11-30 15:40:23 > https://www.postgresql.org/message-id/flat/CA%2BONtZ7CdiJ2X8x_8sPHR%2Btz-vv7mR1uoDzLia3fzWVZA9k0JQ%40mail.gmail.com > > *Changing the "Sort by" ..* Changing date to 'anytime', sorting by 'date' and using 'database backup' instead turns this thread up pretty quickly (number 7): https://www.postgresql.org/search/?m=1&q=database+backup&l=&d=-1&s=d That said, we should really be including the Subject line in the search and I don't think we are today (looks like we just search the body). The code is available for review here: https://git.postgresql.org/gitweb/?p=pgarchives.git;a=summary Patches to improve certainly welcome. Thanks, Stephen signature.asc Description: PGP signature
Backup schema without data
Hi, Please help me in telling that how I can take the backup of one single schema without its data using pg_dump utility ? So far, I could not find anything suitable for doing so. Regards, Atul
Re: Backup schema without data
>From the man page…. “ -s --schema-only Dump only the object definitions (schema), not data. …..” On Thu, Apr 6, 2023 at 18:40 Atul Kumar wrote: > Hi, > > Please help me in telling that how I can take the backup of one single > schema without its data using pg_dump utility ? > > > So far, I could not find anything suitable for doing so. > > Regards, > Atul > > -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler
Re: Backup schema without data
Atul Kumar writes: > Please help me in telling that how I can take the backup of one single > schema without its data using pg_dump utility ? Doesn't "pg_dump -s -n ..." do what you want? regards, tom lane
Re: Backup schema without data
Looking at the result of: pg_dump --help Usage: pg_dump [OPTION]... [DBNAME] ... -n, --schema=PATTERN dump the specified schema(s) only... -s, --schema-only dump only the schema, no data... So: pg_dump -s -n It works for me... Cheers Brent Wood On Friday, April 7, 2023 at 12:40:58 PM GMT+12, Atul Kumar wrote: Hi, Please help me in telling that how I can take the backup of one single schema without its data using pg_dump utility ? So far, I could not find anything suitable for doing so. Regards,Atul
Re: Patroni vs pgpool II
>> > But, I heard PgPool is still affected by Split brain syndrome. >> >> Can you elaborate more? If more than 3 pgpool watchdog nodes (the >> number of nodes must be odd) are configured, a split brain can be >> avoided. > > Split brain is a hard situation to avoid. I suppose OP is talking about > PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would > avoid that. Ok, "split brain" means here that there are two or more PostgreSQL primary serves exist. Pgpool-II's watchdog has a feature called "quorum failover" to avoid the situation. To make this work, you need to configure 3 or more Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal is to avoid that both pg0 and pg1 become primary servers. Pgpool-II periodically monitors PostgreSQL healthiness by checking whether it can reach to the PostgreSQL servers. Suppose w0 and w1 detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely, i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1, pg0 are in a network A, but w2 and pg1 in different network B. A and B cannot reach each other). In this situation if w2 promotes pg1 because w0 seems to be down, then the system ends up with two primary servers: split brain. With quorum failover is enabled, w0, w1, and w2 communicate each other to vote who is correct (if it cannot communicate, it regards other watchdog is down). In the case above w0 and w1 are majority and will win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the primary. On the other hand, since wg2 looses, and it gives up promoting pg1, thus the split brain is avoided. Note that in the configuration above, clients access the cluster via VIP. VIP is always controlled by majority watchdog, clients will not access pg1 because it is set to down status by w0 and w1. > To avoid split brain, you need to implement a combinaison of quorum and > (self-)fencing. > > Patroni quorum is in the DCS's hands. Patroni's self-fencing can be achieved > with the (hardware) watchdog. You can also implement node fencing through the > "pre_promote" script to fence the old primary node before promoting the new > one. > > If you need HA with a high level of anti-split-brain security, you'll not be > able to avoid some sort of fencing, no matter what. > > Good luck. Well, if you define fencing as STONITH (Shoot The Other Node in the Head), Pgpool-II does not have the feature. However I am not sure STONITH is always mandatory. I think that depends what you want to avoid using fencing. If the purpose is to avoid having two primary servers at the same time, Pgpool-II achieve that as described above. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Patroni vs pgpool II
On 4/6/23 23:16, Tatsuo Ishii wrote: But, I heard PgPool is still affected by Split brain syndrome. Can you elaborate more? If more than 3 pgpool watchdog nodes (the number of nodes must be odd) are configured, a split brain can be avoided. Split brain is a hard situation to avoid. I suppose OP is talking about PostgreSQL split brain situation. I'm not sure how PgPool's watchdog would avoid that. Ok, "split brain" means here that there are two or more PostgreSQL primary serves exist. Pgpool-II's watchdog has a feature called "quorum failover" to avoid the situation. To make this work, you need to configure 3 or more Pgpool-II nodes. Suppose they are w0, w1 and w2. Also suppose there are two PostgreSQL servers pg0 (primary) and pg1 (standby). The goal is to avoid that both pg0 and pg1 become primary servers. Pgpool-II periodically monitors PostgreSQL healthiness by checking whether it can reach to the PostgreSQL servers. Suppose w0 and w1 detect that pg0 is healthy but pg1 is not, while w2 thinks oppositely, i.e. pg0 is unhealthy but pg1 is healthy (this could happen if w0, w1, pg0 are in a network A, but w2 and pg1 in different network B. A and B cannot reach each other). In this situation if w2 promotes pg1 because w0 seems to be down, then the system ends up with two primary servers: split brain. With quorum failover is enabled, w0, w1, and w2 communicate each other to vote who is correct (if it cannot communicate, it regards other watchdog is down). In the case above w0 and w1 are majority and will win. Thus w0 and w1 just detach pg1 and keep on using pg0 as the primary. On the other hand, since wg2 looses, and it gives up promoting pg1, thus the split brain is avoided. Note that in the configuration above, clients access the cluster via VIP. VIP is always controlled by majority watchdog, clients will not access pg1 because it is set to down status by w0 and w1. And this concept is quite old. (It's also what Windows clustering uses.) -- Born in Arizona, moved to Babylonia.
Re: Patroni vs pgpool II
On Thu, Apr 6, 2023 at 9:17 PM Tatsuo Ishii wrote: > With quorum failover is enabled, w0, w1, and w2 communicate each other > to vote who is correct (if it cannot communicate, it regards other > watchdog is down). In the case above w0 and w1 are majority and will > win. Communication takes time – network latencies. What if during this communication, the situation becomes different? What if some of them cannot communicate with each other due to network issues? What if pg1 is currently primary, pg0 is standby, both are healthy, but due not network issues, both pg1 and w2 are not reachable to other nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0?
Re: Patroni vs pgpool II
> Communication takes time – network latencies. What if during this > communication, the situation becomes different? We have to accept it (and do the best to mitigate any consequence of the problem). I think there's no such a system which presuppose 0 communication latency. > What if some of them cannot communicate with each other due to network issues? Can you elaborate more? There are many scenarios for communication break down. I hesitate to discuss all of them on this forum since this is for discussions on PostgreSQL, not Pgpool-II. I am welcome you to join and continue the discussion on pgpool mailing list. > What if pg1 is currently primary, pg0 is standby, both are healthy, but > due not network issues, both pg1 and w2 are not reachable to other > nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0? pg1 will remain primary but it is set to "quarantine" state from pgpool's point of view, which means clients cannot access pg1 via pgpool. w0 and w1 will decide to promote pg0. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
Re: Patroni vs pgpool II
On Thu, Apr 6, 2023 at 11:13 PM Tatsuo Ishii wrote: > I am welcome you to > join and continue the discussion on pgpool mailing list. I truly believe that this problem – HA – is PostgreSQL's, not 3rd party's. And it's a shame that Postgres itself doesn't solve this. So we're discussing it here. > > What if pg1 is currently primary, pg0 is standby, both are healthy, but > > due not network issues, both pg1 and w2 are not reachable to other > > nodes? Will pg1 remain primary, and w0 and w1 decide to promote pg0? > > pg1 will remain primary but it is set to "quarantine" state from > pgpool's point of view, which means clients cannot access pg1 via > pgpool. So we have a split brain here – two primaries. Especially if some clients communicate with PG directly. And even if there are no such clients, archive_command is going to work on both nodes, monitoring will show two primaries confusing humans (e.g, SREs) and various systems, if we have many standby nodes, some of them might continue replicating from the old primary if they happen to be in the same network partition, and so on. I don't see how all these things can be solved with this approach.