Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
I've updated the patch due to recent changes by Daniel Gustafsson (549ec201d6132b7). -- Best regards, Maxim Orlov. v10-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch Description: Binary data
Re: Problem with moderation of messages with patched attached.
On Thu, Mar 3, 2022 at 3:31 PM Pavel Borisov wrote: > Hi, hackers! > > Around 2 months ago I've noticed a problem that messages containing > patches in the thread [1] were always processed with manual moderation. > They appear in hackers' thread hours after posting None of them are from > new CF members and personally, I don't see a reason for such inconvenience. > The problem still exists as of today. > > Can someone make changes in a moderation engine to make it more liberal > and convenient for authors? > > [1] > https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com > > -- > Best regards, > Pavel Borisov > > Postgres Professional: http://postgrespro.com <http://www.postgrespro.com> > Confirm -- Best regards, Maxim Orlov.
Re: [PATCH] Automatic HASH and LIST partition creation
On 2020-12-18 21:54, Pavel Borisov wrote: I've realized one strange effect in current grammar parsing: if I do CREATE TABLE foo (a int) PARTITION BY LIST (a) CONFIGURATION (a 1); ERROR: unrecognized auto partition bound specification "a" I consulted the patch code and realized that in fact, the patch considers it the (invalid) HASH bounds (doesn't find a word 'modulus') unless it is specified to be (still invalid) LIST. This is due to the fact that the grammar parser is not context-aware and in the patch, we tried to avoid the new parser keyword MODULUS. The effect is that inside a CONFIGURATION parentheses in case of HASH bounds we don't have a single keyword for the parser to determine it is really a HASH case. It doesn't make the patch work wrongly, besides it checks the validity of all types of bounds in the HASH case even when the partitioning is not HASH. I find this slightly bogus. This is because the parser can not determine the type of partitioning inside the configuration clause and this makes adding new syntax (e.g. adding RANGE partitioning configuration inside CONFIGURATION parentheses) complicated. So I have one more syntax proposal: to have separate keywords inside CONFIGURATION parentheses for each partitioning type. E.g: CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN (1,2),(3,4) DEFAULT PARTITION foo_def); CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES WITH MODULUS 3); CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def); This proposal is in accordance with the current syntax of declarative partitioning: CREATE TABLE foo_1 PARTITION OF foo FOR VALUES ... Some more facultative proposals incremental to the abovementioned: 1. Omit CONFIGURATION with/without parentheses. This makes syntax closer to (non-automatic) declarative partitioning syntax but the clause seems less legible (in my opinion). 2. Omit just FOR VALUES. This makes the clause short, but adds a difference to (non-automatic) declarative partitioning syntax. I'm planning also to add RANGE partitioning syntax to this in the future and I will be happy if all three types of the syntax could come along easily. I very much appreciate your views on this especially regarding that changes can be still made easily because the patch is not committed yet. -- Best regards, Pavel Borisov Postgres Professional: http://postgrespro.com [1] Links: -- [1] http://www.postgrespro.com In my view, next expressions are the golden ground here. On one hand, not far from the original non-automatic declarative partitioning syntax syntax, on the other hand, omit CONFIGURATION key-word (which is redundant here in terms of gram parsing) makes this expressions less apprehensible for the human. CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES IN (1,2),(3,4) DEFAULT PARTITION foo_def); CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES WITH MODULUS 3); CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def); In addition to that, adding RANGE PARTITION would be much simpler since we would have specific "branches" in gram instead of using context-sensitive grammar and dealing with it in c-code. --- Best regards, Maxim Orlov.
SSL negotiation error on massive connect/disconnect
Hi! I have primary server on port 55942 and two standbys on 55943 and 55944. Then use connection string like "postgresql://127.0.0.1:55942,127.0.0.1:55943,127.0.0.1:55944/postgres" to connect to the servers via psql. Everything works perfectly no matter how many attempts to connect I do. But when I stop primary server, very rarely I do get an error "received invalid response to SSL negotiation" from psql. I got this error when I tried to make massive connects/disconnects test and it's unlikely to reproduce manually without thousands of connections sequentially with no intentional delay in between. The problem present only on Linux, MacOS works fine. As far as I understand this particular problem is because of postgresql gets "zero" (i.e. 0) byte in SSLok in PQconnectPoll@src/interfaces/libpq/fe-connect.c. This lead to select "else" branch with described error message. This may be fixed by handling zero byte as 'E' byte. But I'm not sure if it's good solution, since I don't know why fe-connect gets an zero byte at all. I consider it's worth to correct this. This error is rare but it's really odd to notice this unexpected and wrong behavior. --- Best regards, Maxim Orlov.From 860f14c99ec70fcea58897c97e218b8a54286a39 Mon Sep 17 00:00:00 2001 From: Maxim Orlov Date: Mon, 1 Mar 2021 16:47:41 +0300 Subject: [PATCH] WIP --- src/interfaces/libpq/Makefile | 3 + src/interfaces/libpq/fe-connect.c | 2 +- src/interfaces/libpq/t/004-multihost.pl | 143 3 files changed, 147 insertions(+), 1 deletion(-) create mode 100644 src/interfaces/libpq/t/004-multihost.pl diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index f74677eaf9b..98fd1d694ef 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -120,6 +120,9 @@ install: all installdirs install-lib installcheck: $(MAKE) -C test $@ +check: + $(prove_check) + installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)' diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index db71fea169c..43c14a4433a 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -3023,7 +3023,7 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_MADE; return PGRES_POLLING_WRITING; } - else if (SSLok == 'E') + else if (SSLok == 'E' || SSLok == '\0') { /* * Server failure of some sort, such as failure to diff --git a/src/interfaces/libpq/t/004-multihost.pl b/src/interfaces/libpq/t/004-multihost.pl new file mode 100644 index 000..116f4a82635 --- /dev/null +++ b/src/interfaces/libpq/t/004-multihost.pl @@ -0,0 +1,143 @@ +# target_server_type +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 1; +use Scalar::Util qw(blessed); + +# Initialize master node +my $node_master = get_new_node('master'); +$node_master->init(allows_streaming => 1); +$node_master->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_master->start; + +# Take backup +my $backup_name = 'my_backup'; +$node_master->backup($backup_name); + +# Create streaming standby 1 linking to master +my $node_standby_1 = get_new_node('standby_1'); +$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby_1->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_standby_1->start; + +# Create streaming standby 2 linking to master +my $node_standby_2 = get_new_node('standby_2'); +$node_standby_2->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby_2->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_standby_2->start; + +sub get_host_port +{ + my $node = shift; + return "$PostgresNode::test_localhost:" . $node->port; +} + +sub multiconnstring +{ + my $nodes= shift; + my $database = shift || "postgres"; + my $params = shift; + my $extra= ""; + if ($params) + { + my @cs; + while (my ($key, $val) = each %$params) + { + push @cs, $key . "=" . $val; + } + $extra = "?" . join("&", @cs); + } + my $str = + "postgresql://" + . join(",", map({ get_host_port($_) } @$nodes)) + . "/$database$extra"; + + return $str; +} + +# +# Copied from PosgresNode.pm passing explicit connect-string instead of +# constructed from object +# + +sub psql2 +{ + # We expect dbname t
Re: Pre-allocating WAL files
I did check the patch too and found it to be ok. Check and check-world are passed. Overall idea seems to be good in my opinion, but I'm not sure where is the optimal place to put the pre-allocation. On Thu, Dec 30, 2021 at 2:46 PM Pavel Borisov wrote: > > pre-allocating during checkpoints. I've done a few pgbench runs, and >> > it seems to work pretty well. Initialization is around 15% faster, >> > and I'm seeing about a 5% increase in TPS with a simple-update >> > workload with wal_recycle turned off. Of course, these improvements >> > go away once segments can be recycled. >> > > I've checked the patch v7. It applies cleanly, code is good, check-world > tests passed without problems. > I think it's ok to use checkpointer for this and the overall patch can be > committed. But the seen performance gain makes me think again before adding > this feature. I did tests myself a couple of months ago and got similar > results. > Really don't know whether is it worth the effort. > > Wish you and all hackers happy New Year! > -- > Best regards, > Pavel Borisov > > Postgres Professional: http://postgrespro.com <http://www.postgrespro.com> > -- --- Best regards, Maxim Orlov.
Re: Add 64-bit XIDs into PostgreSQL 15
> >Perhaps we can merge some of the code cleanup that it contained, such > as using XID_FMT everywhere and creating a type for the kind of page > returned by TransactionIdToPage() to make the code cleaner. > Agree, I think this is a good idea. > Is your patch functionally the same as the PostgresPro > implementation? > Yes, it is. It basically is PostgresPro implementation, not a concept or smth. -- Best regards, Maxim Orlov.
Re: Add 64-bit XIDs into PostgreSQL 15
> Is there any documentation or README explaining this whole 64-bit XID > mechanism? > There is none, unfortunately. I would come back to this later. > Could you tell me what happens if new tuple with XID larger than xid_base > + 0x is inserted into the page? Such new tuple is not allowed to be > inserted into that page? Or xid_base and xids of all existing tuples in the > page are increased? Also what happens if one of those xids (of existing > tuples) cannot be changed because the tuple still can be seen by > very-long-running transaction? > All this mechanism is around heap_insert/heap_update by calling heap_page_prepare_for_xid() and if it fails (due to tuple still visible) error is raised. Also If xid_base shift is not viable, it will try to remove old tuples. -- Best regards, Maxim Orlov.
Re: UNIQUE null treatment option
I find this patch useful. It includes changes in documentation and tests. Code itself looks reasonable to me. Since, unique constraint check is done by corresponding btree index, it makes this feature implementation elegant and lightweight. In my view, it is sufficient that heap relation can have different nulls treatment in unique constraints for different unique columns. For example: CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT DISTINCT); All the tests are running ok on Linux and MacOS X. Although, patch doesn't apply with default git apply options. Only with the "three way merge" option (-3). Consider rebasing it, please. Then, in my view, it can be "Ready for committer". -- Best regards, Maxim Orlov.
Re: O(n) tasks cause lengthy startups and checkpoints
The code seems to be in good condition. All the tests are running ok with no errors. I like the whole idea of shifting additional checkpointer jobs as much as possible to another worker. In my view, it is more appropriate to call this worker "bg cleaner" or "bg file cleaner" or smth. It could be useful for systems with high load, which may deal with deleting many files at once, but I'm not sure about "small" installations. Extra bg worker need more resources to do occasional deletion of small amounts of files. I really do not know how to do it better, maybe to have two different code paths switched by GUC? Should we also think about adding WAL preallocation into custodian worker from the patch "Pre-alocationg WAL files" [1] ? [1] https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbn...@alap3.anarazel.de -- Best regards, Maxim Orlov.
Failed assertion on standby while shutdown
Hi, haсkers! Recently, I was doing some experiments with primary/standby instances interaction. In certain conditions I’ve got and was able to reproduce crash on failed assertion. The scenario is the following: 1. start primary server 2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432 -D data 3. apply some load to the primary server by pgbench -p5432 -i -s 150 postgres 4. kill primary server (with kill -9) and keep it down 5. stop standby server by pg_ctl 6. run standby server Then any standby server termination will result in a failed assertion. The log with a backtrace is following: 2021-03-19 18:54:25.352 MSK [3508443] LOG: received fast shutdown request 2021-03-19 18:54:25.379 MSK [3508443] LOG: aborting any active transactions TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", Line: 592, PID: 3508452) postgres: walreceiver (ExceptionalCondition+0xd0)[0x55d0526f] postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x55b43e31] postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x5574ae32] postgres: walreceiver (+0x530bff)[0x55a84bff] postgres: walreceiver (+0x531044)[0x55a85044] postgres: walreceiver (+0x530959)[0x55a84959] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x77a303c0] /lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x772a40da] postgres: walreceiver (+0x52bea4)[0x55a7fea4] postgres: walreceiver (PostmasterMain+0x129f)[0x55a7f7c1] postgres: walreceiver (+0x41ff1f)[0x55973f1f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x771b30b3] postgres: walreceiver (_start+0x2e)[0x5561abfe] After a brief investigation I found out that I can get this assert with 100% probability if I insert a sleep for about 5 sec into InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c: diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 897045ee272..b5f365f426d 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -525,7 +525,7 @@ InitAuxiliaryProcess(void) if (MyProc != NULL) elog(ERROR, "you already exist"); - + pg_usleep(500L); /* * We use the ProcStructLock to protect assignment and releasing of * AuxiliaryProcs entries. Maybe, this kinda behaviour would appear if a computer hosting instances is under significant side load, which cause delay to start db-instances under a heavy load. Configuration for a primary server is default with "wal_level = logical" Configuration for a standby server is default with "wal_level = logical" and "primary_conninfo = 'port=5432'" I'm puzzled with this behavor. I'm pretty sure it is not what should be. Any ideas how this can be fixed? --- Best regards, Maxim Orlov.
Re: Failed assertion on standby while shutdown
On 2021-03-22 16:40, Fujii Masao wrote: On 2021/03/20 2:25, Maxim Orlov wrote: Hi, haсkers! Recently, I was doing some experiments with primary/standby instances interaction. In certain conditions I’ve got and was able to reproduce crash on failed assertion. The scenario is the following: 1. start primary server 2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432 -D data 3. apply some load to the primary server by pgbench -p5432 -i -s 150 postgres 4. kill primary server (with kill -9) and keep it down 5. stop standby server by pg_ctl 6. run standby server Then any standby server termination will result in a failed assertion. The log with a backtrace is following: 2021-03-19 18:54:25.352 MSK [3508443] LOG: received fast shutdown request 2021-03-19 18:54:25.379 MSK [3508443] LOG: aborting any active transactions TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", Line: 592, PID: 3508452) postgres: walreceiver (ExceptionalCondition+0xd0)[0x55d0526f] postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x55b43e31] postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x5574ae32] postgres: walreceiver (+0x530bff)[0x55a84bff] postgres: walreceiver (+0x531044)[0x55a85044] postgres: walreceiver (+0x530959)[0x55a84959] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x77a303c0] /lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x772a40da] postgres: walreceiver (+0x52bea4)[0x55a7fea4] postgres: walreceiver (PostmasterMain+0x129f)[0x55a7f7c1] postgres: walreceiver (+0x41ff1f)[0x55973f1f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x771b30b3] postgres: walreceiver (_start+0x2e)[0x5561abfe] After a brief investigation I found out that I can get this assert with 100% probability if I insert a sleep for about 5 sec into InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c: diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 897045ee272..b5f365f426d 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -525,7 +525,7 @@ InitAuxiliaryProcess(void) if (MyProc != NULL) elog(ERROR, "you already exist"); - + pg_usleep(500L); /* * We use the ProcStructLock to protect assignment and releasing of * AuxiliaryProcs entries. Thanks for the report! I could reproduce this issue by adding that sleep into InitAuxiliaryProcess(). Maybe, this kinda behaviour would appear if a computer hosting instances is under significant side load, which cause delay to start db-instances under a heavy load. Configuration for a primary server is default with "wal_level = logical" Configuration for a standby server is default with "wal_level = logical" and "primary_conninfo = 'port=5432'" I'm puzzled with this behavor. I'm pretty sure it is not what should be. Any ideas how this can be fixed? ISTM that the cause of this issue is that the startup process exits without releasing the locks that it was holding when shutdown is requested. To address this issue, IMO the startup process should call ShutdownRecoveryTransactionEnvironment() at its exit. Attached is the POC patch that changes the startup process that way. I've not tested the patch enough yet.. Regards, Thank you for reply! As far as I understand, this is really the case. I've test your patch a bit. This annoying failed assertion is gone now. I think I should test more and report later about results. Should we put this patch to CF? --- Best regards, Maxim Orlov.
Re: Failed assertion on standby while shutdown
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested All the tests passed successfully. The new status of this patch is: Ready for Committer
Re: Failed assertion on standby while shutdown
On 2021-03-30 20:44, Maxim Orlov wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested All the tests passed successfully. The new status of this patch is: Ready for Committer The patch is good. One note, should we put a comment about ShutdownRecoveryTransactionEnvironment not reentrant behaviour? Or maybe rename it to ShutdownRecoveryTransactionEnvironmentOnce? --- Best regards, Maxim Orlov.
Re: Failed assertion on standby while shutdown
On 2021-04-01 15:02, Fujii Masao wrote: On 2021/03/31 19:51, Maxim Orlov wrote: On 2021-03-30 20:44, Maxim Orlov wrote: The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested All the tests passed successfully. The new status of this patch is: Ready for Committer The patch is good. One note, should we put a comment about ShutdownRecoveryTransactionEnvironment not reentrant behaviour? Or maybe rename it to ShutdownRecoveryTransactionEnvironmentOnce? +1 to add more comments into ShutdownRecoveryTransactionEnvironment(). I did that. What about the attached patch? Regards, Well done! In my view is just what it's needed. --- Best regards, Maxim Orlov.
Re: SSL negotiation error on massive connect/disconnect
On 2021-03-01 17:22, Maxim Orlov wrote: Hi! I have primary server on port 55942 and two standbys on 55943 and 55944. Then use connection string like "postgresql://127.0.0.1:55942,127.0.0.1:55943,127.0.0.1:55944/postgres" to connect to the servers via psql. Everything works perfectly no matter how many attempts to connect I do. But when I stop primary server, very rarely I do get an error "received invalid response to SSL negotiation" from psql. I got this error when I tried to make massive connects/disconnects test and it's unlikely to reproduce manually without thousands of connections sequentially with no intentional delay in between. The problem present only on Linux, MacOS works fine. As far as I understand this particular problem is because of postgresql gets "zero" (i.e. 0) byte in SSLok in PQconnectPoll@src/interfaces/libpq/fe-connect.c. This lead to select "else" branch with described error message. This may be fixed by handling zero byte as 'E' byte. But I'm not sure if it's good solution, since I don't know why fe-connect gets an zero byte at all. I consider it's worth to correct this. This error is rare but it's really odd to notice this unexpected and wrong behavior. --- Best regards, Maxim Orlov. Correction. Previous patch was wrong. The proper one is here. --- Best regards, Maxim Orlov.From fcc2392b1942d58e1de6749b65589757fb35322f Mon Sep 17 00:00:00 2001 From: Maxim Orlov Date: Thu, 8 Apr 2021 18:12:46 +0300 Subject: [PATCH] WIP-2 --- src/interfaces/libpq/Makefile | 3 + src/interfaces/libpq/fe-connect.c | 5 + src/interfaces/libpq/t/004-multihost.pl | 143 3 files changed, 151 insertions(+) create mode 100644 src/interfaces/libpq/t/004-multihost.pl diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile index 0c4e55b6ad3..aa9328f98b0 100644 --- a/src/interfaces/libpq/Makefile +++ b/src/interfaces/libpq/Makefile @@ -120,6 +120,9 @@ install: all installdirs install-lib installcheck: $(MAKE) -C test $@ +check: + $(prove_check) + installdirs: installdirs-lib $(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)' diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index aa654dd6a8e..8aa017d858e 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2319,6 +2319,7 @@ keep_going: /* We will come back to here until there is conn->try_next_addr = false; } +keep_going2: /* Time to advance to next connhost[] entry? */ if (conn->try_next_host) { @@ -3081,6 +3082,10 @@ keep_going: /* We will come back to here until there is conn->status = CONNECTION_AWAITING_RESPONSE; goto keep_going; } + else if (SSLok == '\0') + { + goto keep_going2; + } else { appendPQExpBuffer(&conn->errorMessage, diff --git a/src/interfaces/libpq/t/004-multihost.pl b/src/interfaces/libpq/t/004-multihost.pl new file mode 100644 index 000..116f4a82635 --- /dev/null +++ b/src/interfaces/libpq/t/004-multihost.pl @@ -0,0 +1,143 @@ +# target_server_type +use strict; +use warnings; +use PostgresNode; +use TestLib; +use Test::More tests => 1; +use Scalar::Util qw(blessed); + +# Initialize master node +my $node_master = get_new_node('master'); +$node_master->init(allows_streaming => 1); +$node_master->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_master->start; + +# Take backup +my $backup_name = 'my_backup'; +$node_master->backup($backup_name); + +# Create streaming standby 1 linking to master +my $node_standby_1 = get_new_node('standby_1'); +$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby_1->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_standby_1->start; + +# Create streaming standby 2 linking to master +my $node_standby_2 = get_new_node('standby_2'); +$node_standby_2->init_from_backup($node_master, $backup_name, has_streaming => 1); +$node_standby_2->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'"); +$node_standby_2->start; + +sub get_host_port +{ + my $node = shift; + return "$PostgresNode::test_localhost:" . $node->port; +} + +sub multiconnstring +{ + my $nodes= shift; + my $database = shift || "postgres"; + my $params = shift; + my $extra= ""; + if ($params) + { + my @cs; + while (my ($key, $val) = each %$params) + { + push @cs, $key . "=" . $val; + } + $extra = "?" . join("&", @cs); +
[PATCH] lazy relations delete
Hi! Here is the case. Assume we have a master to slave replication with shared_buffers set up to 2 GB at the master and 4 GB at the slave. All of the data is written to the master, while reading occurs from slave. Now we decided to drop many tables, let's say 1000 or 1 not in a single transaction, but each table in a separate one. So, due to "plain" shared_buffers memory we have to do for loop for every relation which leads to lag between master and slave. In real case scenario such issue lead to not a minutes lag, but hours lag. At the same time PostgreSQL have a great routine to delete many relations in a single transaction. So, to get rid of this kind of issue here came up an idea: what if not to delete everyone of relations right away and just store them in an array, prevent shared buffers (correspond to a deleted relations) from been flushed. And then array reaches it max size we need to walk all buffers only once to "free" shared buffers correspond to a deleted relations. Here some values from the test which I am made. Without patch: 1. (master 2 GB) - drop 1000 tables took 6 sec (slave 4 GB) - drop 1000 tables took 8 sec 2. (master 4 GB) - drop 1000 tables took 10 sec (slave 8 GB) - drop 1000 tables took 16 sec 3. (master 10 GB) - drop 1000 tables took 22 sec (slave 20 GB) - drop 1000 tables took 38 sec With patch: 1. (master 2 GB) - drop 1000 tables took 2 sec (slave 4 GB) - drop 1000 tables took 2 sec 2. (master 4 GB) - drop 1000 tables took 3 sec (slave 8 GB) - drop 1000 tables took 3 sec 3. (master 10 GB) - drop 1000 tables took 4 sec (slave 20 GB) - drop 1000 tables took 4 sec -- Max Orlov E-mail: m.or...@postgrespro.ru init.sh Description: application/shellscript run-001.sh Description: application/shellscript run-001-core.sh Description: application/shellscript diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 1f10a97dc78..e0d1fb6a824 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -66,8 +66,6 @@ #define BUF_WRITTEN0x01 #define BUF_REUSABLE 0x02 -#define DROP_RELS_BSEARCH_THRESHOLD 20 - typedef struct PrivateRefCountEntry { Buffer buffer; @@ -410,6 +408,72 @@ ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref) } } +/* + * Lazy relations delete mechanism. + * + * On relation drop no need to walk shared buffers every time, just put a deleted + * RelFileNode into an array. Thus we store RelFileNodes in RelNodeDeleted + * struct and delete them later when number of deleted relations will + * exceed LAZY_DELETE_ARRAY_SIZE. + */ + +#define LAZY_DELETE_ARRAY_SIZE 128 + +/* Wrapper for array of lazy deleted RelFileNodes. */ +typedef struct RelNodeDeletedBuffer +{ + RelFileNode rnodes[LAZY_DELETE_ARRAY_SIZE]; /* Array of deleted */ + int idx; /* Current position */ +} RelNodeDeletedBuffer; + +static RelNodeDeletedBuffer *RelNodeDeleted = NULL; + +/* + * Initialize shared array of lazy deleted relations. + * + * This is called once during shared-memory initialization (either in the + * postmaster, or in a standalone backend). + */ +void InitRelNodeDeletedBuffer(void) +{ + bool found; + + RelNodeDeleted = ShmemInitStruct("Lazy Deleted Relations", +sizeof(RelNodeDeletedBuffer), +&found); + + if (!found) + MemSet(RelNodeDeleted, 0, sizeof(RelNodeDeletedBuffer)); +} + +/* + * Compute the size of shared memory for the buffer of lazy deleted relations. + */ +Size +RelNodeDeletedBufferSize(void) +{ + Size size = 0; + + size = add_size(size, sizeof(RelNodeDeletedBuffer)); + + return size; +} + +/* + * Check for relation is in lazy deleted buffer (up to n-th position). + */ +static inline bool +IsRelFileNodeDeleted(RelFileNode rnode, int n) +{ + int i; + + for (i = 0; i < n; i++) + if (RelFileNodeEquals(rnode, RelNodeDeleted->rnodes[i])) + return true; + + return false; +} + /* * BufferIsPinned * True iff the buffer is pinned (also checks for valid buffer number). @@ -452,6 +516,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr, BlockNumber blockNum, BufferAccessStrategy strategy, bool *foundPtr); +static void InvalidateRelFileNodesBuffers(RelFileNode *nodes, int nnodes); static void FlushBuffer(BufferDesc *buf, SMgrRelation reln); static void AtProcExit_Buffers(int code, Datum arg); static void CheckForBufferLeaks(void); @@ -2690,6 +2755,22 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln) char *bufToWrite; uint32 buf_state; + LWLockAcquire(LazyRelDeleteLock, LW_SHARED); + + /* + * If rnode is in lazy deleted buffer clear dirty and checkpoint flags. + */ + if (IsRelFileNodeDeleted(buf->tag.rnode, RelNodeDeleted->idx)) + { + buf_state = LockBufHdr(buf); + buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED); + UnlockBufHdr(buf, buf_state); + LWLockRelease(LazyRelDeleteLock); + return; + } + + LWLockRelease(LazyRelDeleteLock); + /* * Acquire the buf
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Tue, 21 Feb 2023 at 16:59, Aleksander Alekseev wrote: > Hi, > > One thing that still bothers me is that during the upgrade we only > migrate the CLOG segments (i.e. pg_xact / pg_clog) and completely > ignore all the rest of SLRUs: > > * pg_commit_ts > * pg_multixact/offsets > * pg_multixact/members > * pg_subtrans > * pg_notify > * pg_serial Hi! We do ignore these values, since in order to pg_upgrade the server it must be properly stopped and no transactions can outlast this moment. -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
On Mon, 30 Jan 2023 at 23:23, Corey Huinker wrote: > > > I rebased, but there are no code differences. > The patch set seem to be in a good shape and pretty stable for quite a while. Could you add one more minor improvement, a new line after variables declaration? + int exit_code = wait_result_to_exit_code(result); + charbuf[32]; ...here + snprintf(buf, sizeof(buf), "%d", exit_code); + SetVariable(pset.vars, "SHELL_EXIT_CODE", buf); + SetVariable(pset.vars, "SHELL_ERROR", "true"); + charexit_code_buf[32]; ... and here + snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", +wait_result_to_exit_code(exit_code)); + SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf); + SetVariable(pset.vars, "SHELL_ERROR", "true"); After this changes, I think, we make this patch RfC, shall we? -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Mon, 27 Feb 2023 at 12:10, Heikki Linnakangas wrote: > (CC list trimmed, gmail wouldn't let me send otherwise) > > That sounds right for pg_serial, pg_notify, and pg_subtrans. But not for > pg_commit_ts and the pg_multixacts. > > This needs tests for pg_upgrading those SLRUs, after 0, 1 and N > wraparounds. > > Yep, that's my fault. I've forgotten about pg_multixacts. But for the pg_commit_ts it's a bit complicated story. The thing is, if we do upgrade, old files from pg_commit_ts not copied into a new server. For example, I've checked one more time on the current master branch: 1). initdb 2). add "track_commit_timestamp = on" into postgresql.conf 3). pgbench 4). $ ls pg_commit_ts/ 0005 000A 000F 0014 0019 001E 0023... ...009A 009F 00A4 00A9 00AE 00B3 00B8 00BD 00C2 5). do pg_upgrade 6). $ ls pg_commit_ts/ 00C2 Either I do not understand something, or the files from pg_commit_ts directory are not copied. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Tue, 17 Jan 2023 at 16:33, Aleksander Alekseev wrote: > Hi hackers, > > Maxim, perhaps you could share with us what your reasoning was here? > > I'm really sorry for late response, but better late than never. Yes, we can not access shared memory without lock. In this particular case, we use XidGenLock. That is why we use lock argument to take it is it was not taken previously. Actually, we may place assertion in this insist. As for xid compare: we do not compare xids here, we are checking for wraparound, so, AFAICS, this code is correct. On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas wrote: > > 1. Use 64 bit page numbers in SLRUs (this patch) > > 2. Use the larger segment file names in async.c, to lift the current 8 > GB limit on the max number of pending notifications. > > 3. Extend pg_multixact so that pg_multixact/members is addressed by > 64-bit offsets. > > 4. Extend pg_subtrans to 64-bits. > > 5. Extend pg_xact to 64-bits. > > > 6. (a bonus thing that I noticed while thinking of pg_xact.) Extend > pg_twophase.c, to use FullTransactionIds. > > Currently, the twophase state files in pg_twophase are named according > to the 32 bit Xid of the transaction. Let's switch to FullTransactionId > there. > > ... > > I propose that we try to finish 1 and 2 for v16. And maybe 6. I think > that's doable. It doesn't have any great user-visible benefits yet, but > we need to start somewhere. > > - Heikki > > Yes, this is a great idea! My only concern here is that we're going in circles here. You see, patch 1 is what was proposed in the beginning of this thread. Anyway, I will be happy if we are being able to push this topic forward. As for making pg_multixact 64 bit, I spend the last couple of days to make proper pg_upgrade for pg_multixact's and for pg_xact's with wraparound and I've understood, that it is not a simple task compare to pg_xact's. The problem is, we do not have epoch for multixacts, so we do not have ability to "overcome" wraparound. The solution may be adding some kind of epoch for multixacts or make them 64 bit in "main" 64-xid patch, but in perspective of this thread, in my view, this should be last in line here. In pg_xact we do not have such a problem, we do have epoch for transacions, so conversion should be pretty obvious: -> 0001 -> 0001 ... 0FFE -> 0FFE 0FFF -> 0FFF -> 0001 0001 -> 00010001 So, in my view, the plan should be: 1. Use internal 64 bit page numbers in SLRUs without changing segments naming. 2. Use the larger segment file names in async.c, to lift the current 8 GB limit on the max number of pending notifications. 3. Extend pg_xact to 64-bits. 4. Extend pg_subtrans to 64-bits. 5. Extend pg_multixact so that pg_multixact/members is addressed by 64-bit offsets. 6. Extend pg_twophase.c, to use FullTransactionIds. (a bonus thing) Thoughts? -- Best regards, Maxim Orlov. On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas wrote: > On 01/03/2023 12:21, Aleksander Alekseev wrote: > > Hi, > > > >> I'm surprised that these patches extend the page numbering to 64 bits, > >> but never actually uses the high bits. The XID "epoch" is not used, and > >> pg_xact still wraps around and the segment names are still reused. I > >> thought we could stop doing that. > > > > To clarify, the idea is to let CLOG grow indefinitely and simply store > > FullTransactionId -> TransactionStatus (two bits). Correct? > > Correct. > > > I didn't investigate this in much detail but it may affect quite some > > amount of code since TransactionIdDidCommit() and > > TransactionIdDidCommit() currently both deal with TransactionId, not > > FullTransactionId. IMO, this would be a nice change however, assuming > > we are ready for it. > > Yep, it's a lot of code churn.. > > > In the previous version of the patch there was an attempt to derive > > FullTransactionId from TransactionId but it was wrong for the reasons > > named above in the thread. Thus is was removed and the patch > > simplified. > > Yeah, it's tricky to get it right. Clearly we need to do it at some > point though. > > All in all, this is a big effort. I spent some more time reviewing this > in the last few days, and thought a lot about what the path forward here > could be. And I haven't looked at the actual 64-bit XIDs patch set yet, > just this patch to use 64-bit addressing in SLRUs. > > This patch is the first step, but we have a bit of a chicken and egg > problem, because this patch on its own isn't very interesting, but on > the other hand, we need it to work on the foll
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Tue, 7 Mar 2023 at 15:38, Heikki Linnakangas wrote: > > That is true for pg_multixact/offsets. We will indeed need to add an > epoch and introduce the concept of FullMultiXactIds for that. However, > we can change pg_multixact/members independently of that. We can extend > MultiXactOffset from 32 to 64 bits, and eliminate pg_multixact/members > wraparound, while keeping multi-xids 32 bits wide. > > Yes, you're totally correct. If it will be committable that way, I'm all for that. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi! As suggested before by Heikki Linnakangas, I've added a patch for making 2PC transaction state 64-bit. At first, my intention was to rebuild all twophase interface to use FullTransactionId. But doing this in a proper manner would lead to switching from TransactionId to FullTransactionId in PGPROC and patch become too big to handle here. So I decided to keep it simple for now and use wrap logic trick and calc FullTransactionId on current epoch, since the span of active xids cannot exceed one epoch at any given time. Patches 1 and 2 are the same as above. -- Best regards, Maxim Orlov. v57-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data v57-0003-Make-use-FullTransactionId-in-2PC-filenames.patch Description: Binary data v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data
Re: Add SHELL_EXIT_CODE to psql
I did a look at the patch, and it seems to be in a good condition. It implements declared functionality with no visible defects. As for test, I think it is possible to implement "signals" case in tap tests. But let the actual commiter decide does it worth it or not. -- Best regards, Maxim Orlov.
Re: [PATCH] Add initial xid/mxid/mxoff to initdb
On Mon, 20 Mar 2023 at 22:31, Gregory Stark (as CFM) wrote: > > So is that other thread tracked in a different commitfest entry and > this one completely redundant? I'll mark it Rejected then? > Yep, it appears so. -- Best regards, Maxim Orlov.
Refactoring of pg_resetwal/t/001_basic.pl
Hi! In commit 7b5275eec more tests and test coverage were added into pg_resetwal/t/001_basic.pl. All the added stuff are pretty useful in my view. Unfortunately, there were some magic constants been used. In overall, this is not a problem. But while working on 64 bit XIDs I've noticed these changes and spent some time to figure it out what this magic values are stands fore. And it turns out that I’m not the only one. So, by Svetlana Derevyanko's suggestion, I made this patch. I add constants, just like we did in verify_heapam tests. Sidenote here: in defines in multixact.c TransactionId type used, but I'm sure this is not correct, since we're dealing here with MultiXactId and MultiXactOffset. For now, this is obviously not a problem, since sizes of this particular types are equal. But this will manifest itself when we switch to the 64 bits types for MultiXactOffset or MultiXactId. As always, reviews and opinions are very welcome! -- Best regards, Maxim Orlov. v1-0001-Refactor-pg_resetwal-t-001_basic.pl.patch Description: Binary data
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
I've noticed this patch and had a quick look at it. As far as I understand, this bug does not lead to an incorrect matching, resulting only in degradation in speed. Anyway, consider this patch useful, hope it will be committed soon. -- Best regards, Maxim Orlov.
Re: Refactoring of pg_resetwal/t/001_basic.pl
On Fri, 22 Mar 2024 at 01:08, Peter Eisentraut wrote: > Please send a separate patch for this if you want to propose any changes. > > Thank you for your reply. Here is the second one. I've change types and argument names for the macro functions, so that they better reflect the reality. -- Best regards, Maxim Orlov. v1-0002-Use-proper-types-in-defines-in-multixact.c.patch Description: Binary data
Re: POC: GROUP BY optimization
Hi! Another issue on test introduced in 0452b461bc405. I think it may be unstable in some circumstances. For example, if we'll try to use different BLCKSZ. See, I've made a little change in the number of tuples to be inserted: $ git diff diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql index d6ed5d0eff..414078d4ec 100644 --- a/src/test/regress/sql/aggregates.sql +++ b/src/test/regress/sql/aggregates.sql @@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT i % 100 AS y, 'abc' || i % 10 AS z, i AS w -FROM generate_series(1,1) AS i; +FROM generate_series(1,11900) AS i; CREATE INDEX btg_x_y_idx ON btg(x,y); ANALYZE btg; And the bulk extension is kicked, so we got zeroed pages in the relation. The plane is also changed, switched to seq scan from index scan: @@ -2734,7 +2734,7 @@ i % 100 AS y, 'abc' || i % 10 AS z, i AS w -FROM generate_series(1,1) AS i; +FROM generate_series(1,11900) AS i; CREATE INDEX btg_x_y_idx ON btg(x,y); ANALYZE btg; -- GROUP BY optimization by reorder columns by frequency @@ -2760,62 +2760,57 @@ -- Engage incremental sort explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w; - QUERY PLAN -- + QUERY PLAN +-- Group Group Key: x, y, z, w - -> Incremental Sort + -> Sort Sort Key: x, y, z, w - Presorted Key: x, y - -> Index Scan using btg_x_y_idx on btg -(6 rows) + -> Seq Scan on btg +(5 rows) ... and so on. So, my proposal is simple. I think we need not just "ANALYZE btg", but "VACUUM ANALYZE btg", to get rid of zeroed pages in this particular case. PFA corresponding patch. -- Best regards, Maxim Orlov. 0001-Force-VACUUM-to-avoid-zeroed-pages-from-bulk-extensi.patch Description: Binary data
Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
Quite an interesting patch, in my opinion. I've decided to work on it a bit, did some refactoring (sorry) and add basic tests. Also, I try to take into account as much as possible notes on the patch, mentioned by Cédric Villemain. > and maybe better to go with FlushOneBuffer() ? It's a good idea, but I'm not sure at the moment. I'll try to dig some deeper into it. At least, FlushOneBuffer does not work for a local buffers. So, we have to decide whatever pg_buffercache_invalidate should or should not work for local buffers. For now, I don't see why it should not. Maybe I miss something? -- Best regards, Maxim Orlov. v4-0001-Invalidate-Buffer-By-Bufnum.patch Description: Binary data
Re: Add Index-level REINDEX with multiple jobs
On Tue, 6 Feb 2024 at 09:22, Michael Paquier wrote: > > The problem may be actually trickier than that, no? Could there be > other factors to take into account for their classification, like > their sizes (typically, we'd want to process the biggest one first, I > guess)? Sorry for a late reply. Thanks for an explanation. This is sounds reasonable to me. Svetlana had addressed this in the patch v2. -- Best regards, Maxim Orlov.
Re: CI speed improvements for FreeBSD
Hi! I looked at the changes and I liked them. Here are my thoughts: 0001: 1. I think, this is a good idea to use RAM. Since, it's still a UFS, and we lose nothing in terms of testing, but win in speed significantly. 2. Change from "swapoff -a || true" to "swapoff -a" is legit in my view, since it's better to explicitly fail than silent any possible problem. 3. Man says that lowercase suffixes should be used for the mdconfig. But in fact, you can use either lowercase or an appercase. Yep, it's in the mdconfig.c: "else if (*p == 'g' || *p == 'G')". 0002: 1. The resource usage should be a bit higher, this is for sure. But, if I'm not missing something, not drastically. Anyway, I do not know how to measure this increase to get concrete values. 2. And think of a potential benefits of increasing the number of test jobs: more concurrent processes, more interactions, better test coverage. Here are my runs: FreeBSD @master https://cirrus-ci.com/task/4934701194936320 Run test_world 05:56 FreeBSD @master + 0001 https://cirrus-ci.com/task/5921385306914816 Run test_world 05:06 FreeBSD @master + 0001, + 0002 https://cirrus-ci.com/task/5635288945393664 Run test_world 02:20 For comparison Debian @master https://cirrus-ci.com/task/5143705577848832 Run test_world 02:38 In the overall, I consider this changes useful. CI run faster, with better test coverage in exchange for presumably slight increase in resource usage, but I don't think this increase should be significant. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi! As a result of discussion in the thread [0], Robert Haas proposed to focus on making SLRU 64 bit, as a first step towards 64 bit XIDs. Here is the patch set. In overall, code of this patch set is based on the existing code from [0] and may be simplified, due to the fact, that SLRU_PAGES_PER_SEGMENT is not meant to be changed now. But I decided to leave it that way. At least for now. As always, reviews and opinions are very welcome. Should we change status for this thread to "need review"? [0] https://www.postgresql.org/message-id/flat/CA%2BTgmoZFmTGjgkmjgkcm2-vQq3_TzcoMKmVimvQLx9oJLbye0Q%40mail.gmail.com#03a4ab82569bb7b112db4a2f352d96b9 -- Best regards, Maxim Orlov. v51-0003-Make-pg_upgrade-from-32-bit-to-64-bit-SLRU.patch Description: Binary data v51-0001-Use-internal-64-bit-numbering-of-SLRU-pages.patch Description: Binary data v51-0002-Use-64-bit-pages-representation-in-SLRU-callers.patch Description: Binary data
False positive warning in verify_heapam.c with GCC 03
Hi! While playing with some unrelated to the topic stuff, I've noticed a strange warning from verify_heapam.c:730:25: warning: ‘xmax_status’ may be used uninitialized in this function. This happens only when get_xid_status is inlined, and only in GCC with O3. I use a GCC version 11.3.0. For the purpose of investigation, I've created a PFA patch to force get_xid_status inline. $ CFLAGS="-O3" ./configure -q && make -s -j12 >/dev/null && make -s -j12 -C contrib verify_heapam.c: In function ‘check_tuple_visibility’: verify_heapam.c:730:25: warning: ‘xmax_status’ may be used uninitialized in this function [-Wmaybe-uninitialized] 730 | XidCommitStatus xmax_status; | ^~~ verify_heapam.c:909:25: warning: ‘xmin_status’ may be used uninitialized in this function [-Wmaybe-uninitialized] 909 | else if (xmin_status != XID_COMMITTED) | I believe, this warning is false positive, since mentioned row is unreachable. If xid is invalid, we return from get_xid_status XID_INVALID and could not pass 770 if (HeapTupleHeaderXminInvalid(tuphdr)) 771 return false;···/* inserter aborted, don't check */ So, I think this warning is false positive. On the other hand, we could simply init status variable in get_xid_status and make this code more errors/warnings safe. Thoughts? -- Best regards, Maxim Orlov. 0002-Init-status-var-in-get_xid_status.patch Description: Binary data 0001-Force-inline-of-get_xid_status.patch Description: Binary data
Re: Add 64-bit XIDs into PostgreSQL 15
Hi! I want to make a quick summary here. 1. An overall consensus has been reached: we shall focus on committing SLRU changes first. 2. I've created an appropriate patch set here [0]. 3. How [0] is waiting for a review. As always, all opinions will be welcome. 4. While discussing error/warning messages and some other stuff, this thread was marked as "Waiting on Author". 5. I do rebase this patch set once in a week, but do not post it here, since there is no need in it. See (1). 6. For now, I don't understand what changes I have to make here. So, does "Waiting on Author" is appropriate status here? Anyway. Let's discuss on-disk page format, shall we? AFAICS, we have a following options: 1. Making "true" 64–bit XIDs. I.e. making every tuple have 64–bit xmin and xmax fields. 2. Put special in every page where base for XIDs are stored. This is what we have done in the current patch set. 3. Put base for XIDs in a fork. 4. Make explicit 64–bit XIDs for concrete relations. I.e. CREATE TABLE foo WITH (xid8) of smth. There were opinions that the proposed solution (2) is not the optimal. It would be great to hear your concerns and thoughts. [0] https://www.postgresql.org/message-id/CACG%3Dezav34TL%2BfGXD5vJ48%3DQbQBL9BiwkOTWduu9yRqie-h%2BDg%40mail.gmail.com -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
Hi! The patch is implementing what is declared to do. Shell return code is now accessible is psql var. Overall code is in a good condition. Applies with no errors on master. Unfortunately, regression tests are failing on the macOS due to the different shell output. @@ -1308,13 +1308,13 @@ deallocate q; -- test SHELL_EXIT_CODE \! nosuchcommand -sh: line 1: nosuchcommand: command not found +sh: nosuchcommand: command not found \echo :SHELL_EXIT_CODE 127 \set nosuchvar `nosuchcommand` -sh: line 1: nosuchcommand: command not found +sh: nosuchcommand: command not found \! nosuchcommand -sh: line 1: nosuchcommand: command not found +sh: nosuchcommand: command not found \echo :SHELL_EXIT_CODE 127 Since we do not want to test shell output in these cases, but only return code, what about using this kind of commands? postgres=# \! true > /dev/null 2>&1 postgres=# \echo :SHELL_EXIT_CODE 0 postgres=# \! false > /dev/null 2>&1 postgres=# \echo :SHELL_EXIT_CODE 1 postgres=# \! nosuchcommand > /dev/null 2>&1 postgres=# \echo :SHELL_EXIT_CODE 127 It is better to use spaces around "=". + if (WIFEXITED(exit_code)) + exit_code=WEXITSTATUS(exit_code); + else if(WIFSIGNALED(exit_code)) + exit_code=WTERMSIG(exit_code); + else if(WIFSTOPPED(exit_code)) + exit_code=WSTOPSIG(exit_code); -- Best regards, Maxim Orlov.
Re: add \dpS to psql
> Here it is: https://commitfest.postgresql.org/41/4043/ > Hi! The patch applies with no problem, implements what it declared, CF bot is happy. Without patch \dpS shows 0 rows, after applying system objects are shown. Consider this patch useful, hope it will be committed soon. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
> > Do I get it right that the proposed v51 patchset only changes the SLRU > filenames and type of pageno representation? Is SLRU wraparound still > exactly there after 0x byte? > After applying the whole patch set, SLRU will become 64–bit without a wraparound. Thus, no wraparound should be there. 0001 - should make SLRU internally 64–bit, no effects from "outside" 0002 - should make SLRU callers 64–bit, SLRU segment files naming are changed 0003 - make upgrade from previous versions feasible > Also, I do not understand what is the reason for splitting 1st and 2nd > steps. Certainly, there must be some logic behind it, but I just can't > grasp it... > As we did discuss somewhere in the beginning of the discussion, we try to make every commit as independent as possible. Thus, it is much easier to review and commit. I see no problem to meld these commits into one, if consensus will be reached. > And the purpose of the 3rd step with pg_upgrade changes is a complete > mystery for me. Please excuse my incompetence in the topic, but maybe > some commit message or comments would help. What kind of xact segments > conversion we do? Why is it only necessary for xacts, but not other > SLRUs? > The purpose of the third patch is to make upgrade feasible. Since we've change pg_xact files naming, Postgres could not read status of "old" transactions from "old" pg_xact files. So, we have to convert those files. The major problem here is that we must handle possible segment wraparound (in "old" cluster). The whole idea for an upgrade is to read SLRU pages for pg_xact one by one and write it in a "new" filename. Maybe, It's just a little bit complicated, since the algorithm is intended to deal with different SLRU pages per segment in "new" and "old" clusters. But, on the other hand, it is already created in original patch set of 64–bit XIDs and will be useful in the future. AFAICS, arguably, any variant of 64–bit XIDs should lead to increase of an amount of SLRU pages per segment. And as for other SLRUs, they cannot survive pg_upgrade mostly by the fact, that cluster must be stopped upon upgrade. Thus, no conversion needed. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Fri, 6 Jan 2023 at 09:51, Japin Li wrote: > > For v51-0003. We can use GetClogDirName instead of GET_MAJOR_VERSION in > copy_subdir_files(). > Of course! Tanks! I'll address this in the next iteration, v52. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Hi! Here is a new patch set. I've added comments and make use GetClogDirName call in copy_subdir_files. -- Best regards, Maxim Orlov. v52-0003-Make-pg_upgrade-from-32-bit-to-64-bit-SLRU.patch Description: Binary data v52-0002-Use-64-bit-pages-representation-in-SLRU-callers.patch Description: Binary data v52-0001-Use-internal-64-bit-numbering-of-SLRU-pages.patch Description: Binary data
Re: Add SHELL_EXIT_CODE to psql
Hi! In overall, I think we move in the right direction. But we could make code better, should we? + /* Capture exit code for SHELL_EXIT_CODE */ + close_exit_code = pclose(fd); + if (close_exit_code == -1) + { + pg_log_error("%s: %m", cmd); + error = true; + } + if (WIFEXITED(close_exit_code)) + exit_code=WEXITSTATUS(close_exit_code); + else if(WIFSIGNALED(close_exit_code)) + exit_code=WTERMSIG(close_exit_code); + else if(WIFSTOPPED(close_exit_code)) + exit_code=WSTOPSIG(close_exit_code); + if (exit_code) + error = true; I think, it's better to add spaces around middle if block. It will be easy to read. Also, consider, adding spaces around assignment in this block. + /* + snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(exit_code)); + */ Probably, this is not needed. > 1. pg_regress now creates an environment variable called PG_OS_TARGET Maybe, we can use env "OS"? I do not know much about Windows, but I think this is kind of standard environment variable there. -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
On Mon, 9 Jan 2023 at 21:36, Corey Huinker wrote: > > I chose a name that would avoid collisions with anything a user might > potentially throw into their environment, so if the var "OS" is fairly > standard is a reason to avoid using it. Also, going with our own env var > allows us to stay in perfect synchronization with the build's #ifdef WIN32 > ... and whatever #ifdefs may come in the future for new OSes. If there is > already an environment variable that does that for us, I would rather use > that, but I haven't found it. > > Perhaps, I didn't make myself clear. Your solution is perfectly adapted to our needs. But all Windows since 2000 already have an environment variable OS=Windows_NT. So, if env OS is defined and equal Windows_NT, this had to be Windows. May we use it in our case? I don't insist, just asking. -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
Unfortunately, cirrus-ci still not happy https://cirrus-ci.com/task/6502730475241472 [23:14:34.206] time make -s -j${BUILD_JOBS} world-bin [23:14:43.174] psqlscanslash.l: In function ‘evaluate_backtick’: [23:14:43.174] psqlscanslash.l:827:11: error: implicit declaration of function ‘WIFSTOPPED’ [-Werror=implicit-function-declaration] [23:14:43.174] 827 | exit_code = WSTOPSIG(close_exit_code); [23:14:43.174] | ^~ [23:14:43.174] psqlscanslash.l:828:16: error: implicit declaration of function ‘WSTOPSIG’ [-Werror=implicit-function-declaration] [23:14:43.174] 828 | [23:14:43.174] | ^ [23:14:43.174] cc1: all warnings being treated as errors > and on FreeBSD [23:13:03.914] cc -o ... [23:13:03.914] ld: error: undefined symbol: WEXITSTATUS [23:13:03.914] >>> referenced by command.c:5036 (../src/bin/psql/command.c:5036) [23:13:03.914] >>> src/bin/psql/psql.p/command.c.o:(exec_command_shell_escape) [23:13:03.914] cc: error: linker command failed with exit code 1 (use -v to see invocation) and on Windows [23:19:51.870] meson-generated_.._psqlscanslash.c.obj : error LNK2019: unresolved external symbol WIFSTOPPED referenced in function evaluate_backtick [23:19:52.197] meson-generated_.._psqlscanslash.c.obj : error LNK2019: unresolved external symbol WSTOPSIG referenced in function evaluate_backtick [23:19:52.197] src\bin\psql\psql.exe : fatal error LNK1120: 2 unresolved externals I belive, we need proper includes. -- Best regards, Maxim Orlov.
Re: Assert in pageinspect with NULL pages
I've suddenly found that the test in this patch is based on a fact that heap pages don't have PageSpecial or it is of different size with btree pages Special area: CREATE TABLE test1 (a int8, b int4range); SELECT bt_page_items(get_raw_page('test1', 0)); In the current state is is so, but it is not guaranteed. For example, in the proposed 64xid patch [1] we introduce PageSpecial into heap pages and its size on occasion equals to the size of btree page special so check from above is false. Even if we pass heap page into pageinspect pretending it is btree page. So the test will fail. Generally it seems not only a wrong test but the consequence of a bigger scale fact that we can not always distinguish different AM's pages. So I'd propose at least that we don't come to a conclusion that a page is valid based on PageSpecial size is right. [1] https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com -- Best regards, Maxim Orlov.
Re: Is monotonous xid8 is a right way to do?
Hi! In my view, FullTransactionId type was implemented without considering 64 bit wraparound. Which seems to be unlikely to happen. Then on that basis xid8 type was created. Details of that particular implementation infiltrated into documentation and became sort of normal. In my opinion, semantically, both of these types should be treated as similar types although with different sizes. Thus, again, xid and xid8 types should be a ring and have no min and max functions. At least, in a sort of "conventional" way when minimal value is minimal in a mathematical way and so for maximum. For example, max may be implemented as max(0, 42, 18446744073709551615) = 42, which is a bit weird. -- Best regards, Maxim Orlov.
Re: Unit tests for SLRU
> > Again, I don't have a strong opinion here. If you insist, I will place the > tests to regress.c. > It is up to committer to decide, but I think it would be good to place tests in regression. In my opinion, many things from core may be used by extensions. And then it is up to extension authors to make relevant tests. For me, it's enough to make a reasonable test coverage for SLRU without investing too much effort into generalization. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
On Mon, 6 Nov 2023 at 16:07, Alexander Korotkov wrote: > Hi! > > On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev < > aleksan...@timescale.com> wrote: > > PFE the corrected patchset v58. > > I'd like to revive this thread. > Hi! Great news! > BTW, there is a typo in a word "exceeed". > Fixed. > > +static int inline > +SlruFileName(SlruCtl ctl, char *path, int64 segno) > +{ > ... > +} > > I think it worth adding asserts here to verify there is no overflow making > us mapping different segments into the same files. > Agree, assertion added. > + return occupied == max_notify_queue_pages; > > I'm not sure if the current code could actually allow to occupy more than > max_notify_queue_pages. Probably not even in extreme cases. But I still > think it will more safe and easier to read to write "occupied >= > max_notify_queue"_pages here. > Fixed. > diff --git a/src/test/modules/test_slru/test_slru.c > b/src/test/modules/test_slru/test_slru.c > > The actual 64-bitness of SLRU pages isn't much exercised in our automated > tests. It would be too exhausting to make pg_notify actually use higher > than 2**32 page numbers. Thus, I think test/modules/test_slru is a good > place to give high page numbers a good test. > PFA, I've add test for a 64-bit SLRU pages. By the way, there is another one useful thing we may do here. For now pg_commit_ts functionality is rather strange: if it was enabled, then disabled and then enabled again all the data from before will be discarded. Meanwhile, users expected to have their commit timestamps for all transactions, which were "logged" when this feature was enabled. It's weird. AFICS, the only reason for this behaviour is becouse of transaction wraparound. It may occur while the feature is disabled end it is safe to simply remove all the data from previous period. If we switch to FullTransactionId in commit_ts we can overcome this limitation. But I'm not sure if it worth to try to fix this in current patchset, since it is already non trivial. -- Best regards, Maxim Orlov. v59-0003-Make-use-FullTransactionId-in-2PC-filenames.patch Description: Binary data v59-0004-Add-SLRU-tests-for-64-bit-page-case.patch Description: Binary data v59-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data v59-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
On Thu, 9 Nov 2023 at 15:01, Amul Sul wrote: > > Here is the updated version patch. Did minor changes to documents and > tests. > Overall patch looks good to me. Since Peter did withdraw his comment on triggers and no open problems are present, we can make this patch RfC, shall we? It would be nice to correct this in the next release. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Aleksander Alekseev, > Maxim, > I see both of us accounted for Alexanders feedback and submitted v59. > Your newer version seems to have issues on cfbot, so resubmitting the > previous patchset that passes the tests. Please feel free to add > changes. For unknown reasons, I do not receive any of your emails from after 2023-11-07 11:57:12 (Message-ID: CAJ7c6TN1hKqNPGrNcq96SUyD= z61rakgxf8iqq36qr90oud...@mail.gmail.com). Even after resend. Anyway, PFA patch set of version 61. I've made some minor changes in the 0001 and add 004 in order to test actual 64-bit SLRU pages. As for CF bot had failed on my v59 patch set, this is because of the bug. It's manifested because of added 64-bit pages tests. The problem was in segno calculation, since we convert it from file name using strtol call. But strtol return long, which is 4 byte long in x86. - segno = (int) strtol(clde->d_name, NULL, 16); + segno = strtoi64(clde->d_name, NULL, 16); -- Best regards, Maxim Orlov. v61-0003-Make-use-FullTransactionId-in-2PC-filenames.patch Description: Binary data v61-0004-Add-SLRU-tests-for-64-bit-page-case.patch Description: Binary data v61-0002-Use-larger-segment-file-names-for-pg_notify.patch Description: Binary data v61-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch Description: Binary data
How to stop autovacuum silently
Hi! Recently, one of our customers had reported a not working autovacuum. After a minor investigation, I've found that autovacuum launcher did, actually, run vacuum as expected, but with no results. At the same time, no warnings or other anomies were present in the logs. At first, I've thought may be statistics is broken, thus vacuum is not working as expected. But in fact, something more interesting is had happened. The pg_class.relfrozenxid was set to some rubbish value from the future, thus broken in template1 DB, so any new database will have it's broken too. Then, we create "blocker" DB and then in vac_update_datfrozenxid() we get "bogus" (from the future) value of relfrozenxid and *silently* return. Any other new created DB will not be autovacuumed. Funny, but from the perspective of DBA, this looks like autovacuum is not working any more for no reasons, although all the criterion for its launch is clearly observed. AFAICS, there are several solutions for this state: - run vacuumdb for all DB's - manually update broken pg_class.relfrozenxid - lowering of autovacuum_freeze_max_age to trigger prevent of transaction ID wraparound I do understand, this behaviour hardly can be described as a bug of some sort, but could we make, at least, a useful message to help to clarify what is going on here? === REPRODUCE === $ cat <> pgsql/data/postgresql.conf autovacuum_naptime = 1s autovacuum_freeze_max_age = 10 EOF $ ./pgsql/bin/pg_ctl -D pgsql/data -l pgsql/logfile start waiting for server to start done server started $ ./pgsql/bin/psql postgres psql (17devel) Type "help" for help. postgres=# \c template1 You are now connected to database "template1" as user "orlov". template1=# update pg_class set relfrozenxid='20' where oid = 1262; UPDATE 1 template1=# do $$ begin while 12 - txid_current()::text::int8 > 0 loop commit; end loop; end $$; DO template1=# create database blocker; CREATE DATABASE template1=# create database foo; CREATE DATABASE template1=# \c foo You are now connected to database "foo" as user "orlov". foo=# create table bar(baz int); CREATE TABLE foo=# insert into bar select bar from generate_series(1, 8192) bar; INSERT 0 8192 foo=# update bar set baz=baz; UPDATE 8192 foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count from pg_stat_user_tables where relname = 'bar'; relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | autovacuum_count -+---+---+---+++-+-+-- bar | 8192 | 8192 | 0 | 8192 | 8192 | | |0 (1 row) foo=# update bar set baz=baz; UPDATE 8192 foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count from pg_stat_user_tables where relname = 'bar'; relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | autovacuum_count -+---+---+---+++-+-+-- bar | 8192 | 16384 | 0 | 8192 | 16384 | | |0 (1 row) ... and so on -- Best regards, Maxim Orlov. 0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patch Description: Binary data
Re: How to stop autovacuum silently
On Wed, 22 Nov 2023 at 21:13, Peter Geoghegan wrote: > Are you aware of commit e83ebfe6d7, which added a similar WARNING at > the point when VACUUM overwrites a relfrozenxid/relminmxid "from the > future"? It's a recent one. > Thank you for reply! I hadn't noticed it. But in described above case, it doesn't produce any warnings. My concern here is that with a couple of updates, we can stop autovacuum implicitly without any warnings. > Was pg_upgrade even run against this database? My guess is that the > underlying problem was caused by the bug fixed by commit 74cf7d46. > I'm pretty much sure it was, but, unfortunately, there are no way to 100% confirm this. All I know, they're using PG13 now. -- Best regards, Maxim Orlov.
Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid
Hi! As were discussed in [0] our overall goal is to make Postgres 64 bit XIDs. It's obvious, that such a big patch set couldn't possible to commit "at once". SLUR patch set [1] was committed a short while ago as a first significant step in this direction. This thread is a next step in this enterprise. My objective here is to commit some changes, which were mandatory, as far as I understand, for any type of 64 XIDs implementation. And I'm sure there will be points for discussion here. My original intention was to make PGPROC->xmin, PGPROC->xid and PROC_HDR->xids 64bit. But in reality, it turned out to be much more difficult than I expected. On the one hand, the patch became too big and on the other hand, it's heavily relayed on epoch and XID "adjustment" to FXID. Therefore, for now, I decided to limit myself to more atomic and independent changes. However, as I said above, these changes are required for any implementation of 64bit XIDs. So, PFA patches to make switch PGPROC->xid and XLogRecord->xl_xid to FullTransactionId. As always, any opinions are very welcome! [0] https://www.postgresql.org/message-id/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyj...@mail.gmail.com [1] https://www.postgresql.org/message-id/flat/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com -- Best regards, Maxim Orlov. v1-0002-Switch-to-FullTransactionId-for-XLogRecord-xl_xid.patch Description: Binary data v1-0001-Switch-to-FullTransactionId-for-PGPROC-xid.patch Description: Binary data
Add Index-level REINDEX with multiple jobs
Hi! Recently, one of our customers came to us with the question: why do reindex utility does not support multiple jobs for indices (-i opt)? And, of course, it is because we cannot control the concurrent processing of multiple indexes on the same relation. This was discussed somewhere in [0], I believe. So, customer have to make a shell script to do his business and so on. But. This seems to be not that complicated to split indices by parent tables and do reindex in multiple jobs? Or I miss something? PFA patch implementing this. As always, any opinions are very welcome! [0] https://www.postgresql.org/message-id/flat/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg%40mail.gmail.com -- Best regards, Maxim Orlov. v1-0001-Add-Index-level-REINDEX-with-multiple-jobs.patch Description: Binary data
Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid
On Fri, 29 Dec 2023 at 16:36, Matthias van de Meent < boekewurm+postg...@gmail.com> wrote: > > I don't think this is an actionable change, as this wastes 4 more bytes > (or 8 with alignment) in nearly all WAL records that don't use the > HEAP/HEAP2/XLOG rmgrs, which would then be up to 10 (if not 14, when > 64but-aligned) bytes per record. Unless something like [0] gets committed > this will add a significant write overhead to all operations, even if they > are not doing anything that needs an XID. > > Also, I don't think we need to support transactions that stay alive and > change things for longer than 2^31 concurrently created transactions, so we > could well add a fxid to each WAL segment header (and checkpoint record?) > and calculate the fxid of each record as a relative fxid off of that. > > Thank you for your reply. Yes, this is a good idea. Actually, this is exactly what I was trying to do at first. But in this case, we have to add more locks on TransamVariables->nextXid, and I've abandoned the idea. Maybe, I should look in this direction. On Sun, 31 Dec 2023 at 03:44, Michael Paquier wrote: > And FWIW, echoing with Matthias, making these generic structures > arbitrary larger is a non-starter. We should try to make them > shorter. > Yeah, obviously, this is patch make WAL bigger. I'll try to look into the idea of fxid calculation, as mentioned above. It might in part be a "chicken and the egg" situation. It is very hard to split overall 64xid patch into smaller pieces with each part been a meaningful and beneficial for current 32xids Postgres. Anyway, thanks for reply, appreciate it. -- Best regards, Maxim Orlov.
Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression
Hi! I'm pretty much like the idea of the patch. Looks like an overlook in SQL standard for me. Anyway, patch apply with no conflicts and implements described functionality. On Fri, 25 Aug 2023 at 03:06, Vik Fearing wrote: > > I don't like this part of the patch at all. Not only is the > documentation only half baked, but the entire concept of the two > commands is different. Especially since I believe the command should > also create a generated column from a non-generated one. But I have to agree with Vik Fearing, we can make this patch better, should we? I totally understand your intentions to keep the code flow simple and reuse existing code as much as possible. But in terms of semantics of these commands, they are quite different from each other. And in terms of reading of the code, this makes it even harder to understand what is going on here. So, in my view, consider split these commands. Hope, that helps. Again, I'm +1 for this patch. -- Best regards, Maxim Orlov.
Re: Add 'worker_type' to pg_stat_subscription
Hi! I did a look at the patch, like the idea. The overall code is in a good condition, implements the described feature. Side note: this is not a problem of this particular patch, but in pg_stat_get_subscription and many other places, there is a switch with worker types. Can we use a default section there to have an explicit error instead of the compiler warnings if somehow we decide to add another one worker type? So, should we mark this thread as RfC? -- Best regards, Maxim Orlov.
Re: should frontend tools use syncfs() ?
On Thu, 7 Sept 2023 at 03:34, Nathan Bossart wrote: > Committed. > Hi! Great job! But here is one problem I've encountered during working on some unrelated stuff. How we have two different things call the same name – sync_method. One in xlog: intsync_method = DEFAULT_SYNC_METHOD; ...and another one in "bins": static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC; In current include order, this is not a problem, but imagine you add a couple of new includes, for example: --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" And build will be broken, because we how have two different things called "sync_method" with different types: In file included from .../src/bin/pg_rewind/pg_rewind.c:33: In file included from .../src/include/storage/bufpage.h:22: In file included from .../src/include/utils/rel.h:18: .../src/include/access/xlog.h:27:24: error: redeclaration of 'sync_method' with a different type: 'int' vs 'DataDirSyncMethod' (aka 'enum DataDirSyncMethod') extern PGDLLIMPORT int sync_method; ... As a solution, I suggest renaming sync_method in xlog module to wal_sync_method. In fact, appropriate GUC for this variable, called "wal_sync_method" and I see no reason not to use the exact same name for a variable in xlog module. -- Best regards, Maxim Orlov. diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 424ecba028f..3aa1fc60cb4 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -18,6 +18,8 @@ #include "storage/block.h" #include "storage/item.h" #include "storage/off.h" +#include "postgres.h" +#include "utils/rel.h" /* * A postgres disk page is an abstraction layered on top of a postgres v10-0001-Fix-conflicting-types-for-sync_method.patch Description: Binary data
Re: should frontend tools use syncfs() ?
On Wed, 20 Sept 2023 at 22:08, Nathan Bossart wrote: > I think we should also consider renaming things like SYNC_METHOD_FSYNC to > WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options. > I've already rename sync_method_options in previous patch. 34 @@ -171,7 +171,7 @@ static bool check_wal_consistency_checking_deferred = false; 35 /* 36 * GUC support 37 */ 38 -const struct config_enum_entry sync_method_options[] = { 39 +const struct config_enum_entry wal_sync_method_options[] = { As for SYNC_METHOD_FSYNC rename, PFA patch. Also make enum for WAL sync methods instead of defines. -- Best regards, Maxim Orlov. v11-0001-Fix-conflicting-types-for-sync_method.patch Description: Binary data
Re: should frontend tools use syncfs() ?
Back to the patch v11. I don’t understand a bit, what we should do next? Make a separate thread or put this one on commitfest? -- Best regards, Maxim Orlov.
Re: should frontend tools use syncfs() ?
On Fri, 6 Oct 2023 at 22:35, Nathan Bossart wrote: > From a quick skim, this one looks pretty good to me. Would you mind adding > it to the commitfest so that it doesn't get lost? I will aim to take a > closer look at it next week. > Sounds good, thanks a lot! https://commitfest.postgresql.org/45/4609/ -- Best regards, Maxim Orlov.
Re: UNIQUE null treatment option
Since cfbot did failed with error, probably, unrelated to the patch itself (see https://cirrus-ci.com/task/5330150500859904) and repeated check did not start automatically, I reattach patch v3 to restart cfbot on this patch. -- Best regards, Maxim Orlov. v3-0001-Add-UNIQUE-null-treatment-option.patch Description: Binary data
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
Hi! I've watched over the patch and consider it useful. Applies without conflicts. The functionality of the patch itself is meets declared functionality. But, in my view, some improvements may be proposed. We should be more, let's say, cautious (or a paranoid if you wish), in pg_walfile_offset_lsn while dealing with user input values. What I mean by that is: - to init input vars of sscanf, i.e. tli, log andseg; - check the return value of sscanf call; - check filename max length. Another question arises for me: is this function can be called during recovery? If not, we should ereport about this, should we? And one last note here: pg_walfile_offset_lsn is accepting NULL values and return NULL in this case. From a theoretical point of view, this is perfectly fine. Actually, I think this is exactly how it supposed to be, but I'm not sure if there are no other opinions here. -- Best regards, Maxim Orlov.
[PoC] configurable out of disk space elog level
Hi! PROBLEM Our customer stumble onto the next behaviour of the Postgres cluster: if disk space is exhausted, Postgres continues to work until WAL can be successfully written. Thus, upon disk space exhaustion, clients will get an “ERROR: could not extend file “base/X/X”: No space left on device” messages and transactions will be aborted. But the cluster continues to work for a quite some time. This behaviour of the PostgreSQL, of course, is perfectly legit. Cluster just translate OS error to the user and can do nothing about it, expecting space may be available later. On the other hand, users continues to send more data and having more and more transactions to be aborted. There are next possible ways to diagnose described situation: —external monitoring system; —log analysis; —create/drop table and analyse results. Each one have advantages and disadvantages. I'm not going to dive deeper here, if you don't mind. The customer, mentioned above, in this particular case, would be glad to be able to have a mechanism to stop the cluster. Again, in this concrete case. PROPOSAL My proposal is to add a tablespace option in order to be able to configure which behaviour is appropriate for a particular user. I've decided to call this option “on_no_space” for now. If anyone has a better naming for this feature, please, report. So, the idea is to add both GUC and tablespace option “on_no_space”. The tablespace option defines the behaviour of the cluster for a particular tablespace in “on_no_space” situation. The GUC defines the default value of tablespace option. Patch is posted as PoC is attached. Here's what it looks like: === == Create 100Mb disk $ dd if=/dev/zero of=/tmp/foo.img bs=100M count=1 $ mkfs.ext4 /tmp/foo.img $ mkdir /tmp/foo $ sudo mount -t ext4 -o loop /tmp/foo.img /tmp/foo $ sudo chown -R orlov:orlov /tmp/foo === == Into psql postgres=# CREATE TABLESPACE foo LOCATION '/tmp/foo' WITH (on_no_space=fatal); CREATE TABLESPACE postgres=# \db+ List of tablespaces Name| Owner | Location | Access privileges | Options | Size | Description +---+--+---+-+-+- foo| orlov | /tmp/foo | | {on_no_space=fatal} | 0 bytes | ... postgres=# CREATE TABLE bar(qux int, quux text) WITH (autovacuum_enabled = false) TABLESPACE foo; CREATE TABLE postgres=# INSERT INTO bar(qux, quux) SELECT id, md5(id::text) FROM generate_series(1, 1000) AS id; FATAL: could not extend file "pg_tblspc/16384/PG_16_202211121/5/16385": No space left on device HINT: Check free disk space. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. === CAVEATS Again, I've posted this patch as a PoC. This is not a complete realization of described functionality. AFAICS, there are next problems: - I have to put get_tablespace_elevel call in RelationGetBufferForTuple in order to tablespace in cache; overwise, cache miss in get_tablespace triggers assertion failing in lock.c:887 (Assert("!IsRelationExtensionLockHeld")). This assertion was added by commit 15ef6ff4 (see [0] for details). - What error should be when mdextend called not to insert a tuple into a heap (WAL applying, for example)? Maybe, adding just GUC without ability to customize certain tablespaces to define "out of disk space" behaviour is enough? I would appreciate it if you give your opinions on a subject. -- Best regards, Maxim Orlov. [0] https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B%2BSs%3DGn1LA%40mail.gmail.com v1-0001-Add-out-of-disk-space-elog-level.patch Description: Binary data
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Tue, 15 Nov 2022 at 13:02, Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov wrote: > > > But, in my view, some improvements may be proposed. We should be more, > let's say, cautious (or a paranoid if you wish), > > in pg_walfile_offset_lsn while dealing with user input values. What I > mean by that is: > > - to init input vars of sscanf, i.e. tli, log andseg; > > - check the return value of sscanf call; > > - check filename max length. > > IsXLogFileName() will take care of this. Also, I've added a new inline > function XLogIdFromFileName() that parses file name and returns log > and seg along with XLogSegNo and timeline id. This new function avoids > an extra sscanf() call as well. > > > Another question arises for me: is this function can be called during > recovery? If not, we should ereport about this, should we? > > It's just a utility function and doesn't depend on any of the server's > current state (unlike pg_walfile_name()), hence it doesn't matter if > this function is called during recovery. > > > And one last note here: pg_walfile_offset_lsn is accepting NULL values > and return NULL in this case. From a theoretical > > point of view, this is perfectly fine. Actually, I think this is exactly > how it supposed to be, but I'm not sure if there are no other opinions here. > > These functions are marked as 'STRICT', meaning a null is returned, > without even calling the function, if any of the input parameters is > null. I think we can keep the behaviour the same as its friends. > Thanks for the explanations. I think you are right. > >errmsg("offset must not be negative or greater than or equal to the > WAL segment size"))); > > Changed. > Confirm. And a timeline_id is added. > While on this, I noticed that the pg_walfile_name_offset() isn't > covered in tests. I took an opportunity and added a simple test case > along with pg_walfile_offset_lsn(). > > I'm attaching the v3 patch set for further review. > Great job! We should mark this patch as RFC, shall we? -- Best regards, Maxim Orlov.
Re: [PoC] configurable out of disk space elog level
On Wed, 16 Nov 2022 at 20:41, Andres Freund wrote: > Hi, > You can't do catalog access below the bufmgr.c layer. It could lead to all > kinds of nastiness, including potentially recursing back to md.c. Even > leaving > Yep, this is my biggest concern. It turns out, that the way to make such a feature is to use just GUC for all tablespaces or forward elevel "from above". > that aside, we can't do catalog accesses in all kinds of environments that > this currently is active in - most importantly it's affecting the startup > process. We don't do catalog accesses in the startup process, and even if > we > were to do so, we couldn't unconditionally because the catalog might not > even > be consistent at this point (nor is it guaranteed that the wal_level even > allows to access catalogs during recovery). > Yep, that is why I do use in get_tablespace_elevel: + /* +* Use GUC level only in normal mode. +*/ + if (!IsNormalProcessingMode()) + return ERROR; Anyway, I appreciate the opinion, thank you! -- Best regards, Maxim Orlov.
Re: [PoC] configurable out of disk space elog level
> I don't think this is a good feature to add to PostgreSQL. First, it's > unclear why stopping the cluster is a desirable behavior. It doesn't > stop the user transactions from failing; it just makes them fail in > some other way. Now it is of course perfectly legitimate for a > particular user to want that particular behavior anyway, but there are > a bunch of other things that a user could equally legitimately want to > do, like page the DBA or trigger a failover or kill off sessions that > are using large temporary relations or whatever. And, equally, there > are many other operating system errors to which a user could want the > database system to respond in similar ways. For example, someone might > want any given one of those treatments when an I/O error occurs > writing to the data directory, or a read-only filesystem error, or a > permission denied error. > > Having a switch for one particular kind of error (out of many that > could possibly occur) that triggers one particular coping strategy > (out of many that could possibly be used) seems far too specific a > thing to add as a core feature. And even if we had something much more > general, I'm not sure why that should go into the database rather than > being implemented outside it. After all, nothing at all prevents the > user from scanning the database logs for "out of space" errors and > shutting down the database if any are found. Yes, you are right. Actually, this is one of possible ways to deal with described situation I mentioned above. And if I would deal with such a task, I would make it via log monitoring. The question was: "could we be more general here?". Probably, not. > While you're at it, you > could make your monitoring script also check the free space on the > relevant partition using statfs() and page somebody if the utilization > goes above 95% or whatever threshold you like, which would probably > avoid service outages much more effectively than $SUBJECT. > > I just can't see much real benefit in putting this logic inside the > database. > OK, I got it. Thanks for your thoughts! -- Best regards, Maxim Orlov.
[BUG] FailedAssertion in SnapBuildPurgeOlderTxn
Hi! PREAMBLE For a last couple of months, I stumbled into a problem while running tests on ARM (Debain, aarch64) and some more wired platforms for my 64–bit XIDs patch set. Test contrib/test_decoding (catalog_change_snapshot) rarely failed with the next message: TRAP: FailedAssertion("TransactionIdIsNormal(InitialRunningXacts[0]) && TransactionIdIsNormal(builder->xmin)", File: "snapbuild.c" I have plenty of failing on ARM, couple on x86 and none (if memory serves) on x86–64. At first, my thought was to blame my 64–bit XID patch for what, but this is not the case. This error persist from PG15 to PG10 without any patch applied. Though hard to reproduce. PROBLEM After some investigation, I think, the problem is in the snapbuild.c (commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts array in the context of builder->context, but for the time when we call SnapBuildPurgeOlderTxn this context may be already free'd. Thus, InitialRunningXacts array become array of 2139062143 (i.e. 0x7F7F7F7F) values. This is not caused buildfarm to fail due to that code: if (!NormalTransactionIdPrecedes(InitialRunningXacts[0], builder->xmin)) return; Since the cluster is initialised with XID way less than 0x7F7F7F7F, we get to return here, but the problem is still existing. I've attached the patch based on branch REL_15_STABLE to reproduce the problem on x86-64. On my patch set of 64–bit XID's this problem manifested since I do init cluster with XID far beyond 32–bit bound. Alternatively, I did try to use my patch [1] to init cluster with first transaction 2139062143 (i.e. 0x7F7F7F7F). Then put pg_sleep call just like in the attached patch: --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -968,6 +968,8 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder) if (NInitialRunningXacts == 0) return; + pg_usleep(100L * 2L); + /* bound check if there is at least one transaction to remove */ if (!NormalTransactionIdPrecedes(InitialRunningXacts[0], builder->xmin)) Run installcheck-force for many times for a test_decoding/ catalog_change_snapshot's and got a segfault. CONCLUSION In snapbuild.c, context allocated array InitialRunningXacts may be free'd, this caused assertion failed (at best) or may lead to the more serious problems. P.S. Simple fix like: @@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn * changes. See SnapBuildXidSetCatalogChanges. */ NInitialRunningXacts = nxacts; - InitialRunningXacts = MemoryContextAlloc(builder->context, sz); + InitialRunningXacts = MemoryContextAlloc(TopMemoryContext, sz); memcpy(InitialRunningXacts, running->xids, sz); qsort(InitialRunningXacts, nxacts, sizeof(TransactionId), xidComparator); seems to solve the described problem, but I'm not in the context of [0] and why array is allocated in builder->context. [0] https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com [1] https://www.postgresql.org/message-id/flat/CACG=ezaa4vqYjJ16yoxgrpa-=gxnf0vv3ey9bjgrrrfn2yy...@mail.gmail.com -- Best regards, Maxim Orlov. From d09a031f1f807cdfe1e02000b2bf4fd3eaaedd8f Mon Sep 17 00:00:00 2001 From: Maxim Orlov Date: Mon, 21 Nov 2022 14:50:02 +0300 Subject: [PATCH] catalog_change_snapshot-fail --- contrib/test_decoding/Makefile | 1007 ++- src/backend/replication/logical/snapbuild.c | 12 + 2 files changed, 1012 insertions(+), 7 deletions(-) diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index c7ce603706..aaf7a63411 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -3,12 +3,1006 @@ MODULES = test_decoding PGFILEDESC = "test_decoding - example of a logical decoding output plugin" -REGRESS = ddl xact rewrite toast permissions decoding_in_xact \ - decoding_into_rel binary prepared replorigin time messages \ - spill slot truncate stream stats twophase twophase_stream -ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ - oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ - twophase_snapshot slot_creation_error catalog_change_snapshot +ISOLATION = catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ + catalog_change_snapshot \ +
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
> Pushed. > > -- > With Regards, > Amit Kapila. > > Hi! While working on 64–bit XID's patch set, I stumble into problems with contrib/test_decoding/catalog_change_snapshot test [0]. AFAICS, the problem is not related to the 64–bit XID's patch set and the problem is in InitialRunningXacts array, allocaled in builder->context. Do we really need it to be allocated that way? [0] https://www.postgresql.org/message-id/CACG%3DezZoz_KG%2BRyh9MrU_g5e0HiVoHocEvqFF%3DNRrhrwKmEQJQ%40mail.gmail.com -- Best regards, Maxim Orlov.
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
> > Thank you for the report and initial analysis. I have added Sawada-San > to know his views as he was the primary author of this work. > > -- > With Regards, > Amit Kapila. > OK, thanks a lot. I hope, we'll fix this soon. -- Best regards, Maxim Orlov.
Re: [PATCH] Add initial xid/mxid/mxoff to initdb
Hi! CF bot says patch does not apply. Rebased. Your reviews are very much welcome! -- Best regards, Maxim Orlov. v2-0001-Add-initdb-option-to-initialize-cluster-with-non-.patch Description: Binary data
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
> I've attached a draft patch. To fix it, I think we can reset > InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder() > and add an assertion in AllocateSnapshotBuilder() to make sure both > are reset. > Thanks for the patch. It works fine. I've tested this patch for 15 and 11 versions on x86_64 and ARM and see no fails. But the function pg_current_xact_id added by 4c04be9b05ad doesn't exist in PG11. > Regarding the tests, the patch includes a new scenario to > reproduce this issue. However, since the issue can be reproduced also > by the existing scenario (with low probability, though), I'm not sure > it's worth adding the new scenario. > AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it does not worth it. But, I do not have a strong opinion here. Let's add tests in a separate commit and let the actual committer to decide what to do, should we? -- Best regards, Maxim Orlov.
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
> > Agreed not to have a test case for this. > > I've attached an updated patch. I've confirmed this patch works for > all supported branches. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > It works for me as well. Thanks! I've created a commitfest entry for this patch, see https://commitfest.postgresql.org/41/4024/ Hope, it will be committed soon. -- Best regards, Maxim Orlov.
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
On Fri, 25 Nov 2022 at 09:40, Amit Kapila wrote: > On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila > wrote: > > > > On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada > wrote: > > > > > > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila > wrote: > > > > > > Agreed not to have a test case for this. > > > > > > I've attached an updated patch. I've confirmed this patch works for > > > all supported branches. > > > > > > > I have slightly changed the checks used in the patch, otherwise looks > > good to me. I am planning to push (v11-v15) the attached tomorrow > > unless there are more comments. > > > > Pushed. > A big thanks to you! Could you also, close the commitfest entry https://commitfest.postgresql.org/41/4024/, please? -- Best regards, Maxim Orlov.
Re: Add 64-bit XIDs into PostgreSQL 15
> So I'd vote for an evolutionary approach and give my +1 for > undertaking efforts to first committing [1] to 16. > > [1]: > https://www.postgresql.org/message-id/CAFiTN-uudj2PY8GsUzFtLYFpBoq_rKegW3On_8ZHdxB1mVv3-A%40mail.gmail.com > > Kind regards, > Pavel Borisov, > Supabase. > +1 Totally support the idea. Let's focus on committing SLRU changes. -- Best regards, Maxim Orlov.
Re: Add 64-bit XIDs into PostgreSQL 15
On Fri, 9 Dec 2022 at 16:54, adherent postgres < adherent_postg...@hotmail.com> wrote: > Hi Aleksander Alekseev > I think the xids 32bit transformation project has been dragged on for too > long. Huawei's openGauss referenced this patch to implement xids 64bit, and > Postgrespro also implemented xids 64bit, which is enough to prove that > their worries are redundant.I think postgresql has no reason not to > implement xid 64 bit. What about your opinion? > Yeah, I totally agree, the time has come. With a high transaction load, Postgres become more and more difficult to maintain. The problem is in the overall complexity of the patch set. We need more reviewers. Since committing such a big patch is not viable once at a time, from the start of the work we did split it into several logical parts. The evolution concept is more preferable in this case. As far as I can see, there is overall consensus to commit SLRU related changes first. -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
On Fri, 13 Jan 2023 at 07:50, Corey Huinker wrote: > > I named it wait_result_to_exit_code(), but I welcome suggestions of a > better name. > Thanks! But CF bot still not happy. I think, we should address issues from here https://cirrus-ci.com/task/5391002618298368 -- Best regards, Maxim Orlov.
old_snapshot_threshold bottleneck on replica
Hi! One of our customers stumble onto a significant performance degradation while running multiple OLAP-like queries on a replica. After some investigation, it became clear that the problem is in accessing old_snapshot_threshold parameter. Accessing old_snapshot_threshold parameter is guarded by mutex_threshold. This is not a problem on primary server, since we rarely call GetOldSnapshotThresholdTimestamp: 5028 void 5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation) 5030 { 5031 if (RelationAllowsEarlyPruning(relation) 5032 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp()) 5033 ereport(ERROR, 5034 (errcode(ERRCODE_SNAPSHOT_TOO_OLD), 5035 errmsg("snapshot too old"))); But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp much often. So, this become a bottleneck. The customer solve this issue by setting old_snapshot_threshold to 0. But, I think, we can do something about it. Some more investigation: -- On primary -- $ ./bin/psql postgres -c "create database benchmark" CREATE DATABASE $ ./bin/pgbench -i -Uorlov -s300 benchmark dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping ... creating tables... generating data (client-side)... 3000 of 3000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s) vacuuming... creating primary keys... done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s). -- On secondary -- $ touch 1.sql $ vim 1.sql $ cat 1.sql \set bid random(1, 300) BEGIN; SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid; END; $ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark pgbench (16devel) progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed ... progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed $ perf record -F 99 -a -g --call-graph=dwarf sleep 5 $ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file $ grep s_lock file | wc -l 3486 My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA 0001 patch. With patch 0001 we got: $ grep s_lock file2 | wc -l 8 Maybe, we shall go farther and remove mutex_threshold here? This will lead to inconsistency of threshold_timestamp and threshold_xid, but is this really a problem? Thoughts? -- Best regards, Maxim Orlov. 0001-PGPRO-7624-use-atomic-old_snapshot_threshold.patch Description: Binary data
Re: old_snapshot_threshold bottleneck on replica
On Tue, 24 Jan 2023 at 18:46, Robert Haas wrote: > > (1) that mutex also protects something else and the existing comment > is wrong, or > > (2) the mutex should have been removed but the patch neglected to do so, or > > (3) the mutex is still needed for some reason, in which case either > (3a) the patch isn't actually safe or (3b) the patch needs comments to > explain what the new synchronization model is. > > Yes, you're absolutely right. And my first intention was to remove this mutex completely. But in TransactionIdLimitedForOldSnapshots these variable is using conjointly. So, I'm not sure, is it completely safe to remove mutex. Actually, removing mutex and switch to atomics was my first choice. I've run all the tests and no problems were found. But, at that time I choose to be more conservative. Anyway, here is the new variant. -- Best regards, Maxim Orlov. v2-0001-Use-atomic-old_snapshot_threshold.patch Description: Binary data
Re: old_snapshot_threshold bottleneck on replica
On Wed, 25 Jan 2023 at 16:52, Robert Haas wrote: > On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov wrote: > > Well, that's something we - and ideally you, as the patch author - > need to analyze and figure out. We can't just take a shot and hope for > the best. > I thank you for your advices. I've dived deeper into the problem and I think v2 patch is wrong. Accessing threshold_timestamp and threshold_xid in TransactionIdLimitedForOldSnapshots without lock would lead to an improper xlimit calculation. So, my choice would be (3b). My goal is to optimize access to the threshold_timestamp to avoid multiple spinlock acquisition on read. In the same time, simultaneous access to these variable (threshold_timestamp and threshold_xid) should be protected with spinlock. I remove atomic for threshold_xid and add comments on mutex_threshold. PFA, v3. I -- Best regards, Maxim Orlov. v3-0001-Use-atomic-old_snapshot_threshold.patch Description: Binary data
Re: old_snapshot_threshold bottleneck on replica
On Fri, 27 Jan 2023 at 18:18, Robert Haas wrote: > > Interesting, but it's still not entirely clear to me from reading the > comments why we should think that this is safe. > In overall, I think this is safe, because we do not change algorithm here. More specific, threshold_timestamp have only used in a few cases: 1). To get the latest value by calling GetOldSnapshotThresholdTimestamp. This will work, since we only change the sync type here from the spinlock to an atomic. 2). In TransactionIdLimitedForOldSnapshots, but here no changes in the behaviour will be done. Sync model will be the save as before the patch. 3). In SnapshotTooOldMagicForTest, which is a stub to make old_snapshot_threshold tests appear "working". But no coherence with the threshold_xid here. So, we have a two use cases for the threshold_timestamp: a). When the threshold_timestamp is used in conjunction with the threshold_xid. We must use spinlock to sync. b). When the threshold_timestamp is used without conjunction with the threshold_xid. In this case, we use atomic values. -- Best regards, Maxim Orlov.
Re: Add SHELL_EXIT_CODE to psql
Unfortunately, there is a fail in FreeBSD https://cirrus-ci.com/task/6466749487382528 Maybe, this patch is need to be rebased? -- Best regards, Maxim Orlov.
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Thu, 22 Sept 2022 at 18:13, Andres Freund wrote: > Due to the merge of the meson based build this patch needs to be > adjusted: https://cirrus-ci.com/build/6350479973154816 > > Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be > installed and t/004_verify_nbtree_unique.pl to the tests. > > Greetings, > > Andres Freund > Thanks! Fixed. -- Best regards, Maxim Orlov. v17-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch Description: Binary data
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi! I think, this patch was marked as "Waiting on Author", probably, by mistake. Since recent changes were done without any significant code changes and CF bot how happy again. I'm going to move it to RfC, could I? If not, please tell why. -- Best regards, Maxim Orlov.
Re: making relfilenodes 56 bits
Hi! I'm not in the context of this thread, but I've notice something strange by attempting to rebase my patch set from 64XID thread. As far as I'm aware, this patch set is adding "relfilenumber". So, in pg_control_checkpoint, we have next changes: diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 781f8b8758..d441cd97e2 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -79,8 +79,8 @@ pg_control_system(PG_FUNCTION_ARGS) Datum pg_control_checkpoint(PG_FUNCTION_ARGS) { - Datum values[18]; - boolnulls[18]; + Datum values[19]; + boolnulls[19]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -129,6 +129,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) XIDOID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 18, "checkpoint_time", TIMESTAMPTZOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 19, "next_relfilenumber", + INT8OID, -1, 0); tupdesc = BlessTupleDesc(tupdesc); /* Read the control file. */ In other words, we have 19 attributes. But tupdesc here is constructed for 18 elements: tupdesc = CreateTemplateTupleDesc(18); Is that normal or not? Again, I'm not in this thread and if that is completely ok, I'm sorry about the noise. -- Best regards, Maxim Orlov.
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
> > This is the wrong thread / CF entry. Please see > Yep, my fault. Sorry about that. -- Best regards, Maxim Orlov.
Turn TransactionIdRetreat/Advance into inline functions
Hi! This patch is inspired by [0] and many others. I've notice recent activity to convert macros into inline functions. We should make TransactionIdRetreat/Advance functions Instead of a macro, should we? I also think about NormalTransactionIdPrecedes and NormalTransactionIdFollows, but maybe, they should be addressed separately: the comment says that "this is a macro for speed". Any thoughts? [0]: https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com -- Best regards, Maxim Orlov. v1-0001-Convert-macros-to-static-inline-functions-transam.patch Description: Binary data
Re: Turn TransactionIdRetreat/Advance into inline functions
> -1. Having to touch all the call sites like this outweighs > any claimed advantage: it makes them uglier and it will greatly > complicate any back-patching we might have to do in those areas. > > regards, tom lane > Ok, got it. But what if we change the semantics of these calls to xid = TransactionIdAdvance(xid) ? -- Best regards, Maxim Orlov.
Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment
Hi! Part of the work that Thomas mentions in [1], regarding Direct I/O, > has certain requirements that pointers must be page-aligned. > > I've attached a patch which implements palloc_aligned() and > MemoryContextAllocAligned() ... > I've done a quick look and the patch is looks good to me. Let's add tests for these functions, should we? If you think this is an overkill, feel free to trim tests for your taste. -- Best regards, Maxim Orlov. v1-0002-Add-tests-for-palloc_aligned.patch Description: Binary data v1-0001-palloc_aligned.patch Description: Binary data
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Just a note here. After examining the core dump I did notice something. While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true although subxip == NULL and subxcnt == 0. As far as I understand, snapshot->suboverflowed is set true in the GetRunningTransactionData call. And then I decided to put elog around CurrentRunningXacts->subxcnt's assigment. diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 42a89fc5dc9..3d2db02f580 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2781,6 +2781,9 @@ GetRunningTransactionData(void) * increases if slots do. */ + if (suboverflowed) + elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow is true"); + CurrentRunningXacts->xcnt = count - subcount; CurrentRunningXacts->subxcnt = subcount; CurrentRunningXacts->subxid_overflow = suboverflowed; ... and did get a bunch of messages. I.e. subxid_overflow is set true very often. I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes more than 120 there are no messages and no failed assertions are provided any more. --- Best regards, Maxim Orlov.diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 42a89fc5dc9..3d2db02f580 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2781,6 +2781,9 @@ GetRunningTransactionData(void) * increases if slots do. */ + if (suboverflowed) + elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow is true"); + CurrentRunningXacts->xcnt = count - subcount; CurrentRunningXacts->subxcnt = subcount; CurrentRunningXacts->subxid_overflow = suboverflowed; diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index be67d8a8616..8e1a3d1c6bb 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -33,7 +33,7 @@ * listed anywhere in the PGPROC array is not a running transaction. Else we * have to look at pg_subtrans. */ -#define PGPROC_MAX_CACHED_SUBXIDS 64 /* XXX guessed-at value */ +#define PGPROC_MAX_CACHED_SUBXIDS 120 /* XXX guessed-at value */ typedef struct XidCacheStatus {
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
The analysis in the beginning of the discussion seems to be right, but the fix v2 looks too invasive for me. Personally, I'd like not to remove snapshot even if transaction is read-only. I propose to consider "xid < TransactionXmin" as a legit case and just promote xid to TransactionXmin. It's annoying this old bug still not fixed. What do you think? --- Best regards, Maxim Orlov.diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c index 6a8e521f894..ce7c90aa38f 100644 --- a/src/backend/access/transam/subtrans.c +++ b/src/backend/access/transam/subtrans.c @@ -29,6 +29,7 @@ #include "postgres.h" #include "access/slru.h" +#include "access/parallel.h" #include "access/subtrans.h" #include "access/transam.h" #include "pg_trace.h" @@ -152,6 +153,21 @@ SubTransGetTopmostTransaction(TransactionId xid) TransactionId parentXid = xid, previousXid = xid; + /* + * We may have different xmins in active and transaction snapshots in + * parallel workers. And TransactionXmin can be promoted to the max of them. + * So, we do this to avoid assert below when restore active snapshot with + * less xmin value. + * + * XXX: the same may be fixed by prohibiting read-only transactions (like + * parallel scans) to increment TransactionXmin. + */ + if (IsParallelWorker()) + { + xid = xid > TransactionXmin ? xid : TransactionXmin; + parentXid = xid; + previousXid = xid; + } /* Can't ask about stuff that might not be around anymore */ Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
[PATCH] Add initial xid/mxid/mxoff to initdb
Hi! During work on 64-bit XID patch [1] we found handy to have initdb options to set initial xid/mxid/mxoff values to arbitrary non default values. It helps test different scenarios: related to wraparound, pg_upgrade from 32-bit XID to 64-bit XID, etc. We realize, this patch can be singled out as an independent patch from the whole patchset in [1] and be useful irrespective of 64-bit XID in cases of testing of wraparound and so on. In particular, we employed this patch to test recent changes in logical replication of subxacts [2] and found no problems in it near the point of publisher wraparound. Please share your opinions and reviews are always welcome. [1] https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com [2] https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df...@enterprisedb.com -- Best regards, Maxim Orlov. From c752471dab910b3ddfed5b0f3b59e7314165a193 Mon Sep 17 00:00:00 2001 From: Maxim Orlov Date: Wed, 4 May 2022 15:53:36 +0300 Subject: [PATCH v1] Add initdb option to initialize cluster with non-standard xid/mxid/mxoff. To date testing database cluster wraparund was not easy as initdb has always inited it with default xid/mxid/mxoff. The option to specify any valid xid/mxid/mxoff at cluster startup will make these things easier. Author: Maxim Orlov Author: Pavel Borisov --- src/backend/access/transam/clog.c | 9 +++ src/backend/access/transam/multixact.c | 35 src/backend/access/transam/xlog.c | 15 ++-- src/backend/bootstrap/bootstrap.c | 56 - src/backend/main/main.c| 6 ++ src/backend/postmaster/postmaster.c| 15 +++- src/backend/tcop/postgres.c| 59 +- src/bin/initdb/initdb.c| 107 - src/bin/initdb/t/001_initdb.pl | 86 src/include/access/xlog.h | 3 + src/include/catalog/pg_class.h | 2 +- 11 files changed, 380 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 3d9088a704..d973a2159d 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -713,6 +713,7 @@ void BootStrapCLOG(void) { int slotno; + int pageno; LWLockAcquire(XactSLRULock, LW_EXCLUSIVE); @@ -723,6 +724,14 @@ BootStrapCLOG(void) SimpleLruWritePage(XactCtl, slotno); Assert(!XactCtl->shared->page_dirty[slotno]); + pageno = TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextXid)); + if (pageno != 0) + { + slotno = ZeroCLOGPage(pageno, false); + SimpleLruWritePage(XactCtl, slotno); + Assert(!XactCtl->shared->page_dirty[slotno]); + } + LWLockRelease(XactSLRULock); } diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 8f7d12950e..00acb955e0 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1894,6 +1894,7 @@ void BootStrapMultiXact(void) { int slotno; + int pageno; LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE); @@ -1904,6 +1905,17 @@ BootStrapMultiXact(void) SimpleLruWritePage(MultiXactOffsetCtl, slotno); Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]); + pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact); + if (pageno != 0) + { + /* Create and zero the first page of the offsets log */ + slotno = ZeroMultiXactOffsetPage(pageno, false); + + /* Make sure it's written out */ + SimpleLruWritePage(MultiXactOffsetCtl, slotno); + Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]); + } + LWLockRelease(MultiXactOffsetSLRULock); LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE); @@ -1915,7 +1927,30 @@ BootStrapMultiXact(void) SimpleLruWritePage(MultiXactMemberCtl, slotno); Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]); + pageno = MXOffsetToMemberPage(MultiXactState->nextOffset); + if (pageno != 0) + { + /* Create and zero the first page of the members log */ + slotno = ZeroMultiXactMemberPage(pageno, false); + + /* Make sure it's written out */ + SimpleLruWritePage(MultiXactMemberCtl, slotno); + Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]); + } + LWLockRelease(MultiXactMemberSLRULock); + + /* + * If we're starting not from zero offset, initilize dummy multixact to + * evade too long loop in PerformMembersTruncation(). + */ + if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0) + { + RecordNewMultiXact(FirstMultiXactId, + MultiXactState->nextOffset, 0, NULL); + RecordNewMultiXact(MultiXactState->nextMXact, + MultiXactState->nextOffset, 0, NULL); + } } /* diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 61cda56c6f..7768a3835d 100644 --- a/src/backend/access/transam/xlog.c +++ b/sr
Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
> here is the rebased v32 version of the patch. > > The patchset rotted a bit. Here is a rebased version. > We have posted an updated version v34 of the whole patchset in [1]. Changes of patches 0001-0003 there are identical to v33. So, no update is needed in this thread. [1] https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com -- Best regards, Maxim Orlov.
Re: Custom tuplesorts for extensions
Hi! Overall patch looks good let's mark it as ready for committer, shall we? -- Best regards, Maxim Orlov.
Re: Fix unnecessary includes and comments in 019_replslot_limit.pl, 007_wal.pl and 004_timeline_switch.pl
Hi! This is an obvious change, I totally for it. Hope it will be commited soon. -- Best regards, Maxim Orlov.
Re: Fix unnecessary includes and comments in 019_replslot_limit.pl, 007_wal.pl and 004_timeline_switch.pl
> > I'm sorry for some nitpicking about changes in the comments: > - The number of WAL segments advanced hasn't changed from 5 to 1, it just > advances as 1+4 as previously. So the original comment is right. I reverted > this in v2. > Yeah, it looks even better now. -- Best regards, Maxim Orlov.
Re: remove reset_shared()
Hi! In general I'm for this patch. Some time ago I was working on a patch related to shared memory and noticed no reason to have reset_shared() function. -- Best regards, Maxim Orlov.
Re: Pre-allocating WAL files
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, failed Spec compliant: not tested Documentation:not tested Hi! We've looked through the code and everything looks good except few minor things: 1). Using dedicated bg worker seems not optimal, it introduces a lot of redundant code for little single action. We'd join initial proposal of Andres to implement it as an extra fuction of checkpointer. 2). In our view, it is better shift #define PREALLOCSEGDIR outside the function body. 3). It is better to have at least small comments on functions GetNumPreallocatedWalSegs, SetNumPreallocatedWalSegs, We've also tested performance difference between master branch and this patch and noticed no significant difference in performance. We used pgbench with some sort of "standard" settings: $ pgbench -c50 -s50 -T200 -P1 -r postgres ...and with... $ pgbench -c100 -s50 -T200 -P1 -r postgres When looking at every second output of pgbench we saw regular spikes of latency (aroud 5-10 times increase) and this pattern was similar with and without patch. Overall average latency stat for 200 sec of pgbench also looks pretty much the same with and without patch. Could you provide your testing setup to see the effect, please. The check-world was successfull. Overall patch looks good, but in our view it's better to have experimental support of the performance improvements to be commited. --- Best regards, Maxim Orlov, Pavel Borisov. The new status of this patch is: Waiting on Author
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On 2021-07-09 20:36, Tomas Vondra wrote: Hi, I took a quick look on this - I'm no expert in the details of snapshots, so take my comments with a grain of salt. AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think Greg is right the v3 patch does not seem like the right (or correct) solution, for a couple reasons: 1) It fixes the symptoms, not the cause. If we set TransactionXmin to a bogus value, this only fixes it locally in SubTransGetTopmostTransaction but I'd be surprised if other places did not have the same issue. 2) The xid/TransactionXmin comparison in the v2 fix: xid = xid > TransactionXmin ? xid : TransactionXmin; seems broken, because it compares the XIDs directly, but that's not going to work after generating enough transactions. 3) But even if this uses TransactionIdFollowsOrEquals, it seems very strange because the "xid" comes from XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot)) i.e. it's the raw xmin from the tuple, so why should we be setting it to TransactionXmin? That seems pretty strange, TBH. So yeah, I think this is due to confusion with two snapshots and failing to consider both of them when calculating TransactionXmin. But I think removing one of the snapshots (as the v2 does it) is rather strange too. I very much doubt having both the transaction and active snapshots in the parallel worker is not intentional, and Pavel may very well be right this breaks isolation levels that use the xact snapshot (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though. So I think we need to keep both snapshots, but fix TransactionXmin to consider both of them - I suppose ParallelWorkerMain could simply look at the two snapshots, and use the minimum. Which is what [1] (already linked by Pavel) talks about, although that's related to concerns about one of the processes dying, which is not what's happening here. I'm wondering what consequences this may have on production systems, though. We've only seen this failing because of the assert, so what happens when the build has asserts disabled? Looking at SubTransGetTopmostTransaction, it seems the while loop ends immediately (because it's pretty much the opposite of the assert), so we just return the "xid" as topmost XID. But will that produce incorrect query results, or what? regards [1] https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de PFA v4 patch based on the ideas above. In principle I see little difference with v3. But I agree it is more correct. I did test this patch on Linux and MacOS using testing algo above and got no error. On master branch before the patch I still see this error. --- Best regards, Maxim Orlov.diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 3550ef13baa..6443d492501 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -34,6 +34,7 @@ #include "pgstat.h" #include "storage/ipc.h" #include "storage/predicate.h" +#include "storage/procarray.h" #include "storage/sinval.h" #include "storage/spin.h" #include "tcop/tcopprot.h" @@ -45,6 +46,7 @@ #include "utils/snapmgr.h" #include "utils/typcache.h" + /* * We don't want to waste a lot of memory on an error queue which, most of * the time, will process only a handful of small messages. However, it is @@ -1261,6 +1263,7 @@ ParallelWorkerMain(Datum main_arg) char *uncommittedenumsspace; StringInfoData msgbuf; char *session_dsm_handle_space; + Snapshot asnap; /* Set flag to indicate that we're initializing a parallel worker. */ InitializingParallelWorker = true; @@ -1415,7 +1418,15 @@ ParallelWorkerMain(Datum main_arg) /* Restore active snapshot. */ asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false); - PushActiveSnapshot(RestoreSnapshot(asnapspace)); + asnap = RestoreSnapshot(asnapspace) + PushActiveSnapshot(asnap); + + /* + * We may have different xmins in active and transaction snapshots in + * parallel workers. So, use minimum for TransactionXmin. + */ + if (TransactionIdFollows(TransactionXmin, asnap->xmin)) + ProcArrayInstallRestoredXmin(p->xmin, fps->parallel_leader_pgproc); /* * We've changed which tuples we can see, and must therefore invalidate
Bugfix and improvements in multixact.c
Hi! While working on a making multix xact offsets 64-bit [0] I've discovered a minor issue. The thing is that type 'xid' is used in all macro, but it doesn't correct. Appropriate MultiXactId or MultiXactOffset should be used, actually. And the second thing, as Heikki Linnakangas points out, args naming is also misleading. Since, these problems are not the point of thread [0], I decided to create this discussion. And here is the patch set addressing mentioned issues (0001 and 0002). Additionally, I made an optional patch 0003 to switch from macro to inline functions. For me, personally, use of macro functions is justified if we are dealing with different argument types, to make polymorphic call. Which is not the case here. So, we can have more control over types and still generate the same code in terms of speed. See https://godbolt.org/z/KM8voadhs Starting from O1 function is inlined, thus no overhead is noticeable. Anyway, it's up to the actual commiter to decide does it worth it or not. Again, this particular patch 0003 is completely optional. As always, any opinions and reviews are very welcome! [0] https://www.postgresql.org/message-id/flat/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi#61d5a0e1cf6ab94b0e8aae8559bc4cf7 -- Best regards, Maxim Orlov. v1-0003-Switch-from-macro-functions-to-inline-in-multixac.patch Description: Binary data v1-0002-Fix-using-of-MultiXactOffset-type-insted-of-Trans.patch Description: Binary data v1-0001-Get-rid-of-misleading-xid-arg-name-in-multixact.patch Description: Binary data
Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?
Hi! Maybe, I'm too bold, but looks like a kinda bug to me. At least, I don't understand why we do not check the HEAP_XMAX_INVALID flag. My guess is nobody noticed, that MultiXactIdIsValid call does not check the mentioned flag in the "first" condition, but it's all my speculation. Does anyone know if there are reasons to deliberately ignore the HEAP_XMAX INVALID flag? Or this is just an unfortunate oversight. PFA, my approach on this issue. -- Best regards, Maxim Orlov. v2-0001-Invalidate-xmax-if-HEAP_XMAX_INVALID-is-set.patch Description: Binary data
Vectorization of some functions and improving pg_list interface
Hi! Recently, I've been playing around with pg_lists and realize how annoying (maybe, I was a bit tired) some stuff related to the lists. For an example, see this code List *l1 = list_make4(1, 2, 3, 4), *l2 = list_make4(5, 6, 7, 8), *l3 = list_make4(9, 0, 1, 2); ListCell *lc1, *lc2, *lc3; forthree(lc1, l1, lc2, l2, lc3, l3) { ... } list_free(l1); list_free(l2); list_free(l3); There are several questions: 1) Why do I need to specify the number of elements in the list in the function name? Compiler already knew how much arguments do I use. 2) Why I have to call free for every list? I don't know how to call it right, for now I call it vectorization. Why not to use simple wrapper to "vectorize" function args? So, my proposal is: 1) Add a simple macro to "vectorize" functions. 2) Use this macro to "vectorize" list_free and list_free_deep functions. 3) Use this macro to "vectorize" bms_free function. 4) "Vectorize" list_makeN functions. For this V1 version, I do not remove all list_makeN calls in order to reduce diff, but I'll address this in future, if it will be needed. In my view, one thing still waiting to be improved if foreach loop. It is not very handy to have a bunch of similar calls foreach, forboth, forthree and etc. It will be ideal to have single foreach interface, but I don't know how to do it without overall interface of the loop. Any opinions are very welcome! -- Best regards, Maxim Orlov. v1-0002-Make-use-of-vectorized-lists_free-and-lists_free_.patch Description: Binary data v1-0004-vectorize-list_make-functions.patch Description: Binary data v1-0001-Add-lists_free-and-lists_free_deep.patch Description: Binary data v1-0003-Make-use-of-vectorized-bmss_free-function.patch Description: Binary data