from:"Maxim Orlov"

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-02-21 Thread Maxim Orlov

I've updated the patch due to recent changes by Daniel Gustafsson
(549ec201d6132b7).

-- 
Best regards,
Maxim Orlov.


v10-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch
Description: Binary data

Re: Problem with moderation of messages with patched attached.

2022-03-03 Thread Maxim Orlov

On Thu, Mar 3, 2022 at 3:31 PM Pavel Borisov  wrote:

> Hi, hackers!
>
> Around 2 months ago I've noticed a problem that messages containing
> patches in the thread [1] were always processed with manual moderation.
> They appear in hackers' thread hours after posting None of them are from
> new CF members and personally, I don't see a reason for such inconvenience.
> The problem still exists as of today.
>
> Can someone make changes in a moderation engine to make it more liberal
> and convenient for authors?
>
> [1]
> https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
>
> --
> Best regards,
> Pavel Borisov
>
> Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
>

Confirm

-- 
Best regards,
Maxim Orlov.

Re: [PATCH] Automatic HASH and LIST partition creation

2020-12-21 Thread Maxim Orlov


On 2020-12-18 21:54, Pavel Borisov wrote:

I've realized one strange effect in current grammar parsing: if I do

CREATE TABLE foo (a int) PARTITION BY LIST (a) CONFIGURATION (a 1);
ERROR:  unrecognized auto partition bound specification "a"

I consulted the patch code and realized that in fact, the patch
considers it the (invalid) HASH bounds (doesn't find a word 'modulus')
unless it is specified to be (still invalid) LIST. This is due to the
fact that the grammar parser is not context-aware and in the patch, we
tried to avoid the new parser keyword MODULUS. The effect is that
inside a CONFIGURATION parentheses in case of HASH bounds we don't
have a single keyword for the parser to determine it is really a HASH
case.

It doesn't make the patch work wrongly, besides it checks the validity
of all types of bounds in the HASH case even when the partitioning is
not HASH. I find this slightly bogus. This is because the parser can
not determine the type of partitioning inside the configuration clause
and this makes adding new syntax (e.g. adding RANGE partitioning
configuration inside CONFIGURATION parentheses) complicated.

So I have one more syntax proposal: to have separate keywords inside
CONFIGURATION parentheses for each partitioning type.
E.g:
CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES
IN (1,2),(3,4) DEFAULT PARTITION foo_def);
CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES
WITH MODULUS 3);
CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES
FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def);

This proposal is in accordance with the current syntax of declarative
partitioning: CREATE TABLE foo_1 PARTITION OF foo FOR VALUES ...

Some more facultative proposals incremental to the abovementioned:
1. Omit CONFIGURATION with/without parentheses. This makes syntax
closer to (non-automatic) declarative partitioning syntax but the
clause seems less legible (in my opinion).
2. Omit just FOR VALUES. This makes the clause short, but adds a
difference to (non-automatic) declarative partitioning syntax.

 I'm planning also to add RANGE partitioning syntax to this in the
future and I will be happy if all three types of the syntax could come
along easily.

 I very much appreciate your views on this especially regarding that
changes can be still made easily because the patch is not committed
yet.

--

Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com [1]


Links:
--
[1] http://www.postgrespro.com


In my view, next expressions are the golden ground here. On one hand, 
not far from the original
non-automatic declarative partitioning syntax syntax, on the other hand, 
omit CONFIGURATION
key-word (which is redundant here in terms of gram parsing) makes this 
expressions less apprehensible for the human.


CREATE TABLE foo(a int) PARTITION BY LIST(a) CONFIGURATION (FOR VALUES 
IN (1,2),(3,4) DEFAULT PARTITION foo_def);
CREATE TABLE foo(a int) PARTITION BY HASH(a) CONFIGURATION (FOR VALUES 
WITH MODULUS 3);
CREATE TABLE foo(a int) PARTITION BY RAGE(a) CONFIGURATION (FOR VALUES 
FROM 1 TO 1000 INTERVAL 10 DEFAULT PARTITION foo_def);


In addition to that, adding RANGE PARTITION would be much simpler since 
we would have specific "branches" in gram instead of using 
context-sensitive grammar and dealing with it in c-code.

---
Best regards,
Maxim Orlov.

SSL negotiation error on massive connect/disconnect

2021-03-01 Thread Maxim Orlov


Hi!

I have primary server on port 55942 and two standbys on 55943 and 55944. 
Then use connection string like 
"postgresql://127.0.0.1:55942,127.0.0.1:55943,127.0.0.1:55944/postgres" 
to connect to the servers via psql.


Everything works perfectly no matter how many attempts to connect I do.
But when I stop primary server, very rarely I do get an error "received 
invalid response to SSL negotiation" from psql. I got this error when I 
tried to make massive connects/disconnects test and it's unlikely to 
reproduce manually without thousands of connections sequentially with no 
intentional delay in between.


The problem present only on Linux, MacOS works fine.

As far as I understand this particular problem is because of postgresql 
gets "zero" (i.e. 0) byte in SSLok in 
PQconnectPoll@src/interfaces/libpq/fe-connect.c. This lead to select 
"else" branch with described error message. This may be fixed by 
handling zero byte as 'E' byte. But I'm not sure if it's good solution, 
since I don't know why fe-connect gets an zero byte at all.


I consider it's worth to correct this. This error is rare but it's 
really odd to notice this unexpected and wrong behavior.


---
Best regards,
Maxim Orlov.From 860f14c99ec70fcea58897c97e218b8a54286a39 Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Mon, 1 Mar 2021 16:47:41 +0300
Subject: [PATCH] WIP

---
 src/interfaces/libpq/Makefile   |   3 +
 src/interfaces/libpq/fe-connect.c   |   2 +-
 src/interfaces/libpq/t/004-multihost.pl | 143 
 3 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 src/interfaces/libpq/t/004-multihost.pl

diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index f74677eaf9b..98fd1d694ef 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -120,6 +120,9 @@ install: all installdirs install-lib
 installcheck:
 	$(MAKE) -C test $@
 
+check:
+	$(prove_check)
+
 installdirs: installdirs-lib
 	$(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)'
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index db71fea169c..43c14a4433a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -3023,7 +3023,7 @@ keep_going:		/* We will come back to here until there is
 		conn->status = CONNECTION_MADE;
 		return PGRES_POLLING_WRITING;
 	}
-	else if (SSLok == 'E')
+	else if (SSLok == 'E' || SSLok == '\0')
 	{
 		/*
 		 * Server failure of some sort, such as failure to
diff --git a/src/interfaces/libpq/t/004-multihost.pl b/src/interfaces/libpq/t/004-multihost.pl
new file mode 100644
index 000..116f4a82635
--- /dev/null
+++ b/src/interfaces/libpq/t/004-multihost.pl
@@ -0,0 +1,143 @@
+# target_server_type
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+use Scalar::Util qw(blessed);
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_master->start;
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby 1 linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1);
+$node_standby_1->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_standby_1->start;
+
+# Create streaming standby 2 linking to master
+my $node_standby_2 = get_new_node('standby_2');
+$node_standby_2->init_from_backup($node_master, $backup_name, has_streaming => 1);
+$node_standby_2->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_standby_2->start;
+
+sub get_host_port
+{
+	my $node = shift;
+	return "$PostgresNode::test_localhost:" . $node->port;
+}
+
+sub multiconnstring
+{
+	my $nodes= shift;
+	my $database = shift || "postgres";
+	my $params   = shift;
+	my $extra= "";
+	if ($params)
+	{
+		my @cs;
+		while (my ($key, $val) = each %$params)
+		{
+			push @cs, $key . "=" . $val;
+		}
+		$extra = "?" . join("&", @cs);
+	}
+	my $str =
+		"postgresql://"
+		. join(",", map({ get_host_port($_) } @$nodes))
+		. "/$database$extra";
+
+	return $str;
+}
+
+#
+# Copied from PosgresNode.pm passing explicit connect-string instead of
+# constructed from object
+#
+
+sub psql2
+{
+	# We expect dbname t

Re: Pre-allocating WAL files

2021-12-30 Thread Maxim Orlov

I did check the patch too and found it to be ok. Check and check-world are
passed.
Overall idea seems to be good in my opinion, but I'm not sure where is the
optimal place to put the pre-allocation.

On Thu, Dec 30, 2021 at 2:46 PM Pavel Borisov 
wrote:

> > pre-allocating during checkpoints.  I've done a few pgbench runs, and
>> > it seems to work pretty well.  Initialization is around 15% faster,
>> > and I'm seeing about a 5% increase in TPS with a simple-update
>> > workload with wal_recycle turned off.  Of course, these improvements
>> > go away once segments can be recycled.
>>
>
> I've checked the patch v7. It applies cleanly, code is good, check-world
> tests passed without problems.
> I think it's ok to use checkpointer for this and the overall patch can be
> committed. But the seen performance gain makes me think again before adding
> this feature. I did tests myself a couple of months ago and got similar
> results.
> Really don't know whether is it worth the effort.
>
> Wish you and all hackers happy New Year!
> --
> Best regards,
> Pavel Borisov
>
> Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
>


-- 
---
Best regards,
Maxim Orlov.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-08 Thread Maxim Orlov

>
>Perhaps we can merge some of the code cleanup that it contained, such
> as using XID_FMT everywhere and creating a type for the kind of page
> returned by TransactionIdToPage() to make the code cleaner.
>

Agree, I think this is a good idea.


> Is your patch functionally the same as the PostgresPro
> implementation?
>

Yes, it is. It basically is PostgresPro implementation, not a concept or
smth.


-- 
Best regards,
Maxim Orlov.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-08 Thread Maxim Orlov

> Is there any documentation or README explaining this whole 64-bit XID
> mechanism?
>
There is none, unfortunately. I would come back to this later.


> Could you tell me what happens if new tuple with XID larger than xid_base
> + 0x is inserted into the page? Such new tuple is not allowed to be
> inserted into that page? Or xid_base and xids of all existing tuples in the
> page are increased? Also what happens if one of those xids (of existing
> tuples) cannot be changed because the tuple still can be seen by
> very-long-running transaction?
>
All this mechanism is around heap_insert/heap_update by
calling heap_page_prepare_for_xid() and if it fails (due to tuple still
visible) error is raised. Also If xid_base shift is not viable, it will try
to remove old tuples.

-- 
Best regards,
Maxim Orlov.

Re: UNIQUE null treatment option

2022-01-13 Thread Maxim Orlov

I find this patch useful. It includes changes in documentation and tests.
Code itself looks reasonable to me. Since, unique constraint check is done
by corresponding btree index, it makes this feature implementation
elegant and lightweight.

In my view, it is sufficient that heap relation can have different nulls
treatment in unique constraints for different unique columns. For example:
CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT
DISTINCT);

All the tests are running ok on Linux and MacOS X.

Although, patch doesn't apply with default git apply options. Only with the
"three way merge" option (-3). Consider rebasing it, please. Then, in my
view, it can be "Ready for committer".
-- 
Best regards,
Maxim Orlov.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-01-14 Thread Maxim Orlov

The code seems to be in good condition. All the tests are running ok with
no errors.

I like the whole idea of shifting additional checkpointer jobs as much
as possible to another worker. In my view, it is more appropriate to call
this worker "bg cleaner" or "bg file cleaner" or smth.

It could be useful for systems with high load, which may deal with deleting
many files at once, but I'm not sure about "small" installations. Extra bg
worker need more resources to do occasional deletion of small amounts of
files. I really do not know how to do it better, maybe to have two
different code paths switched by GUC?

Should we also think about adding WAL preallocation into custodian worker
from the patch "Pre-alocationg WAL files" [1] ?

[1]
https://www.postgresql.org/message-id/flat/20201225200953.jjkrytlrzojbn...@alap3.anarazel.de
-- 
Best regards,
Maxim Orlov.

Failed assertion on standby while shutdown

2021-03-19 Thread Maxim Orlov


Hi, haсkers!

Recently, I was doing some experiments with primary/standby instances 
interaction. In certain conditions I’ve got and was able to reproduce 
crash on failed assertion.


The scenario is the following:
1. start primary server
2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432 
-D data
3. apply some load to the primary server by pgbench -p5432 -i -s 150 
postgres

4. kill primary server (with kill -9) and keep it down
5. stop standby server by pg_ctl
6. run standby server

Then any standby server termination will result in a failed assertion.

The log with a backtrace is following:

2021-03-19 18:54:25.352 MSK [3508443] LOG:  received fast shutdown 
request
2021-03-19 18:54:25.379 MSK [3508443] LOG:  aborting any active 
transactions
TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: 
"/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", 
Line: 592, PID: 3508452)

postgres: walreceiver (ExceptionalCondition+0xd0)[0x55d0526f]
postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x55b43e31]
postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x5574ae32]
postgres: walreceiver (+0x530bff)[0x55a84bff]
postgres: walreceiver (+0x531044)[0x55a85044]
postgres: walreceiver (+0x530959)[0x55a84959]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x77a303c0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x772a40da]
postgres: walreceiver (+0x52bea4)[0x55a7fea4]
postgres: walreceiver (PostmasterMain+0x129f)[0x55a7f7c1]
postgres: walreceiver (+0x41ff1f)[0x55973f1f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x771b30b3]
postgres: walreceiver (_start+0x2e)[0x5561abfe]

After a brief investigation I found out that I can get this assert with 
100% probability if I insert a sleep for about 5 sec into 
InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c:


diff --git a/src/backend/storage/lmgr/proc.c 
b/src/backend/storage/lmgr/proc.c

index 897045ee272..b5f365f426d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -525,7 +525,7 @@ InitAuxiliaryProcess(void)

if (MyProc != NULL)
elog(ERROR, "you already exist");
-
+   pg_usleep(500L);
/*
 * We use the ProcStructLock to protect assignment and releasing 
of

 * AuxiliaryProcs entries.

Maybe, this kinda behaviour would appear if a computer hosting instances 
is under significant side load, which cause delay to start db-instances 
under a heavy load.


Configuration for a primary server is default with "wal_level = logical"

Configuration for a standby server is default with "wal_level = logical" 
and "primary_conninfo = 'port=5432'"


I'm puzzled with this behavor. I'm pretty sure it is not what should be. 
Any ideas how this can be fixed?


---
Best regards,
Maxim Orlov.

Re: Failed assertion on standby while shutdown

2021-03-23 Thread Maxim Orlov


On 2021-03-22 16:40, Fujii Masao wrote:

On 2021/03/20 2:25, Maxim Orlov wrote:

Hi, haсkers!

Recently, I was doing some experiments with primary/standby instances 
interaction. In certain conditions I’ve got and was able to reproduce 
crash on failed assertion.


The scenario is the following:
1. start primary server
2. start standby server by pg_basebackup -P -R -X stream -c fast 
-p5432 -D data
3. apply some load to the primary server by pgbench -p5432 -i -s 150 
postgres

4. kill primary server (with kill -9) and keep it down
5. stop standby server by pg_ctl
6. run standby server

Then any standby server termination will result in a failed assertion.

The log with a backtrace is following:

2021-03-19 18:54:25.352 MSK [3508443] LOG:  received fast shutdown 
request
2021-03-19 18:54:25.379 MSK [3508443] LOG:  aborting any active 
transactions
TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", 
File: 
"/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", 
Line: 592, PID: 3508452)

postgres: walreceiver (ExceptionalCondition+0xd0)[0x55d0526f]
postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x55b43e31]
postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x5574ae32]
postgres: walreceiver (+0x530bff)[0x55a84bff]
postgres: walreceiver (+0x531044)[0x55a85044]
postgres: walreceiver (+0x530959)[0x55a84959]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x77a303c0]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x772a40da]
postgres: walreceiver (+0x52bea4)[0x55a7fea4]
postgres: walreceiver (PostmasterMain+0x129f)[0x55a7f7c1]
postgres: walreceiver (+0x41ff1f)[0x55973f1f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x771b30b3]
postgres: walreceiver (_start+0x2e)[0x5561abfe]

After a brief investigation I found out that I can get this assert 
with 100% probability if I insert a sleep for about 5 sec into 
InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c:


diff --git a/src/backend/storage/lmgr/proc.c 
b/src/backend/storage/lmgr/proc.c

index 897045ee272..b5f365f426d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -525,7 +525,7 @@ InitAuxiliaryProcess(void)

     if (MyProc != NULL)
     elog(ERROR, "you already exist");
-
+   pg_usleep(500L);
     /*
  * We use the ProcStructLock to protect assignment and 
releasing of

  * AuxiliaryProcs entries.


Thanks for the report! I could reproduce this issue by adding that 
sleep

into InitAuxiliaryProcess().

Maybe, this kinda behaviour would appear if a computer hosting 
instances is under significant side load, which cause delay to start 
db-instances under a heavy load.


Configuration for a primary server is default with "wal_level = 
logical"


Configuration for a standby server is default with "wal_level = 
logical" and "primary_conninfo = 'port=5432'"


I'm puzzled with this behavor. I'm pretty sure it is not what should 
be. Any ideas how this can be fixed?


ISTM that the cause of this issue is that the startup process exits
without releasing the locks that it was holding when shutdown is
requested. To address this issue, IMO the startup process should
call ShutdownRecoveryTransactionEnvironment() at its exit.
Attached is the POC patch that changes the startup process that way.

I've not tested the patch enough yet..

Regards,


Thank you for reply! As far as I understand, this is really the case. 
I've test your patch a bit. This annoying failed assertion is gone now.


I think I should test more and report later about results.

Should we put this patch to CF?

---
Best regards,
Maxim Orlov.

Re: Failed assertion on standby while shutdown

2021-03-30 Thread Maxim Orlov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

All the tests passed successfully.

The new status of this patch is: Ready for Committer

Re: Failed assertion on standby while shutdown

2021-03-31 Thread Maxim Orlov


On 2021-03-30 20:44, Maxim Orlov wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

All the tests passed successfully.

The new status of this patch is: Ready for Committer


The patch is good. One note, should we put a comment about 
ShutdownRecoveryTransactionEnvironment not reentrant behaviour? Or maybe 
rename it to ShutdownRecoveryTransactionEnvironmentOnce?


---
Best regards,
Maxim Orlov.

Re: Failed assertion on standby while shutdown

2021-04-05 Thread Maxim Orlov


On 2021-04-01 15:02, Fujii Masao wrote:

On 2021/03/31 19:51, Maxim Orlov wrote:

On 2021-03-30 20:44, Maxim Orlov wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    not tested

All the tests passed successfully.

The new status of this patch is: Ready for Committer


The patch is good. One note, should we put a comment about 
ShutdownRecoveryTransactionEnvironment not reentrant behaviour? Or 
maybe rename it to ShutdownRecoveryTransactionEnvironmentOnce?


+1 to add more comments into ShutdownRecoveryTransactionEnvironment().
I did that. What about the attached patch?

Regards,


Well done! In my view is just what it's needed.

---
Best regards,
Maxim Orlov.

Re: SSL negotiation error on massive connect/disconnect

2021-04-09 Thread Maxim Orlov


On 2021-03-01 17:22, Maxim Orlov wrote:

Hi!

I have primary server on port 55942 and two standbys on 55943 and
55944. Then use connection string like
"postgresql://127.0.0.1:55942,127.0.0.1:55943,127.0.0.1:55944/postgres"
to connect to the servers via psql.

Everything works perfectly no matter how many attempts to connect I do.
But when I stop primary server, very rarely I do get an error
"received invalid response to SSL negotiation" from psql. I got this
error when I tried to make massive connects/disconnects test and it's
unlikely to reproduce manually without thousands of connections
sequentially with no intentional delay in between.

The problem present only on Linux, MacOS works fine.

As far as I understand this particular problem is because of
postgresql gets "zero" (i.e. 0) byte in SSLok in
PQconnectPoll@src/interfaces/libpq/fe-connect.c. This lead to select
"else" branch with described error message. This may be fixed by
handling zero byte as 'E' byte. But I'm not sure if it's good
solution, since I don't know why fe-connect gets an zero byte at all.

I consider it's worth to correct this. This error is rare but it's
really odd to notice this unexpected and wrong behavior.

---
Best regards,
Maxim Orlov.


Correction. Previous patch was wrong. The proper one is here.

---
Best regards,
Maxim Orlov.From fcc2392b1942d58e1de6749b65589757fb35322f Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Thu, 8 Apr 2021 18:12:46 +0300
Subject: [PATCH] WIP-2

---
 src/interfaces/libpq/Makefile   |   3 +
 src/interfaces/libpq/fe-connect.c   |   5 +
 src/interfaces/libpq/t/004-multihost.pl | 143 
 3 files changed, 151 insertions(+)
 create mode 100644 src/interfaces/libpq/t/004-multihost.pl

diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 0c4e55b6ad3..aa9328f98b0 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -120,6 +120,9 @@ install: all installdirs install-lib
 installcheck:
 	$(MAKE) -C test $@
 
+check:
+	$(prove_check)
+
 installdirs: installdirs-lib
 	$(MKDIR_P) '$(DESTDIR)$(includedir)' '$(DESTDIR)$(includedir_internal)' '$(DESTDIR)$(datadir)'
 
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index aa654dd6a8e..8aa017d858e 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2319,6 +2319,7 @@ keep_going:		/* We will come back to here until there is
 		conn->try_next_addr = false;
 	}
 
+keep_going2:
 	/* Time to advance to next connhost[] entry? */
 	if (conn->try_next_host)
 	{
@@ -3081,6 +3082,10 @@ keep_going:		/* We will come back to here until there is
 		conn->status = CONNECTION_AWAITING_RESPONSE;
 		goto keep_going;
 	}
+	else if (SSLok == '\0')
+	{
+		goto keep_going2;
+	}
 	else
 	{
 		appendPQExpBuffer(&conn->errorMessage,
diff --git a/src/interfaces/libpq/t/004-multihost.pl b/src/interfaces/libpq/t/004-multihost.pl
new file mode 100644
index 000..116f4a82635
--- /dev/null
+++ b/src/interfaces/libpq/t/004-multihost.pl
@@ -0,0 +1,143 @@
+# target_server_type
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+use Scalar::Util qw(blessed);
+
+# Initialize master node
+my $node_master = get_new_node('master');
+$node_master->init(allows_streaming => 1);
+$node_master->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_master->start;
+
+# Take backup
+my $backup_name = 'my_backup';
+$node_master->backup($backup_name);
+
+# Create streaming standby 1 linking to master
+my $node_standby_1 = get_new_node('standby_1');
+$node_standby_1->init_from_backup($node_master, $backup_name, has_streaming => 1);
+$node_standby_1->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_standby_1->start;
+
+# Create streaming standby 2 linking to master
+my $node_standby_2 = get_new_node('standby_2');
+$node_standby_2->init_from_backup($node_master, $backup_name, has_streaming => 1);
+$node_standby_2->append_conf('postgresql.conf', "listen_addresses = '$PostgresNode::test_localhost'");
+$node_standby_2->start;
+
+sub get_host_port
+{
+	my $node = shift;
+	return "$PostgresNode::test_localhost:" . $node->port;
+}
+
+sub multiconnstring
+{
+	my $nodes= shift;
+	my $database = shift || "postgres";
+	my $params   = shift;
+	my $extra= "";
+	if ($params)
+	{
+		my @cs;
+		while (my ($key, $val) = each %$params)
+		{
+			push @cs, $key . "=" . $val;
+		}
+		$extra = "?" . join("&", @cs);
+

[PATCH] lazy relations delete

2019-12-31 Thread Maxim Orlov


Hi!

Here is the case.

Assume we have a master to slave replication with shared_buffers set up 
to 2 GB at the master and 4 GB at the slave. All of the data is written 
to the master, while reading occurs from slave.


Now we decided to drop many tables, let's say 1000 or 1 not in a 
single transaction, but each table in a separate one. So, due to "plain" 
shared_buffers memory we have to do for loop for every relation which 
leads to lag between master and slave.


In real case scenario such issue lead to not a minutes lag, but hours 
lag. At the same time PostgreSQL have a great routine to delete many 
relations in a single transaction.


So, to get rid of this kind of issue here came up an idea: what if not 
to delete everyone of relations right away and just store them in an 
array, prevent shared buffers (correspond to a deleted relations) from 
been flushed. And then array reaches it max size we need to walk all 
buffers only once to "free" shared buffers correspond to a deleted 
relations.


Here some values from the test which I am made.

Without patch:

1.

(master 2 GB) - drop 1000 tables took 6 sec

(slave 4 GB) - drop 1000 tables took 8 sec

2.

(master 4 GB) - drop 1000 tables took 10 sec

(slave 8 GB) - drop 1000 tables took 16 sec

3.

(master 10 GB) - drop 1000 tables took 22 sec

(slave 20 GB) - drop 1000 tables took 38 sec


With patch:

1.

(master 2 GB) - drop 1000 tables took 2 sec

(slave 4 GB) - drop 1000 tables took 2 sec

2.

(master 4 GB) - drop 1000 tables took 3 sec

(slave 8 GB) - drop 1000 tables took 3 sec

3.

(master 10 GB) - drop 1000 tables took 4 sec

(slave 20 GB) - drop 1000 tables took 4 sec

--
Max Orlov
E-mail: m.or...@postgrespro.ru



init.sh
Description: application/shellscript


run-001.sh
Description: application/shellscript


run-001-core.sh
Description: application/shellscript
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f10a97dc78..e0d1fb6a824 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -66,8 +66,6 @@
 #define BUF_WRITTEN0x01
 #define BUF_REUSABLE			0x02
 
-#define DROP_RELS_BSEARCH_THRESHOLD		20
-
 typedef struct PrivateRefCountEntry
 {
 	Buffer		buffer;
@@ -410,6 +408,72 @@ ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref)
 	}
 }
 
+/*
+ * Lazy relations delete mechanism.
+ *
+ * On relation drop no need to walk shared buffers every time, just put a deleted
+ * RelFileNode into an array. Thus we store RelFileNodes in RelNodeDeleted
+ * struct and delete them later when number of deleted relations will
+ * exceed LAZY_DELETE_ARRAY_SIZE.
+ */
+
+#define	LAZY_DELETE_ARRAY_SIZE 128
+
+/* Wrapper for array of lazy deleted RelFileNodes. */
+typedef struct RelNodeDeletedBuffer
+{
+	RelFileNode 	rnodes[LAZY_DELETE_ARRAY_SIZE];	/* Array of deleted */
+	int 			idx;			/* Current position */
+} RelNodeDeletedBuffer;
+
+static RelNodeDeletedBuffer *RelNodeDeleted = NULL;
+
+/*
+ * Initialize shared array of lazy deleted relations.
+ *
+ * This is called once during shared-memory initialization (either in the
+ * postmaster, or in a standalone backend).
+ */
+void InitRelNodeDeletedBuffer(void)
+{
+	bool 	found;
+
+	RelNodeDeleted = ShmemInitStruct("Lazy Deleted Relations",
+sizeof(RelNodeDeletedBuffer),
+&found);
+
+	if (!found)
+		MemSet(RelNodeDeleted, 0, sizeof(RelNodeDeletedBuffer));
+}
+
+/*
+ * Compute the size of shared memory for the buffer of lazy deleted relations.
+ */
+Size
+RelNodeDeletedBufferSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, sizeof(RelNodeDeletedBuffer));
+
+	return size;
+}
+
+/*
+ * Check for relation is in lazy deleted buffer (up to n-th position).
+ */
+static inline bool
+IsRelFileNodeDeleted(RelFileNode rnode, int n)
+{
+	int i;
+
+	for (i = 0; i < n; i++)
+		if (RelFileNodeEquals(rnode, RelNodeDeleted->rnodes[i]))
+			return true;
+
+	return false;
+}
+
 /*
  * BufferIsPinned
  *		True iff the buffer is pinned (also checks for valid buffer number).
@@ -452,6 +516,7 @@ static BufferDesc *BufferAlloc(SMgrRelation smgr,
 			   BlockNumber blockNum,
 			   BufferAccessStrategy strategy,
 			   bool *foundPtr);
+static void InvalidateRelFileNodesBuffers(RelFileNode *nodes, int nnodes);
 static void FlushBuffer(BufferDesc *buf, SMgrRelation reln);
 static void AtProcExit_Buffers(int code, Datum arg);
 static void CheckForBufferLeaks(void);
@@ -2690,6 +2755,22 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln)
 	char	   *bufToWrite;
 	uint32		buf_state;
 
+	LWLockAcquire(LazyRelDeleteLock, LW_SHARED);
+
+	/*
+	 * If rnode is in lazy deleted buffer clear dirty and checkpoint flags.
+	 */
+	if (IsRelFileNodeDeleted(buf->tag.rnode, RelNodeDeleted->idx))
+	{
+		buf_state = LockBufHdr(buf);
+		buf_state &= ~(BM_DIRTY | BM_CHECKPOINT_NEEDED);
+		UnlockBufHdr(buf, buf_state);
+		LWLockRelease(LazyRelDeleteLock);
+		return;
+	}
+
+	LWLockRelease(LazyRelDeleteLock);
+
 	/*
 	 * Acquire the buf

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-02-22 Thread Maxim Orlov

On Tue, 21 Feb 2023 at 16:59, Aleksander Alekseev 
wrote:

> Hi,
>
> One thing that still bothers me is that during the upgrade we only
> migrate the CLOG segments (i.e. pg_xact / pg_clog) and completely
> ignore all the rest of SLRUs:
>
> * pg_commit_ts
> * pg_multixact/offsets
> * pg_multixact/members
> * pg_subtrans
> * pg_notify
> * pg_serial

Hi! We do ignore these values, since in order to pg_upgrade the server it
must be properly stopped and no transactions can outlast this moment.

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2023-02-22 Thread Maxim Orlov

On Mon, 30 Jan 2023 at 23:23, Corey Huinker  wrote:

>
>
> I rebased, but there are no code differences.
>

The patch set seem to be in a good shape and pretty stable for quite a
while.
Could you add one more minor improvement, a new line after variables
declaration?

+   int exit_code =
wait_result_to_exit_code(result);
+   charbuf[32];
...here
+   snprintf(buf, sizeof(buf), "%d", exit_code);
+   SetVariable(pset.vars, "SHELL_EXIT_CODE", buf);
+   SetVariable(pset.vars, "SHELL_ERROR", "true");

+   charexit_code_buf[32];
... and here
+   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
+wait_result_to_exit_code(exit_code));
+   SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+   SetVariable(pset.vars, "SHELL_ERROR", "true");

After this changes, I think, we make this patch RfC, shall we?

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-02-28 Thread Maxim Orlov

On Mon, 27 Feb 2023 at 12:10, Heikki Linnakangas  wrote:

> (CC list trimmed, gmail wouldn't let me send otherwise)
>
> That sounds right for pg_serial, pg_notify, and pg_subtrans. But not for
> pg_commit_ts and the pg_multixacts.
>
> This needs tests for pg_upgrading those SLRUs, after 0, 1 and N
> wraparounds.
>
> Yep, that's my fault. I've forgotten about pg_multixacts. But for the
pg_commit_ts it's a bit complicated story.
The thing is, if we do upgrade, old files from pg_commit_ts not copied into
a new server.

For example, I've checked one more time on the current master branch:
1). initdb
2). add "track_commit_timestamp = on" into postgresql.conf
3). pgbench
4). $ ls  pg_commit_ts/
  0005  000A  000F  0014  0019  001E  0023...
...009A  009F  00A4  00A9  00AE  00B3  00B8  00BD  00C2
5). do pg_upgrade
6). $ ls  pg_commit_ts/
  00C2

Either I do not understand something, or the files from pg_commit_ts
directory are not copied.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-03-07 Thread Maxim Orlov

On Tue, 17 Jan 2023 at 16:33, Aleksander Alekseev 
wrote:

> Hi hackers,
>
> Maxim, perhaps you could share with us what your reasoning was here?
>
> I'm really sorry for late response, but better late than never. Yes, we
can not access shared memory without lock.
In this particular case, we use XidGenLock. That is why we use lock
argument to take it is it was not taken previously.
Actually, we may place assertion in this insist.

As for xid compare: we do not compare xids here, we are checking for
wraparound, so, AFAICS, this code is correct.



On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas  wrote:

>
> 1. Use 64 bit page numbers in SLRUs (this patch)
>
> 2. Use the larger segment file names in async.c, to lift the current 8
> GB limit on the max number of pending notifications.
>
> 3. Extend pg_multixact so that pg_multixact/members is addressed by
> 64-bit offsets.
>
> 4. Extend pg_subtrans to 64-bits.
>
> 5. Extend pg_xact to 64-bits.
>
>
> 6. (a bonus thing that I noticed while thinking of pg_xact.) Extend
> pg_twophase.c, to use FullTransactionIds.
>
> Currently, the twophase state files in pg_twophase are named according
> to the 32 bit Xid of the transaction. Let's switch to FullTransactionId
> there.
>
> ...
>
> I propose that we try to finish 1 and 2 for v16. And maybe 6. I think
> that's doable. It doesn't have any great user-visible benefits yet, but
> we need to start somewhere.
>
> - Heikki
>
> Yes, this is a great idea! My only concern here is that we're going in
circles here. You see, patch 1 is what was proposed
in the beginning of this thread. Anyway, I will be happy if we are being
able to push this topic forward.

As for making pg_multixact 64 bit, I spend the last couple of days to make
proper pg_upgrade for pg_multixact's and for pg_xact's
with wraparound and I've understood, that it is not a simple task compare
to pg_xact's. The problem is, we do not have epoch for
multixacts, so we do not have ability to "overcome" wraparound. The
solution may be adding some kind of epoch for multixacts or
make them 64 bit in "main" 64-xid patch, but in perspective of this thread,
in my view, this should be last in line here.

In pg_xact we do not have such a problem, we do have epoch for transacions,
so conversion should be pretty obvious:
 -> 
0001 -> 0001
...
0FFE -> 0FFE
0FFF -> 0FFF
 -> 0001
0001 -> 00010001

So, in my view, the plan should be:
1. Use internal 64 bit page numbers in SLRUs without changing segments
naming.
2. Use the larger segment file names in async.c, to lift the current 8 GB
limit on the max number of pending notifications.
3. Extend pg_xact to 64-bits.
4. Extend pg_subtrans to 64-bits.
5. Extend pg_multixact so that pg_multixact/members is addressed by 64-bit
offsets.
6. Extend pg_twophase.c, to use FullTransactionIds. (a bonus thing)

Thoughts?

-- 
Best regards,
Maxim Orlov.

On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas  wrote:

> On 01/03/2023 12:21, Aleksander Alekseev wrote:
> > Hi,
> >
> >> I'm surprised that these patches extend the page numbering to 64 bits,
> >> but never actually uses the high bits. The XID "epoch" is not used, and
> >> pg_xact still wraps around and the segment names are still reused. I
> >> thought we could stop doing that.
> >
> > To clarify, the idea is to let CLOG grow indefinitely and simply store
> > FullTransactionId -> TransactionStatus (two bits). Correct?
>
> Correct.
>
> > I didn't investigate this in much detail but it may affect quite some
> > amount of code since TransactionIdDidCommit() and
> > TransactionIdDidCommit() currently both deal with TransactionId, not
> > FullTransactionId. IMO, this would be a nice change however, assuming
> > we are ready for it.
>
> Yep, it's a lot of code churn..
>
> > In the previous version of the patch there was an attempt to derive
> > FullTransactionId from TransactionId but it was wrong for the reasons
> > named above in the thread. Thus is was removed and the patch
> > simplified.
>
> Yeah, it's tricky to get it right. Clearly we need to do it at some
> point though.
>
> All in all, this is a big effort. I spent some more time reviewing this
> in the last few days, and thought a lot about what the path forward here
> could be. And I haven't looked at the actual 64-bit XIDs patch set yet,
> just this patch to use 64-bit addressing in SLRUs.
>
> This patch is the first step, but we have a bit of a chicken and egg
> problem, because this patch on its own isn't very interesting, but on
> the other hand, we need it to work on the foll

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-03-07 Thread Maxim Orlov

On Tue, 7 Mar 2023 at 15:38, Heikki Linnakangas  wrote:

>
> That is true for pg_multixact/offsets. We will indeed need to add an
> epoch and introduce the concept of FullMultiXactIds for that. However,
> we can change pg_multixact/members independently of that. We can extend
> MultiXactOffset from 32 to 64 bits, and eliminate pg_multixact/members
> wraparound, while keeping multi-xids 32 bits wide.
>
> Yes, you're totally correct. If it will be committable that way, I'm all
for that.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-03-20 Thread Maxim Orlov

Hi!

As suggested before by Heikki Linnakangas, I've added a patch for making
2PC transaction state 64-bit.
At first, my intention was to rebuild all twophase interface to use
FullTransactionId. But doing this in a proper
manner would lead to switching from TransactionId to FullTransactionId in
PGPROC and patch become too
big to handle here.

So I decided to keep it simple for now and use wrap logic trick and calc
FullTransactionId on current epoch,
since the span of active xids cannot exceed one epoch at any given time.


Patches 1 and 2 are the same as above.

-- 
Best regards,
Maxim Orlov.


v57-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data


v57-0003-Make-use-FullTransactionId-in-2PC-filenames.patch
Description: Binary data


v57-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data

Re: Add SHELL_EXIT_CODE to psql

2023-03-20 Thread Maxim Orlov

I did a look at the patch, and it seems to be in a good condition.
It implements declared functionality with no visible defects.

As for test, I think it is possible to implement "signals" case in tap
tests. But let the actual commiter decide does it worth it or not.

-- 
Best regards,
Maxim Orlov.

Re: [PATCH] Add initial xid/mxid/mxoff to initdb

2023-03-21 Thread Maxim Orlov

On Mon, 20 Mar 2023 at 22:31, Gregory Stark (as CFM) 
wrote:

>
> So is that other thread tracked in a different commitfest entry and
> this one completely redundant? I'll mark it Rejected then?
>

Yep, it appears so.

-- 
Best regards,
Maxim Orlov.

Refactoring of pg_resetwal/t/001_basic.pl

2024-03-21 Thread Maxim Orlov

Hi!

In commit 7b5275eec more tests and test coverage were added into
pg_resetwal/t/001_basic.pl.
All the added stuff are pretty useful in my view.  Unfortunately, there
were some magic constants
been used.  In overall, this is not a problem.  But while working on 64 bit
XIDs I've noticed these
changes and spent some time to figure it out what this magic values are
stands fore.

And it turns out that I’m not the only one.

So, by Svetlana Derevyanko's suggestion, I made this patch.  I add
constants, just like we did
in verify_heapam tests.

Sidenote here: in defines in multixact.c TransactionId type used, but I'm
sure this is not correct,
since we're dealing here with MultiXactId and MultiXactOffset.  For now,
this is obviously not a
problem, since sizes of this particular types are equal.  But this will
manifest itself when we switch
to the 64 bits types for MultiXactOffset or MultiXactId.

As always, reviews and opinions are very welcome!

-- 
Best regards,
Maxim Orlov.


v1-0001-Refactor-pg_resetwal-t-001_basic.pl.patch
Description: Binary data

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

2024-03-22 Thread Maxim Orlov

I've noticed this patch and had a quick look at it.  As far as I
understand, this bug
does not lead to an incorrect matching, resulting only in degradation in
speed.
Anyway, consider this patch useful, hope it will be committed soon.

-- 
Best regards,
Maxim Orlov.

Re: Refactoring of pg_resetwal/t/001_basic.pl

2024-03-22 Thread Maxim Orlov

On Fri, 22 Mar 2024 at 01:08, Peter Eisentraut  wrote:

> Please send a separate patch for this if you want to propose any changes.
>
>
Thank you for your reply.  Here is the second one.  I've change types and
argument
names for the macro functions, so that they better reflect the reality.

-- 
Best regards,
Maxim Orlov.


v1-0002-Use-proper-types-in-defines-in-multixact.c.patch
Description: Binary data

Re: POC: GROUP BY optimization

2024-02-21 Thread Maxim Orlov

Hi!

Another issue on test introduced in 0452b461bc405. I think it may be
unstable in some circumstances.
For example, if we'll try to use different BLCKSZ. See, I've made a little
change in the number of tuples to be inserted:

$ git diff
diff --git a/src/test/regress/sql/aggregates.sql
b/src/test/regress/sql/aggregates.sql
index d6ed5d0eff..414078d4ec 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,1) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;

And the bulk extension is kicked, so we got zeroed pages in the relation.
The plane is also changed,
switched to seq scan from index scan:
@@ -2734,7 +2734,7 @@
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,1) AS i;
+FROM generate_series(1,11900) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 -- GROUP BY optimization by reorder columns by frequency
@@ -2760,62 +2760,57 @@

 -- Engage incremental sort
 explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
-   QUERY PLAN
--
+  QUERY PLAN
+--
  Group
Group Key: x, y, z, w
-   ->  Incremental Sort
+   ->  Sort
  Sort Key: x, y, z, w
- Presorted Key: x, y
- ->  Index Scan using btg_x_y_idx on btg
-(6 rows)
+ ->  Seq Scan on btg
+(5 rows)
... and so on.

So, my proposal is simple. I think we need not just "ANALYZE btg", but
"VACUUM ANALYZE btg", to get rid of zeroed pages in this particular
case. PFA corresponding patch.

-- 
Best regards,
Maxim Orlov.


0001-Force-VACUUM-to-avoid-zeroed-pages-from-bulk-extensi.patch
Description: Binary data

Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2024-03-07 Thread Maxim Orlov

Quite an interesting patch, in my opinion.  I've decided to work on it a
bit, did some refactoring (sorry) and add
basic tests. Also, I try to take into account as much as possible notes on
the patch, mentioned by Cédric Villemain.

> and maybe better to go with FlushOneBuffer() ?
It's a good idea, but I'm not sure at the moment.  I'll try to dig some
deeper into it.  At least, FlushOneBuffer does
not work for a local buffers.  So, we have to decide whatever
pg_buffercache_invalidate should or should not
work for local buffers.  For now, I don't see why it should not.  Maybe I
miss something?

-- 
Best regards,
Maxim Orlov.


v4-0001-Invalidate-Buffer-By-Bufnum.patch
Description: Binary data

Re: Add Index-level REINDEX with multiple jobs

2024-03-11 Thread Maxim Orlov

On Tue, 6 Feb 2024 at 09:22, Michael Paquier  wrote:

>
> The problem may be actually trickier than that, no?  Could there be
> other factors to take into account for their classification, like
> their sizes (typically, we'd want to process the biggest one first, I
> guess)?

Sorry for a late reply.  Thanks for an explanation.  This is sounds
reasonable to me.
Svetlana had addressed this in the patch v2.

-- 
Best regards,
Maxim Orlov.

Re: CI speed improvements for FreeBSD

2024-03-12 Thread Maxim Orlov

Hi!

I looked at the changes and I liked them.  Here are my thoughts:

0001:
1. I think, this is a good idea to use RAM.  Since, it's still a UFS, and
we lose nothing in terms of testing, but win in speed significantly.
2. Change from "swapoff -a || true" to "swapoff -a" is legit in my view,
since it's better to explicitly fail than silent any possible problem.
3. Man says that lowercase suffixes should be used for the mdconfig. But in
fact, you can use either lowercase or an appercase. Yep, it's in
   the mdconfig.c: "else if (*p == 'g' || *p == 'G')".

0002:
1. The resource usage should be a bit higher, this is for sure.  But, if
I'm not missing something, not drastically. Anyway, I do not know
   how to measure this increase to get concrete values.
2. And think of a potential benefits of increasing the number of test jobs:
more concurrent processes, more interactions, better test coverage.

Here are my runs:
FreeBSD @master
https://cirrus-ci.com/task/4934701194936320
Run test_world 05:56

FreeBSD @master + 0001
https://cirrus-ci.com/task/5921385306914816
Run test_world 05:06

FreeBSD @master + 0001, + 0002
https://cirrus-ci.com/task/5635288945393664
Run test_world 02:20

For comparison
Debian @master
https://cirrus-ci.com/task/5143705577848832
Run test_world 02:38

In the overall, I consider this changes useful.  CI run faster, with better
test coverage in exchange for presumably slight increase
in resource usage, but I don't think this increase should be significant.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-12-19 Thread Maxim Orlov

Hi!

As a result of discussion in the thread [0], Robert Haas proposed to focus
on making SLRU 64 bit, as a first step towards 64 bit XIDs.
Here is the patch set.

In overall, code of this patch set is based on the existing code from [0]
and may be simplified, due to the fact, that SLRU_PAGES_PER_SEGMENT is not
meant to be changed now.
But I decided to leave it that way. At least for now.

As always, reviews and opinions are very welcome.

Should we change status for this thread to "need review"?

[0]
https://www.postgresql.org/message-id/flat/CA%2BTgmoZFmTGjgkmjgkcm2-vQq3_TzcoMKmVimvQLx9oJLbye0Q%40mail.gmail.com#03a4ab82569bb7b112db4a2f352d96b9

-- 
Best regards,
Maxim Orlov.


v51-0003-Make-pg_upgrade-from-32-bit-to-64-bit-SLRU.patch
Description: Binary data


v51-0001-Use-internal-64-bit-numbering-of-SLRU-pages.patch
Description: Binary data


v51-0002-Use-64-bit-pages-representation-in-SLRU-callers.patch
Description: Binary data

False positive warning in verify_heapam.c with GCC 03

2022-12-27 Thread Maxim Orlov

Hi!

While playing with some unrelated to the topic stuff, I've noticed a
strange warning from verify_heapam.c:730:25:
warning: ‘xmax_status’ may be used uninitialized in this function.

This happens only when get_xid_status is inlined, and only in GCC with O3.
I use a GCC version 11.3.0.

For the purpose of investigation, I've created a PFA patch to force
get_xid_status
inline.

$ CFLAGS="-O3" ./configure -q && make -s -j12 >/dev/null && make -s -j12 -C
contrib
verify_heapam.c: In function ‘check_tuple_visibility’:
verify_heapam.c:730:25: warning: ‘xmax_status’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
  730 | XidCommitStatus xmax_status;
  | ^~~
verify_heapam.c:909:25: warning: ‘xmin_status’ may be used uninitialized in
this function [-Wmaybe-uninitialized]
  909 | else if (xmin_status != XID_COMMITTED)
  |

I believe, this warning is false positive, since mentioned row is
unreachable. If xid is invalid, we return from get_xid_status
XID_INVALID and could not pass
 770 if (HeapTupleHeaderXminInvalid(tuphdr))
 771 return false;···/* inserter aborted, don't check */

So, I think this warning is false positive. On the other hand, we could
simply init status variable in get_xid_status and
make this code more errors/warnings safe. Thoughts?

-- 
Best regards,
Maxim Orlov.


0002-Init-status-var-in-get_xid_status.patch
Description: Binary data


0001-Force-inline-of-get_xid_status.patch
Description: Binary data

Re: Add 64-bit XIDs into PostgreSQL 15

2022-12-28 Thread Maxim Orlov

Hi!

I want to make a quick summary here.

1. An overall consensus has been reached: we shall focus on committing SLRU
changes first.
2. I've created an appropriate patch set here [0].
3. How [0] is waiting for a review. As always, all opinions will be welcome.
4. While discussing error/warning messages and some other stuff, this
thread was marked as "Waiting on Author".
5. I do rebase this patch set once in a week, but do not post it here,
since there is no need in it. See (1).
6. For now, I don't understand what changes I have to make here. So,
does "Waiting
on Author" is appropriate status here?

Anyway. Let's discuss on-disk page format, shall we?

AFAICS, we have a following options:
1. Making "true" 64–bit XIDs. I.e. making every tuple have 64–bit xmin and
xmax fields.
2. Put special in every page where base for XIDs are stored. This is what
we have done in the current patch set.
3. Put base for XIDs in a fork.
4. Make explicit 64–bit XIDs for concrete relations. I.e. CREATE TABLE foo
WITH (xid8) of smth.

There were opinions that the proposed solution (2) is not the optimal. It
would be great to hear your concerns and thoughts.

[0]
https://www.postgresql.org/message-id/CACG%3Dezav34TL%2BfGXD5vJ48%3DQbQBL9BiwkOTWduu9yRqie-h%2BDg%40mail.gmail.com

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2022-12-28 Thread Maxim Orlov

Hi!

The patch is implementing what is declared to do. Shell return code is now
accessible is psql var.
Overall code is in a good condition. Applies with no errors on master.
Unfortunately, regression tests are failing on the macOS due to the
different shell output.

@@ -1308,13 +1308,13 @@
 deallocate q;
 -- test SHELL_EXIT_CODE
 \! nosuchcommand
-sh: line 1: nosuchcommand: command not found
+sh: nosuchcommand: command not found
 \echo :SHELL_EXIT_CODE
 127
 \set nosuchvar `nosuchcommand`
-sh: line 1: nosuchcommand: command not found
+sh: nosuchcommand: command not found
 \! nosuchcommand
-sh: line 1: nosuchcommand: command not found
+sh: nosuchcommand: command not found
 \echo :SHELL_EXIT_CODE
 127

Since we do not want to test shell output in these cases, but only return
code,
what about using this kind of commands?
postgres=# \! true > /dev/null 2>&1
postgres=# \echo :SHELL_EXIT_CODE
0
postgres=# \! false > /dev/null 2>&1
postgres=# \echo :SHELL_EXIT_CODE
1
postgres=# \! nosuchcommand > /dev/null 2>&1
postgres=# \echo :SHELL_EXIT_CODE
127

It is better to use spaces around "=".
+   if (WIFEXITED(exit_code))
+   exit_code=WEXITSTATUS(exit_code);
+   else if(WIFSIGNALED(exit_code))
+   exit_code=WTERMSIG(exit_code);
+   else if(WIFSTOPPED(exit_code))
+   exit_code=WSTOPSIG(exit_code);

-- 
Best regards,
Maxim Orlov.

Re: add \dpS to psql

2022-12-28 Thread Maxim Orlov

> Here it is: https://commitfest.postgresql.org/41/4043/
>

Hi!

The patch applies with no problem, implements what it declared, CF bot is
happy.
Without patch \dpS shows 0 rows, after applying system objects are shown.
Consider this patch useful, hope it will be committed soon.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-09 Thread Maxim Orlov

>
> Do I get it right that the proposed v51 patchset only changes the SLRU
> filenames and type of pageno representation? Is SLRU wraparound still
> exactly there after 0x byte?
>
After applying the whole patch set, SLRU will become 64–bit without a
wraparound. Thus, no wraparound
should be there.

0001 - should make SLRU internally 64–bit, no effects from "outside"
0002 - should make SLRU callers 64–bit, SLRU segment files naming are
changed
0003 - make upgrade from previous versions feasible


> Also, I do not understand what is the reason for splitting 1st and 2nd
> steps. Certainly, there must be some logic behind it, but I just can't
> grasp it...
>
As we did discuss somewhere in the beginning of the discussion, we try to
make every commit as independent as possible.
Thus, it is much easier to review and commit. I see no problem to meld
these commits into one, if consensus will be reached.


> And the purpose of the 3rd step with pg_upgrade changes is a complete
> mystery for me. Please excuse my incompetence in the topic, but maybe
> some commit message or comments would help. What kind of xact segments
> conversion we do? Why is it only necessary for xacts, but not other
> SLRUs?
>
The purpose of the third patch is to make upgrade feasible. Since we've
change pg_xact files naming,
Postgres could not read status of "old" transactions from "old" pg_xact
files. So, we have to convert those files.
The major problem here is that we must handle possible segment wraparound
(in "old" cluster).  The whole idea
for an upgrade is to read SLRU pages for pg_xact one by one and write it in
a "new" filename.

Maybe, It's just a little bit complicated, since the algorithm is intended
to deal with different SLRU pages per segment
in "new" and "old" clusters. But, on the other hand, it is already created
in original patch set of 64–bit XIDs and will be useful
in the future. AFAICS, arguably, any variant of 64–bit XIDs should lead to
increase of an amount of SLRU pages per segment.

And as for other SLRUs, they cannot survive pg_upgrade mostly by the fact,
that cluster must be stopped upon upgrade.
Thus, no conversion needed.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-09 Thread Maxim Orlov

On Fri, 6 Jan 2023 at 09:51, Japin Li  wrote:

>
> For v51-0003. We can use GetClogDirName instead of GET_MAJOR_VERSION in
> copy_subdir_files().
>

Of course! Tanks! I'll address this in the next iteration, v52.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-01-09 Thread Maxim Orlov

Hi!

Here is a new patch set.
I've added comments and make use GetClogDirName call in copy_subdir_files.

-- 
Best regards,
Maxim Orlov.


v52-0003-Make-pg_upgrade-from-32-bit-to-64-bit-SLRU.patch
Description: Binary data


v52-0002-Use-64-bit-pages-representation-in-SLRU-callers.patch
Description: Binary data


v52-0001-Use-internal-64-bit-numbering-of-SLRU-pages.patch
Description: Binary data

Re: Add SHELL_EXIT_CODE to psql

2023-01-09 Thread Maxim Orlov

Hi!

In overall, I think we move in the right direction. But we could make code
better, should we?

+   /* Capture exit code for SHELL_EXIT_CODE */
+   close_exit_code = pclose(fd);
+   if (close_exit_code == -1)
+   {
+   pg_log_error("%s: %m", cmd);
+   error = true;
+   }
+   if (WIFEXITED(close_exit_code))
+   exit_code=WEXITSTATUS(close_exit_code);
+   else if(WIFSIGNALED(close_exit_code))
+   exit_code=WTERMSIG(close_exit_code);
+   else if(WIFSTOPPED(close_exit_code))
+   exit_code=WSTOPSIG(close_exit_code);
+   if (exit_code)
+   error = true;
I think, it's better to add spaces around middle if block. It will be easy
to read.
Also, consider, adding spaces around assignment in this block.

+   /*
+   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
WEXITSTATUS(exit_code));
+   */
Probably, this is not needed.


> 1. pg_regress now creates an environment variable called PG_OS_TARGET
Maybe, we can use env "OS"? I do not know much about Windows, but I think
this is kind of standard environment variable there.

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2023-01-10 Thread Maxim Orlov

On Mon, 9 Jan 2023 at 21:36, Corey Huinker  wrote:

>
> I chose a name that would avoid collisions with anything a user might
> potentially throw into their environment, so if the var "OS" is fairly
> standard is a reason to avoid using it. Also, going with our own env var
> allows us to stay in perfect synchronization with the build's #ifdef WIN32
> ... and whatever #ifdefs may come in the future for new OSes. If there is
> already an environment variable that does that for us, I would rather use
> that, but I haven't found it.
>
> Perhaps, I didn't make myself clear. Your solution is perfectly adapted to
our needs.
But all Windows since 2000 already have an environment variable
OS=Windows_NT. So, if env OS is defined and equal Windows_NT, this had to
be Windows.
May we use it in our case? I don't insist, just asking.

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2023-01-12 Thread Maxim Orlov

Unfortunately, cirrus-ci still not happy
https://cirrus-ci.com/task/6502730475241472

[23:14:34.206] time make -s -j${BUILD_JOBS} world-bin
[23:14:43.174] psqlscanslash.l: In function ‘evaluate_backtick’:
[23:14:43.174] psqlscanslash.l:827:11: error: implicit declaration of
function ‘WIFSTOPPED’ [-Werror=implicit-function-declaration]
[23:14:43.174] 827 | exit_code = WSTOPSIG(close_exit_code);
[23:14:43.174] | ^~
[23:14:43.174] psqlscanslash.l:828:16: error: implicit declaration of
function ‘WSTOPSIG’ [-Werror=implicit-function-declaration]
[23:14:43.174] 828 |
[23:14:43.174] | ^
[23:14:43.174] cc1: all warnings being treated as errors

>
and on FreeBSD

[23:13:03.914] cc -o ...
[23:13:03.914] ld: error: undefined symbol: WEXITSTATUS
[23:13:03.914] >>> referenced by command.c:5036
(../src/bin/psql/command.c:5036)
[23:13:03.914] >>>
src/bin/psql/psql.p/command.c.o:(exec_command_shell_escape)
[23:13:03.914] cc: error: linker command failed with exit code 1 (use -v to
see invocation)

and on Windows

[23:19:51.870] meson-generated_.._psqlscanslash.c.obj : error LNK2019:
unresolved external symbol WIFSTOPPED referenced in function
evaluate_backtick
[23:19:52.197] meson-generated_.._psqlscanslash.c.obj : error LNK2019:
unresolved external symbol WSTOPSIG referenced in function evaluate_backtick
[23:19:52.197] src\bin\psql\psql.exe : fatal error LNK1120: 2 unresolved
externals

I belive, we need proper includes.

-- 
Best regards,
Maxim Orlov.

Re: Assert in pageinspect with NULL pages

2022-03-28 Thread Maxim Orlov

I've suddenly found that the test in this patch is based on a fact that
heap pages don't have PageSpecial or it is of different size with btree
pages Special area:

CREATE TABLE test1 (a int8, b int4range);

SELECT bt_page_items(get_raw_page('test1', 0));


In the current state is is so, but it is not guaranteed. For example, in
the proposed 64xid patch [1] we introduce PageSpecial into heap pages and
its size on occasion equals to the size of btree page special so check from
above is false. Even if we pass heap page into pageinspect pretending it is
btree page. So the test will fail.


Generally it seems not only a wrong test but the consequence of a bigger
scale fact that we can not always distinguish different AM's pages. So I'd
propose at least that we don't come to a conclusion that a page is valid
based on PageSpecial size is right.

[1]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
-- 
Best regards,
Maxim Orlov.

Re: Is monotonous xid8 is a right way to do?

2022-04-01 Thread Maxim Orlov

Hi!

In my view, FullTransactionId type was implemented without considering 64
bit wraparound. Which seems to be unlikely to happen. Then on that basis
xid8 type was created. Details of that particular implementation
infiltrated into documentation and became sort of normal. In my opinion,
semantically, both of these types should be treated as similar
types although with different sizes. Thus, again, xid and xid8 types should
be a ring and have no min and max functions. At least, in a sort of
"conventional" way when minimal value is minimal in a mathematical way and
so for maximum.

For example, max may be implemented as max(0, 42, 18446744073709551615) =
42, which is a bit weird.

-- 
Best regards,
Maxim Orlov.

Re: Unit tests for SLRU

2022-04-05 Thread Maxim Orlov

>
> Again, I don't have a strong opinion here. If you insist, I will place the
> tests to regress.c.
>

It is up to committer to decide, but I think it would be good to place
tests in regression.
In my opinion, many things from core may be used by extensions. And then it
is up to extension authors to make relevant tests.

For me, it's enough to make a reasonable test coverage for SLRU without
investing too much effort into generalization.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-07 Thread Maxim Orlov

On Mon, 6 Nov 2023 at 16:07, Alexander Korotkov 
wrote:

> Hi!
>
> On Wed, Jul 5, 2023 at 4:46 PM Aleksander Alekseev <
> aleksan...@timescale.com> wrote:
> > PFE the corrected patchset v58.
>
> I'd like to revive this thread.
>
Hi! Great news!

> BTW, there is a typo in a word "exceeed".
>
Fixed.

>
> +static int inline
> +SlruFileName(SlruCtl ctl, char *path, int64 segno)
> +{
> ...
> +}
>
> I think it worth adding asserts here to verify there is no overflow making
> us mapping different segments into the same files.
>
Agree, assertion added.

> +   return occupied == max_notify_queue_pages;
>
> I'm not sure if the current code could actually allow to occupy more than
> max_notify_queue_pages.  Probably not even in extreme cases.  But I still
> think it will more safe and easier to read to write "occupied >=
> max_notify_queue"_pages here.
>
Fixed.

> diff --git a/src/test/modules/test_slru/test_slru.c
> b/src/test/modules/test_slru/test_slru.c
>
> The actual 64-bitness of SLRU pages isn't much exercised in our automated
> tests.  It would be too exhausting to make pg_notify actually use higher
> than 2**32 page numbers.  Thus, I think test/modules/test_slru is a good
> place to give high page numbers a good test.
>
PFA, I've add test for a 64-bit SLRU pages.

By the way, there is another one useful thing we may do here.  For now
pg_commit_ts functionality is rather strange: if it was enabled, then
disabled and then enabled again all the data from before will be
discarded.  Meanwhile, users expected to have their commit timestamps for
all transactions, which were "logged" when this feature was enabled.  It's
weird.

AFICS, the only reason for this behaviour is becouse of transaction
wraparound. It may occur while the feature is disabled end it is safe to
simply remove all the data from previous period.  If we switch to
FullTransactionId in commit_ts we can overcome this limitation.  But I'm
not sure if it worth to try to fix this in current patchset, since it is
already non trivial.

-- 
Best regards,
Maxim Orlov.

v59-0003-Make-use-FullTransactionId-in-2PC-filenames.patch
Description: Binary data

v59-0004-Add-SLRU-tests-for-64-bit-page-case.patch
Description: Binary data

v59-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data

v59-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-09 Thread Maxim Orlov

On Thu, 9 Nov 2023 at 15:01, Amul Sul  wrote:

>
> Here is the updated version patch. Did minor changes to documents and
> tests.
>
Overall patch looks good to me.  Since Peter did withdraw his comment on
triggers and no open problems
are present, we can make this patch RfC, shall we?  It would be nice to
correct this in the next release.


-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2023-11-09 Thread Maxim Orlov

Aleksander Alekseev,

> Maxim,
> I see both of us accounted for Alexanders feedback and submitted v59.
> Your newer version seems to have issues on cfbot, so resubmitting the
> previous patchset that passes the tests. Please feel free to add
> changes.

For unknown reasons, I do not receive any of your emails from after
2023-11-07 11:57:12 (Message-ID: CAJ7c6TN1hKqNPGrNcq96SUyD=
z61rakgxf8iqq36qr90oud...@mail.gmail.com).
Even after resend.

Anyway, PFA patch set of version 61.  I've made some minor changes in the
0001 and add 004 in order to test actual 64-bit SLRU pages.

As for CF bot had failed on my v59 patch set, this is because of the bug.
It's manifested because of added 64-bit pages tests.
The problem was in segno calculation, since we convert it from file name
using strtol call.  But strtol return long,
which is 4 byte long in x86.

-   segno = (int) strtol(clde->d_name, NULL, 16);
+   segno = strtoi64(clde->d_name, NULL, 16);

-- 
Best regards,
Maxim Orlov.


v61-0003-Make-use-FullTransactionId-in-2PC-filenames.patch
Description: Binary data


v61-0004-Add-SLRU-tests-for-64-bit-page-case.patch
Description: Binary data


v61-0002-Use-larger-segment-file-names-for-pg_notify.patch
Description: Binary data


v61-0001-Index-SLRUs-by-64-bit-integers-rather-than-by-32.patch
Description: Binary data

How to stop autovacuum silently

2023-11-22 Thread Maxim Orlov

Hi!

Recently, one of our customers had reported a not working autovacuum.
After a minor investigation, I've found that
autovacuum launcher did, actually, run vacuum as expected, but with no
results.  At the same time, no warnings or
other anomies were present in the logs.

At first, I've thought may be statistics is broken, thus vacuum is not
working as expected.  But in fact, something
more interesting is had happened.

The pg_class.relfrozenxid was set to some rubbish value from the future,
thus broken in template1 DB, so any new
database will have it's broken too.  Then, we create "blocker" DB and then
in vac_update_datfrozenxid() we get "bogus" (from the future) value
of relfrozenxid and *silently* return.  Any other new created DB will not
be autovacuumed.

Funny, but from the perspective of DBA, this looks like autovacuum is not
working any more for no reasons, although
all the criterion for its launch is clearly observed.

AFAICS, there are several solutions for this state:
 - run vacuumdb for all DB's
 - manually update broken pg_class.relfrozenxid
 - lowering of autovacuum_freeze_max_age to trigger prevent of transaction
ID wraparound

I do understand, this behaviour hardly can be described as a bug of some
sort, but could we make, at least, a useful
message to help to clarify what is going on here?

=== REPRODUCE ===
$ cat <> pgsql/data/postgresql.conf
autovacuum_naptime = 1s
autovacuum_freeze_max_age = 10
EOF
$ ./pgsql/bin/pg_ctl -D pgsql/data -l pgsql/logfile start
waiting for server to start done
server started
$ ./pgsql/bin/psql postgres
psql (17devel)
Type "help" for help.

postgres=# \c template1
You are now connected to database "template1" as user "orlov".
template1=# update pg_class set relfrozenxid='20' where oid = 1262;
UPDATE 1
template1=# do $$

begin

  while 12 - txid_current()::text::int8 > 0 loop

  commit;

end loop;

  end $$;
DO
template1=# create database blocker;
CREATE DATABASE
template1=# create database foo;
CREATE DATABASE
template1=# \c foo
You are now connected to database "foo" as user "orlov".
foo=# create table bar(baz int);
CREATE TABLE
foo=# insert into bar select bar from generate_series(1, 8192) bar;
INSERT 0 8192
foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
  from
pg_stat_user_tables where relname = 'bar';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
-+---+---+---+++-+-+--
 bar |  8192 |  8192 | 0 |   8192 |   8192 |
  | |0
(1 row)

foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
  from
pg_stat_user_tables where relname = 'bar';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
-+---+---+---+++-+-+--
 bar |  8192 | 16384 | 0 |   8192 |  16384 |
  | |0
(1 row)

... and so on

-- 
Best regards,
Maxim Orlov.


0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patch
Description: Binary data

Re: How to stop autovacuum silently

2023-11-23 Thread Maxim Orlov

On Wed, 22 Nov 2023 at 21:13, Peter Geoghegan  wrote:

> Are you aware of commit e83ebfe6d7, which added a similar WARNING at
> the point when VACUUM overwrites a relfrozenxid/relminmxid "from the
> future"? It's a recent one.
>
Thank you for reply!  I hadn't noticed it.  But in described above case, it
doesn't
produce any warnings.  My concern here is that with a couple of updates, we
can
stop autovacuum implicitly without any warnings.


> Was pg_upgrade even run against this database? My guess is that the
> underlying problem was caused by the bug fixed by commit 74cf7d46.
>
I'm pretty much sure it was, but, unfortunately, there are no way to 100%
confirm
this.  All I know, they're using PG13 now.

-- 
Best regards,
Maxim Orlov.

Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid

2023-12-29 Thread Maxim Orlov

Hi!

As were discussed in [0] our overall goal is to make Postgres 64 bit XIDs.
It's obvious, that such a big patch set
couldn't possible to commit "at once". SLUR patch set [1] was committed a
short while ago as a first significant
step in this direction.

This thread is a next step in this enterprise. My objective here is to
commit some changes, which were mandatory,
as far as I understand, for any type of 64 XIDs implementation. And I'm
sure there will be points for discussion here.

My original intention was to make PGPROC->xmin, PGPROC->xid and
PROC_HDR->xids 64bit. But in reality,
it turned out to be much more difficult than I expected. On the one hand,
the patch became too big and on the other
hand, it's heavily relayed on epoch and XID "adjustment" to FXID. Therefore,
for now, I decided to limit myself to
more atomic and independent changes. However, as I said above, these
changes are required for any implementation
of 64bit XIDs.

So, PFA patches to make switch PGPROC->xid and XLogRecord->xl_xid to
FullTransactionId.

As always, any opinions are very welcome!

[0]
https://www.postgresql.org/message-id/CACG=ezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe=pyyj...@mail.gmail.com
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com

--
Best regards,
Maxim Orlov.

v1-0002-Switch-to-FullTransactionId-for-XLogRecord-xl_xid.patch
Description: Binary data

v1-0001-Switch-to-FullTransactionId-for-PGPROC-xid.patch
Description: Binary data

Add Index-level REINDEX with multiple jobs

2023-12-29 Thread Maxim Orlov

Hi!

Recently, one of our customers came to us with the question: why do reindex
utility does not support multiple jobs for indices (-i opt)?
And, of course, it is because we cannot control the concurrent processing
of multiple indexes on the same relation.  This was
discussed somewhere in [0], I believe.  So, customer have to make a shell
script to do his business and so on.

But. This seems to be not that complicated to split indices by parent
tables and do reindex in multiple jobs?  Or I miss something?
PFA patch implementing this.

As always, any opinions are very welcome!

[0]
https://www.postgresql.org/message-id/flat/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg%40mail.gmail.com

-- 
Best regards,
Maxim Orlov.


v1-0001-Add-Index-level-REINDEX-with-multiple-jobs.patch
Description: Binary data

Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid

2023-12-31 Thread Maxim Orlov

On Fri, 29 Dec 2023 at 16:36, Matthias van de Meent <
boekewurm+postg...@gmail.com> wrote:

>
> I don't think this is an actionable change, as this wastes 4 more bytes
> (or 8 with alignment) in nearly all WAL records that don't use the
> HEAP/HEAP2/XLOG rmgrs, which would then be up to 10 (if not 14, when
> 64but-aligned) bytes per record. Unless something like [0] gets committed
> this will add a significant write overhead to all operations, even if they
> are not doing anything that needs an XID.
>
> Also, I don't think we need to support transactions that stay alive and
> change things for longer than 2^31 concurrently created transactions, so we
> could well add a fxid to each WAL segment header (and checkpoint record?)
> and calculate the fxid of each record as a relative fxid off of that.
>
> Thank you for your reply.  Yes, this is a good idea.  Actually, this is
exactly what I was trying to do at first.
But in this case, we have to add more locks on TransamVariables->nextXid,
and I've abandoned the idea.
Maybe, I should look in this direction.

On Sun, 31 Dec 2023 at 03:44, Michael Paquier  wrote:

> And FWIW, echoing with Matthias, making these generic structures
> arbitrary larger is a non-starter.  We should try to make them
> shorter.
>

Yeah, obviously, this is patch make WAL bigger.  I'll try to look into the
idea of fxid calculation, as mentioned above.
It might in part be a "chicken and the egg" situation.  It is very hard to
split overall 64xid patch into smaller pieces
with each part been a meaningful and beneficial for current 32xids Postgres.

Anyway, thanks for reply, appreciate it.

-- 
Best regards,
Maxim Orlov.

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-09-13 Thread Maxim Orlov

Hi!

I'm pretty much like the idea of the patch. Looks like an overlook in SQL
standard for me.
Anyway, patch apply with no conflicts and implements described
functionality.

On Fri, 25 Aug 2023 at 03:06, Vik Fearing  wrote:

>
> I don't like this part of the patch at all.  Not only is the
> documentation only half baked, but the entire concept of the two
> commands is different.  Especially since I believe the command should
> also create a generated column from a non-generated one.

But I have to agree with Vik Fearing, we can make this patch better, should
we?
I totally understand your intentions to keep the code flow simple and reuse
existing code as much
as possible. But in terms of semantics of these commands, they are quite
different from each other.
And in terms of reading of the code, this makes it even harder to
understand what is going on here.
So, in my view, consider split these commands.

Hope, that helps. Again, I'm +1 for this patch.

-- 
Best regards,
Maxim Orlov.

Re: Add 'worker_type' to pg_stat_subscription

2023-09-13 Thread Maxim Orlov

Hi!

I did a look at the patch, like the idea. The overall code is in a good
condition, implements the described feature.

Side note: this is not a problem of this particular patch, but in
pg_stat_get_subscription and many other places, there
is a switch with worker types. Can we use a default section there to have
an explicit error instead of the compiler
warnings if somehow we decide to add another one worker type?

So, should we mark this thread as RfC?

-- 
Best regards,
Maxim Orlov.

Re: should frontend tools use syncfs() ?

2023-09-20 Thread Maxim Orlov

On Thu, 7 Sept 2023 at 03:34, Nathan Bossart 
wrote:

> Committed.
>

Hi! Great job!

But here is one problem I've encountered during working on some unrelated
stuff.
How we have two different things call the same name – sync_method. One in
xlog:
intsync_method = DEFAULT_SYNC_METHOD;
...and another one in "bins":
static DataDirSyncMethod sync_method = DATA_DIR_SYNC_METHOD_FSYNC;

In current include order, this is not a problem, but imagine you add a
couple of new includes,
for example:
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -18,6 +18,8 @@
 #include "storage/block.h"
 #include "storage/item.h"
 #include "storage/off.h"
+#include "postgres.h"
+#include "utils/rel.h"

And build will be broken, because we how have two different things called
"sync_method" with
different types:
In file included from .../src/bin/pg_rewind/pg_rewind.c:33:
In file included from .../src/include/storage/bufpage.h:22:
In file included from .../src/include/utils/rel.h:18:
.../src/include/access/xlog.h:27:24: error: redeclaration of 'sync_method'
with a different type: 'int' vs 'DataDirSyncMethod' (aka 'enum
DataDirSyncMethod')
extern PGDLLIMPORT int sync_method;
...

As a solution, I suggest renaming sync_method in xlog module to
wal_sync_method. In fact,
appropriate GUC for this variable, called "wal_sync_method" and I see no
reason not to use
the exact same name for a variable in xlog module.

-- 
Best regards,
Maxim Orlov.
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 424ecba028f..3aa1fc60cb4 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -18,6 +18,8 @@
 #include "storage/block.h"
 #include "storage/item.h"
 #include "storage/off.h"
+#include "postgres.h"
+#include "utils/rel.h"

 /*
  * A postgres disk page is an abstraction layered on top of a postgres

v10-0001-Fix-conflicting-types-for-sync_method.patch
Description: Binary data

Re: should frontend tools use syncfs() ?

2023-09-21 Thread Maxim Orlov

On Wed, 20 Sept 2023 at 22:08, Nathan Bossart 
wrote:

> I think we should also consider renaming things like SYNC_METHOD_FSYNC to
> WAL_SYNC_METHOD_FSYNC, and sync_method_options to wal_sync_method_options.
>

I've already rename sync_method_options in previous patch.
 34 @@ -171,7 +171,7 @@ static bool check_wal_consistency_checking_deferred
= false;
 35  /*
 36   * GUC support
 37   */
 38 -const struct config_enum_entry sync_method_options[] = {
 39 +const struct config_enum_entry wal_sync_method_options[] = {

As for SYNC_METHOD_FSYNC rename, PFA patch.
Also make enum for WAL sync methods instead of defines.

-- 
Best regards,
Maxim Orlov.


v11-0001-Fix-conflicting-types-for-sync_method.patch
Description: Binary data

Re: should frontend tools use syncfs() ?

2023-10-06 Thread Maxim Orlov

Back to the patch v11. I don’t understand a bit, what we should do next?
Make a separate thread or put this one on commitfest?

-- 
Best regards,
Maxim Orlov.

Re: should frontend tools use syncfs() ?

2023-10-09 Thread Maxim Orlov

On Fri, 6 Oct 2023 at 22:35, Nathan Bossart 
wrote:

> From a quick skim, this one looks pretty good to me.  Would you mind adding
> it to the commitfest so that it doesn't get lost?  I will aim to take a
> closer look at it next week.
>

Sounds good, thanks a lot!

https://commitfest.postgresql.org/45/4609/

-- 
Best regards,
Maxim Orlov.

Re: UNIQUE null treatment option

2022-01-25 Thread Maxim Orlov

Since cfbot did failed with error, probably, unrelated to the patch itself
(see https://cirrus-ci.com/task/5330150500859904)
and repeated check did not start automatically, I reattach patch v3 to
restart cfbot on this patch.



-- 
Best regards,
Maxim Orlov.


v3-0001-Add-UNIQUE-null-treatment-option.patch
Description: Binary data

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-11-11 Thread Maxim Orlov

Hi!

I've watched over the patch and consider it useful. Applies without
conflicts. The functionality of the patch itself is
meets declared functionality.

But, in my view, some improvements may be proposed. We should be more,
let's say, cautious (or a paranoid if you wish),
in pg_walfile_offset_lsn while dealing with user input values. What I mean
by that is:
 - to init input vars of sscanf, i.e. tli, log andseg;
 - check the return value of sscanf call;
 - check filename max length.

Another question arises for me: is this function can be called during
recovery? If not, we should ereport about this, should we?

And one last note here: pg_walfile_offset_lsn is accepting NULL values and
return NULL in this case. From a theoretical
point of view, this is perfectly fine. Actually, I think this is exactly
how it supposed to be, but I'm not sure if there are no other opinions here.
-- 
Best regards,
Maxim Orlov.

[PoC] configurable out of disk space elog level

2022-11-16 Thread Maxim Orlov

Hi!


PROBLEM

Our customer stumble onto the next behaviour of the Postgres cluster: if
disk space is exhausted, Postgres continues to
work until WAL can be successfully written. Thus, upon disk space
exhaustion, clients will get an “ERROR: could not
extend file “base/X/X”: No space left on device” messages and
transactions will be aborted. But the cluster
continues to work for a quite some time.

This behaviour of the PostgreSQL, of course, is perfectly legit. Cluster
just translate OS error to the user and can do
nothing about it, expecting space may be available later.

On the other hand, users continues to send more data and having more and
more transactions to be aborted.

There are next possible ways to diagnose described situation:
 —external monitoring system;
 —log analysis;
 —create/drop table and analyse results.

Each one have advantages and disadvantages. I'm not going to dive deeper
here, if you don't mind.

The customer, mentioned above, in this particular case, would be glad to be
able to have a mechanism to stop the cluster.
Again, in this concrete case.


PROPOSAL

My proposal is to add a tablespace option in order to be able to configure
which behaviour is appropriate for a
particular user. I've decided to call this option “on_no_space” for now. If
anyone has a better naming for this feature,
please, report.

So, the idea is to add both GUC and tablespace option “on_no_space”. The
tablespace option defines the behaviour of the
cluster for a particular tablespace in “on_no_space” situation. The GUC
defines the default value of tablespace option.

Patch is posted as PoC is attached.

Here's what it looks like:
===
== Create 100Mb disk
$ dd if=/dev/zero of=/tmp/foo.img bs=100M count=1
$ mkfs.ext4 /tmp/foo.img
$ mkdir /tmp/foo
$ sudo mount -t ext4 -o loop /tmp/foo.img /tmp/foo
$ sudo chown -R orlov:orlov /tmp/foo
===
== Into psql
postgres=# CREATE TABLESPACE foo LOCATION '/tmp/foo' WITH
(on_no_space=fatal);
CREATE TABLESPACE
postgres=# \db+
   List of tablespaces
Name| Owner | Location | Access privileges |   Options   |
 Size   | Description
+---+--+---+-+-+-
 foo| orlov | /tmp/foo |   | {on_no_space=fatal} |
0 bytes |
...

postgres=# CREATE TABLE bar(qux int, quux text) WITH (autovacuum_enabled =
false) TABLESPACE foo;
CREATE TABLE
postgres=# INSERT INTO bar(qux, quux) SELECT id, md5(id::text) FROM
generate_series(1, 1000) AS id;
FATAL:  could not extend file "pg_tblspc/16384/PG_16_202211121/5/16385": No
space left on device
HINT:  Check free disk space.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
===


CAVEATS

Again, I've posted this patch as a PoC. This is not a complete realization
of described functionality. AFAICS, there are
next problems:
 - I have to put get_tablespace_elevel call in RelationGetBufferForTuple in
order to tablespace in cache; overwise,
   cache miss in get_tablespace triggers assertion failing in lock.c:887
(Assert("!IsRelationExtensionLockHeld")).
   This assertion was added by commit 15ef6ff4 (see [0] for details).
 - What error should be when mdextend called not to insert a tuple into a
heap (WAL applying, for example)?

Maybe, adding just GUC without ability to customize certain tablespaces to
define "out of disk space" behaviour is enough?
I would appreciate it if you give your opinions on a subject.

-- 
Best regards,
Maxim Orlov.

[0]
https://www.postgresql.org/message-id/flat/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B%2BSs%3DGn1LA%40mail.gmail.com


v1-0001-Add-out-of-disk-space-elog-level.patch
Description: Binary data

Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures

2022-11-16 Thread Maxim Orlov

On Tue, 15 Nov 2022 at 13:02, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Fri, Nov 11, 2022 at 5:52 PM Maxim Orlov  wrote:
>
> > But, in my view, some improvements may be proposed. We should be more,
> let's say, cautious (or a paranoid if you wish),
> > in pg_walfile_offset_lsn while dealing with user input values. What I
> mean by that is:
> >  - to init input vars of sscanf, i.e. tli, log andseg;
> >  - check the return value of sscanf call;
> >  - check filename max length.
>
> IsXLogFileName() will take care of this. Also, I've added a new inline
> function XLogIdFromFileName() that parses file name and returns log
> and seg along with XLogSegNo and timeline id. This new function avoids
> an extra sscanf() call as well.
>
> > Another question arises for me: is this function can be called during
> recovery? If not, we should ereport about this, should we?
>
> It's just a utility function and doesn't depend on any of the server's
> current state (unlike pg_walfile_name()), hence it doesn't matter if
> this function is called during recovery.
>
> > And one last note here: pg_walfile_offset_lsn is accepting NULL values
> and return NULL in this case. From a theoretical
> > point of view, this is perfectly fine. Actually, I think this is exactly
> how it supposed to be, but I'm not sure if there are no other opinions here.
>
> These functions are marked as 'STRICT', meaning a null is returned,
> without even calling the function, if any of the input parameters is
> null. I think we can keep the behaviour the same as its friends.
>

Thanks for the explanations. I think you are right.


> >errmsg("offset must not be negative or greater than or equal to the
> WAL segment size")));
>
> Changed.
>

Confirm. And a timeline_id is added.


> While on this, I noticed that the pg_walfile_name_offset() isn't
> covered in tests. I took an opportunity and added a simple test case
> along with pg_walfile_offset_lsn().
>
> I'm attaching the v3 patch set for further review.
>

Great job! We should mark this patch as RFC, shall we?

-- 
Best regards,
Maxim Orlov.

Re: [PoC] configurable out of disk space elog level

2022-11-17 Thread Maxim Orlov

On Wed, 16 Nov 2022 at 20:41, Andres Freund  wrote:

> Hi,
> You can't do catalog access below the bufmgr.c layer. It could lead to all
> kinds of nastiness, including potentially recursing back to md.c. Even
> leaving
>
Yep, this is my biggest concern. It turns out, that the way to make such a
feature is to use just GUC for all tablespaces or
forward elevel "from above".


> that aside, we can't do catalog accesses in all kinds of environments that
> this currently is active in - most importantly it's affecting the startup
> process. We don't do catalog accesses in the startup process, and even if
> we
> were to do so, we couldn't unconditionally because the catalog might not
> even
> be consistent at this point (nor is it guaranteed that the wal_level even
> allows to access catalogs during recovery).
>
Yep, that is why I do use in get_tablespace_elevel:
+   /*
+* Use GUC level only in normal mode.
+*/
+   if (!IsNormalProcessingMode())
+   return ERROR;

Anyway, I appreciate the opinion, thank you!

-- 
Best regards,
Maxim Orlov.

Re: [PoC] configurable out of disk space elog level

2022-11-18 Thread Maxim Orlov

> I don't think this is a good feature to add to PostgreSQL. First, it's
> unclear why stopping the cluster is a desirable behavior. It doesn't
> stop the user transactions from failing; it just makes them fail in
> some other way. Now it is of course perfectly legitimate for a
> particular user to want that particular behavior anyway, but there are
> a bunch of other things that a user could equally legitimately want to
> do, like page the DBA or trigger a failover or kill off sessions that
> are using large temporary relations or whatever. And, equally, there
> are many other operating system errors to which a user could want the
> database system to respond in similar ways. For example, someone might
> want any given one of those treatments when an I/O error occurs
> writing to the data directory, or a read-only filesystem error, or a
> permission denied error.
>
> Having a switch for one particular kind of error (out of many that
> could possibly occur) that triggers one particular coping strategy
> (out of many that could possibly be used) seems far too specific a
> thing to add as a core feature. And even if we had something much more
> general, I'm not sure why that should go into the database rather than
> being implemented outside it. After all, nothing at all prevents the
> user from scanning the database logs for "out of space" errors and
> shutting down the database if any are found.


Yes, you are right. Actually, this is one of possible ways to deal with
described situation I
mentioned above. And if I would deal with such a task, I would make it via
log monitoring.
The question was: "could we be more general here?". Probably, not.


> While you're at it, you
> could make your monitoring script also check the free space on the
> relevant partition using statfs() and page somebody if the utilization
> goes above 95% or whatever threshold you like, which would probably
> avoid service outages much more effectively than $SUBJECT.
>
> I just can't see much real benefit in putting this logic inside the
> database.
>

OK, I got it. Thanks for your thoughts!

-- 
Best regards,
Maxim Orlov.

[BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-21 Thread Maxim Orlov

Hi!


PREAMBLE

For a last couple of months, I stumbled into a problem while running tests
on ARM (Debain, aarch64) and some more wired platforms
for my 64–bit XIDs patch set. Test contrib/test_decoding
(catalog_change_snapshot) rarely failed with the next message:

TRAP: FailedAssertion("TransactionIdIsNormal(InitialRunningXacts[0]) &&
TransactionIdIsNormal(builder->xmin)", File: "snapbuild.c"

I have plenty of failing on ARM, couple on x86 and none (if memory serves)
on x86–64.

At first, my thought was to blame my 64–bit XID patch for what, but this is
not the case. This error persist from PG15 to PG10
without any patch applied. Though hard to reproduce.


PROBLEM

After some investigation, I think, the problem is in the snapbuild.c
(commit 272248a0c1b1, see [0]). We do allocate InitialRunningXacts
array in the context of builder->context, but for the time when we call
SnapBuildPurgeOlderTxn this context may be already free'd. Thus,
InitialRunningXacts array become array of 2139062143 (i.e. 0x7F7F7F7F)
values. This is not caused buildfarm to fail due to that code:

if (!NormalTransactionIdPrecedes(InitialRunningXacts[0],
 builder->xmin))
return;

Since the cluster is initialised with XID way less than 0x7F7F7F7F, we get
to return here, but the problem is still existing.
I've attached the patch based on branch REL_15_STABLE to reproduce the
problem on x86-64.

On my patch set of 64–bit XID's this problem manifested since I do init
cluster with XID far beyond 32–bit bound.

Alternatively, I did try to use my patch [1] to init cluster with first
transaction 2139062143 (i.e. 0x7F7F7F7F).
Then put pg_sleep call just like in the attached patch:
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -968,6 +968,8 @@ SnapBuildPurgeOlderTxn(SnapBuild *builder)
if (NInitialRunningXacts == 0)
return;

+   pg_usleep(100L * 2L);
+
/* bound check if there is at least one transaction to remove */
if (!NormalTransactionIdPrecedes(InitialRunningXacts[0],

 builder->xmin))

Run installcheck-force for many times for a test_decoding/
catalog_change_snapshot's and got a segfault.


CONCLUSION

In snapbuild.c, context allocated array InitialRunningXacts may be free'd,
this caused assertion failed (at best) or
may lead to the more serious problems.


P.S.

Simple fix like:
@@ -1377,7 +1379,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
lsn, xl_running_xacts *runn
 * changes. See SnapBuildXidSetCatalogChanges.
 */
NInitialRunningXacts = nxacts;
-   InitialRunningXacts = MemoryContextAlloc(builder->context,
sz);
+   InitialRunningXacts = MemoryContextAlloc(TopMemoryContext,
sz);
memcpy(InitialRunningXacts, running->xids, sz);
qsort(InitialRunningXacts, nxacts, sizeof(TransactionId),
xidComparator);

seems to solve the described problem, but I'm not in the context of [0] and
why array is allocated in builder->context.


[0] https://postgr.es/m/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
[1]
https://www.postgresql.org/message-id/flat/CACG=ezaa4vqYjJ16yoxgrpa-=gxnf0vv3ey9bjgrrrfn2yy...@mail.gmail.com

-- 
Best regards,
Maxim Orlov.
From d09a031f1f807cdfe1e02000b2bf4fd3eaaedd8f Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Mon, 21 Nov 2022 14:50:02 +0300
Subject: [PATCH] catalog_change_snapshot-fail

---
 contrib/test_decoding/Makefile  | 1007 ++-
 src/backend/replication/logical/snapbuild.c |   12 +
 2 files changed, 1012 insertions(+), 7 deletions(-)

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index c7ce603706..aaf7a63411 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -3,12 +3,1006 @@
 MODULES = test_decoding
 PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
 
-REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
-   decoding_into_rel binary prepared replorigin time messages \
-   spill slot truncate stream stats twophase twophase_stream
-ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-   oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \
-   twophase_snapshot slot_creation_error catalog_change_snapshot
+ISOLATION = catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+   catalog_change_snapshot \
+

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-11-21 Thread Maxim Orlov

> Pushed.
>
> --
> With Regards,
> Amit Kapila.
>
>
Hi!

While working on 64–bit XID's patch set, I stumble into problems with
contrib/test_decoding/catalog_change_snapshot test [0].

AFAICS, the problem is not related to the 64–bit XID's patch set and the
problem is in InitialRunningXacts array, allocaled in
builder->context. Do we really need it to be allocated that way?


[0]
https://www.postgresql.org/message-id/CACG%3DezZoz_KG%2BRyh9MrU_g5e0HiVoHocEvqFF%3DNRrhrwKmEQJQ%40mail.gmail.com


-- 
Best regards,
Maxim Orlov.

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-22 Thread Maxim Orlov

>
> Thank you for the report and initial analysis. I have added Sawada-San
> to know his views as he was the primary author of this work.
>
> --
> With Regards,
> Amit Kapila.
>

OK, thanks a lot. I hope, we'll fix this soon.

-- 
Best regards,
Maxim Orlov.

Re: [PATCH] Add initial xid/mxid/mxoff to initdb

2022-11-22 Thread Maxim Orlov

Hi!

CF bot says patch does not apply. Rebased.
Your reviews are very much welcome!

-- 
Best regards,
Maxim Orlov.


v2-0001-Add-initdb-option-to-initialize-cluster-with-non-.patch
Description: Binary data

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-22 Thread Maxim Orlov

> I've attached a draft patch. To fix it, I think we can reset
> InitialRunningXacts and NInitialRunningXacts at FreeSnapshotBuilder()
> and add an assertion in AllocateSnapshotBuilder() to make sure both
> are reset.
>
Thanks for the patch. It works fine. I've tested this patch for 15 and 11
versions on x86_64 and ARM
and see no fails. But the function pg_current_xact_id added by 4c04be9b05ad
doesn't exist in PG11.


> Regarding the tests, the patch includes a new scenario to
> reproduce this issue. However, since the issue can be reproduced also
> by the existing scenario (with low probability, though), I'm not sure
> it's worth adding the new scenario.
>
AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it
does not worth it. But, I do not have a strong opinion here.
Let's add tests in a separate commit and let the actual committer to decide
what to do, should we?

-- 
Best regards,
Maxim Orlov.

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-24 Thread Maxim Orlov

>
> Agreed not to have a test case for this.
>
> I've attached an updated patch. I've confirmed this patch works for
> all supported branches.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>
It works for me as well. Thanks!

I've created a commitfest entry for this patch, see
https://commitfest.postgresql.org/41/4024/
Hope, it will be committed soon.

-- 
Best regards,
Maxim Orlov.

Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn

2022-11-25 Thread Maxim Orlov

On Fri, 25 Nov 2022 at 09:40, Amit Kapila  wrote:

> On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila 
> wrote:
> >
> > On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada 
> wrote:
> > >
> > > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila 
> wrote:
> > >
> > > Agreed not to have a test case for this.
> > >
> > > I've attached an updated patch. I've confirmed this patch works for
> > > all supported branches.
> > >
> >
> > I have slightly changed the checks used in the patch, otherwise looks
> > good to me. I am planning to push (v11-v15) the attached tomorrow
> > unless there are more comments.
> >
>
> Pushed.
>
A big thanks to you! Could you also, close the commitfest entry
https://commitfest.postgresql.org/41/4024/, please?

-- 
Best regards,
Maxim Orlov.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-12-09 Thread Maxim Orlov

> So I'd vote for an evolutionary approach and give my +1 for
> undertaking efforts to first committing [1] to 16.
>
> [1]:
> https://www.postgresql.org/message-id/CAFiTN-uudj2PY8GsUzFtLYFpBoq_rKegW3On_8ZHdxB1mVv3-A%40mail.gmail.com
>
> Kind regards,
> Pavel Borisov,
> Supabase.
>

+1 Totally support the idea. Let's focus on committing SLRU changes.

-- 
Best regards,
Maxim Orlov.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-12-09 Thread Maxim Orlov

On Fri, 9 Dec 2022 at 16:54, adherent postgres <
adherent_postg...@hotmail.com> wrote:

> Hi Aleksander Alekseev
>  I think the xids 32bit transformation project has been dragged on for too
> long. Huawei's openGauss referenced this patch to implement xids 64bit, and
> Postgrespro also implemented xids 64bit, which is enough to prove that
> their worries are redundant.I think postgresql has no reason not to
> implement xid 64 bit. What about your opinion?
>

Yeah, I totally agree, the time has come. With a high transaction load,
Postgres become more and more difficult to maintain.
The problem is in the overall complexity of the patch set. We need more
reviewers.

Since committing such a big patch is not viable once at a time, from the
start of the work we did split it into several logical parts.
The evolution concept is more preferable in this case. As far as I can see,
there is overall consensus to commit SLRU related
changes first.

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2023-01-20 Thread Maxim Orlov

On Fri, 13 Jan 2023 at 07:50, Corey Huinker  wrote:

>
> I named it wait_result_to_exit_code(), but I welcome suggestions of a
> better name.
>

Thanks! But CF bot still not happy. I think, we should address issues from
here https://cirrus-ci.com/task/5391002618298368

-- 
Best regards,
Maxim Orlov.

old_snapshot_threshold bottleneck on replica

2023-01-23 Thread Maxim Orlov

Hi!

One of our customers stumble onto a significant performance degradation
while running multiple OLAP-like queries on a replica.
After some investigation, it became clear that the problem is in accessing
old_snapshot_threshold parameter.

Accessing old_snapshot_threshold parameter is guarded by mutex_threshold.
This is not a problem on primary
server, since we rarely call GetOldSnapshotThresholdTimestamp:

5028 void
5029 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
5030 {
5031 if (RelationAllowsEarlyPruning(relation)
5032 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
5033 ereport(ERROR,
5034 (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
5035  errmsg("snapshot too old")));

But in case of a replica, we have to call GetOldSnapshotThresholdTimestamp
much often. So, this become a
bottleneck. The customer solve this issue by setting old_snapshot_threshold
to 0. But, I think, we can
do something about it.

Some more investigation:

-- On primary --
$ ./bin/psql postgres -c "create database benchmark"
CREATE DATABASE
$ ./bin/pgbench -i -Uorlov -s300 benchmark
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
...
creating tables...
generating data (client-side)...
3000 of 3000 tuples (100%) done (elapsed 142.37 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 177.67 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 144.45 s, vacuum 0.59 s, primary keys 32.61 s).

-- On secondary --
$ touch 1.sql
$ vim 1.sql
$ cat 1.sql
\set bid random(1, 300)
BEGIN;
SELECT sum(aid) FROM pgbench_accounts where bid = :bid GROUP BY bid;
END;
$ ./bin/pgbench -f 1.sql -p5433 -Uorlov -j10 -c100 -T720 -P1 -n benchmark
pgbench (16devel)
progress: 1.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
...
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed

$ perf record -F 99 -a -g --call-graph=dwarf sleep 5
$ perf script --header --fields comm,pid,tid,time,event,ip,sym,dso > file
$ grep s_lock file | wc -l

3486


My proposal is to use atomic for threshold_timestamp and threshold_xid. PFA
0001 patch.
With patch 0001 we got:

$ grep s_lock file2 | wc -l
8


Maybe, we shall go farther and remove mutex_threshold here? This will lead
to inconsistency of
threshold_timestamp and threshold_xid, but is this really a problem?

Thoughts?

-- 
Best regards,
Maxim Orlov.


0001-PGPRO-7624-use-atomic-old_snapshot_threshold.patch
Description: Binary data

Re: old_snapshot_threshold bottleneck on replica

2023-01-25 Thread Maxim Orlov

On Tue, 24 Jan 2023 at 18:46, Robert Haas  wrote:

>
> (1) that mutex also protects something else and the existing comment
> is wrong, or
>
> (2) the mutex should have been removed but the patch neglected to do so, or
>
> (3) the mutex is still needed for some reason, in which case either
> (3a) the patch isn't actually safe or (3b) the patch needs comments to
> explain what the new synchronization model is.
>
> Yes, you're absolutely right. And my first intention was to remove this
mutex completely.
But in TransactionIdLimitedForOldSnapshots these variable is using
conjointly. So, I'm not
sure, is it completely safe to remove mutex. Actually, removing mutex and
switch to atomics
was my first choice. I've run all the tests and no problems were found.
But, at that time I choose
to be more conservative. Anyway, here is the new variant.

-- 
Best regards,
Maxim Orlov.


v2-0001-Use-atomic-old_snapshot_threshold.patch
Description: Binary data

Re: old_snapshot_threshold bottleneck on replica

2023-01-27 Thread Maxim Orlov

On Wed, 25 Jan 2023 at 16:52, Robert Haas  wrote:

> On Wed, Jan 25, 2023 at 3:52 AM Maxim Orlov  wrote:
>
> Well, that's something we - and ideally you, as the patch author -
> need to analyze and figure out. We can't just take a shot and hope for
> the best.
>

I thank you for your advices. I've dived deeper into the problem and I
think v2 patch is wrong.
Accessing threshold_timestamp and threshold_xid in
TransactionIdLimitedForOldSnapshots
without lock would lead to an improper xlimit calculation.

So, my choice would be (3b). My goal is to optimize access to the
threshold_timestamp to avoid
multiple spinlock acquisition on read. In the same time, simultaneous
access to these variable
(threshold_timestamp and threshold_xid) should be protected with spinlock.

I remove atomic for threshold_xid and add comments on mutex_threshold. PFA,
v3. I

-- 
Best regards,
Maxim Orlov.

v3-0001-Use-atomic-old_snapshot_threshold.patch
Description: Binary data

Re: old_snapshot_threshold bottleneck on replica

2023-01-30 Thread Maxim Orlov

On Fri, 27 Jan 2023 at 18:18, Robert Haas  wrote:

>
> Interesting, but it's still not entirely clear to me from reading the
> comments why we should think that this is safe.
>

In overall, I think this is safe, because we do not change algorithm here.
More specific, threshold_timestamp have only used in a few cases:
1). To get the latest value by calling GetOldSnapshotThresholdTimestamp.
This will work, since we only change the sync type here from the spinlock
to an atomic.
2). In TransactionIdLimitedForOldSnapshots, but here no changes in the
behaviour will be done. Sync model will be the save as before the patch.
3). In SnapshotTooOldMagicForTest, which is a stub to make
old_snapshot_threshold tests appear "working". But no coherence with the
threshold_xid here.

So, we have a two use cases for the threshold_timestamp:
a). When the threshold_timestamp is used in conjunction with the
threshold_xid. We must use spinlock to sync.
b). When the threshold_timestamp is used without conjunction with the
threshold_xid. In this case, we use atomic values.

-- 
Best regards,
Maxim Orlov.

Re: Add SHELL_EXIT_CODE to psql

2023-01-30 Thread Maxim Orlov

Unfortunately, there is a fail in FreeBSD
https://cirrus-ci.com/task/6466749487382528

Maybe, this patch is need to be rebased?

-- 
Best regards,
Maxim Orlov.

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-09-27 Thread Maxim Orlov

On Thu, 22 Sept 2022 at 18:13, Andres Freund  wrote:

> Due to the merge of the meson based build this patch needs to be
> adjusted: https://cirrus-ci.com/build/6350479973154816
>
> Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be
> installed and t/004_verify_nbtree_unique.pl to the tests.
>
> Greetings,
>
> Andres Freund
>

Thanks! Fixed.

-- 
Best regards,
Maxim Orlov.


v17-0001-Add-option-for-amcheck-and-pg_amcheck-to-check-u.patch
Description: Binary data

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-09-28 Thread Maxim Orlov

Hi!

I think, this patch was marked as "Waiting on Author", probably, by
mistake. Since recent changes were done without any significant code
changes and CF bot how happy again.

I'm going to move it to RfC, could I? If not, please tell why.

-- 
Best regards,
Maxim Orlov.

Re: making relfilenodes 56 bits

2022-09-29 Thread Maxim Orlov

Hi!

I'm not in the context of this thread, but I've notice something strange by
attempting to rebase my patch set from 64XID thread.
As far as I'm aware, this patch set is adding "relfilenumber". So, in
pg_control_checkpoint, we have next changes:

diff --git a/src/backend/utils/misc/pg_controldata.c
b/src/backend/utils/misc/pg_controldata.c
index 781f8b8758..d441cd97e2 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -79,8 +79,8 @@ pg_control_system(PG_FUNCTION_ARGS)
 Datum
 pg_control_checkpoint(PG_FUNCTION_ARGS)
 {
-   Datum   values[18];
-   boolnulls[18];
+   Datum   values[19];
+   boolnulls[19];
TupleDesc   tupdesc;
HeapTuple   htup;
ControlFileData *ControlFile;
@@ -129,6 +129,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
   XIDOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 18, "checkpoint_time",
   TIMESTAMPTZOID, -1, 0);
+   TupleDescInitEntry(tupdesc, (AttrNumber) 19, "next_relfilenumber",
+  INT8OID, -1, 0);
tupdesc = BlessTupleDesc(tupdesc);

/* Read the control file. */

In other words, we have 19 attributes. But tupdesc here is constructed for
18 elements:
tupdesc = CreateTemplateTupleDesc(18);

Is that normal or not? Again, I'm not in this thread and if that is
completely ok, I'm sorry about the noise.

-- 
Best regards,
Maxim Orlov.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-10-07 Thread Maxim Orlov

>
> This is the wrong thread / CF entry. Please see
>

Yep, my fault. Sorry about that.

-- 
Best regards,
Maxim Orlov.

Turn TransactionIdRetreat/Advance into inline functions

2022-10-10 Thread Maxim Orlov

Hi!

This patch is inspired by [0] and many others.
I've notice recent activity to convert macros into inline functions. We
should make TransactionIdRetreat/Advance functions
Instead of a macro, should we?

I also think about NormalTransactionIdPrecedes and
NormalTransactionIdFollows, but maybe, they should be addressed
separately: the comment says that "this is a macro for speed".

Any thoughts?

[0]:
https://www.postgresql.org/message-id/flat/5b558da8-99fb-0a99-83dd-f72f05388517%40enterprisedb.com

-- 
Best regards,
Maxim Orlov.


v1-0001-Convert-macros-to-static-inline-functions-transam.patch
Description: Binary data

Re: Turn TransactionIdRetreat/Advance into inline functions

2022-10-10 Thread Maxim Orlov

> -1.  Having to touch all the call sites like this outweighs
> any claimed advantage: it makes them uglier and it will greatly
> complicate any back-patching we might have to do in those areas.
>
> regards, tom lane
>

Ok, got it. But what if we change the semantics of these calls to
xid = TransactionIdAdvance(xid) ?

-- 
Best regards,
Maxim Orlov.

Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

2022-11-03 Thread Maxim Orlov

Hi!

Part of the work that Thomas mentions in [1], regarding Direct I/O,
> has certain requirements that pointers must be page-aligned.
>
> I've attached a patch which implements palloc_aligned() and
> MemoryContextAllocAligned() ...
>
I've done a quick look and the patch is looks good to me.
Let's add tests for these functions, should we? If you think this is an
overkill, feel free to trim tests for your taste.

-- 
Best regards,
Maxim Orlov.


v1-0002-Add-tests-for-palloc_aligned.patch
Description: Binary data


v1-0001-palloc_aligned.patch
Description: Binary data

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-04 Thread Maxim Orlov


Just a note here. After examining the core dump I did notice something.

While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true 
although subxip == NULL and subxcnt == 0. As far as I understand, 
snapshot->suboverflowed is set true in the GetRunningTransactionData 
call.


And then I decided to put elog around CurrentRunningXacts->subxcnt's 
assigment.
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c

index 42a89fc5dc9..3d2db02f580 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
 * increases if slots do.
 */

+   if (suboverflowed)
+   elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow 
is true");

+
CurrentRunningXacts->xcnt = count - subcount;
CurrentRunningXacts->subxcnt = subcount;
CurrentRunningXacts->subxid_overflow = suboverflowed;

... and did get a bunch of messages. I.e. subxid_overflow is set true 
very often.


I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes 
more than 120 there are no messages and no failed assertions are 
provided any more.


---
Best regards,
Maxim Orlov.diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc9..3d2db02f580 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
 	 * increases if slots do.
 	 */
 
+	if (suboverflowed)
+		elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow is true");
+
 	CurrentRunningXacts->xcnt = count - subcount;
 	CurrentRunningXacts->subxcnt = subcount;
 	CurrentRunningXacts->subxid_overflow = suboverflowed;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index be67d8a8616..8e1a3d1c6bb 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -33,7 +33,7 @@
  * listed anywhere in the PGPROC array is not a running transaction.  Else we
  * have to look at pg_subtrans.
  */
-#define PGPROC_MAX_CACHED_SUBXIDS 64	/* XXX guessed-at value */
+#define PGPROC_MAX_CACHED_SUBXIDS 120	/* XXX guessed-at value */
 
 typedef struct XidCacheStatus
 {

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-06-22 Thread Maxim Orlov

The analysis in the beginning of the discussion seems to be right, but 
the fix v2 looks too invasive for me.


Personally, I'd like not to remove snapshot even if transaction is 
read-only. I propose to consider "xid < TransactionXmin" as a legit case 
and just promote xid to TransactionXmin.


It's annoying this old bug still not fixed. What do you think?
---
Best regards,
Maxim Orlov.diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 6a8e521f894..ce7c90aa38f 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -29,6 +29,7 @@
 #include "postgres.h"
 
 #include "access/slru.h"
+#include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "pg_trace.h"
@@ -152,6 +153,21 @@ SubTransGetTopmostTransaction(TransactionId xid)
 	TransactionId parentXid = xid,
 previousXid = xid;
 
+	/*
+	 * We may have different xmins in active and transaction snapshots in
+	 * parallel workers. And TransactionXmin can be promoted to the max of them.
+	 * So, we do this to avoid assert below when restore active snapshot with
+	 * less xmin value.
+	 *
+	 * XXX: the same may be fixed by prohibiting read-only transactions (like
+	 * parallel scans) to increment TransactionXmin.
+	 */
+	if (IsParallelWorker())
+	{
+		xid = xid > TransactionXmin ? xid : TransactionXmin;
+		parentXid = xid;
+		previousXid = xid;
+	}
 	/* Can't ask about stuff that might not be around anymore */
 	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

[PATCH] Add initial xid/mxid/mxoff to initdb

2022-05-05 Thread Maxim Orlov

Hi!

During work on 64-bit XID patch [1] we found handy to have initdb options
to set initial xid/mxid/mxoff values to arbitrary non default values. It
helps test different scenarios: related to wraparound, pg_upgrade from
32-bit XID to 64-bit XID, etc.

We realize, this patch can be singled out as an independent patch from the
whole patchset in [1] and be useful irrespective of 64-bit XID in cases of
testing of wraparound and so on.

In particular, we employed this patch to test recent changes in logical
replication of subxacts [2] and found no problems in it near the point of
publisher wraparound.

Please share your opinions and reviews are always welcome.

[1]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com
[2]
https://postgr.es/m/d045f3c2-6cfb-06d3-5540-e63c320df...@enterprisedb.com

-- 
Best regards,
Maxim Orlov.
From c752471dab910b3ddfed5b0f3b59e7314165a193 Mon Sep 17 00:00:00 2001
From: Maxim Orlov 
Date: Wed, 4 May 2022 15:53:36 +0300
Subject: [PATCH v1] Add initdb option to initialize cluster with non-standard
 xid/mxid/mxoff.

To date testing database cluster wraparund was not easy as initdb has always
inited it with default xid/mxid/mxoff. The option to specify any valid
xid/mxid/mxoff at cluster startup will make these things easier.

Author: Maxim Orlov 
Author: Pavel Borisov 
---
 src/backend/access/transam/clog.c  |   9 +++
 src/backend/access/transam/multixact.c |  35 
 src/backend/access/transam/xlog.c  |  15 ++--
 src/backend/bootstrap/bootstrap.c  |  56 -
 src/backend/main/main.c|   6 ++
 src/backend/postmaster/postmaster.c|  15 +++-
 src/backend/tcop/postgres.c|  59 +-
 src/bin/initdb/initdb.c| 107 -
 src/bin/initdb/t/001_initdb.pl |  86 
 src/include/access/xlog.h  |   3 +
 src/include/catalog/pg_class.h |   2 +-
 11 files changed, 380 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 3d9088a704..d973a2159d 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -713,6 +713,7 @@ void
 BootStrapCLOG(void)
 {
 	int			slotno;
+	int			pageno;
 
 	LWLockAcquire(XactSLRULock, LW_EXCLUSIVE);
 
@@ -723,6 +724,14 @@ BootStrapCLOG(void)
 	SimpleLruWritePage(XactCtl, slotno);
 	Assert(!XactCtl->shared->page_dirty[slotno]);
 
+	pageno = TransactionIdToPage(XidFromFullTransactionId(ShmemVariableCache->nextXid));
+	if (pageno != 0)
+	{
+		slotno = ZeroCLOGPage(pageno, false);
+		SimpleLruWritePage(XactCtl, slotno);
+		Assert(!XactCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(XactSLRULock);
 }
 
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 8f7d12950e..00acb955e0 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1894,6 +1894,7 @@ void
 BootStrapMultiXact(void)
 {
 	int			slotno;
+	int			pageno;
 
 	LWLockAcquire(MultiXactOffsetSLRULock, LW_EXCLUSIVE);
 
@@ -1904,6 +1905,17 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactOffsetCtl, slotno);
 	Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
 
+	pageno = MultiXactIdToOffsetPage(MultiXactState->nextMXact);
+	if (pageno != 0)
+	{
+		/* Create and zero the first page of the offsets log */
+		slotno = ZeroMultiXactOffsetPage(pageno, false);
+
+		/* Make sure it's written out */
+		SimpleLruWritePage(MultiXactOffsetCtl, slotno);
+		Assert(!MultiXactOffsetCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(MultiXactOffsetSLRULock);
 
 	LWLockAcquire(MultiXactMemberSLRULock, LW_EXCLUSIVE);
@@ -1915,7 +1927,30 @@ BootStrapMultiXact(void)
 	SimpleLruWritePage(MultiXactMemberCtl, slotno);
 	Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
 
+	pageno = MXOffsetToMemberPage(MultiXactState->nextOffset);
+	if (pageno != 0)
+	{
+		/* Create and zero the first page of the members log */
+		slotno = ZeroMultiXactMemberPage(pageno, false);
+
+		/* Make sure it's written out */
+		SimpleLruWritePage(MultiXactMemberCtl, slotno);
+		Assert(!MultiXactMemberCtl->shared->page_dirty[slotno]);
+	}
+
 	LWLockRelease(MultiXactMemberSLRULock);
+
+	/*
+	 * If we're starting not from zero offset, initilize dummy multixact to
+	 * evade too long loop in PerformMembersTruncation().
+	 */
+	if (MultiXactState->nextOffset > 0 && MultiXactState->nextMXact > 0)
+	{
+		RecordNewMultiXact(FirstMultiXactId,
+		   MultiXactState->nextOffset, 0, NULL);
+		RecordNewMultiXact(MultiXactState->nextMXact,
+		   MultiXactState->nextOffset, 0, NULL);
+	}
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61cda56c6f..7768a3835d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/sr

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-05-13 Thread Maxim Orlov

> here is the rebased v32 version of the patch.
>
> The patchset rotted a bit. Here is a rebased version.
>

We have posted an updated version v34 of the whole patchset in [1].
Changes of patches 0001-0003 there are identical to v33. So, no update is
needed in this thread.

[1]
https://www.postgresql.org/message-id/flat/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com

-- 
Best regards,
Maxim Orlov.

Re: Custom tuplesorts for extensions

2022-07-06 Thread Maxim Orlov

Hi!

Overall patch looks good let's mark it as ready for committer, shall we?

-- 
Best regards,
Maxim Orlov.

Re: Fix unnecessary includes and comments in 019_replslot_limit.pl, 007_wal.pl and 004_timeline_switch.pl

2022-07-06 Thread Maxim Orlov

Hi!

This is an obvious change, I totally for it. Hope it will be commited soon.

-- 
Best regards,
Maxim Orlov.

Re: Fix unnecessary includes and comments in 019_replslot_limit.pl, 007_wal.pl and 004_timeline_switch.pl

2022-07-06 Thread Maxim Orlov

>
> I'm sorry for some nitpicking about changes in the comments:
> - The number of WAL segments advanced hasn't changed from 5 to 1, it just
> advances as 1+4 as previously. So the original comment is right. I reverted
> this in v2.
>

Yeah, it looks even better now.

-- 
Best regards,
Maxim Orlov.

Re: remove reset_shared()

2022-07-15 Thread Maxim Orlov

Hi!

In general I'm for this patch. Some time ago I was working on a patch
related to shared memory and noticed
no reason to have reset_shared() function.

-- 
Best regards,
Maxim Orlov.

Re: Pre-allocating WAL files

2021-10-06 Thread Maxim Orlov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

Hi!

We've looked through the code and everything looks good except few minor things:
1). Using dedicated bg worker seems not optimal, it introduces a lot of 
redundant code for little single action.
We'd join initial proposal of Andres to implement it as an extra fuction of 
checkpointer.
2). In our view, it is better shift #define PREALLOCSEGDIR outside the function 
body.
3). It is better to have at least small comments on functions 
GetNumPreallocatedWalSegs, SetNumPreallocatedWalSegs, 

We've also tested performance difference between master branch and this patch 
and noticed no significant difference in performance.
We used pgbench with some sort of "standard" settings:
$ pgbench -c50 -s50 -T200 -P1 -r postgres
...and with...
$ pgbench -c100 -s50 -T200 -P1 -r postgres

When looking at every second output of pgbench we saw regular spikes of latency 
(aroud 5-10 times increase) and this pattern was similar with and without 
patch. Overall average latency stat for 200 sec of pgbench also looks pretty 
much the same with and without patch. Could you provide your testing setup to 
see the effect, please.

The check-world was successfull.

Overall patch looks good, but in our view it's better to have experimental 
support of the performance improvements to be commited.

---
Best regards,
Maxim Orlov, Pavel Borisov.

The new status of this patch is: Waiting on Author

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

2021-07-13 Thread Maxim Orlov


On 2021-07-09 20:36, Tomas Vondra wrote:

Hi,

I took a quick look on this - I'm no expert in the details of 
snapshots,

so take my comments with a grain of salt.

AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or correct)
solution, for a couple reasons:


1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
bogus value, this only fixes it locally in 
SubTransGetTopmostTransaction

but I'd be surprised if other places did not have the same issue.


2) The xid/TransactionXmin comparison in the v2 fix:

  xid = xid > TransactionXmin ? xid : TransactionXmin;

seems broken, because it compares the XIDs directly, but that's not
going to work after generating enough transactions.


3) But even if this uses TransactionIdFollowsOrEquals, it seems very
strange because the "xid" comes from

  XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

i.e. it's the raw xmin from the tuple, so why should we be setting it 
to

TransactionXmin? That seems pretty strange, TBH.


So yeah, I think this is due to confusion with two snapshots and 
failing

to consider both of them when calculating TransactionXmin.

But I think removing one of the snapshots (as the v2 does it) is rather
strange too. I very much doubt having both the transaction and active
snapshots in the parallel worker is not intentional, and Pavel may very
well be right this breaks isolation levels that use the xact snapshot
(i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.

So I think we need to keep both snapshots, but fix TransactionXmin to
consider both of them - I suppose ParallelWorkerMain could simply look
at the two snapshots, and use the minimum. Which is what [1] (already
linked by Pavel) talks about, although that's related to concerns about
one of the processes dying, which is not what's happening here.


I'm wondering what consequences this may have on production systems,
though. We've only seen this failing because of the assert, so what
happens when the build has asserts disabled?

Looking at SubTransGetTopmostTransaction, it seems the while loop ends
immediately (because it's pretty much the opposite of the assert), so 
we

just return the "xid" as topmost XID. But will that produce incorrect
query results, or what?



regards


[1]
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de


PFA v4 patch based on the ideas above.

In principle I see little difference with v3. But I agree it is more 
correct.


I did test this patch on Linux and MacOS using testing algo above and 
got no error. On master branch before the patch I still see this error.


---
Best regards,
Maxim Orlov.diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 3550ef13baa..6443d492501 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -34,6 +34,7 @@
 #include "pgstat.h"
 #include "storage/ipc.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/sinval.h"
 #include "storage/spin.h"
 #include "tcop/tcopprot.h"
@@ -45,6 +46,7 @@
 #include "utils/snapmgr.h"
 #include "utils/typcache.h"
 
+
 /*
  * We don't want to waste a lot of memory on an error queue which, most of
  * the time, will process only a handful of small messages.  However, it is
@@ -1261,6 +1263,7 @@ ParallelWorkerMain(Datum main_arg)
 	char	   *uncommittedenumsspace;
 	StringInfoData msgbuf;
 	char	   *session_dsm_handle_space;
+	Snapshot	asnap;
 
 	/* Set flag to indicate that we're initializing a parallel worker. */
 	InitializingParallelWorker = true;
@@ -1415,7 +1418,15 @@ ParallelWorkerMain(Datum main_arg)
 
 	/* Restore active snapshot. */
 	asnapspace = shm_toc_lookup(toc, PARALLEL_KEY_ACTIVE_SNAPSHOT, false);
-	PushActiveSnapshot(RestoreSnapshot(asnapspace));
+	asnap = RestoreSnapshot(asnapspace)
+	PushActiveSnapshot(asnap);
+
+	/*
+	 * We may have different xmins in active and transaction snapshots in
+	 * parallel workers. So, use minimum for TransactionXmin.
+	 */
+	if (TransactionIdFollows(TransactionXmin, asnap->xmin))
+		ProcArrayInstallRestoredXmin(p->xmin, fps->parallel_leader_pgproc);
 
 	/*
 	 * We've changed which tuples we can see, and must therefore invalidate

Bugfix and improvements in multixact.c

2024-06-14 Thread Maxim Orlov

Hi!

While working on a making multix xact offsets 64-bit [0] I've discovered a
minor issue. The
thing is that type 'xid' is used in all macro, but it doesn't correct.
Appropriate MultiXactId or
MultiXactOffset should be used, actually.

And the second thing, as Heikki Linnakangas points out, args naming is also
misleading.

Since, these problems are not the point of thread [0], I decided to create
this discussion.
And here is the patch set addressing mentioned issues (0001 and 0002).

Additionally, I made an optional patch 0003 to switch from macro to inline
functions. For
me, personally, use of macro functions is justified if we are dealing with
different argument
types, to make polymorphic call. Which is not the case here. So, we can
have more
control over types and still generate the same code in terms of speed.
See https://godbolt.org/z/KM8voadhs Starting from O1 function is inlined,
thus no
overhead is noticeable. Anyway, it's up to the actual commiter to decide
does it worth it
or not. Again, this particular patch 0003 is completely optional.

As always, any opinions and reviews are very welcome!

[0]
https://www.postgresql.org/message-id/flat/ff143b24-a093-40da-9833-d36b83726bdf%40iki.fi#61d5a0e1cf6ab94b0e8aae8559bc4cf7

--
Best regards,
Maxim Orlov.

v1-0003-Switch-from-macro-functions-to-inline-in-multixac.patch
Description: Binary data

v1-0002-Fix-using-of-MultiXactOffset-type-insted-of-Trans.patch
Description: Binary data

v1-0001-Get-rid-of-misleading-xid-arg-name-in-multixact.patch
Description: Binary data

Re: Maybe don't process multi xmax in FreezeMultiXactId() if it is already marked as invalid?

2024-06-18 Thread Maxim Orlov

Hi!

Maybe, I'm too bold, but looks like a kinda bug to me.  At least, I don't
understand why we do not check the HEAP_XMAX_INVALID flag.
My guess is nobody noticed, that MultiXactIdIsValid call does not check the
mentioned flag in the "first" condition, but it's all my speculation.
Does anyone know if there are reasons to deliberately ignore the HEAP_XMAX
INVALID flag? Or this is just an unfortunate oversight.

PFA, my approach on this issue.

-- 
Best regards,
Maxim Orlov.


v2-0001-Invalidate-xmax-if-HEAP_XMAX_INVALID-is-set.patch
Description: Binary data

Vectorization of some functions and improving pg_list interface

2023-08-24 Thread Maxim Orlov

Hi!

Recently, I've been playing around with pg_lists and realize how annoying
(maybe, I was a bit tired) some stuff related to the lists.
For an example, see this code
List *l1 = list_make4(1, 2, 3, 4),
 *l2 = list_make4(5, 6, 7, 8),
 *l3 = list_make4(9, 0, 1, 2);
ListCell *lc1, *lc2, *lc3;

forthree(lc1, l1, lc2, l2, lc3, l3) {
...
}

list_free(l1);
list_free(l2);
list_free(l3);

There are several questions:
1) Why do I need to specify the number of elements in the list in the
function name?
Compiler already knew how much arguments do I use.
2) Why I have to call free for every list? I don't know how to call it
right, for now I call it vectorization.
Why not to use simple wrapper to "vectorize" function args?

So, my proposal is:
1) Add a simple macro to "vectorize" functions.
2) Use this macro to "vectorize" list_free and list_free_deep functions.
3) Use this macro to "vectorize" bms_free function.
4) "Vectorize" list_makeN functions.

For this V1 version, I do not remove all list_makeN calls in order to
reduce diff, but I'll address
this in future, if it will be needed.

In my view, one thing still waiting to be improved if foreach loop. It is
not very handy to have a bunch of
similar calls foreach, forboth, forthree and etc. It will be ideal to have
single foreach interface, but I don't know how
to do it without overall interface of the loop.

Any opinions are very welcome!

-- 
Best regards,
Maxim Orlov.


v1-0002-Make-use-of-vectorized-lists_free-and-lists_free_.patch
Description: Binary data


v1-0004-vectorize-list_make-functions.patch
Description: Binary data


v1-0001-Add-lists_free-and-lists_free_deep.patch
Description: Binary data


v1-0003-Make-use-of-vectorized-bmss_free-function.patch
Description: Binary data

1 2 >

1 - 100 of 173 matches

Mail list logo