date:20210606

Re: Misplaced superuser check in pg_log_backend_memory_contexts()

2021-06-06 Thread Julien Rouhaud

On Sun, Jun 06, 2021 at 03:53:10PM +0900, Michael Paquier wrote:
> 
> While reading the code of pg_log_backend_memory_contexts(), I have
> been surprised to see that the code would attempt to look at a PROC
> entry based on the given input PID *before* checking if the function
> has been called by a superuser.  This does not strike me as a good
> idea as this allows any users to call this function and to take
> ProcArrayLock in shared mode, freely.

It doesn't seem like a huge problem as at least GetSnapshotData also acquires
ProcArrayLock in shared mode.  Knowing if a specific pid is a postgres backend
or not isn't privileged information either, and anyone can check that using
pg_stat_activity as an unprivileged user (which will also acquire ProcArrayLock
in shared mode).
> 
> It seems to me that we had better check for a superuser at the
> beginning of the function, like in the attached.

However +1 for the patch, as it seems more consistent to always get a
permission failure if you're not a superuser.

Re: SQL-standard function body

2021-06-06 Thread Julien Rouhaud

On Sat, Jun 05, 2021 at 09:44:18PM -0700, Noah Misch wrote:
> On Wed, Apr 07, 2021 at 09:55:40PM +0200, Peter Eisentraut wrote:
> > Committed.  Thanks!
> 
> I get a NULL pointer dereference if the function body has a doubled semicolon:
> 
>   create function f() returns int language sql begin atomic select 1;; end;

You don't even need a statements to reproduce the problem, a body containing
only semi-colon(s) will behave the same.

Attached patch should fix the problem.
>From d03f6f0aee61fb2f2246d72c266671fe975d6a65 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Sun, 6 Jun 2021 15:28:59 +0800
Subject: [PATCH v1] Fix SQL-standard body empty statements handling.

---
 src/backend/commands/functioncmds.c | 3 +++
 src/test/regress/expected/create_function_3.out | 2 +-
 src/test/regress/sql/create_function_3.sql  | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/functioncmds.c b/src/backend/commands/functioncmds.c
index 4c12aa33df..61de9bf11a 100644
--- a/src/backend/commands/functioncmds.c
+++ b/src/backend/commands/functioncmds.c
@@ -932,6 +932,9 @@ interpret_AS_clause(Oid languageOid, const char *languageName,
 Query	   *q;
 ParseState *pstate = make_parsestate(NULL);
 
+if (!stmt)
+	continue;
+
 pstate->p_sourcetext = queryString;
 sql_fn_parser_setup(pstate, pinfo);
 q = transformStmt(pstate, stmt);
diff --git a/src/test/regress/expected/create_function_3.out b/src/test/regress/expected/create_function_3.out
index 5b6bc5eddb..5955859bb5 100644
--- a/src/test/regress/expected/create_function_3.out
+++ b/src/test/regress/expected/create_function_3.out
@@ -267,7 +267,7 @@ CREATE FUNCTION functest_S_3() RETURNS boolean
 RETURN false;
 CREATE FUNCTION functest_S_3a() RETURNS boolean
 BEGIN ATOMIC
-RETURN false;
+;;RETURN false;;
 END;
 CREATE FUNCTION functest_S_10(a text, b date) RETURNS boolean
 LANGUAGE SQL
diff --git a/src/test/regress/sql/create_function_3.sql b/src/test/regress/sql/create_function_3.sql
index 4b778999ed..6e8b838ff2 100644
--- a/src/test/regress/sql/create_function_3.sql
+++ b/src/test/regress/sql/create_function_3.sql
@@ -165,7 +165,7 @@ CREATE FUNCTION functest_S_3() RETURNS boolean
 RETURN false;
 CREATE FUNCTION functest_S_3a() RETURNS boolean
 BEGIN ATOMIC
-RETURN false;
+;;RETURN false;;
 END;
 
 CREATE FUNCTION functest_S_10(a text, b date) RETURNS boolean
-- 
2.31.1

Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hi,

There seems to be a weird bug in Postgres (last tested 11.12) where it
allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
on a TEXT/VARCHAR when there's already a value present in that index,
but only for UTF-8 input.

I just had this happen on our user table and it somehow made it so
that Postgres returned no results for *any* SELECT ... FROM x WHERE
unique_col = 'x', which unfortunately meant no one could login to our
service.

I had to:

SET enable_indexscan = off;
SET enable_bitmapscan = off;

And then the data was returned properly. I thought maybe the index was
corrupt somehow, so I tried to reindex the unique index, which failed
because "nur" was present twice.

I modified the value in that column by the primary key (which is an
integer), and that allowed me to reindex, after which queries against
the column started working properly again.

My collation settings:

 postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

I've had this happen before on a different table with cyerrlic UTF-8
input, but didn't really have much to go on debugging wise.

What I sort of don't get is... before we insert anything into these
tables, we always check to see if a value already exists. And Postgres
must be returning no results for some reason. So it goes to insert a
duplicate value which somehow succeeds despite the unique index, but
then a reindex says it's a duplicate. Pretty weird.

Regards,
Omar

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Laurenz Albe

On Sun, 2021-06-06 at 03:54 -0700, Omar Kilani wrote:
> There seems to be a weird bug in Postgres (last tested 11.12) where it
> allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
> on a TEXT/VARCHAR when there's already a value present in that index,
> but only for UTF-8 input.
> 
> I just had this happen on our user table and it somehow made it so
> that Postgres returned no results for *any* SELECT ... FROM x WHERE
> unique_col = 'x', which unfortunately meant no one could login to our
> service.
> 
> I had to:
> 
> SET enable_indexscan = off;
> SET enable_bitmapscan = off;
> 
> And then the data was returned properly.

Sounds like data corruption.
REINDEX the index and see if that fixes the problem.
Try to figure out the cause (bad hardware?).

Yours,
Laurenz Albe

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread David Rowley

On Sun, 6 Jun 2021 at 22:55, Omar Kilani  wrote:
> There seems to be a weird bug in Postgres (last tested 11.12) where it
> allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
> on a TEXT/VARCHAR when there's already a value present in that index,
> but only for UTF-8 input.

It would be good to know a bit about the history of this instance.
Was the initdb done on 11.12? Or some other 11.x version?  Or was this
instance pg_upgraded from some previous major version?

There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY
possibly to miss rows that were inserted by a prepared transaction.
Was this index created with CREATE INDEX CONCURRENTLY?

> What I sort of don't get is... before we insert anything into these
> tables, we always check to see if a value already exists. And Postgres
> must be returning no results for some reason. So it goes to insert a
> duplicate value which somehow succeeds despite the unique index, but
> then a reindex says it's a duplicate. Pretty weird.

That does not seem that weird to me.  If the index is corrupt and
fails to find the record you're searching for using a scan of that
index, then it seems pretty likely that the record would also not be
found in the index when doing the INSERT.

The reindex will catch the problem because it uses the heap as the
source of truth to build the new index.  It simply sounds like there
are two records in the heap because a subsequent one was added and a
corrupt index didn't find the original record either because it was
either missing from the index or because the index was corrupt in some
way that the record was just not found.

David

Re: Misplaced superuser check in pg_log_backend_memory_contexts()

2021-06-06 Thread Bharath Rupireddy

On Sun, Jun 6, 2021 at 12:23 PM Michael Paquier  wrote:
>
> Hi all,
>
> While reading the code of pg_log_backend_memory_contexts(), I have
> been surprised to see that the code would attempt to look at a PROC
> entry based on the given input PID *before* checking if the function
> has been called by a superuser.  This does not strike me as a good
> idea as this allows any users to call this function and to take
> ProcArrayLock in shared mode, freely.
>
> It seems to me that we had better check for a superuser at the
> beginning of the function, like in the attached.

pg_signal_backend still locks ProcArrayLock in shared mode first and then
checks for the superuser permissions. Of course, it does that for the
roleId i.e. superuser_arg(proc->roleId), but there's also superuser() check.

With Regards,
Bharath Rupireddy.

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hey David,

Hmmm… it wasn’t init on 11.x.

This is a very old database (2004) that has moved forward via pg_upgrade. I
think we did a pg_dump and pg_restore every time we hit some major
incompatibility like float vs integer date times.

The current DB started as a pg_restore into 10.x. Then was pg_upgrade’d to
11.2. Has been minor upgraded a bunch of times since and we upgraded to
11.12… just before this happened.

As in, we just restarted our cluster on 11.12. Everything was working fine
(and the index was working) and then the INSERT happened.

I have checksums on and I did a VACUUM on the table just before the REINDEX.

I’m 99.9% confident the hardware isn’t bad.

The only time we’ve seen this is with Unicode input.

Regards,
Omar

On Sun, Jun 6, 2021 at 4:59 AM David Rowley  wrote:

> On Sun, 6 Jun 2021 at 22:55, Omar Kilani  wrote:
> > There seems to be a weird bug in Postgres (last tested 11.12) where it
> > allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
> > on a TEXT/VARCHAR when there's already a value present in that index,
> > but only for UTF-8 input.
>
> It would be good to know a bit about the history of this instance.
> Was the initdb done on 11.12? Or some other 11.x version?  Or was this
> instance pg_upgraded from some previous major version?
>
> There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY
> possibly to miss rows that were inserted by a prepared transaction.
> Was this index created with CREATE INDEX CONCURRENTLY?
>
> > What I sort of don't get is... before we insert anything into these
> > tables, we always check to see if a value already exists. And Postgres
> > must be returning no results for some reason. So it goes to insert a
> > duplicate value which somehow succeeds despite the unique index, but
> > then a reindex says it's a duplicate. Pretty weird.
>
> That does not seem that weird to me.  If the index is corrupt and
> fails to find the record you're searching for using a scan of that
> index, then it seems pretty likely that the record would also not be
> found in the index when doing the INSERT.
>
> The reindex will catch the problem because it uses the heap as the
> source of truth to build the new index.  It simply sounds like there
> are two records in the heap because a subsequent one was added and a
> corrupt index didn't find the original record either because it was
> either missing from the index or because the index was corrupt in some
> way that the record was just not found.
>
> David
>

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

I just remembered, I have… many… snapshots of the on disk data prior to
starting 11.12.

It should be possible to start at a specific point in time with the index
in the state it was in prior to the insert.

How do I prove or disprove… hardware issues?

Also… I ran the select on 3 of our standby servers and they all had the
same issue. Presumably the index would be “corrupt” in the same way across
multiple very different machines from WAL apply?

On Sun, Jun 6, 2021 at 6:41 AM Omar Kilani  wrote:

> Hey David,
>
> Hmmm… it wasn’t init on 11.x.
>
> This is a very old database (2004) that has moved forward via pg_upgrade.
> I think we did a pg_dump and pg_restore every time we hit some major
> incompatibility like float vs integer date times.
>
> The current DB started as a pg_restore into 10.x. Then was pg_upgrade’d to
> 11.2. Has been minor upgraded a bunch of times since and we upgraded to
> 11.12… just before this happened.
>
> As in, we just restarted our cluster on 11.12. Everything was working fine
> (and the index was working) and then the INSERT happened.
>
> I have checksums on and I did a VACUUM on the table just before the
> REINDEX.
>
> I’m 99.9% confident the hardware isn’t bad.
>
> The only time we’ve seen this is with Unicode input.
>
> Regards,
> Omar
>
> On Sun, Jun 6, 2021 at 4:59 AM David Rowley  wrote:
>
>> On Sun, 6 Jun 2021 at 22:55, Omar Kilani  wrote:
>> > There seems to be a weird bug in Postgres (last tested 11.12) where it
>> > allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
>> > on a TEXT/VARCHAR when there's already a value present in that index,
>> > but only for UTF-8 input.
>>
>> It would be good to know a bit about the history of this instance.
>> Was the initdb done on 11.12? Or some other 11.x version?  Or was this
>> instance pg_upgraded from some previous major version?
>>
>> There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY
>> possibly to miss rows that were inserted by a prepared transaction.
>> Was this index created with CREATE INDEX CONCURRENTLY?
>>
>> > What I sort of don't get is... before we insert anything into these
>> > tables, we always check to see if a value already exists. And Postgres
>> > must be returning no results for some reason. So it goes to insert a
>> > duplicate value which somehow succeeds despite the unique index, but
>> > then a reindex says it's a duplicate. Pretty weird.
>>
>> That does not seem that weird to me.  If the index is corrupt and
>> fails to find the record you're searching for using a scan of that
>> index, then it seems pretty likely that the record would also not be
>> found in the index when doing the INSERT.
>>
>> The reindex will catch the problem because it uses the heap as the
>> source of truth to build the new index.  It simply sounds like there
>> are two records in the heap because a subsequent one was added and a
>> corrupt index didn't find the original record either because it was
>> either missing from the index or because the index was corrupt in some
>> way that the record was just not found.
>>
>> David
>>
>

Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-06-06 Thread Tom Lane

Noah Misch  writes:
> equalfuncs.c has been using COMPARE_COERCIONFORM_FIELD() to ignore differences
> in fields of this type.  Does this spot have cause to depart from the pattern?

Oversight, I think.  Will fix.

regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Tom Lane

Omar Kilani  writes:
> This is a very old database (2004) that has moved forward via pg_upgrade. I
> think we did a pg_dump and pg_restore every time we hit some major
> incompatibility like float vs integer date times.

If it's that old, it's likely also survived multiple OS upgrades.
It seems clear that this index has been corrupt for awhile, and
I'm wondering whether the corruption was brought on by an OS
locale change.  There's useful info at

https://wiki.postgresql.org/wiki/Locale_data_changes

regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

I was able to restore a snapshot where the database was fully consistent.

2021-06-06 14:52:34.748 UTC [0/48529] LOG:  database system was
interrupted while in recovery at log time 2021-06-06 06:57:27 UTC
2021-06-06 14:52:34.748 UTC [0/48529] HINT:  If this has occurred more
than once some data might be corrupted and you might need to choose an
earlier recovery target.
2021-06-06 14:52:40.847 UTC [0/48529] LOG:  database system was not
properly shut down; automatic recovery in progress
2021-06-06 14:52:40.856 UTC [0/48529] LOG:  invalid record length at
8849/3298: wanted 24, got 0
2021-06-06 14:52:40.856 UTC [0/48529] LOG:  redo is not required
2021-06-06 14:52:40.865 UTC [0/48529] LOG:  checkpoint starting:
end-of-recovery immediate
2021-06-06 14:52:40.909 UTC [0/48529] LOG:  checkpoint complete: wrote
0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.006 s, sync=0.001 s, total=0.050 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
2021-06-06 14:52:40.964 UTC [0/48527] LOG:  database system is ready
to accept connections

I'm running pg_verify_checksums on the cluster, but the database is
many TB so it'll be a bit.

I missed one of your questions before -- no, it wasn't created with
CREATE INDEX CONCURRENTLY. That index was created by 11.2's pg_restore
roughly 2 years ago.

On Sun, Jun 6, 2021 at 6:53 AM Omar Kilani  wrote:
>
> I just remembered, I have… many… snapshots of the on disk data prior to 
> starting 11.12.
>
> It should be possible to start at a specific point in time with the index in 
> the state it was in prior to the insert.
>
> How do I prove or disprove… hardware issues?
>
> Also… I ran the select on 3 of our standby servers and they all had the same 
> issue. Presumably the index would be “corrupt” in the same way across 
> multiple very different machines from WAL apply?
>
> On Sun, Jun 6, 2021 at 6:41 AM Omar Kilani  wrote:
>>
>> Hey David,
>>
>> Hmmm… it wasn’t init on 11.x.
>>
>> This is a very old database (2004) that has moved forward via pg_upgrade. I 
>> think we did a pg_dump and pg_restore every time we hit some major 
>> incompatibility like float vs integer date times.
>>
>> The current DB started as a pg_restore into 10.x. Then was pg_upgrade’d to 
>> 11.2. Has been minor upgraded a bunch of times since and we upgraded to 
>> 11.12… just before this happened.
>>
>> As in, we just restarted our cluster on 11.12. Everything was working fine 
>> (and the index was working) and then the INSERT happened.
>>
>> I have checksums on and I did a VACUUM on the table just before the REINDEX.
>>
>> I’m 99.9% confident the hardware isn’t bad.
>>
>> The only time we’ve seen this is with Unicode input.
>>
>> Regards,
>> Omar
>>
>> On Sun, Jun 6, 2021 at 4:59 AM David Rowley  wrote:
>>>
>>> On Sun, 6 Jun 2021 at 22:55, Omar Kilani  wrote:
>>> > There seems to be a weird bug in Postgres (last tested 11.12) where it
>>> > allows an INSERT into a table with a UNIQUE / UNIQUE CONSTRAINT index
>>> > on a TEXT/VARCHAR when there's already a value present in that index,
>>> > but only for UTF-8 input.
>>>
>>> It would be good to know a bit about the history of this instance.
>>> Was the initdb done on 11.12? Or some other 11.x version?  Or was this
>>> instance pg_upgraded from some previous major version?
>>>
>>> There was a bug fixed in 11.11 that caused CREATE INDEX CONCURRENTLY
>>> possibly to miss rows that were inserted by a prepared transaction.
>>> Was this index created with CREATE INDEX CONCURRENTLY?
>>>
>>> > What I sort of don't get is... before we insert anything into these
>>> > tables, we always check to see if a value already exists. And Postgres
>>> > must be returning no results for some reason. So it goes to insert a
>>> > duplicate value which somehow succeeds despite the unique index, but
>>> > then a reindex says it's a duplicate. Pretty weird.
>>>
>>> That does not seem that weird to me.  If the index is corrupt and
>>> fails to find the record you're searching for using a scan of that
>>> index, then it seems pretty likely that the record would also not be
>>> found in the index when doing the INSERT.
>>>
>>> The reindex will catch the problem because it uses the heap as the
>>> source of truth to build the new index.  It simply sounds like there
>>> are two records in the heap because a subsequent one was added and a
>>> corrupt index didn't find the original record either because it was
>>> either missing from the index or because the index was corrupt in some
>>> way that the record was just not found.
>>>
>>> David

Re: Misplaced superuser check in pg_log_backend_memory_contexts()

2021-06-06 Thread Tom Lane

Julien Rouhaud  writes:
> On Sun, Jun 06, 2021 at 03:53:10PM +0900, Michael Paquier wrote:
>> It seems to me that we had better check for a superuser at the
>> beginning of the function, like in the attached.

> However +1 for the patch, as it seems more consistent to always get a
> permission failure if you're not a superuser.

Yeah, it's just weird if such a check is not the first thing
in the function.  Even if you can convince yourself that the
actions taken before that don't create any security issue today,
it's not hard to imagine that innocent future code rearrangements
could break that argument.  What's the value of postponing the
check anyway?

regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hey Tom,

The database was pg_dump'ed out of 10.4 and pg_restore'd into 11.2 on
a RHEL 7.x machine.

The only other upgrade has been to RHEL 8.x. So the locale data change
might have changed something -- thanks for that information.

We've seen this issue on a different table before upgrading to RHEL
8.x, though. And only on that table, because it's user-generated and
gets a bunch of Unicode data input into a UNIQUE index.

I'm not saying the index isn't corrupt as in something's not wrong
with it. I'm saying that during normal Postgres operation the index
has somehow got itself into this state, and I'm fairly sure it's not
the hardware.

Thanks again.

Regards,
Omar

On Sun, Jun 6, 2021 at 8:08 AM Tom Lane  wrote:
>
> Omar Kilani  writes:
> > This is a very old database (2004) that has moved forward via pg_upgrade. I
> > think we did a pg_dump and pg_restore every time we hit some major
> > incompatibility like float vs integer date times.
>
> If it's that old, it's likely also survived multiple OS upgrades.
> It seems clear that this index has been corrupt for awhile, and
> I'm wondering whether the corruption was brought on by an OS
> locale change.  There's useful info at
>
> https://wiki.postgresql.org/wiki/Locale_data_changes
>
> regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hmmm.

Is it possible that in some version of 11.x, the corrupt index stopped
"working"? As in, yes, it may have been corrupt but still returned
data on version 11.y, whereas on version 11.z it's no longer working
and returns nothing?

David mentions that change in 11.11...?

I guess I can try some older versions of 11.x on this cluster for
completeness' sake.

Regards,
Omar

On Sun, Jun 6, 2021 at 8:18 AM Omar Kilani  wrote:
>
> Hey Tom,
>
> The database was pg_dump'ed out of 10.4 and pg_restore'd into 11.2 on
> a RHEL 7.x machine.
>
> The only other upgrade has been to RHEL 8.x. So the locale data change
> might have changed something -- thanks for that information.
>
> We've seen this issue on a different table before upgrading to RHEL
> 8.x, though. And only on that table, because it's user-generated and
> gets a bunch of Unicode data input into a UNIQUE index.
>
> I'm not saying the index isn't corrupt as in something's not wrong
> with it. I'm saying that during normal Postgres operation the index
> has somehow got itself into this state, and I'm fairly sure it's not
> the hardware.
>
> Thanks again.
>
> Regards,
> Omar
>
> On Sun, Jun 6, 2021 at 8:08 AM Tom Lane  wrote:
> >
> > Omar Kilani  writes:
> > > This is a very old database (2004) that has moved forward via pg_upgrade. 
> > > I
> > > think we did a pg_dump and pg_restore every time we hit some major
> > > incompatibility like float vs integer date times.
> >
> > If it's that old, it's likely also survived multiple OS upgrades.
> > It seems clear that this index has been corrupt for awhile, and
> > I'm wondering whether the corruption was brought on by an OS
> > locale change.  There's useful info at
> >
> > https://wiki.postgresql.org/wiki/Locale_data_changes
> >
> > regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Magnus Hagander

On Sun, Jun 6, 2021 at 5:19 PM Omar Kilani  wrote:
>
> Hey Tom,
>
> The database was pg_dump'ed out of 10.4 and pg_restore'd into 11.2 on
> a RHEL 7.x machine.
>
> The only other upgrade has been to RHEL 8.x. So the locale data change
> might have changed something -- thanks for that information.

There is no might -- if you upgraded from RHEL 7 to RHEL 8 without
doing a reindex or a dump/reload there, you are pretty much guaranteed
to have corrupt text indexes from that. Regardless of PostgreSQL
versions, this was about the RHEL upgrade not the Postgres one.

> We've seen this issue on a different table before upgrading to RHEL
> 8.x, though. And only on that table, because it's user-generated and
> gets a bunch of Unicode data input into a UNIQUE index.

This indicates you may have more than one problem.

But that doesn't mean it's not both, sadly.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hey Magnus,

Hmmm, okay -- that's unfortunate. :)

I apparently wrote a script in 2017 to find duplicates from this issue
on the other table and fix them up. Maybe a similar locale thing
happened back then?

Anyway, for what it's worth:

Checksum scan completed
Data checksum version: 1
Files scanned:  7068
Blocks scanned: 524565247
Bad checksums:  0

Regards,
Omar

On Sun, Jun 6, 2021 at 9:15 AM Magnus Hagander  wrote:
>
> On Sun, Jun 6, 2021 at 5:19 PM Omar Kilani  wrote:
> >
> > Hey Tom,
> >
> > The database was pg_dump'ed out of 10.4 and pg_restore'd into 11.2 on
> > a RHEL 7.x machine.
> >
> > The only other upgrade has been to RHEL 8.x. So the locale data change
> > might have changed something -- thanks for that information.
>
> There is no might -- if you upgraded from RHEL 7 to RHEL 8 without
> doing a reindex or a dump/reload there, you are pretty much guaranteed
> to have corrupt text indexes from that. Regardless of PostgreSQL
> versions, this was about the RHEL upgrade not the Postgres one.
>
>
>
> > We've seen this issue on a different table before upgrading to RHEL
> > 8.x, though. And only on that table, because it's user-generated and
> > gets a bunch of Unicode data input into a UNIQUE index.
>
> This indicates you may have more than one problem.
>
> But that doesn't mean it's not both, sadly.
>
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/

pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Justin Pryzby

An internal instance was rejecting connections with "too many clients".
I found a bunch of processes waiting on a futex and I was going to upgrade the
kernel (3.10.0-514) and dismiss the issue.

However, I also found an autovacuum chewing 100% CPU, and it appears the
problem is actually because autovacuum has locked a page of pg-statistic, and
every other process then gets stuck waiting in the planner.  I checked a few
and found these:

#13 0x00961908 in SearchSysCache3 (cacheId=cacheId@entry=59, 
key1=key1@entry=2610, key2=key2@entry=2, key3=key3@entry=0) at syscache.c:1156

As for the autovacuum:

$ ps -wwf 18950
UIDPID  PPID  C STIME TTY  STAT   TIME CMD
postgres 18950  7179 93 Jun04 ?ts   2049:20 postgres: autovacuum worker 
ts

(gdb)
#0  0x004f995c in heap_prune_satisfies_vacuum 
(prstate=prstate@entry=0x7ffe7a0cd0c0, tup=tup@entry=0x7ffe7a0cce50, 
buffer=buffer@entry=14138) at pruneheap.c:423
#1  0x004fa154 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, 
rootoffnum=11, buffer=14138) at pruneheap.c:644
#2  heap_page_prune (relation=relation@entry=0x7f0349466d28, 
buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 
, old_snap_xmin=old_snap_xmin@entry=0,
old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, 
off_loc=, off_loc@entry=0x1d1b3fc) at pruneheap.c:278
#3  0x004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, 
buf=buf@entry=14138, blkno=blkno@entry=75, page=page@entry=0x2aaab2089e00 "G\f",
vistest=vistest@entry=0xe7bcc0 , 
prunestate=prunestate@entry=0x7ffe7a0ced80) at vacuumlazy.c:1712
#4  0x00500263 in lazy_scan_heap (aggressive=, 
params=0x1c77b7c, vacrel=) at vacuumlazy.c:1347
#5  heap_vacuum_rel (rel=0x7f0349466d28, params=0x1c77b7c, bstrategy=) at vacuumlazy.c:612
#6  0x0067418a in table_relation_vacuum (bstrategy=, 
params=0x1c77b7c, rel=0x7f0349466d28) at 
../../../src/include/access/tableam.h:1678
#7  vacuum_rel (relid=2619, relation=, 
params=params@entry=0x1c77b7c) at vacuum.c:2001
#8  0x0067556e in vacuum (relations=0x1cc5008, 
params=params@entry=0x1c77b7c, bstrategy=, 
bstrategy@entry=0x1c77400, isTopLevel=isTopLevel@entry=true) at vacuum.c:461
#9  0x00783c13 in autovacuum_do_vac_analyze (bstrategy=0x1c77400, 
tab=0x1c77b78) at autovacuum.c:3284
#10 do_autovacuum () at autovacuum.c:2537
#11 0x00784073 in AutoVacWorkerMain (argv=0x0, argc=0) at 
autovacuum.c:1715
#12 0x007841c9 in StartAutoVacWorker () at autovacuum.c:1500
#13 0x00792b9c in StartAutovacuumWorker () at postmaster.c:5547
#14 sigusr1_handler (postgres_signal_arg=) at postmaster.c:5251
#15 
#16 0x7f0346c56783 in __select_nocancel () from /lib64/libc.so.6
#17 0x0048ee7d in ServerLoop () at postmaster.c:1709
#18 0x00793e98 in PostmasterMain (argc=argc@entry=3, 
argv=argv@entry=0x1bed280) at postmaster.c:1417
#19 0x00491272 in main (argc=3, argv=0x1bed280) at main.c:209

heap_page_prune() is being called repeatedly, with (I think) the same arguments.

(gdb) c
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, 
buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 
, old_snap_xmin=old_snap_xmin@entry=0, 
old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, 
off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225
225 in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, 
buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 
, old_snap_xmin=old_snap_xmin@entry=0, 
old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, 
off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225
225 in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, 
buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 
, old_snap_xmin=old_snap_xmin@entry=0, 
old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, 
off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225
225 in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, 
buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 
, old_snap_xmin=old_snap_xmin@entry=0, 
old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, 
off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225
225 in pruneheap.c

(gdb) p *vacrel
$3 = {rel = 0x7f0349466d28, indrels = 0x1d1b500, nindexes = 1, 
do_index_vacuuming = true, do_index_cleanup = true, do_failsafe = false, 
bstrategy = 0x1c77400, lps = 0x0, old_rel_pages = 80, 
  old_live_tuples = 1101, relfrozenxid = 909081649, relminmxid = 53341561, 
OldestXmin = 913730329, FreezeLimit = 863730329, MultiXactCutoff = 48553302, 
relnamespace = 0x1d1b520 "pg_catalog", 
  relname = 0x1d1b548 "pg_statistic", indname = 0x0, blkno = 75, offnum = 15, 
phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, dead_tuples = 0x1ccef10, rel_pages = 85,

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Chapman Flack

On 06/06/21 11:08, Omar Kilani wrote:
> I'm running pg_verify_checksums on the cluster, but the database is
> many TB so it'll be a bit.

Index corruption because of a locale change would not be the sort of thing
checksums would detect. Entries would be put into the index in the correct
order according to the old collation. The same entries can be still there,
intact, just fine according to the checksums, only the new collation would
have put them in a different order. Index search algorithms that are fast,
because they assume the entries to be correctly ordered, will skip regions
of the index where the desired key "couldn't possibly be", and if that's
where the old ordering put it, it won't be found.

Regards,
-Chap

Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-06-06 Thread Tom Lane

I wrote:
> Noah Misch  writes:
>> equalfuncs.c has been using COMPARE_COERCIONFORM_FIELD() to ignore 
>> differences
>> in fields of this type.  Does this spot have cause to depart from the 
>> pattern?

> Oversight, I think.  Will fix.

After looking closer, I see that there are a couple of very very minor
ways in which parse analysis changes behavior based on the value of
FuncCall.funcformat:

* transformRangeFunction won't apply the appropriate transformation to
a multiple-argument unnest() unless the format is COERCE_EXPLICIT_CALL.
(This is likely a no-op, though, as no grammar production that emits
COERCE_SQL_SYNTAX could apply to the function name "unnest".)

* ParseFuncOrColumn will not believe that a FuncCall could_be_projection
unless the format is COERCE_EXPLICIT_CALL.  This is next door to a no-op,
since other restrictions such as nargs == 1 would usually suffice to
reject COERCE_SQL_SYNTAX calls, but maybe there are corner cases where
it'd matter.

So if you wanted to be picky you could claim that within FuncCall,
funcformat is semantically significant and thus that equalfuncs.c is
coded correctly.  Nonetheless I'm inclined to think that it'd be better
to use COMPARE_COERCIONFORM_FIELD here.  I'm quite sure I didn't make
the above analysis when I wrote the code; using COMPARE_SCALAR_FIELD
was just reflex.

We could make use of COMPARE_COERCIONFORM_FIELD 100% correct by removing
these two tests of the funcformat value, but on the whole I doubt that
would be better.

BTW, I'm not sure any of this matters anyway; do we ever use equal()
on raw parse trees, except for debug purposes?

Thoughts?

regards, tom lane

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

Hey Chap,

Yeah, I understand. Just ruling out the bad hardware scenario.

Plus the next person to Google this will hopefully stumble upon this
thread. :)

Regards,
Omar

On Sun, Jun 6, 2021 at 9:36 AM Chapman Flack  wrote:

> On 06/06/21 11:08, Omar Kilani wrote:
> > I'm running pg_verify_checksums on the cluster, but the database is
> > many TB so it'll be a bit.
>
> Index corruption because of a locale change would not be the sort of thing
> checksums would detect. Entries would be put into the index in the correct
> order according to the old collation. The same entries can be still there,
> intact, just fine according to the checksums, only the new collation would
> have put them in a different order. Index search algorithms that are fast,
> because they assume the entries to be correctly ordered, will skip regions
> of the index where the desired key "couldn't possibly be", and if that's
> where the old ordering put it, it won't be found.
>
> Regards,
> -Chap
>

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Matthias van de Meent

On Sun, 6 Jun 2021 at 18:35, Justin Pryzby  wrote:
>
> An internal instance was rejecting connections with "too many clients".
> I found a bunch of processes waiting on a futex and I was going to upgrade the
> kernel (3.10.0-514) and dismiss the issue.
>
> However, I also found an autovacuum chewing 100% CPU, and it appears the
> problem is actually because autovacuum has locked a page of pg-statistic, and
> every other process then gets stuck waiting in the planner.  I checked a few
> and found these:

My suspicion is that for some tuple on that page
HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
thinks should have been cleaned up by heap_page_prune, but isn't. This
would result in an infinite loop in lazy_scan_prune where the
condition on vacuumlazy.c:1800 will always be true, but the retry will
not do the job it's expected to do.

Apart from reporting this suspicion, I sadly can't help you much
further, as my knowledge and experience on vacuum and snapshot
horizons is only limited and probably won't help you in this.

I think it would be helpful for further debugging if we would have the
state of the all tuples on that page (well, the tuple headers with
their transactionids and their line pointers), as that would help with
determining if my suspicion could be correct.

With regards,

Matthias van de Meent

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Tom Lane

Matthias van de Meent  writes:
> On Sun, 6 Jun 2021 at 18:35, Justin Pryzby  wrote:
>> However, I also found an autovacuum chewing 100% CPU, and it appears the
>> problem is actually because autovacuum has locked a page of pg-statistic, and
>> every other process then gets stuck waiting in the planner.  I checked a few
>> and found these:

> My suspicion is that for some tuple on that page
> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
> thinks should have been cleaned up by heap_page_prune, but isn't. This
> would result in an infinite loop in lazy_scan_prune where the
> condition on vacuumlazy.c:1800 will always be true, but the retry will
> not do the job it's expected to do.

Since Justin's got a debugger on the process already, it probably
wouldn't be too hard to confirm or disprove that theory by stepping
through the code.

regards, tom lane

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Peter Geoghegan

On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby  wrote:
> I'll leave the instance running for a little bit before restarting (or kill-9)
> in case someone requests more info.

How about dumping the page image out, and sharing it with the list?
This procedure should work fine from gdb:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB

I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
imagine that you have the instance already stuck in an infinite loop,
so what we'll probably see from the page image is the page after the
first prune and another no-progress prune.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Andres Freund

Hi, 

On Sun, Jun 6, 2021, at 10:59, Tom Lane wrote:
> Matthias van de Meent  writes:
> > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby  wrote:
> >> However, I also found an autovacuum chewing 100% CPU, and it appears the
> >> problem is actually because autovacuum has locked a page of pg-statistic, 
> >> and
> >> every other process then gets stuck waiting in the planner.  I checked a 
> >> few
> >> and found these:
> 
> > My suspicion is that for some tuple on that page
> > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
> > thinks should have been cleaned up by heap_page_prune, but isn't. This
> > would result in an infinite loop in lazy_scan_prune where the
> > condition on vacuumlazy.c:1800 will always be true, but the retry will
> > not do the job it's expected to do.
> 
> Since Justin's got a debugger on the process already, it probably
> wouldn't be too hard to confirm or disprove that theory by stepping
> through the code.

If that turns out to be the issue, it'd be good to check what prevents the 
tuple from being considered fully dead, by stepping through the visibility 
test...

Andres

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Justin Pryzby

On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby  wrote:
> > I'll leave the instance running for a little bit before restarting (or 
> > kill-9)
> > in case someone requests more info.
> 
> How about dumping the page image out, and sharing it with the list?
> This procedure should work fine from gdb:

Sorry, but I already killed the process to try to follow Matthias' suggestion.
I have a core file from "gcore" but it looks like it's incomplete and the
address is now "out of bounds"...

#2  0x004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, 
buf=buf@entry=14138, blkno=blkno@entry=75, page=page@entry=0x2aaab2089e00 
,

I saved a copy of the datadir, but a manual "vacuum" doesn't trigger the
problem.  So if Matthias' theory is right, it seems like there may be a race
condition.  Maybe that goes without saying.

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Peter Geoghegan

On Sun, Jun 6, 2021 at 11:43 AM Justin Pryzby  wrote:
> Sorry, but I already killed the process to try to follow Matthias' suggestion.
> I have a core file from "gcore" but it looks like it's incomplete and the
> address is now "out of bounds"...

Based on what you said about ending up back in lazy_scan_prune()
alone, I think he's right. That is, I agree that it's very likely that
the stuck VACUUM would not have become stuck had the "goto retry on
HEAPTUPLE_DEAD inside lazy_scan_prune" thing not been added by commit
8523492d4e3. But that in itself doesn't necessarily implicate commit
8523492d4e3.

The interesting question is: Why doesn't heap_page_prune() ever agree
with HeapTupleSatisfiesVacuum() calls made from lazy_scan_prune(), no
matter how many times the call to heap_page_prune() is repeated? (It's
repeated to try to resolve the disagreement that aborted xacts can
sometimes cause.)

If I had to guess I'd say that the underlying problem has something to
do with heap_prune_satisfies_vacuum() not agreeing with
HeapTupleSatisfiesVacuum(), perhaps only when GlobalVisCatalogRels is
used. But that's a pretty wild guess at this point.

-- 
Peter Geoghegan

Re: Since '2001-09-09 01:46:40'::timestamp microseconds are lost when extracting epoch

2021-06-06 Thread Tom Lane

I wrote:
> We could make use of COMPARE_COERCIONFORM_FIELD 100% correct by removing
> these two tests of the funcformat value, but on the whole I doubt that
> would be better.

On still closer inspection, that seems like it'd be fine.  All of
the gram.y productions that emit COERCE_SQL_SYNTAX also produce
schema-qualified function names (via SystemFuncName); and it seems
hard to see a use-case where we'd not do that.  This makes the two
checks I cited 100% redundant, because the conditions they are in
also insist on an unqualified function name.  So let's just take them
out again, making it strictly OK to use COMPARE_COERCIONFORM_FIELD.

regards, tom lane

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tomas Vondra

On 6/6/21 7:37 AM, Noah Misch wrote:
> On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote:
>> OK, pushed after a bit more polishing and testing.
> 
> This added a "transformed" field to CreateStatsStmt, but it didn't mention
> that field in src/backend/nodes.  Should those functions handle the field?
> 

Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not
sure if it can result in error/failure or just inefficiency (due to
transforming the expressions repeatedly), but it should do whatever
CREATE INDEX is doing.

Thanks for noticing! Fixed by d57ecebd12.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tom Lane

Tomas Vondra  writes:
> On 6/6/21 7:37 AM, Noah Misch wrote:
>> This added a "transformed" field to CreateStatsStmt, but it didn't mention
>> that field in src/backend/nodes.  Should those functions handle the field?

> Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not
> sure if it can result in error/failure or just inefficiency (due to
> transforming the expressions repeatedly), but it should do whatever
> CREATE INDEX is doing.

I'm curious about how come the buildfarm didn't notice this.  The
animals using COPY_PARSE_PLAN_TREES should have failed.  The fact
that they didn't implies that there's no test case that makes use
of a nonzero value for this field, which seems like a testing gap.

regards, tom lane

Re: speed up verifying UTF-8

2021-06-06 Thread John Naylor

On Thu, Jun 3, 2021 at 3:22 PM Heikki Linnakangas  wrote:
>
> On 03/06/2021 22:16, Heikki Linnakangas wrote:
> > On 03/06/2021 22:10, John Naylor wrote:
> >> On Thu, Jun 3, 2021 at 3:08 PM Heikki Linnakangas  >> > wrote:
> >>   > x1 = half1 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> >>   > x2 = half2 + UINT64CONST(0x7f7f7f7f7f7f7f7f);
> >>   >
> >>   > /* then check that the high bit is set in each
byte. */
> >>   > x = (x1 | x2);
> >>   > x &= UINT64CONST(0x8080808080808080);
> >>   > if (x != UINT64CONST(0x8080808080808080))
> >>   > return 0;

> If you replace (x1 | x2) with (x1 & x2) above, I think it's correct.

After looking at it again with fresh eyes, I agree this is correct. I
modified the regression tests to pad the input bytes with ascii so that the
code path that works on 16-bytes at a time is tested. I use both UTF-8
input tables for some of the additional tests. There is a de facto
requirement that the descriptions are unique across both of the input
tables. That could be done more elegantly, but I wanted to keep things
simple for now.

v11-0001 is an improvement over v10:

clang 12.0.5 / MacOS:

master:

 chinese | mixed | ascii
-+---+---
 975 |   686 |   369

v10-0001:

 chinese | mixed | ascii
-+---+---
 930 |   549 |   109

v11-0001:

 chinese | mixed | ascii
-+---+---
 687 |   440 |64

gcc 4.8.5 / Linux (older machine)

master:

 chinese | mixed | ascii
-+---+---
2559 |  1495 |   825

v10-0001:

 chinese | mixed | ascii
-+---+---
2966 |  1034 |   156

v11-0001:

 chinese | mixed | ascii
-+---+---
2242 |   824 |   140

Previous testing on POWER8 and Arm64 leads me to expect similar results
there as well.

I also looked again at 0002 and decided I wasn't quite happy with the test
coverage. Previously, the code padded out a short input with ascii so that
the 16-bytes-at-a-time code path was always exercised. However, that
required some finicky complexity and still wasn't adequate. For v11, I
ripped that out and put the responsibility on the regression tests to make
sure the various code paths are exercised.

--
John Naylor
EDB: http://www.enterprisedb.com

v11-0001-Rewrite-pg_utf8_verifystr-for-speed.patch
Description: Binary data

v11-0002-Use-SSE-instructions-for-pg_utf8_verifystr-where.patch
Description: Binary data

Re: list of extended statistics on psql (\dX)

2021-06-06 Thread Tomas Vondra

Hi,

Here's a slightly more complete patch, tweaking the regression tests a
bit to detect this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 2abf255798..eba659705e 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -4774,7 +4774,7 @@ listExtendedStats(const char *pattern)
 	processSQLNamePattern(pset.db, &buf, pattern,
 		  false, false,
 		  "es.stxnamespace::pg_catalog.regnamespace::text", "es.stxname",
-		  NULL, NULL);
+		  NULL, "pg_catalog.pg_statistics_obj_is_visible(es.oid)");
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 8c214d8dfc..fa9fa9c8f0 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -2987,6 +2987,7 @@ create statistics stts_s1.stts_foo on col1, col2 from stts_t3;
 create statistics stts_s2.stts_yama (dependencies, mcv) on col1, col3 from stts_t3;
 insert into stts_t1 select i,i from generate_series(1,100) i;
 analyze stts_t1;
+set search_path to public, stts_s1, stts_s2, tststats;
 \dX
List of extended statistics
   Schema  |  Name  |   Definition   | Ndistinct | Dependencies |   MCV   
@@ -3002,7 +3003,7 @@ analyze stts_t1;
  public   | stts_hoge  | col1, col2, col3 FROM stts_t3  | defined   | defined  | defined
  stts_s1  | stts_foo   | col1, col2 FROM stts_t3| defined   | defined  | defined
  stts_s2  | stts_yama  | col1, col3 FROM stts_t3|   | defined  | defined
- tststats | priv_test_stats| a, b FROM tststats.priv_test_tbl   |   |  | defined
+ tststats | priv_test_stats| a, b FROM priv_test_tbl|   |  | defined
 (12 rows)
 
 \dX stts_?
@@ -3037,7 +3038,7 @@ analyze stts_t1;
  public   | stts_hoge  | col1, col2, col3 FROM stts_t3  | defined   | defined  | defined
  stts_s1  | stts_foo   | col1, col2 FROM stts_t3| defined   | defined  | defined
  stts_s2  | stts_yama  | col1, col3 FROM stts_t3|   | defined  | defined
- tststats | priv_test_stats| a, b FROM tststats.priv_test_tbl   |   |  | defined
+ tststats | priv_test_stats| a, b FROM priv_test_tbl|   |  | defined
 (12 rows)
 
 \dX+ stts_?
@@ -3064,30 +3065,45 @@ analyze stts_t1;
  stts_s2 | stts_yama | col1, col3 FROM stts_t3 |   | defined  | defined
 (1 row)
 
+set search_path to public, stts_s1;
+\dX
+  List of extended statistics
+ Schema  |  Name  |   Definition   | Ndistinct | Dependencies |   MCV   
+-+++---+--+-
+ public  | func_deps_stat | ((a * 2)), upper(b), ((c + (1)::numeric)) FROM functional_dependencies |   | defined  | 
+ public  | mcv_lists_arrays_stats | a, b, c FROM mcv_lists_arrays  |   |  | defined
+ public  | mcv_lists_bool_stats   | a, b, c FROM mcv_lists_bool|   |  | defined
+ public  | mcv_lists_stats| a, b, d FROM mcv_lists |   |  | defined
+ public  | stts_1 | a, b FROM stts_t1  | defined   |  | 
+ public  | stts_2 | a, b FROM stts_t1  | defined   | defined  | 
+ public  | stts_3 | a, b FROM stts_t1  | defined   | defined  | defined
+ public  | stts_4 | b, c FROM stts_t2  | defined   | defined  | defined
+ public  | stts_hoge  | col1, col2, col3 FROM stts_t3  | defined   | defined  | defined
+ stts_s1 | stts_foo   | col1, col2 FROM stts_t3

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tomas Vondra

On 6/6/21 9:17 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 6/6/21 7:37 AM, Noah Misch wrote:
>>> This added a "transformed" field to CreateStatsStmt, but it didn't mention
>>> that field in src/backend/nodes.  Should those functions handle the field?
> 
>> Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not
>> sure if it can result in error/failure or just inefficiency (due to
>> transforming the expressions repeatedly), but it should do whatever
>> CREATE INDEX is doing.
> 
> I'm curious about how come the buildfarm didn't notice this.  The
> animals using COPY_PARSE_PLAN_TREES should have failed.  The fact
> that they didn't implies that there's no test case that makes use
> of a nonzero value for this field, which seems like a testing gap.
> 

AFAICS the reason is pretty simple - the COPY_PARSE_PLAN_TREES checks
look like this:

List   *new_list = copyObject(raw_parsetree_list);

/* This checks both copyObject() and the equal() routines... */
if (!equal(new_list, raw_parsetree_list))
elog(WARNING, "copyObject() failed to produce an equal raw
   parse tree");
else
raw_parsetree_list = new_list;
}

But if the field is missing from all the functions, equal() can't detect
that copyObject() did not actually copy it. It'd detect a case when the
field was added just to one place, but not this. The CREATE INDEX (which
served as an example for CREATE STATISTICS) has exactly the same issue.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

2021-06-06 Thread Justin Pryzby

On Sun, Jun 06, 2021 at 07:26:22PM +0200, Matthias van de Meent wrote:
> I think it would be helpful for further debugging if we would have the
> state of the all tuples on that page (well, the tuple headers with
> their transactionids and their line pointers), as that would help with
> determining if my suspicion could be correct.

This is the state of the page after I killed the cluster and started a
postmaster on a copy of its datadir, with autovacuum=off:

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, 
t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM 
heap_page_items(get_raw_page('pg_statistic', 75)) ;
 lp | lp_off | lp_flags | lp_len |  t_xmin   |  t_xmax   | t_field3 | t_ctid  | 
t_infomask2 | t_infomask | t_hoff |  t_bits  | t_oid 
++--++---+---+--+-+-+++--+---
  1 |  0 |3 |  0 |   |   |  | | 
|||  |  
  2 |  0 |3 |  0 |   |   |  | | 
|||  |  
  3 |  0 |3 |  0 |   |   |  | | 
|||  |  
  4 |   7624 |1 |564 | 913726913 | 913730328 |0 | (83,19) | 
 31 |   8451 | 32 | 10100010 |  
  5 |   6432 |1 |   1188 | 913726913 | 913730328 |0 | (83,20) | 
 31 |   8451 | 32 | 11010011 |  
  6 |   6232 |1 |195 | 913726913 | 913730328 |0 | (83,21) | 
 31 |   8451 | 32 | 11100010 |  
  7 |   6032 |1 |195 | 913726913 | 913730328 |0 | (83,22) | 
 31 |   8451 | 32 | 11100010 |  
  8 |   5848 |1 |181 | 913726913 | 913730328 |0 | (83,23) | 
 31 |   8451 | 32 | 11100010 |  
  9 |   5664 |1 |181 | 913726913 | 913730328 |0 | (81,13) | 
 31 |   8451 | 32 | 11100010 |  
 10 |   5488 |1 |176 | 913726913 | 913730328 |0 | (81,14) | 
 31 |   8451 | 32 | 11100010 |  
 11 |   5312 |1 |176 | 913726913 | 913730328 |0 | (82,13) | 
 31 |   8451 | 32 | 11100010 |  
 12 |   5128 |1 |181 | 913726913 | 913730328 |0 | (79,57) | 
 31 |   8451 | 32 | 11100010 |  
 13 |   4952 |1 |176 | 913726913 | 913730328 |0 | (80,25) | 
 31 |   8451 | 32 | 11100010 |  
 14 |   4776 |1 |176 | 913726913 | 913730328 |0 | (80,26) | 
 31 |   8451 | 32 | 11100010 |  
 15 |   4600 |1 |176 | 913726913 | 913730328 |0 | (84,1)  | 
 31 |   8451 | 32 | 11100010 |  
 16 |   4424 |1 |176 | 913726913 | 913730328 |0 | (84,2)  | 
 31 |   8451 | 32 | 11100010 |  
 17 |   4248 |1 |176 | 913726913 | 913730328 |0 | (84,3)  | 
 31 |   8451 | 32 | 11100010 |  
 18 |   2880 |1 |   1364 | 913726913 | 913730328 |0 | (84,4)  | 
 31 |   8451 | 32 | 11010011 |  
 19 |   2696 |1 |179 | 913726914 | 0 |0 | (75,19) | 
 31 |  10499 | 32 | 11100010 |  
 20 |   2520 |1 |176 | 913726914 | 0 |0 | (75,20) | 
 31 |  10499 | 32 | 11100010 |  
 21 |   2336 |1 |179 | 913726914 | 0 |0 | (75,21) | 
 31 |  10499 | 32 | 11100010 |  
(21 rows)

(In the interest of full disclosure, this was a dumb cp -a of the live datadir
where the processes had been stuck for a day, and where I was unable to open a
client session).

8451 = HEAP_KEYS_UPDATED + 259 atts?

Note that after I vacuum pg_statistic, it looks like this:

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, 
t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROM 
heap_page_items(get_raw_page('pg_statistic', 75));
 lp | lp_off | lp_flags | lp_len |  t_xmin   | t_xmax | t_field3 | t_ctid  | 
t_infomask2 | t_infomask | t_hoff |  t_bits  | t_oid 
++--++---++--+-+-+--

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tom Lane

Tomas Vondra  writes:
> On 6/6/21 9:17 PM, Tom Lane wrote:
>> I'm curious about how come the buildfarm didn't notice this.  The
>> animals using COPY_PARSE_PLAN_TREES should have failed.  The fact
>> that they didn't implies that there's no test case that makes use
>> of a nonzero value for this field, which seems like a testing gap.

> AFAICS the reason is pretty simple - the COPY_PARSE_PLAN_TREES checks
> look like this:
> ...
> But if the field is missing from all the functions, equal() can't detect
> that copyObject() did not actually copy it.

Right, that code would only detect a missing copyfuncs.c line if
equalfuncs.c did have the line, which isn't all that likely.  However,
we then pass the copied node on to further processing, which in principle
should result in visible failures when copyfuncs.c is missing a line.

I think the reason it didn't is that the transformed field would always
be zero (false) in grammar output.  We could only detect a problem if
we copied already-transformed nodes and then used them further.  Even
then it *might* not fail, because the consequence would likely be an
extra round of parse analysis on the expressions, which is likely to
be a no-op.

Not sure if there's a good way to improve that.  I hope sometime soon
we'll be able to auto-generate these functions, and then the risk of
this sort of mistake will go away (he says optimistically).

regards, tom lane

Re: installcheck failure in indirect_toast with default_toast_compression = lz4

2021-06-06 Thread Justin Pryzby

On Sat, Jun 05, 2021 at 09:20:43AM +0900, Michael Paquier wrote:
> As said in $subject, installcheck fails once I set up a server with
> default_toast_compression = lz4 in the test indirect_toast.  Please
> see the attached for the diffs.
> 
> The issue is that the ordering of the tuples returned by UPDATE
> RETURNING is not completely stable.  Perhaps we should just enforce
> the order of those tuples by wrapping the DMLs into a CTE and use an
> ORDER BY in the outer query.

See also a prior discussion:
https://www.postgresql.org/message-id/CAFiTN-sm8Dpx3q92g5ohTdZu1_wKsw96-KiEMf3SoK8DhRPfWw%40mail.gmail.com

-- 
Justin

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Justin Pryzby

On Sun, Jun 06, 2021 at 03:54:48AM -0700, Omar Kilani wrote:
> What I sort of don't get is... before we insert anything into these
> tables, we always check to see if a value already exists. And Postgres
> must be returning no results for some reason. So it goes to insert a
> duplicate value which somehow succeeds despite the unique index, but
> then a reindex says it's a duplicate. Pretty weird.

In addition to the other issues, this is racy.

You 1) check if a key exists, and if not then 2) INSERT (or maybe you UPDATE if
it did exist).

https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use

Maybe you'll say that "this process only runs once", but it's not hard to
imagine that might be violated.  For example, if you restart a multi-threaded
process, does the parent make sure that the child processes die before itself
dying?  Do you create a pidfile, and do you make sure the children are dead
before removing the pidfile ?

The right way to do this since v9.6 is INSERT ON CONFLICT, which is also more
efficient in a couple ways.

-- 
Justin

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

We do use ON CONFLICT… it doesn’t work because the index is both “good” and
“bad” at the same time.

On Sun, Jun 6, 2021 at 2:03 PM Justin Pryzby  wrote:

> On Sun, Jun 06, 2021 at 03:54:48AM -0700, Omar Kilani wrote:
> > What I sort of don't get is... before we insert anything into these
> > tables, we always check to see if a value already exists. And Postgres
> > must be returning no results for some reason. So it goes to insert a
> > duplicate value which somehow succeeds despite the unique index, but
> > then a reindex says it's a duplicate. Pretty weird.
>
> In addition to the other issues, this is racy.
>
> You 1) check if a key exists, and if not then 2) INSERT (or maybe you
> UPDATE if
> it did exist).
>
> https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
>
> Maybe you'll say that "this process only runs once", but it's not hard to
> imagine that might be violated.  For example, if you restart a
> multi-threaded
> process, does the parent make sure that the child processes die before
> itself
> dying?  Do you create a pidfile, and do you make sure the children are dead
> before removing the pidfile ?
>
> The right way to do this since v9.6 is INSERT ON CONFLICT, which is also
> more
> efficient in a couple ways.
>
> --
> Justin
>

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Omar Kilani

I mean, maybe it's because I've been awake since... 7am yesterday, but
it seems to me that if Postgres fails catastrophically silently (and I
would say "it looks like all your data in this table disappeared
because of some arcane locale / btree issue that no one except Tom
Lane even knows exists" -- see the replies about hardware issues and
ON CONFLICT as an example) -- then maybe that is... not good, and
Postgres shouldn't do that?

Not only that, it's only indices which have non-ASCII or whatever in
them that silently fail, so it's like 95% of your indices work just
fine, but the ones that don't... look fine. They're not corrupt on
disk, they have their full size, etc.

How is anyone supposed to know about this issue? I've been using
Postgres since 1999, built the Postgres website, worked with Neil and
Gavin on Postgres, submitted patches to Postgres and various
Postgres-related projects, and this is the first time I've become
aware of it. I mean, maybe I'm dumb, and... fine. But your average
user is going to have no idea about this.

Why can't some "locale signature" or something be encoded into the
index so Postgres can at least warn you? Or not use the messed up
index altogether instead of silently returning no data?

On Sun, Jun 6, 2021 at 2:06 PM Omar Kilani  wrote:
>
> We do use ON CONFLICT… it doesn’t work because the index is both “good” and 
> “bad” at the same time.
>
> On Sun, Jun 6, 2021 at 2:03 PM Justin Pryzby  wrote:
>>
>> On Sun, Jun 06, 2021 at 03:54:48AM -0700, Omar Kilani wrote:
>> > What I sort of don't get is... before we insert anything into these
>> > tables, we always check to see if a value already exists. And Postgres
>> > must be returning no results for some reason. So it goes to insert a
>> > duplicate value which somehow succeeds despite the unique index, but
>> > then a reindex says it's a duplicate. Pretty weird.
>>
>> In addition to the other issues, this is racy.
>>
>> You 1) check if a key exists, and if not then 2) INSERT (or maybe you UPDATE 
>> if
>> it did exist).
>>
>> https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
>>
>> Maybe you'll say that "this process only runs once", but it's not hard to
>> imagine that might be violated.  For example, if you restart a multi-threaded
>> process, does the parent make sure that the child processes die before itself
>> dying?  Do you create a pidfile, and do you make sure the children are dead
>> before removing the pidfile ?
>>
>> The right way to do this since v9.6 is INSERT ON CONFLICT, which is also more
>> efficient in a couple ways.
>>
>> --
>> Justin

Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-06-06 Thread Tom Lane

It seems like nobody's terribly interested in figuring out why
pg_GSS_have_cred_cache() is misbehaving on Windows.  So I took
a look at disabling GSSENC in these test cases to try to silence
hamerkop's test failure that way.  Here's a proposed patch.
It relies on setenv() being available, but I think that's fine
because we link the ECPG test programs with libpgport.

regards, tom lane

diff --git a/src/interfaces/ecpg/test/connect/test5.pgc b/src/interfaces/ecpg/test/connect/test5.pgc
index e712fa8778..f7263935ce 100644
--- a/src/interfaces/ecpg/test/connect/test5.pgc
+++ b/src/interfaces/ecpg/test/connect/test5.pgc
@@ -18,6 +18,9 @@ exec sql begin declare section;
 	char *user="regress_ecpg_user1";
 exec sql end declare section;
 
+	/* disable GSSENC to ensure stability of connection failure reports */
+	setenv("PGGSSENCMODE", "disable", 1);
+
 	ECPGdebug(1, stderr);
 
 	exec sql connect to ecpg2_regression as main;
diff --git a/src/interfaces/ecpg/test/expected/connect-test5.c b/src/interfaces/ecpg/test/expected/connect-test5.c
index 6ae5b589de..a86a5e4331 100644
--- a/src/interfaces/ecpg/test/expected/connect-test5.c
+++ b/src/interfaces/ecpg/test/expected/connect-test5.c
@@ -38,37 +38,33 @@ main(void)
 #line 19 "test5.pgc"
 
 
+	/* disable GSSENC to ensure stability of connection failure reports */
+	setenv("PGGSSENCMODE", "disable", 1);
+
 	ECPGdebug(1, stderr);
 
 	{ ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); }
-#line 23 "test5.pgc"
+#line 26 "test5.pgc"
 
 	{ ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user2 encrypted password 'insecure'", ECPGt_EOIT, ECPGt_EORT);}
-#line 24 "test5.pgc"
+#line 27 "test5.pgc"
 
 	{ ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, "alter user regress_ecpg_user1 encrypted password 'connectpw'", ECPGt_EOIT, ECPGt_EORT);}
-#line 25 "test5.pgc"
+#line 28 "test5.pgc"
 
 	{ ECPGtrans(__LINE__, NULL, "commit");}
-#line 26 "test5.pgc"
+#line 29 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "CURRENT");}
-#line 27 "test5.pgc"
+#line 30 "test5.pgc"
   /* <-- "main" not specified */
 
 	strcpy(db, "ecpg2_regression");
 	strcpy(id, "main");
 	{ ECPGconnect(__LINE__, 0, db , NULL, NULL , id, 0); }
-#line 31 "test5.pgc"
-
-	{ ECPGdisconnect(__LINE__, id);}
-#line 32 "test5.pgc"
-
-
-	{ ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); }
 #line 34 "test5.pgc"
 
-	{ ECPGdisconnect(__LINE__, "main");}
+	{ ECPGdisconnect(__LINE__, id);}
 #line 35 "test5.pgc"
 
 
@@ -86,21 +82,21 @@ main(void)
 #line 41 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "" , "regress_ecpg_user2" , "insecure" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "ecpg2_regression" , NULL, NULL , "main", 0); }
 #line 43 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 44 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "" , "regress_ecpg_user2" , "insecure" , "main", 0); }
 #line 46 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 47 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
 #line 49 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
@@ -114,48 +110,55 @@ main(void)
 #line 53 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , user , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
 #line 55 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 56 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=latin1" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression" , user , "connectpw" , "main", 0); }
 #line 58 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 59 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://200.46.204.71/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/ecpg2_regression?connect_timeout=180 & client_encoding=latin1" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
 #line 61 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 62 "test5.pgc"
 
 
-	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/" , "regress_ecpg_user2" , "insecure" , "main", 0); }
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://200.46.204.71/ecpg2_regression" , "regress_ecpg_user1" , "connectpw" , "main", 0); }
 #line 64 "test5.pgc"
 
 	{ ECPGdisconnect(__LINE__, "main");}
 #line 65 "test5.pgc"
 
 
+	{ ECPGconnect(__LINE__, 0, "unix:postgresql://localhost/" , "regress_ecpg_user2" , "insecure" , "main", 0)

Re: Strangeness with UNIQUE indexes and UTF-8

2021-06-06 Thread Tom Lane

Omar Kilani  writes:
> How is anyone supposed to know about this issue?

We're working on infrastructure to help detect OS locale changes,
but it's not shipped yet.

regards, tom lane

Re: Move pg_attribute.attcompression to earlier in struct for reduced size?

2021-06-06 Thread Ranier Vilela

Em sáb., 5 de jun. de 2021 às 11:16, Ranier Vilela 
escreveu:

> > Alvaro Herrera  writes:
>
> > > Now, while this patch does seem to work correctly, it raises a number
> of
> > > weird cpluspluscheck warnings, which I think are attributable to the
> > > new macro definitions. I didn't look into it closely, but I suppose it
> > > should be fixable given sufficient effort:
> >
> > Didn't test, but the first one is certainly fixable by adding a cast,
> > and I guess the others might be as well.
>
> >I get no warnings with this one. I'm a bit wary of leaving
> >VARDATA_COMPRESSED_GET_EXTSIZE unchanged, but at least nothing in this
> >patch requires a cast there.
>
> Hi Alvaro.
>
> Please, would you mind testing with these changes.
> I'm curious to see if anything improves or not.
> 1. Add a const to the attr parameter.
> 2. Remove the cmid variable (and store it).
> 3. Add tail cut.
>
I think that
https://github.com/postgres/postgres/commit/e6241d8e030fbd2746b3ea3f44e728224298f35b#diff-640a50de37a0dc027d9d1c7239e34aed53b184a8ec6e1f653694e458376b19fa
still has space for improvements.

If attr->attcompression is invalid, it doesn't matter, it's better to
decompress.
Besides if it is invalid, currently default_toast_compression is not stored
in attr->attcompression.
I believe this version should be a little faster.
Since it won't double-scan the attributes if it's not really necessary.

regards,
Ranier Vilela


reduce_overhead-on-detoasteds_values.patch
Description: Binary data

Re: when the startup process doesn't

2021-06-06 Thread Justin Pryzby

On Fri, Jun 04, 2021 at 07:49:21PM +0530, Nitin Jadhav wrote:
> I have added the proper logs in all of the above scenarios.
> 
> Following is the sample log displayed during server startup when the
> time period is set to 10ms.
> 
> 2021-06-04 19:40:06.390 IST [51116] LOG:  Syncing data directory, elapsed 
> time: 14.165 ms,  current path: ./base/13892/16384_fsm
> 2021-06-04 19:40:06.399 IST [51116] LOG:  Syncing data directory completed 
> after 22.661 ms

|2021-06-04 19:40:07.222 IST [51116] LOG:  Performing crash recovery, elapsed 
time: 820.805 ms,  current LSN: 0/76F6F10
|2021-06-04 19:40:07.227 IST [51116] LOG:  invalid record length at 0/774E758: 
wanted 24, got 0
|2021-06-04 19:40:07.227 IST [51116] LOG:  redo done at 0/774E730 system usage: 
CPU: user: 0.77 s, system: 0.03 s, elapsed: 0.82 s

Should it show the rusage ?  It's shown at startup completion since 10a5b35a0,
so it seems strange not to show it here.

+   log_startup_process_progress("Syncing data directory", 
path, false);

I think the fsync vs syncfs paths should be distinguished: "Syncing data
directory (fsync)" vs "Syncing data directory (syncfs)".

+   {"log_min_duration_startup_process", PGC_SUSET, LOGGING_WHEN,

I think it should be PGC_SIGHUP, to allow changing it during runtime.
Obviously it has no effect except during startup, but the change will be
effective if the current process crashes.
See also: 
https://www.postgresql.org/message-id/20210526001359.ge3...@telsasoft.com

+extern void log_startup_process_progress(char *operation, void *data,
+ bool is_complete);

I think this should take an enum operation, rather than using strcmp() on it
later.  The enum values might be RECOVERY_START, RECOVERY_END, rather than
having a bool is_complete.

-- 
Justin

RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-06 Thread osumi.takami...@fujitsu.com

On Thursday, June 3, 2021 7:07 PM Amit Kapila  wrote:
> On Thu, Jun 3, 2021 at 9:18 AM osumi.takami...@fujitsu.com
>  wrote:
> > Thank you for providing the patch.
> > I have updated your patch to include some other viewpoints.
> >
> 
> I suggest creating a synchronous replication part of the patch for
> back-branches as well.
You are right. Please have a look at the attached patch-set.
Needless to say, the patch for HEAD has descriptions that depend on
the 2pc patch-set.


Best Regards,
Takamichi Osumi



HEAD_deadlock_documentation_of_logical_decoding_v03.patch
Description: HEAD_deadlock_documentation_of_logical_decoding_v03.patch


PG13_deadlock_documentation_of_logical_decoding_v03.patch
Description: PG13_deadlock_documentation_of_logical_decoding_v03.patch


PG12_deadlock_documentation_of_logical_decoding_v03.patch
Description: PG12_deadlock_documentation_of_logical_decoding_v03.patch


PG11_deadlock_documentation_of_logical_decoding_v03.patch
Description: PG11_deadlock_documentation_of_logical_decoding_v03.patch


PG10_deadlock_documentation_of_logical_decoding_v03.patch
Description: PG10_deadlock_documentation_of_logical_decoding_v03.patch

RE: locking [user] catalog tables vs 2pc vs logical rep

2021-06-06 Thread osumi.takami...@fujitsu.com

On Thursday, June 3, 2021 1:09 PM vignesh C  wrote:
> On Thu, Jun 3, 2021 at 9:18 AM osumi.takami...@fujitsu.com
>  wrote:
> > Thank you for providing the patch.
> > I have updated your patch to include some other viewpoints.
> >
> > I also included the description about TRUNCATE on user_catalog_table
> > in the patch. Please have a look at this patch.
> 
> 1) I was not able to generate html docs with the attached patch:
> logicaldecoding.sgml:1128: element sect1: validity error : Element...
Thank you for your review.
I fixed the patch to make it pass to generate html output.
Kindly have a look at the v03.

> 2) You could change hang to deadlock:
> +   logical decoding of published table within the same
> transaction leads to a hang.
Yes. I included your point. Thanks.


Best Regards,
Takamichi Osumi

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com




> 2021/06/04 17:16、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>  wrote:
>> 
>> 
>> 
>> 2021/06/04 12:28、Masahiko Sawada のメール:
>> 
>> 
>> Thank you for pointing it out. This idea has been proposed several
>> times and there were discussions. I'd like to summarize the proposed
>> ideas and those pros and cons before replying to your other comments.
>> 
>> There are 3 ideas. After backend both prepares all foreign transaction
>> and commit the local transaction,
>> 
>> 1. the backend continues attempting to commit all prepared foreign
>> transactions until all of them are committed.
>> 2. the backend attempts to commit all prepared foreign transactions
>> once. If an error happens, leave them for the resolver.
>> 3. the backend asks the resolver that launched per foreign server to
>> commit the prepared foreign transactions (and backend waits or doesn't
>> wait for the commit completion depending on the setting).
>> 
>> With ideas 1 and 2, since the backend itself commits all foreign
>> transactions the resolver process cannot be a bottleneck, and probably
>> the code can get more simple as backends don't need to communicate
>> with resolver processes.
>> 
>> However, those have two problems we need to deal with:
>> 
>> 
>> Thanks for sharing the summarize. I understood there are problems related to
>> FDW implementation.
>> 
>> First, users could get an error if an error happens during the backend
>> committing prepared foreign transaction but the local transaction is
>> already committed and some foreign transactions could also be
>> committed, confusing users. There were two opinions to this problem:
>> FDW developers should be responsible for writing FDW code such that
>> any error doesn't happen during committing foreign transactions, and
>> users can accept that confusion since an error could happen after
>> writing the commit WAL even today without this 2PC feature. For the
>> former point, I'm not sure it's always doable since even palloc()
>> could raise an error and it seems hard to require all FDW developers
>> to understand all possible paths of raising an error. And for the
>> latter point, that's true but I think those cases are
>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>> an error during committing prepared transactions is not low (e.g., by
>> network connectivity problem). I think we need to assume that that is
>> not a rare case.
>> 
>> 
>> Hmm… Sorry, I don’t have any good ideas now.
>> 
>> If anything, I’m on second side which users accept the confusion though
>> let users know a error happens before local commit is done or not is 
>> necessary
>> because if the former case, users will execute the same query again.
> 
> Yeah, users will need to remember the XID of the last executed
> transaction and check if it has been committed by pg_xact_status().
> 
>> 
>> 
>> The second problem is whether we can cancel committing foreign
>> transactions by pg_cancel_backend() (or pressing Ctl-c). If the
>> backend process commits prepared foreign transactions, it's FDW
>> developers' responsibility to write code that is interruptible. I’m
>> not sure it’s feasible for drivers for other databases.
>> 
>> 
>> Sorry, my understanding is not clear.
>> 
>> After all prepares are done, the foreign transactions will be committed.
>> So, does this mean that FDW must leave the unresolved transaction to the 
>> transaction
>> resolver and show some messages like “Since the transaction is already 
>> committed,
>> the transaction will be resolved in background" ?
> 
> I think this would happen after the backend cancels COMMIT PREPARED.
> To be able to cancel an in-progress query the backend needs to accept
> the interruption and send the cancel request. postgres_fdw can do that
> since libpq supports sending a query and waiting for the result but
> I’m not sure about other drivers.

Thanks, I understood that handling this issue is not scope of the 2PC feature 
as Tsunakawa-san and you said, 

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: Transactions involving multiple postgres foreign servers, take 2

2021-06-06 Thread ikeda...@oss.nttdata.com




> 2021/06/04 21:38、Masahiko Sawada のメール:
> 
> On Fri, Jun 4, 2021 at 5:16 PM Masahiko Sawada  wrote:
>> 
>> On Fri, Jun 4, 2021 at 3:58 PM ikeda...@oss.nttdata.com
>>  wrote:
>>> 
>>> 
>>> 
>>> 2021/06/04 12:28、Masahiko Sawada のメール:
>>> 
>>> 
>>> Thank you for pointing it out. This idea has been proposed several
>>> times and there were discussions. I'd like to summarize the proposed
>>> ideas and those pros and cons before replying to your other comments.
>>> 
>>> There are 3 ideas. After backend both prepares all foreign transaction
>>> and commit the local transaction,
>>> 
>>> 1. the backend continues attempting to commit all prepared foreign
>>> transactions until all of them are committed.
>>> 2. the backend attempts to commit all prepared foreign transactions
>>> once. If an error happens, leave them for the resolver.
>>> 3. the backend asks the resolver that launched per foreign server to
>>> commit the prepared foreign transactions (and backend waits or doesn't
>>> wait for the commit completion depending on the setting).
>>> 
>>> With ideas 1 and 2, since the backend itself commits all foreign
>>> transactions the resolver process cannot be a bottleneck, and probably
>>> the code can get more simple as backends don't need to communicate
>>> with resolver processes.
>>> 
>>> However, those have two problems we need to deal with:
>>> 
>>> 
>>> Thanks for sharing the summarize. I understood there are problems related to
>>> FDW implementation.
>>> 
>>> First, users could get an error if an error happens during the backend
>>> committing prepared foreign transaction but the local transaction is
>>> already committed and some foreign transactions could also be
>>> committed, confusing users. There were two opinions to this problem:
>>> FDW developers should be responsible for writing FDW code such that
>>> any error doesn't happen during committing foreign transactions, and
>>> users can accept that confusion since an error could happen after
>>> writing the commit WAL even today without this 2PC feature. For the
>>> former point, I'm not sure it's always doable since even palloc()
>>> could raise an error and it seems hard to require all FDW developers
>>> to understand all possible paths of raising an error. And for the
>>> latter point, that's true but I think those cases are
>>> should-not-happen cases (i.g., rare cases) whereas the likelihood of
>>> an error during committing prepared transactions is not low (e.g., by
>>> network connectivity problem). I think we need to assume that that is
>>> not a rare case.
>>> 
>>> 
>>> Hmm… Sorry, I don’t have any good ideas now.
>>> 
>>> If anything, I’m on second side which users accept the confusion though
>>> let users know a error happens before local commit is done or not is 
>>> necessary
>>> because if the former case, users will execute the same query again.
>> 
>> Yeah, users will need to remember the XID of the last executed
>> transaction and check if it has been committed by pg_xact_status().
> 
> As the second idea, can we send something like a hint along with the
> error (or send a new type of error) that indicates the error happened
> after the transaction commit so that the client can decide whether or
> not to ignore the error? That way, we can deal with the confusion led
> by an error raised after the local commit by the existing post-commit
> cleanup routines (and post-commit xact callbacks) as well as by FDW’s
> commit prepared routine.


I think your second idea is better because it’s easier for users to know what 
error happens and there is nothing users should do. Since the focus of "hint” 
is how to fix the problem, is it appropriate to use "context”?  

FWIF, I took a fast look to elog.c and I found there is “error_context_stack”. 
So, why don’t you add the context which shows like "the transaction fate is 
decided to COMMIT (or ROLLBACK). So, even if error happens, the transaction 
will be resolved in background” after the local commit?


Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: ALTER SUBSCRIPTION REFRESH PUBLICATION has default copy_data = true?

2021-06-06 Thread Peter Smith

On Thu, Jun 3, 2021 at 12:52 AM Peter Eisentraut
 wrote:
> > Is that a deliberate functionality, or is it a quirk / bug?
>
> copy_data is an option of the action, not a property of the
> subscription.  The difference between those two things is admittedly not
>   clearly (at all?) documented.
>
> However, I'm not sure whether creating a subscription that always
> defaults to copy_data=false for tables added in the future is useful
> functionality, so I think the current behavior is okay.

...

On Thu, Jun 3, 2021 at 12:53 AM Bharath Rupireddy
 wrote:
> > Is that a deliberate functionality, or is it a quirk / bug?
>
> I don't think it's a bug

...

OK. Thanks to both of you for sharing your thoughts about it.

--
Kind Regards,
Peter Smith
Fujitsu Australia

Re: please update ps display for recovery checkpoint

2021-06-06 Thread Justin Pryzby

On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote:
> On Sat, Dec 12, 2020 at 12:41:25AM +, Bossart, Nathan wrote:
> > On 12/11/20, 4:00 PM, "Michael Paquier"  wrote:
> >> My counter-proposal is like the attached, with the set/reset part not
> >> reversed this time, and the code indented :p
> > 
> > Haha.  LGTM.
> 
> Thanks.  I have applied this one, then.

To refresh: commit df9274adf adds "checkpoint" info to "ps", which previously
continued to say "recovering N" even after finishing WAL replay, and
throughout the checkpoint.

Now, I wonder whether the startup process should also include some detail about
"syncing data dir".  It's possible to strace the process to see what it's
doing, but most DBA would probably not know that, and it's helpful to know the
status of recovery and what part of recovery is slow: sync, replay, or
checkpoint.  commit df9274adf improved the situation between replay and
ckpoint, but it's still not clear what "postgres: startup" means before replay
starts.

There's some interaction between Thomas' commit 61752afb2 and
recovery_init_sync_method - if we include a startup message, it should
distinguish between "syncing data dirs (fsync)" and (syncfs).

Putting this into fd.c seems to assume that we can clobber "ps", which is fine
when called by StartupXLOG(), but it's a public interface, so I'm not sure if
it's okay to assume that's the only caller.  Maybe it should check if
MyAuxProcType == B_STARTUP.

-- 
Justin
>From a143c40fb2bad96d45f5cc3e22f70dd0e2d6b5c6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 6 Jun 2021 16:28:15 -0500
Subject: [PATCH] Show "syncing data directories" in ps display..

See also: df9274adf, which shows checkpoint output in ps, and 61752afb2 which
adds syncfs.
---
 src/backend/storage/file/fd.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index e8cd7ef088..692d7f16b7 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -99,6 +99,7 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "utils/guc.h"
+#include "utils/ps_status.h"
 #include "utils/resowner_private.h"

 /* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */
@@ -3374,6 +3375,8 @@ SyncDataDirectory(void)
 		 * directories.
 		 */

+		set_ps_display("syncing data directory (syncfs)");
+
 		/* Sync the top level pgdata directory. */
 		do_syncfs(".");
 		/* If any tablespaces are configured, sync each of those. */
@@ -3392,10 +3395,13 @@ SyncDataDirectory(void)
 		/* If pg_wal is a symlink, process that too. */
 		if (xlog_is_symlink)
 			do_syncfs("pg_wal");
+
+		set_ps_display("");
 		return;
 	}
 #endif			/* !HAVE_SYNCFS */

+	set_ps_display("syncing data directory (fsync)");
 	/*
 	 * If possible, hint to the kernel that we're soon going to fsync the data
 	 * directory and its contents.  Errors in this step are even less
@@ -3421,6 +3427,8 @@ SyncDataDirectory(void)
 	if (xlog_is_symlink)
 		walkdir("pg_wal", datadir_fsync_fname, false, LOG);
 	walkdir("pg_tblspc", datadir_fsync_fname, true, LOG);
+
+	set_ps_display("");
 }

 /*
-- 
2.17.0

contrib/pg_visibility fails regression under CLOBBER_CACHE_ALWAYS

2021-06-06 Thread Tom Lane

husky just reported $SUBJECT:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=husky&dt=2021-06-05%2013%3A42%3A17

and I find I can reproduce that locally:

diff -U3 /home/postgres/pgsql/contrib/pg_visibility/expected/pg_visibility.out 
/home/postgres/pgsql/contrib/pg_visibility/results/pg_visibility.out
--- /home/postgres/pgsql/contrib/pg_visibility/expected/pg_visibility.out   
2021-01-20 11:12:24.854346717 -0500
+++ /home/postgres/pgsql/contrib/pg_visibility/results/pg_visibility.out
2021-06-06 22:12:07.948890104 -0400
@@ -215,7 +215,8 @@
  0 | f   | f
  1 | f   | f
  2 | t   | t
-(3 rows)
+ 3 | t   | t
+(4 rows)
 
 select * from pg_check_frozen('copyfreeze');
  t_ctid 
@@ -235,7 +236,8 @@
  0 | t   | t
  1 | f   | f
  2 | t   | t
-(3 rows)
+ 3 | t   | t
+(4 rows)
 
 select * from pg_check_frozen('copyfreeze');
  t_ctid 


The test cases that are failing date back to January (7db0cd2145f),
so I think this is some side-effect of a recent commit, but I have
no idea which one.

regards, tom lane

Re: Decoding speculative insert with toast leaks memory

2021-06-06 Thread Amit Kapila

On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila  wrote:
>
> On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar  wrote:
> >
> > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila  wrote:
> > >
> > > I think the same relation case might not create a problem because it
> > > won't find the entry for it in the toast_hash, so it will return from
> > > there but the other two problems will be there.
> >
> > Right
> >
> > So, one idea could be
> > > to just avoid those two cases (by simply adding return for those
> > > cases) and still we can rely on toast clean up on the next
> > > insert/update/delete. However, your approach looks more natural to me
> > > as that will allow us to clean up memory in all cases instead of
> > > waiting till the transaction end. So, I still vote for going with your
> > > patch's idea of cleaning at spec_abort but I am fine if you and others
> > > decide not to process spec_abort message. What do you think? Tomas, do
> > > you have any opinion on this matter?
> >
> > I agree that processing with spec abort looks more natural and ideally
> > the current code expects it to be getting cleaned after the change,
> > that's the reason we have those assertions and errors.
> >

Okay, so, let's go with that approach. I have thought about whether it
creates any problem in back-branches but couldn't think of any
primarily because we are not going to send anything additional to
plugin/subscriber. Do you see any problems with back branches if we go
with this approach?

> >  OTOH I agree
> > that we can just return from those conditions because now we know that
> > with the current code those situations are possible.  My vote is with
> > handling the spec abort option (Option1) because that looks more
> > natural way of handling these issues and we also don't have to clean
> > up the hash in "ReorderBufferReturnTXN" if no followup change after
> > spec abort.
> >
>
> Even, if we decide to go with spec_abort approach, it might be better
> to still keep the toastreset call in ReorderBufferReturnTXN so that it
> can be freed in case of error.
>

Please take care of this as well.

-- 
With Regards,
Amit Kapila.

Re: Decoding speculative insert with toast leaks memory

2021-06-06 Thread Dilip Kumar

On Mon, 7 Jun 2021 at 8:30 AM, Amit Kapila  wrote:

> On Wed, Jun 2, 2021 at 11:52 AM Amit Kapila 
> wrote:
> >
> > On Wed, Jun 2, 2021 at 11:38 AM Dilip Kumar 
> wrote:
> > >
> > > On Wed, Jun 2, 2021 at 11:25 AM Amit Kapila 
> wrote:
> > > >
> > > > I think the same relation case might not create a problem because it
> > > > won't find the entry for it in the toast_hash, so it will return from
> > > > there but the other two problems will be there.
> > >
> > > Right
> > >
> > > So, one idea could be
> > > > to just avoid those two cases (by simply adding return for those
> > > > cases) and still we can rely on toast clean up on the next
> > > > insert/update/delete. However, your approach looks more natural to me
> > > > as that will allow us to clean up memory in all cases instead of
> > > > waiting till the transaction end. So, I still vote for going with
> your
> > > > patch's idea of cleaning at spec_abort but I am fine if you and
> others
> > > > decide not to process spec_abort message. What do you think? Tomas,
> do
> > > > you have any opinion on this matter?
> > >
> > > I agree that processing with spec abort looks more natural and ideally
> > > the current code expects it to be getting cleaned after the change,
> > > that's the reason we have those assertions and errors.
> > >
>
> Okay, so, let's go with that approach. I have thought about whether it
> creates any problem in back-branches but couldn't think of any
> primarily because we are not going to send anything additional to
> plugin/subscriber. Do you see any problems with back branches if we go
> with this approach?


I will check this and let you know.


> > >  OTOH I agree
> > > that we can just return from those conditions because now we know that
> > > with the current code those situations are possible.  My vote is with
> > > handling the spec abort option (Option1) because that looks more
> > > natural way of handling these issues and we also don't have to clean
> > > up the hash in "ReorderBufferReturnTXN" if no followup change after
> > > spec abort.
> > >
> >
> > Even, if we decide to go with spec_abort approach, it might be better
> > to still keep the toastreset call in ReorderBufferReturnTXN so that it
> > can be freed in case of error.
> >
>
> Please take care of this as well.


Ok

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-06 Thread vignesh C

On Mon, Jun 7, 2021 at 4:18 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, June 3, 2021 7:07 PM Amit Kapila  wrote:
> > On Thu, Jun 3, 2021 at 9:18 AM osumi.takami...@fujitsu.com
> >  wrote:
> > > Thank you for providing the patch.
> > > I have updated your patch to include some other viewpoints.
> > >
> >
> > I suggest creating a synchronous replication part of the patch for
> > back-branches as well.
> You are right. Please have a look at the attached patch-set.
> Needless to say, the patch for HEAD has descriptions that depend on
> the 2pc patch-set.
>

1)
+ 
+   The use of any command to take an ACCESS EXCLUSIVE lock on
[user] catalog tables
+   can cause the deadlock of logical decoding in synchronous
mode. This means that
+   at the transaction commit or prepared transaction, the command
hangs or the server
+   becomes to block any new connections. To avoid this, users
must refrain from such
+   operations.
+ 

Can we change it something like:
Logical decoding of transactions in synchronous replication mode
requires access to system tables and/or user catalog tables, hence
user should refrain from taking exclusive lock on system tables and/or
user catalog tables or refrain from executing commands like cluster
command which will take exclusive lock on system tables internally. If
not the transaction will get blocked at commit/prepare time because of
a deadlock.

2) I was not sure if we should include the examples below or the above
para is enough, we can hear from others and retain it if required:
+ 
+   When COMMIT is conducted for a transaction that has
+   issued explicit LOCK on
pg_class
+   with logical decoding, the deadlock occurs. Also, committing
one that runs
+   CLUSTER pg_class is another
+   deadlock scenario.
+ 
+
+ 
+   Similarly, executing PREPARE TRANSACTION
+   after LOCK command on
pg_class and
+   logical decoding of published table within the same
transaction leads to the deadlock.
+   Clustering pg_trigger by
CLUSTER command
+   brings about the deadlock as well, when published table has a
trigger and any operations
+   that will be decoded are conducted on the same table.
+ 
+
+ 
+   The deadlock can happen when users execute TRUNCATE
+   on user_catalog_table under the condition that output plugin
have reference to it.
  

Regards,
Vignesh

Re: Asynchronous Append on postgres_fdw nodes.

2021-06-06 Thread Etsuro Fujita

On Fri, Jun 4, 2021 at 7:26 PM Etsuro Fujita  wrote:
> On Tue, Jun 1, 2021 at 6:30 PM Etsuro Fujita  wrote:
> > On Fri, May 28, 2021 at 10:53 PM Etsuro Fujita  
> > wrote:
> > > On Fri, May 28, 2021 at 5:29 PM Kyotaro Horiguchi
> > >  wrote:
> > > > The patch drops some "= NULL" (initial) initializations when
> > > > nasyncplans == 0. AFAICS makeNode() fills the returned memory with
> > > > zeroes but I'm not sure it is our convention to omit the
> > > > intializations.
> > >
> > > I’m not sure, but I think we omit it in some cases; for example, we
> > > don’t set as_valid_subplans to NULL explicitly in ExecInitAppend(), if
> > > do_exec_prune=true.
> >
> > Ok, I think it would be a good thing to initialize the
> > pointers/variables to NULL/zero explicitly, so I updated the patch as
> > such.  Barring objections, I'll get the patch committed in a few days.
>
> I'm replanning to push this early next week for some reason.

Pushed.  I will close this in the open items list for v14.

Best regards,
Etsuro Fujita

Re: Race condition in recovery?

2021-06-06 Thread Kyotaro Horiguchi

At Fri, 4 Jun 2021 10:56:12 -0400, Robert Haas  wrote in 
> On Fri, Jun 4, 2021 at 5:25 AM Kyotaro Horiguchi
>  wrote:
> > I think that's right. And the test script detects the issue for me
> > both on Linux but doesn't work for Windows.
> >
> > '"C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files"' is 
> > not recognized as an internal command or external command ..
> 
> Hmm, that's a problem. Can you try the attached version?

Unfortunately no. The backslashes in the binary path need to be
escaped. (taken from PostgresNode.pm:1008)

> (my $perlbin = $^X) =~ s{\\}{}g if ($TestLib::windows_os);
> $node_primary->append_conf(
>   'postgresql.conf', qq(
> archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" 
> "$archivedir_primary/%f"'
> ));

This works for me.

> > + # clean up
> > + $node_primary->teardown_node;
> > + $node_standby->teardown_node;
> > + $node_cascade->teardown_node;
> >
> > I don't think explicit teardown is useless as the final cleanup.
> 
> I don't know what you mean by this. If it's not useless, good, because
> we're doing it. Or do you mean that you think it is useless, and we
> should remove it?

Ugh! Sorry. I meant "The explicit teardowns are useless". That's not
harmful but it is done by PostgresNode.pm automatically(implicitly)
and we don't do that in the existing scripts.

> > By the way the attached patch is named as "Fix-corner-case..." but
> > doesn't contain the fix. Is it intentional?
> 
> No, that was a goof.

As I said upthread the relationship between receiveTLI and
recoveryTargetTLI is not confirmed yet at the point.
findNewestTimeLine() simply searches for the history file with the
largest timeline id so the returned there's a case where the timeline
id that the function returns is not a future of the latest checkpoint
TLI.  I think that the fact that rescanLatestTimeLine() checks the
relationship is telling us that we need to do the same in the path as
well.

In my previous proposal, it is done just after the line the patch
touches but it can be in the if (fetching_ckpt) branch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Race condition in recovery?

2021-06-06 Thread Kyotaro Horiguchi

Sorry, some extra words are left alone.

At Mon, 07 Jun 2021 13:57:35 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> As I said upthread the relationship between receiveTLI and
> recoveryTargetTLI is not confirmed yet at the point.
- findNewestTimeLine() simply searches for the history file with the
- largest timeline id so the returned there's a case where the timeline
+ findNewestTimeLine() simply searches for the history file with the
+ largest timeline id so there's a case where the timeline
> id that the function returns is not a future of the latest checkpoint
> TLI.  I think that the fact that rescanLatestTimeLine() checks the
> relationship is telling us that we need to do the same in the path as
> well.
> 
> In my previous proposal, it is done just after the line the patch
> touches but it can be in the if (fetching_ckpt) branch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: locking [user] catalog tables vs 2pc vs logical rep

2021-06-06 Thread Amit Kapila

On Mon, Jun 7, 2021 at 9:26 AM vignesh C  wrote:
>
> On Mon, Jun 7, 2021 at 4:18 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Thursday, June 3, 2021 7:07 PM Amit Kapila  
> > wrote:
> > > On Thu, Jun 3, 2021 at 9:18 AM osumi.takami...@fujitsu.com
> > >  wrote:
> > > > Thank you for providing the patch.
> > > > I have updated your patch to include some other viewpoints.
> > > >
> > >
> > > I suggest creating a synchronous replication part of the patch for
> > > back-branches as well.
> > You are right. Please have a look at the attached patch-set.
> > Needless to say, the patch for HEAD has descriptions that depend on
> > the 2pc patch-set.
> >
>
> 1)
> + 
> +   The use of any command to take an ACCESS EXCLUSIVE lock on
> [user] catalog tables
> +   can cause the deadlock of logical decoding in synchronous
> mode. This means that
> +   at the transaction commit or prepared transaction, the command
> hangs or the server
> +   becomes to block any new connections. To avoid this, users
> must refrain from such
> +   operations.
> + 
>
> Can we change it something like:
> Logical decoding of transactions in synchronous replication mode
> requires access to system tables and/or user catalog tables, hence
> user should refrain from taking exclusive lock on system tables and/or
> user catalog tables or refrain from executing commands like cluster
> command which will take exclusive lock on system tables internally. If
> not the transaction will get blocked at commit/prepare time because of
> a deadlock.
>

I think this is better than what the patch has proposed. I suggest
minor modifications to your proposed changes. Let's write the above
para as: "In synchronous replication setup, a deadlock can happen, if
the transaction has locked [user] catalog tables exclusively. This is
because logical decoding of transactions can lock catalog tables to
access them. To avoid this users must refrain from taking an exclusive
lock on [user] catalog tables. This can happen in the following ways:"

+ 
+   When COMMIT is conducted for a transaction that has
+   issued explicit LOCK on
pg_class
+   with logical decoding, the deadlock occurs. Also, committing
one that runs
+   CLUSTER pg_class is another
+   deadlock scenario.

The above points need to be mentioned in the  fashion.
See  for an example. I think
the above point can be split as follows.

User has issued an explicit LOCK on
pg_class (or any other catalog table) in a
transaction. Now when we try to decode such a transaction, a deadlock
can happen.

Similarly, write separate points for Cluster and Truncate.

One more comment is that for HEAD, first just create a patch with
synchronous replication-related doc changes and then write a separate
patch for prepared transactions.

> 2) I was not sure if we should include the examples below or the above
> para is enough,
>

It is better to give examples but let's use the format as I suggested above.

-- 
With Regards,
Amit Kapila.

Re: Decoding of two-phase xacts missing from CREATE_REPLICATION_SLOT command

2021-06-06 Thread Amit Kapila

On Fri, Jun 4, 2021 at 2:29 PM Ajin Cherian  wrote:
>
> On Fri, Jun 4, 2021 at 1:06 PM Amit Kapila  wrote:
>
> > I think we can try but not sure if we can get it by then. So, here is
> > my suggestion:
> > a. remove the change in CreateReplicationSlotCmd
> > b. prepare the patches for protocol change and pg_recvlogical. This
> > will anyway include the change we removed as part of (a).
>
>
> Attaching two patches:
> 1. Removes two-phase from CreateReplicationSlotCmd
>

Pushed the above patch.

-- 
With Regards,
Amit Kapila.

Re: Fast COPY FROM based on batch insert

2021-06-06 Thread Andrey Lepikhov


On 4/6/21 13:45, tsunakawa.ta...@fujitsu.com wrote:

From: Andrey Lepikhov 

We still have slow 'COPY FROM' operation for foreign tables in current master.
Now we have a foreign batch insert operation And I tried to rewrite the patch 
[1]
with this machinery.


I haven't looked at the patch, but nice performance.

However, I see the following problems.  What do you think about them?

I agree with your fears.
Think about this patch as an intermediate step on the way to fast COPY 
FROM. This patch contains all logic of the previous patch, except of 
transport machinery (bulk insertion api).
It may be simpler to understand advantages of proposed 'COPY' FDW API 
having committed 'COPY FROM ...' feature based on the bulk insert FDW API.


--
regards,
Andrey Lepikhov
Postgres Professional

Re: Multiple hosts in connection string failed to failover in non-hot standby mode

2021-06-06 Thread Michael Paquier

On Sun, Jun 06, 2021 at 05:27:49PM -0400, Tom Lane wrote:
> It seems like nobody's terribly interested in figuring out why
> pg_GSS_have_cred_cache() is misbehaving on Windows.

I have been investigating that for a couple of hours in total, but
nothing to report yet.

> So I took
> a look at disabling GSSENC in these test cases to try to silence
> hamerkop's test failure that way.  Here's a proposed patch.
> It relies on setenv() being available, but I think that's fine
> because we link the ECPG test programs with libpgport.

No, that's not it.  The compilation of the tests happens when
triggering the tests as of ecpgcheck() in vcregress.pl so I think that
this is going to fail.  This requires at least the addition of a
reference to libpgport in ecpg_regression.proj, perhaps more.
--
Michael


signature.asc
Description: PGP signature

Re: Duplicate history file?

2021-06-06 Thread Tatsuro Yamada


Hi Horiguchi-san,


Regarding "test ! -f",
I am wondering how many people are using the test command for
archive_command. If I remember correctly, the guide provided by
NTT OSS Center that we are using does not recommend using the test
command.


I think, as the PG-REX documentation says, the simple cp works well as
far as the assumption of PG-REX - no double failure happenes, and
following the instruction - holds.



I believe that this assumption started to be wrong after
archive_mode=always was introduced. As far as I can tell, it doesn't
happen when it's archive_mode=on.



On the other hand, I found that the behavior happens more generally.

If a standby with archive_mode=always craches, it starts recovery from
the last checkpoint. If the checkpoint were in a archived segment, the
restarted standby will fetch the already-archived segment from archive
then fails to archive it. (The attached).

So, your fear stated upthread is applicable for wider situations. The
following suggestion is rather harmful for the archive_mode=always
setting.

https://www.postgresql.org/docs/14/continuous-archiving.html

The archive command should generally be designed to refuse to
overwrite any pre-existing archive file. This is an important safety
feature to preserve the integrity of your archive in case of
administrator error (such as sending the output of two different
servers to the same archive directory).


I'm not sure how we should treat this..  Since archive must store
files actually applied to the server data, just being already archived
cannot be the reason for omitting archiving.  We need to make sure the
new file is byte-identical to the already-archived version. We could
compare just *restored* file to the same file in pg_wal but it might
be too much of penalty for for the benefit. (Attached second file.)



Thanks for creating the patch!

 

Otherwise the documentation would need someting like the following if
we assume the current behavior.


The archive command should generally be designed to refuse to
overwrite any pre-existing archive file. This is an important safety
feature to preserve the integrity of your archive in case of
administrator error (such as sending the output of two different
servers to the same archive directory).

+ For standby with the setting archive_mode=always, there's a case where
+ the same file is archived more than once.  For safety, it is
+ recommended that when the destination file exists, the archive_command
+ returns zero if it is byte-identical to the source file.



Agreed.
That is same solution as I mentioned earlier.
If possible, it also would better to write it postgresql.conf (that might
be overkill?!).


Regards,
Tatsuro Yamada

60 matches

Mail list logo