/src/include/access/htup_details.h some comments kind of confusing....

2023-07-02 Thread jian he
Hi, there...

drop table infomask_test;
CREATE TABLE infomask_test(acc_no integer PRIMARY KEY,amount
numeric,misc  text);
INSERT INTO infomask_test VALUES (1, 100.00,default), (2,
200.00,repeat('abc',700));

BEGIN;
SELECT  acc_no,ctid,xmin,xmax FROM infomask_test WHERE acc_no = 1 FOR KEY SHARE;
SELECT  acc_no,ctid,xmin,xmax FROM infomask_test WHERE acc_no = 2 FOR SHARE;

select t_ctid, raw_flags, combined_flags,t_xmin,t_xmax
FROMheap_page_items(get_raw_page('infomask_test', 0))
,LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2)
order by t_ctid;

 t_ctid |  raw_flags
|combined_flags|
t_xmin | t_xmax
+--+--++
 (0,1)  | 
{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_KEYSHR_LOCK,HEAP_XMAX_LOCK_ONLY,HEAP_XMIN_COMMITTED}
   | {}|  25655 |  25656
 (0,2)  | 
{HEAP_HASVARWIDTH,HEAP_XMAX_KEYSHR_LOCK,HEAP_XMAX_EXCL_LOCK,HEAP_XMAX_LOCK_ONLY,HEAP_XMIN_COMMITTED}
| {HEAP_XMAX_SHR_LOCK} |  25655 |  25656

select  acc_no,ctid,xmin,xmax from infomask_test;
 acc_no | ctid  | xmin  | xmax
+---+---+---
  1 | (0,1) | 25655 | 25656
  2 | (0,2) | 25655 | 25656
(2 rows)
rollback;
--
/main/postgres/src/include/access/htup_details.h:

#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */

while manual:
FOR SHARE: Behaves similarly to FOR NO KEY UPDATE, except that it
acquires a shared lock rather than exclusive lock on each retrieved
row. A shared lock blocks other transactions from performing UPDATE,
DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows,
but it does not prevent them from performing SELECT FOR SHARE or
SELECT FOR KEY SHARE.

I failed to distinguish/reconcile between exclusive locker (in source
code comment) and shared lock (in manual).
---
aslo in /src/include/access/htup_details.h

#define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */

personally I found this comment kind of confusing. Trigger concept old
table, the new table is very intuitive.




pg_basebackup check vs Windows file path limits

2023-07-02 Thread Andrew Dunstan


The buildfarm animal fairywren has been failing the tests for 
pg_basebackup because it can't create a file with a path longer than 255 
chars. This has just been tripped because for release 16 it's running 
TAP tests, and the branch name is part of the file path, and 
"REL_16_STABLE" is longer than "HEAD". I did think of chdir'ing into the 
directory to create the file, but experimentation shows that doesn't 
solve matters. I also adjusted the machine's settings related to long 
file names, but to no avail, so for now I propose to reduce slightly the 
name of the long file so it still exercises the check for file names 
longer than 100 but doesn't trip this up on fairywren. But that's a 
bandaid. I don't have a good solution for now.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Fdw batch insert error out when set batch_size > 65535

2023-07-02 Thread Tom Lane
Andres Freund  writes:
> On 2021-06-11 18:44:28 -0400, Tom Lane wrote:
>> I suggest what we do is leave it in place for long enough to get
>> a round of reports from those slow animals, and then (assuming
>> those reports are positive) drop the test.

> I think two years later is long enough to have some confidence in this being
> fixed?

+1, time to drop it (in the back branches too).

regards, tom lane




Re: Fdw batch insert error out when set batch_size > 65535

2023-07-02 Thread Tomas Vondra
On 7/2/23 15:23, Tom Lane wrote:
> Andres Freund  writes:
>> On 2021-06-11 18:44:28 -0400, Tom Lane wrote:
>>> I suggest what we do is leave it in place for long enough to get
>>> a round of reports from those slow animals, and then (assuming
>>> those reports are positive) drop the test.
> 
>> I think two years later is long enough to have some confidence in this being
>> fixed?
> 
> +1, time to drop it (in the back branches too).
> 

OK, will do (unless someone else wants to handle this) on Monday.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Extend ALTER OPERATOR to support adding commutator, negator, hashes, and merges

2023-07-02 Thread Tommy Pavlicek
On Fri, Jun 23, 2023 at 12:21 PM Tom Lane  wrote:
>
> Tommy Pavlicek  writes:
> > I've added a single patch here: https://commitfest.postgresql.org/43/4389/
>
> > It wasn't obvious whether I should create a second commitfest entry
> > because I've included 2 patches so I've just done 1 to begin with. On
> > that note, is it preferred here to split patches of this size into
> > separate patches, and if so, additionally, separate threads?
>
> No, our commitfest infrastructure is unable to deal with patches that have
> interdependencies unless they're presented in a single email.  So just use
> one thread, and be sure to attach all the patches each time.
>
> (BTW, while you seem to have gotten away with it so far, it's usually
> advisable to name the patch files like 0001-foo.patch, 0002-bar.patch,
> etc, to make sure the cfbot understands what order to apply them in.)
>
> regards, tom lane

Thanks.

I've attached a new version of the ALTER OPERATOR patch that allows
no-ops. It should be ready to review now.


0001-create_op_fixes_v1.patch
Description: Binary data


0002-alter_op_v2.patch
Description: Binary data


Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-07-02 Thread vignesh C
Hi,

Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
completion of alter default privileges like the below statement:
ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM dba1;

2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
public FOR " like in below statement:
ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
ON TABLES TO PUBLIC;

3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
REVOKE " like in below statement:
alter default privileges revoke grant option for select ON tables FROM dba1;

4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
column-name SET" like in:
ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;

Attached patch has the changes for the same.

Regards,
Vignesh
From de80163a02d5d528402b3547bec09f5207ead9d2 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Jul 2023 19:35:46 +0530
Subject: [PATCH 1/2] Fix missing tab completion in "ALTER DEFAULT PRIVILEGES"

GRANT, REVOKE and FOR USER keyword was not displayed in tab completion of alter default prvileges like the below statement:
ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM vignesh;

USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR " like in below statement:
ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER vignesh GRANT INSERT ON TABLES TO PUBLIC;

"FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES REVOKE " like in below statement:
alter default privileges revoke grant option for select ON tables FROM testdba;
---
 src/bin/psql/tab-complete.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 935bb9bd95..73502b0b95 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2137,10 +2137,15 @@ psql_completion(const char *text, int start, int end)
 
 	/* ALTER DEFAULT PRIVILEGES */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES"))
-		COMPLETE_WITH("FOR ROLE", "IN SCHEMA");
+		COMPLETE_WITH("FOR", "GRANT", "IN SCHEMA", "REVOKE");
 	/* ALTER DEFAULT PRIVILEGES FOR */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "FOR"))
-		COMPLETE_WITH("ROLE");
+		COMPLETE_WITH("ROLE", "USER");
+	/* ALTER DEFAULT PRIVILEGES REVOKE */
+	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "REVOKE"))
+		COMPLETE_WITH("SELECT", "INSERT", "UPDATE", "DELETE", "TRUNCATE",
+	  "REFERENCES", "TRIGGER", "CREATE", "EXECUTE", "USAGE",
+	  "MAINTAIN", "ALL", "GRANT OPTION FOR");
 	/* ALTER DEFAULT PRIVILEGES IN */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "IN"))
 		COMPLETE_WITH("SCHEMA");
@@ -2151,11 +2156,11 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER DEFAULT PRIVILEGES IN SCHEMA ... */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "IN", "SCHEMA",
 	 MatchAny))
-		COMPLETE_WITH("GRANT", "REVOKE", "FOR ROLE");
+		COMPLETE_WITH("GRANT", "REVOKE", "FOR");
 	/* ALTER DEFAULT PRIVILEGES IN SCHEMA ... FOR */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "IN", "SCHEMA",
 	 MatchAny, "FOR"))
-		COMPLETE_WITH("ROLE");
+		COMPLETE_WITH("ROLE", "USER");
 	/* ALTER DEFAULT PRIVILEGES FOR ROLE|USER ... IN SCHEMA ... */
 	/* ALTER DEFAULT PRIVILEGES IN SCHEMA ... FOR ROLE|USER ... */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "FOR", "ROLE|USER",
-- 
2.34.1

From 7ff5cd41e5c1a33bcb91e1d32537556cb9808923 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Jul 2023 19:53:50 +0530
Subject: [PATCH 2/2] Fix missing tab completion in "ALTER TABLE table-name
 ALTER COLUMN column-name SET"

"DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
column-name SET" lke in:
ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
---
 src/bin/psql/tab-complete.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 73502b0b95..c71235462a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2488,7 +2488,7 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER TABLE ALTER [COLUMN]  SET */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny, "SET") ||
 			 Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny, "SET"))
-		COMPLETE_WITH("(", "COMPRESSION", "DEFAULT", "GENERATED", "NOT NULL", "STATISTICS", "STORAGE",
+		COMPLETE_WITH("(", "COMPRESSION", "DATA TYPE", "DEFAULT", "GENERATED", "NOT NULL", "STATISTICS", "STORAGE",
 		/* a subset of ALTER SEQUENCE options */
 	  "INCREMENT", "MINVALUE", "MAXVALUE", "START", "NO", "CACHE", "CYCLE");
 	/* ALTER TABLE ALTER [COLUMN]  SE

010_database.pl fails on openbsd w/ LC_ALL=LANG=C

2023-07-02 Thread Andres Freund
Hi,

I was rebasing my meson tree, which has more OSs added to CI, and noticed that
010_database.pl started failing on openbsd recently-ish, without the CI
environment for that having changed.

The tests passed on openbsd when my tree was based on 47b7051bc82
(2023-06-01), but failed after rebasing onto a798660ebe3 (2023-06-29).

Example of a failing run:
https://cirrus-ci.com/task/6391476419035136?logs=test_world#L273
https://api.cirrus-ci.com/v1/artifact/task/6391476419035136/testrun/build/testrun/icu/010_database/log/regress_log_010_database
https://api.cirrus-ci.com/v1/artifact/task/6391476419035136/testrun/build/testrun/icu/010_database/log/010_database_node1.log

[07:25:06.421](0.161s) not ok 6 - ICU-specific locale must be specified with 
ICU_LOCALE: exit code not 0
[07:25:06.423](0.002s) #   Failed test 'ICU-specific locale must be specified 
with ICU_LOCALE: exit code not 0'
#   at /home/postgres/postgres/src/test/icu/t/010_database.pl line 78.
[07:25:06.423](0.000s) #  got: '0'
# expected: anything else
[07:25:06.424](0.001s) not ok 7 - ICU-specific locale must be specified with 
ICU_LOCALE: error message
[07:25:06.424](0.001s) #   Failed test 'ICU-specific locale must be specified 
with ICU_LOCALE: error message'
#   at /home/postgres/postgres/src/test/icu/t/010_database.pl line 80.
[07:25:06.424](0.000s) #   'psql::2: NOTICE:  using 
standard form "und-u-ks-level1" for ICU locale "@colStrength=primary"'
# doesn't match '(?^:ERROR:  invalid LC_COLLATE locale name)'
[07:25:06.425](0.000s) 1..7

The server log says:

2023-07-02 07:25:05.946 UTC [15605][client backend] [010_database.pl][3/14:0] 
LOG:  statement: CREATE DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE 
template0 ENCODING UTF8
2023-07-02 07:25:05.947 UTC [15605][client backend] [010_database.pl][3/14:0] 
WARNING:  could not convert locale name "C" to language tag: 
U_ILLEGAL_ARGUMENT_ERROR
2023-07-02 07:25:05.948 UTC [15605][client backend] [010_database.pl][3/14:0] 
WARNING:  ICU locale "C" has unknown language "c"
2023-07-02 07:25:05.948 UTC [15605][client backend] [010_database.pl][3/14:0] 
HINT:  To disable ICU locale validation, set parameter icu_validation_level to 
DISABLED.


Example of a succeeding run:
https://cirrus-ci.com/task/5893925412536320?logs=test_world#L261

I have not yet debugged this further.

Greetings,

Andres Freund




Replacing abort() with __builtin_trap()?

2023-07-02 Thread Andres Freund
Hi,

When looking at Assert() failures and at PANICs, the number of "pointless"
stack entries at the top seems to have grown over the years.  Here's an
example of a stacktrace (that I obviously intentionally triggered):

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x7f31920a815f in __pthread_kill_internal (signo=6, threadid=) at ./nptl/pthread_kill.c:78
#2  0x7f319205a472 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26
#3  0x7f31920444b2 in __GI_abort () at ./stdlib/abort.c:79
#4  0x55b3340c5140 in ExceptionalCondition (conditionName=0x55b3338c7ea0 
"\"I kid you not\" == NULL",
fileName=0x55b3338c6958 
"../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", 
lineNumber=4126)
at 
../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
#5  0x55b333ef46c4 in PostgresMain (dbname=0x55b336271608 "postgres", 
username=0x55b3361fa888 "andres")
at ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
#6  0x55b333e1fadd in BackendRun (port=0x55b336267ec0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
#7  0x55b333e1f369 in BackendStartup (port=0x55b336267ec0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
#8  0x55b333e1b406 in ServerLoop () at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
#9  0x55b333e1ad17 in PostmasterMain (argc=73, argv=0x55b3361f83f0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
#10 0x55b333d052e2 in main (argc=73, argv=0x55b3361f83f0) at 
../../../../home/andres/src/postgresql/src/backend/main/main.c:198

That's due to glibc having a very complicated abort(). Which might be nice as
a backstop, but for the default Assert it's imo just noise.

I'd like to propose that we do a configure test for __builtin_trap() and use
it, if available, before the abort() in ExceptionalCondition(). Perhaps also
for PANIC, but it's not as clear to me whether we should.

Here's a backtrace when using __builtin_trap():
#0  ExceptionalCondition (conditionName=0x55e7e7c90ea0 "\"I kid you not\" == 
NULL",
fileName=0x55e7e7c8f958 
"../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", 
lineNumber=4126)
at 
../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
66  __builtin_trap();
(gdb) bt
#0  ExceptionalCondition (conditionName=0x55e7e7c90ea0 "\"I kid you not\" == 
NULL",
fileName=0x55e7e7c8f958 
"../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c", 
lineNumber=4126)
at 
../../../../home/andres/src/postgresql/src/backend/utils/error/assert.c:66
#1  0x55e7e82bd6c4 in PostgresMain (dbname=0x55e7e9ea8608 "postgres", 
username=0x55e7e9e31888 "andres")
at ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
#2  0x55e7e81e8add in BackendRun (port=0x55e7e9e9eec0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
#3  0x55e7e81e8369 in BackendStartup (port=0x55e7e9e9eec0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
#4  0x55e7e81e4406 in ServerLoop () at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
#5  0x55e7e81e3d17 in PostmasterMain (argc=73, argv=0x55e7e9e2f3f0) at 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
#6  0x55e7e80ce2e2 in main (argc=73, argv=0x55e7e9e2f3f0) at 
../../../../home/andres/src/postgresql/src/backend/main/main.c:198


Maybe I crash things too often, but I like to not have to deal with 4
pointless frames at the top...

Greetings,

Andres Freund




Re: Replacing abort() with __builtin_trap()?

2023-07-02 Thread Tom Lane
Andres Freund  writes:
> I'd like to propose that we do a configure test for __builtin_trap() and use
> it, if available, before the abort() in ExceptionalCondition(). Perhaps also
> for PANIC, but it's not as clear to me whether we should.

Does that still result in the same process exit signal being reported to
the postmaster?  The GCC manual makes it sound like the reported signal
could be platform-dependent, which'd be kind of confusing.

regards, tom lane




Re: Replacing abort() with __builtin_trap()?

2023-07-02 Thread Andres Freund
Hi,

On 2023-07-02 13:55:53 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'd like to propose that we do a configure test for __builtin_trap() and use
> > it, if available, before the abort() in ExceptionalCondition(). Perhaps also
> > for PANIC, but it's not as clear to me whether we should.
>
> Does that still result in the same process exit signal being reported to
> the postmaster?

It does not on linux / x86-64.

2023-07-02 10:52:55.103 PDT [1398197][postmaster][:0][] LOG:  server process 
(PID 1398207) was terminated by signal 4: Illegal instruction
vs today's
2023-07-02 11:08:22.674 PDT [1401801][postmaster][:0][] LOG:  server process 
(PID 1401809) was terminated by signal 6: Aborted

It wouldn't be bad for postmaster to be able to distinguish between PANIC and
Assert(), but I agree that the non-determinism is a bit annoying.

Greetings,

Andres Freund




Re: possible bug in handling of contrecords in dd38ff28ad (Fix recovery_prefetch with low maintenance_io_concurrency)

2023-07-02 Thread Tomas Vondra



On 7/2/23 04:09, Thomas Munro wrote:
> On Sun, Jul 2, 2023 at 1:40 AM Tomas Vondra
>  wrote:
>> I think there's some sort of bug in how dd38ff28ad deals with
>> contrecords. Consider something as simple as
>>
>>   pgbench -i -s 100
>>
>> and then doing pg_waldump on the WAL segments, I get this for every
>> single one:
>>
>>   pg_waldump: error: error in WAL record at 0/198: missing
>>  contrecord at 0/1E0
>>
>> This only happens since dd38ff28ad, and revert makes it disappear.
>>
>> It's possible we still have some issue with contrecords, but IIUC we
>> fixed those. So unless there's some unknown one (and considering this
>> seems to happen for *every* WAL segment that's hard to believe), this
>> seems more like an issue in the error detection.
> 
> Yeah.  That message is due to this part of dd38ff28ad's change:
> 
> Also add an explicit error message for missing contrecords.  It was a
> bit strange that we didn't report an error already, and became a latent
> bug with prefetching, since the internal state that tracks aborted
> contrecords would not survive retrying, as revealed by
> 026_overwrite_contrecord.pl with this adjustment.  Reporting an error
> prevents that.
> 
> We can change 'missing contrecord' back to silent end-of-decoding (as
> it was in 14) with the attached.  Here [1] is some analysis of this
> error that I wrote last year.  The reason for my hesitation in pushing
> a fix was something like this:
> 
> 1.  For pg_waldump, it's "you told me to decode only this WAL segment,
> so decoding failed here", which is useless noise
> 2.  For walsender, it's "you told me to shut down, so decoding failed
> here", which is useless noise
> 3.  For crash recovery, "I ran out of data, so decoding failed here",
> which seems like a report-worthy condition, I think?
> 
> When I added that new error I was thinking about that third case.  We
> generally expect to detect the end of WAL replay after a crash with an
> error ("invalid record length ...: wanted 24, got 0" + several other
> possibilities), but in this rare case it would be silent.  The effect
> on the first two cases is cosmetic, but certainly annoying.  Perhaps I
> should go ahead and back-patch the attached change, and then we can
> discuss whether/how we should do a better job of distinguishing "user
> requested artificial end of decoding" from "unexpectedly ran out of
> data" for v17?
> 

Yeah, I think that'd be sensible. IMHO we have a habit of scaring users
with stuff that might be dangerous/bad, but 99% of the time it's
actually fine and perhaps even expected. It's almost as if we're
conditioning people to ignore errors.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Memory leak in incremental sort re-scan

2023-07-02 Thread Tomas Vondra



On 6/29/23 13:49, Laurenz Albe wrote:
> On Fri, 2023-06-16 at 00:34 +0200, Tomas Vondra wrote:
>> On 6/15/23 22:36, Tom Lane wrote:
>>> Tomas Vondra  writes:
 On 6/15/23 22:11, Tom Lane wrote:
> I see zero leakage in that example after applying the attached quick
> hack.  (It might be better to make the check in the caller, or to just
> move the call to ExecInitIncrementalSort.)
>>>
 Thanks for looking. Are you planning to work on this and push the fix,
 or do you want me to finish this up?
>>>
>>> I'm happy to let you take it -- got lots of other stuff on my plate.
>>
>> OK, will do.
> 
> It would be cool if we could get that into the next minor release in August.
> 

FWIW I've pushed the fix prepared by James a couple days ago. Thanks for
the report!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Optionally using a better backtrace library?

2023-07-02 Thread Andres Freund
Hi,

I like that we now have a builtin backtrace ability. Unfortunately I think the
backtraces are often not very useful, because only externally visible
functions are symbolized.

E.g.:

2023-07-02 10:54:01.756 PDT [1398494][client backend][:0][[unknown]] LOG:  will 
crash
2023-07-02 10:54:01.756 PDT [1398494][client backend][:0][[unknown]] BACKTRACE:
postgres: dev assert: andres postgres [local] 
initializing(errbacktrace+0xbb) [0x562a44c97ca9]
postgres: dev assert: andres postgres [local] 
initializing(PostgresMain+0xb6) [0x562a44ac56d4]
postgres: dev assert: andres postgres [local] initializing(+0x806add) 
[0x562a449f0add]
postgres: dev assert: andres postgres [local] initializing(+0x806369) 
[0x562a449f0369]
postgres: dev assert: andres postgres [local] initializing(+0x802406) 
[0x562a449ec406]
postgres: dev assert: andres postgres [local] 
initializing(PostmasterMain+0x1676) [0x562a449ebd17]
postgres: dev assert: andres postgres [local] initializing(+0x6ec2e2) 
[0x562a448d62e2]
/lib/x86_64-linux-gnu/libc.so.6(+0x276ca) [0x7f1e820456ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f1e82045785]
postgres: dev assert: andres postgres [local] initializing(_start+0x21) 
[0x562a445ede21]

which is far from as useful as it could be.


A lot of platforms have "libbacktrace" available, e.g. as part of gcc. I think
we should consider using it, when available, to produce more useful
backtraces.

I hacked it up for ereport() to debug something, and the backtraces are
considerably better:

2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] LOG:  will 
crash
2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] BACKTRACE:
[0x55fcd03e6143] PostgresMain: 
../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
[0x55fcd031154c] BackendRun: 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
[0x55fcd0310dd8] BackendStartup: 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
[0x55fcd030ce75] ServerLoop: 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
[0x55fcd030c786] PostmasterMain: 
../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
[0x55fcd01f6d51] main: 
../../../../home/andres/src/postgresql/src/backend/main/main.c:198
[0x7fdd914456c9] __libc_start_call_main: 
../sysdeps/nptl/libc_start_call_main.h:58
[0x7fdd91445784] __libc_start_main_impl: ../csu/libc-start.c:360
[0x55fccff0e890] [unknown]: [unknown]:0

The way each frame looks is my fault, not libbacktrace's...

Nice things about libbacktrace are that the generation of stack traces is
documented to be async signal safe on most platforms (with a #define to figure
that out, and a more minimal safe version always available) and that it
supports a wide range of platforms:

https://github.com/ianlancetaylor/libbacktrace
  As of October 2020, libbacktrace supports ELF, PE/COFF, Mach-O, and XCOFF
  executables with DWARF debugging information. In other words, it supports
  GNU/Linux, *BSD, macOS, Windows, and AIX. The library is written to make it
  straightforward to add support for other object file and debugging formats.


The state I currently have is very hacky, but if there's interest in
upstreaming something like this, I could clean it up.

Greetings,

Andres Freund




Re: Optionally using a better backtrace library?

2023-07-02 Thread Pavel Stehule
ne 2. 7. 2023 v 20:32 odesílatel Andres Freund  napsal:

> Hi,
>
> I like that we now have a builtin backtrace ability. Unfortunately I think
> the
> backtraces are often not very useful, because only externally visible
> functions are symbolized.
>
> E.g.:
>
> 2023-07-02 10:54:01.756 PDT [1398494][client backend][:0][[unknown]] LOG:
> will crash
> 2023-07-02 10:54:01.756 PDT [1398494][client backend][:0][[unknown]]
> BACKTRACE:
> postgres: dev assert: andres postgres [local]
> initializing(errbacktrace+0xbb) [0x562a44c97ca9]
> postgres: dev assert: andres postgres [local]
> initializing(PostgresMain+0xb6) [0x562a44ac56d4]
> postgres: dev assert: andres postgres [local]
> initializing(+0x806add) [0x562a449f0add]
> postgres: dev assert: andres postgres [local]
> initializing(+0x806369) [0x562a449f0369]
> postgres: dev assert: andres postgres [local]
> initializing(+0x802406) [0x562a449ec406]
> postgres: dev assert: andres postgres [local]
> initializing(PostmasterMain+0x1676) [0x562a449ebd17]
> postgres: dev assert: andres postgres [local]
> initializing(+0x6ec2e2) [0x562a448d62e2]
> /lib/x86_64-linux-gnu/libc.so.6(+0x276ca) [0x7f1e820456ca]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)
> [0x7f1e82045785]
> postgres: dev assert: andres postgres [local]
> initializing(_start+0x21) [0x562a445ede21]
>
> which is far from as useful as it could be.
>
>
> A lot of platforms have "libbacktrace" available, e.g. as part of gcc. I
> think
> we should consider using it, when available, to produce more useful
> backtraces.
>
> I hacked it up for ereport() to debug something, and the backtraces are
> considerably better:
>
> 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]] LOG:
> will crash
> 2023-07-02 10:52:54.863 PDT [1398207][client backend][:0][[unknown]]
> BACKTRACE:
> [0x55fcd03e6143] PostgresMain:
> ../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4126
> [0x55fcd031154c] BackendRun:
> ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4461
> [0x55fcd0310dd8] BackendStartup:
> ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:4189
> [0x55fcd030ce75] ServerLoop:
> ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1779
> [0x55fcd030c786] PostmasterMain:
> ../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:1463
> [0x55fcd01f6d51] main:
> ../../../../home/andres/src/postgresql/src/backend/main/main.c:198
> [0x7fdd914456c9] __libc_start_call_main:
> ../sysdeps/nptl/libc_start_call_main.h:58
> [0x7fdd91445784] __libc_start_main_impl: ../csu/libc-start.c:360
> [0x55fccff0e890] [unknown]: [unknown]:0
>
> The way each frame looks is my fault, not libbacktrace's...
>
> Nice things about libbacktrace are that the generation of stack traces is
> documented to be async signal safe on most platforms (with a #define to
> figure
> that out, and a more minimal safe version always available) and that it
> supports a wide range of platforms:
>
> https://github.com/ianlancetaylor/libbacktrace
>   As of October 2020, libbacktrace supports ELF, PE/COFF, Mach-O, and XCOFF
>   executables with DWARF debugging information. In other words, it supports
>   GNU/Linux, *BSD, macOS, Windows, and AIX. The library is written to make
> it
>   straightforward to add support for other object file and debugging
> formats.
>
>
> The state I currently have is very hacky, but if there's interest in
> upstreaming something like this, I could clean it up.
>

Looks nice

+1

Pavel


> Greetings,
>
> Andres Freund
>
>
>


Re: Memory leak in incremental sort re-scan

2023-07-02 Thread Laurenz Albe
On Sun, 2023-07-02 at 20:13 +0200, Tomas Vondra wrote:
> FWIW I've pushed the fix prepared by James a couple days ago. Thanks for
> the report!

Thanks, and sorry for being pushy.

Yours,
Laurenz Albe




Re: memory leak in trigger handling (since PG12)

2023-07-02 Thread Tomas Vondra



On 6/23/23 08:03, Alexander Pyhalov wrote:
> Tomas Vondra писал 2023-06-22 17:16:
>> On 6/22/23 13:46, Tomas Vondra wrote:
>>> ...
>>>
>>> I haven't tried the reproducer, but I think I see the issue - we store
>>> the bitmap as part of the event to be executed later, but the bitmap is
>>> in per-tuple context and gets reset. So I guess we need to copy it into
>>> the proper long-lived context (e.g. AfterTriggerEvents).
>>>
>>> I'll get that fixed.
>>>
>>
>> Alexander, can you try if this fixes the issue for you?
>>
>>
>> regard
> 
> Hi.
> The patch fixes the problem and looks good to me.

Thanks, I've pushed the fix, including backpatch to 13+ (12 is not
affected by the oversight, the bitmap was added by 71d60e2aa0).

I think it'd be good to investigate if it's possible to compute the
bitmap only once - as already suggested by Andres, but that's a matter
for separate patch, not a bugfix.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Optionally using a better backtrace library?

2023-07-02 Thread Joe Conway

On 7/2/23 14:31, Andres Freund wrote:

Nice things about libbacktrace are that the generation of stack traces is
documented to be async signal safe on most platforms (with a #define to figure
that out, and a more minimal safe version always available) and that it
supports a wide range of platforms:

https://github.com/ianlancetaylor/libbacktrace
   As of October 2020, libbacktrace supports ELF, PE/COFF, Mach-O, and XCOFF
   executables with DWARF debugging information. In other words, it supports
   GNU/Linux, *BSD, macOS, Windows, and AIX. The library is written to make it
   straightforward to add support for other object file and debugging formats.


The state I currently have is very hacky, but if there's interest in
upstreaming something like this, I could clean it up.


+1
Seems useful!

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Replacing abort() with __builtin_trap()?

2023-07-02 Thread Tom Lane
Andres Freund  writes:
> On 2023-07-02 13:55:53 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> I'd like to propose that we do a configure test for __builtin_trap() and use
>>> it, if available, before the abort() in ExceptionalCondition(). Perhaps also
>>> for PANIC, but it's not as clear to me whether we should.

>> Does that still result in the same process exit signal being reported to
>> the postmaster?

> It does not on linux / x86-64.

> 2023-07-02 10:52:55.103 PDT [1398197][postmaster][:0][] LOG:  server process 
> (PID 1398207) was terminated by signal 4: Illegal instruction
> vs today's
> 2023-07-02 11:08:22.674 PDT [1401801][postmaster][:0][] LOG:  server process 
> (PID 1401809) was terminated by signal 6: Aborted

Hm, I do *not* like "Illegal instruction" in place of SIGABRT;
that looks too much like we vectored off into never-never land.
I'd rather live with the admittedly-ugly stack traces.

regards, tom lane




Re: possible bug in handling of contrecords in dd38ff28ad (Fix recovery_prefetch with low maintenance_io_concurrency)

2023-07-02 Thread Thomas Munro
On Mon, Jul 3, 2023 at 6:12 AM Tomas Vondra
 wrote:
> On 7/2/23 04:09, Thomas Munro wrote:
> > When I added that new error I was thinking about that third case.  We
> > generally expect to detect the end of WAL replay after a crash with an
> > error ("invalid record length ...: wanted 24, got 0" + several other
> > possibilities), but in this rare case it would be silent.  The effect
> > on the first two cases is cosmetic, but certainly annoying.  Perhaps I
> > should go ahead and back-patch the attached change, and then we can
> > discuss whether/how we should do a better job of distinguishing "user
> > requested artificial end of decoding" from "unexpectedly ran out of
> > data" for v17?
> >
>
> Yeah, I think that'd be sensible. IMHO we have a habit of scaring users
> with stuff that might be dangerous/bad, but 99% of the time it's
> actually fine and perhaps even expected. It's almost as if we're
> conditioning people to ignore errors.

Done.

There is CF #2490 "Make message at end-of-recovery less scary".
Perhaps we should think about how to classify this type of failure in
the context of that proposal.




Re: [PATCH] Extend the length of BackgroundWorker.bgw_library_name

2023-07-02 Thread Yurii Rashkovskii
Hi Nathan,

On Fri, Jun 30, 2023 at 2:39 PM Nathan Bossart 
wrote:

>
> In v4, I've introduced a new BGW_LIBLEN macro and set it to the default
> value of MAXPGPATH (1024).  This way, the value can live in bgworker.h like
> the other BGW_* macros do.  Plus, this should make the assertion that
> checks for backward compatibility unnecessary.  Since bgw_library_name is
> essentially a path, I can see the argument that we should just set
> BGW_LIBLEN to MAXPGPATH directly.  I'm curious what folks think about this.
>

Thank you for revising the patch. While this is relatively minor, I think
it should be set to MAXPGPATH directly to clarify their relationship.

-- 
Y.


Re: Making empty Bitmapsets always be NULL

2023-07-02 Thread David Rowley
On Mon, 3 Jul 2023 at 09:27, David Rowley  wrote:
> If nobody else wants to take a look, then I plan to push the v4 + the
> asserts in the next day or so.

Here's the patch which includes those Asserts.  I also made some small
tweaks to a comment.

I understand that Tom thought that the Asserts were a step too far in
[1], but per the bugs found in [2], I think having them is worthwhile.

In the attached, I only added Asserts to the locations where the code
relies on there being no trailing zero words.  I didn't include them
in places like bms_copy() since nothing there would do the wrong thing
if there were trailing zero words.

David

[1] https://postgr.es/m/2686153.1677881...@sss.pgh.pa.us
[2] 
https://postgr.es/m/caj2pmkyckhfbd_omusvyhysqu0-j9t6nz0pl6pwbzsucohw...@mail.gmail.com
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 7ba3cf635b..34fe063632 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -5,8 +5,16 @@
  *
  * A bitmap set can represent any set of nonnegative integers, although
  * it is mainly intended for sets where the maximum value is not large,
- * say at most a few hundred.  By convention, we always represent the
- * empty set by a NULL pointer.
+ * say at most a few hundred.  By convention, we always represent a set with
+ * the minimum possible number of words, i.e, there are never any trailing
+ * zero words.  Enforcing this requires that an empty set is represented as
+ * NULL.  Because an empty Bitmapset is represented as NULL, a non-NULL
+ * Bitmapset always has at least 1 Bitmapword.  We can exploit this fact to
+ * speed up various loops over the Bitmapset's words array by using "do while"
+ * loops instead of "for" loops.  This means the code does not waste time
+ * checking the loop condition before the first iteration.  For Bitmapsets
+ * containing only a single word (likely the majority of them) this halves the
+ * number of loop condition checks.
  *
  *
  * Copyright (c) 2003-2023, PostgreSQL Global Development Group
@@ -64,8 +72,6 @@
 #error "invalid BITS_PER_BITMAPWORD"
 #endif
 
-static bool bms_is_empty_internal(const Bitmapset *a);
-
 
 /*
  * bms_copy - make a palloc'd copy of a bitmapset
@@ -85,20 +91,16 @@ bms_copy(const Bitmapset *a)
 }
 
 /*
- * bms_equal - are two bitmapsets equal?
- *
- * This is logical not physical equality; in particular, a NULL pointer will
- * be reported as equal to a palloc'd value containing no members.
+ * bms_equal - are two bitmapsets equal? or both NULL?
  */
 bool
 bms_equal(const Bitmapset *a, const Bitmapset *b)
 {
-   const Bitmapset *shorter;
-   const Bitmapset *longer;
-   int shortlen;
-   int longlen;
int i;
 
+   Assert(a == NULL || a->words[a->nwords - 1] != 0);
+   Assert(b == NULL || b->words[b->nwords - 1] != 0);
+
/* Handle cases where either input is NULL */
if (a == NULL)
{
@@ -108,30 +110,19 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
}
else if (b == NULL)
return false;
-   /* Identify shorter and longer input */
-   if (a->nwords <= b->nwords)
-   {
-   shorter = a;
-   longer = b;
-   }
-   else
-   {
-   shorter = b;
-   longer = a;
-   }
-   /* And process */
-   shortlen = shorter->nwords;
-   for (i = 0; i < shortlen; i++)
-   {
-   if (shorter->words[i] != longer->words[i])
-   return false;
-   }
-   longlen = longer->nwords;
-   for (; i < longlen; i++)
+
+   /* can't be equal if the word counts don't match */
+   if (a->nwords != b->nwords)
+   return false;
+
+   /* check each word matches */
+   i = 0;
+   do
{
-   if (longer->words[i] != 0)
+   if (a->words[i] != b->words[i])
return false;
-   }
+   } while (++i < a->nwords);
+
return true;
 }
 
@@ -146,36 +137,30 @@ bms_equal(const Bitmapset *a, const Bitmapset *b)
 int
 bms_compare(const Bitmapset *a, const Bitmapset *b)
 {
-   int shortlen;
int i;
 
+   Assert(a == NULL || a->words[a->nwords - 1] != 0);
+   Assert(b == NULL || b->words[b->nwords - 1] != 0);
+
/* Handle cases where either input is NULL */
if (a == NULL)
return (b == NULL) ? 0 : -1;
else if (b == NULL)
return +1;
-   /* Handle cases where one input is longer than the other */
-   shortlen = Min(a->nwords, b->nwords);
-   for (i = shortlen; i < a->nwords; i++)
-   {
-   if (a->words[i] != 0)
-   return +1;
-   }
-   for (i = shortlen; i < b->nwords; i++)
-   {
-   if (b->words[i] != 0)
-   return -1;
-   }
-   /* Process 

Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys

2023-07-02 Thread David Rowley
On Mon, 12 Jun 2023 at 20:20, Richard Guo  wrote:
>  So now the v2 patch looks good to me.

Thank you for reviewing this. I've just pushed the patch.

David




Re: [PATCH] Honor PG_TEST_NOCLEAN for tempdirs

2023-07-02 Thread Michael Paquier
On Fri, Jun 30, 2023 at 09:42:13AM +0200, Daniel Gustafsson wrote:
> Agreed, I'd prefer all branches to work the same for this.

Thanks, done this way across all the branches, then.

> Reading the patch, only one thing stood out:
> 
> -variable PG_TEST_NOCLEAN is set, data directories will be retained
> -regardless of test status.
> +variable PG_TEST_NOCLEAN is set, those directories will be retained
> 
> I would've written "the data directories" instead of "those directories" here.

Adjusted that as well, on top of an extra comment.
--
Michael


signature.asc
Description: PGP signature


Re: Skip collecting decoded changes of already-aborted transactions

2023-07-02 Thread Masahiko Sawada
On Fri, Jun 23, 2023 at 12:39 PM Dilip Kumar  wrote:
>
> On Fri, Jun 9, 2023 at 10:47 AM Masahiko Sawada  wrote:
> >
> > Hi,
> >
> > In logical decoding, we don't need to collect decoded changes of
> > aborted transactions. While streaming changes, we can detect
> > concurrent abort of the (sub)transaction but there is no mechanism to
> > skip decoding changes of transactions that are known to already be
> > aborted. With the attached WIP patch, we check CLOG when decoding the
> > transaction for the first time. If it's already known to be aborted,
> > we skip collecting decoded changes of such transactions. That way,
> > when the logical replication is behind or restarts, we don't need to
> > decode large transactions that already aborted, which helps improve
> > the decoding performance.
> >
> +1 for the idea of checking the transaction status only when we need
> to flush it to the disk or send it downstream (if streaming in
> progress is enabled).   Although this check is costly since we are
> planning only for large transactions then it is worth it if we can
> occasionally avoid disk or network I/O for the aborted transactions.
>

Thanks.

I've attached the updated patch. With this patch, we check the
transaction status for only large-transactions when eviction. For
regression test purposes, I disable this transaction status check when
logical_replication_mode is set to 'immediate'.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v2-0001-Skip-decoding-already-aborted-transactions.patch
Description: Binary data


Re: check_strxfrm_bug()

2023-07-02 Thread Thomas Munro
On Sun, Jul 2, 2023 at 4:25 AM Noah Misch  wrote:
> On Wed, Jun 28, 2023 at 01:02:21PM +1200, Thomas Munro wrote:
> > On Wed, Jun 28, 2023 at 11:03 AM Thomas Munro  
> > wrote:
> > > The GCC build farm has just received some SPARC hardware new enough to
> > > run modern Solaris (hostname gcc106), so if wrasse were moved over
> > > there we could finally assume all systems have POSIX 2008 (AKA
> > > SUSv4)'s locale_t.
> >
> > That would look something like this.
>
> This removes thirty-eight ifdefs, most of them located in the middle of
> function bodies.  That's far more beneficial than most proposals to raise
> minimum requirements.  +1 for revoking support for wrasse's OS version.
> (wrasse wouldn't move, but it would stop testing v17+.)

Great.  It sounds like I should wait a few days for any other feedback
and then push this patch.  Thanks for looking.




Re: ProcessStartupPacket(): database_name and user_name truncation

2023-07-02 Thread Kyotaro Horiguchi
At Fri, 30 Jun 2023 19:32:50 +0200, "Drouvot, Bertrand" 
 wrote in 
> Hi,
> 
> On 6/30/23 5:54 PM, Tom Lane wrote:
> > Nathan Bossart  writes:
> >> After taking another look at this, I wonder if it'd be better to fail
> >> as
> >> soon as we see the database or user name is too long instead of
> >> lugging
> >> them around when authentication is destined to fail.

For the record, if I understand Nathan correctly, it is what I
suggested in my initial post. If this is correct, +1 for the suggestion.

me> I think we might want to consider outright rejecting the
me> estblishment of a connection when the given database name doesn't
me> fit the startup packet

> > If we're agreed that we aren't going to truncate these identifiers,
> > that seems like a reasonable way to handle it.
> > 
> 
> Yeah agree, thanks Nathan for the idea.
> I'll work on a new patch version proposal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: ProcessStartupPacket(): database_name and user_name truncation

2023-07-02 Thread Kyotaro Horiguchi
At Mon, 03 Jul 2023 10:50:45 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> For the record, if I understand Nathan correctly, it is what I
> suggested in my initial post. If this is correct, +1 for the suggestion.
> 
> me> I think we might want to consider outright rejecting the
> me> estblishment of a connection when the given database name doesn't
> me> fit the startup packet

Mmm. It's bit wrong. "doesn't fit the startup packet" is "is long as a
database name".


At Sat, 1 Jul 2023 16:02:06 +0200, "Drouvot, Bertrand" 
 wrote in 
> Please find V2 attached where it's failing as soon as the database
> name or
> user name are detected as overlength.

I find another errocde "ERRCODE_INVALID_ROLE_SPECIFICATION". I don't
find a clear distinction between the usages of the two, but I think
.._ROLE_.. might be a better fit.


ERRCODE_INVALID_ROLE_SPACIFICATION:
  auth.c:1507:  "could not transnlate name"
  auth.c:1526:  "could not translate name"
  auth.c:1539:  "realm name too long"
  auth.c:1554:  "translated account name too long"

ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION:
postmaster.c:2268:  "no PostgreSQL user name specified in startup packet"
miscinit.c:756: "role \"%s\" does not exist"
miscinit.c:764: "role with OID %u does not exist"
miscinit.c:794: "role \"%s\" is not permitted to log in"
auth.c:420: "connection requires a valid client certificate"
auth.c:461,468,528,536:  "pg_hba.conf rejects ..."
auth.c:878: MD5 authentication is not supported when 
\"db_user_namespace\" is enabled"
auth-scram.c:1016:  "SCRAM channel binding negotiation error"
auth-scram.c:1349:  "SCRAM channel binding check failed"

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-02 Thread Masahiko Sawada
Hi,

Commit 89e46da5e5 allowed us to use indexes for searching on REPLICA
IDENTITY FULL tables. The documentation explains:

When replica identity full is specified,
indexes can be used on the subscriber side for searching the rows.  Candidate
indexes must be btree, non-partial, and have at least one column reference
(i.e. cannot consist of only expressions).

To be exact, IIUC the column reference must be on the leftmost column
of indexes. Does it make sense to mention that? I've attached the
patch.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


doc_update.patch
Description: Binary data


Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-02 Thread Masahiko Sawada
Hi all,

While testing PG16, I observed that in PG16 there is a big performance
degradation in concurrent COPY into a single relation with 2 - 16
clients in my environment. I've attached a test script that measures
the execution time of COPYing 5GB data in total to the single relation
while changing the number of concurrent insertions, in PG16 and PG15.
Here are the results on my environment (EC2 instance, RHEL 8.6, 128
vCPUs, 512GB RAM):

* PG15 (4b15868b69)
PG15: nclients = 1, execution time = 14.181
PG15: nclients = 2, execution time = 9.319
PG15: nclients = 4, execution time = 5.872
PG15: nclients = 8, execution time = 3.773
PG15: nclients = 16, execution time = 3.202
PG15: nclients = 32, execution time = 3.023
PG15: nclients = 64, execution time = 3.829
PG15: nclients = 128, execution time = 4.111
PG15: nclients = 256, execution time = 4.158

* PG16 (c24e9ef330)
PG16: nclients = 1, execution time = 17.112
PG16: nclients = 2, execution time = 14.084
PG16: nclients = 4, execution time = 27.997
PG16: nclients = 8, execution time = 10.554
PG16: nclients = 16, execution time = 7.074
PG16: nclients = 32, execution time = 4.607
PG16: nclients = 64, execution time = 2.093
PG16: nclients = 128, execution time = 2.141
PG16: nclients = 256, execution time = 2.202

PG16 has better scalability (more than 64 clients) but it took much
more time than PG15, especially at 1 - 16 clients.

The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to
extend tables more efficiently". With commit 1cbbee0338 (the previous
commit of 00d1e02be2), I got a better numbers, it didn't have a better
scalability, though:

PG16: nclients = 1, execution time = 17.444
PG16: nclients = 2, execution time = 10.690
PG16: nclients = 4, execution time = 7.010
PG16: nclients = 8, execution time = 4.282
PG16: nclients = 16, execution time = 3.373
PG16: nclients = 32, execution time = 3.205
PG16: nclients = 64, execution time = 3.705
PG16: nclients = 128, execution time = 4.196
PG16: nclients = 256, execution time = 4.201

While investigating the cause, I found an interesting fact that in
mdzeroextend if I use only either FileFallocate() or FileZero, we can
get better numbers. For example, If I always use FileZero with the
following change:

@@ -574,7 +574,7 @@ mdzeroextend(SMgrRelation reln, ForkNumber forknum,
 * that decision should be made though? For now just use a cutoff of
 * 8, anything between 4 and 8 worked OK in some local testing.
 */
-   if (numblocks > 8)
+   if (false)
{
int ret;

I got:

PG16: nclients = 1, execution time = 16.898
PG16: nclients = 2, execution time = 8.740
PG16: nclients = 4, execution time = 4.656
PG16: nclients = 8, execution time = 2.733
PG16: nclients = 16, execution time = 2.021
PG16: nclients = 32, execution time = 1.693
PG16: nclients = 64, execution time = 1.742
PG16: nclients = 128, execution time = 2.180
PG16: nclients = 256, execution time = 2.296

After further investigation, the performance degradation comes from
calling posix_fallocate() (called via FileFallocate()) and pwritev()
(called via FileZero) alternatively depending on how many blocks we
extend by. And it happens only on the xfs filesystem. Does anyone
observe a similar performance issue with the attached benchmark
script?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
PG15_PGDATA="/home/masahiko/pgsql/15.s/data"
PG16_PGDATA="/home/masahiko/pgsql/16.s/data"
PG15_BIN="/home/masahiko/pgsql/15.s/bin"
PG16_BIN="/home/masahiko/pgsql/16.s/bin"

ROWS=$((100 * 1000 * 1000))
CLIENTS="1 2 4 8 16 32 64 128 256"

${PG15_BIN}/pg_ctl stop -D ${PG15_PGDATA} -mi
${PG16_BIN}/pg_ctl stop -D ${PG16_PGDATA} -mi
rm -rf $PG15_PGDATA $PG16_PGDATA

echo "initializing clusters ..."
${PG15_BIN}/initdb -D $PG15_PGDATA -E UTF8 --no-locale
${PG16_BIN}/initdb -D $PG16_PGDATA -E UTF8 --no-locale

cat <> ${PG15_PGDATA}/postgresql.conf
port = 5515
max_wal_size = 50GB
shared_buffers = 20GB
max_connections = 500
EOF
cat <> ${PG16_PGDATA}/postgresql.conf
port = 5516
max_wal_size = 50GB
shared_buffers = 20GB
max_connections = 500
EOF

if [ "$1" != "skip_file_init" ]; then
echo "prepare load files..."
${PG16_BIN}/pg_ctl start -D ${PG16_PGDATA}
for c in $CLIENTS
do
	rm -f /tmp/tmp_${c}.data
	${PG16_BIN}/psql -d postgres -p 5516 -X -c "copy (select generate_series(1, $ROWS / $c)) to '/tmp/tmp_${c}.data'"
done
${PG16_BIN}/pg_ctl stop -D ${PG16_PGDATA}
fi

echo "start benchmark ..."
#for version in PG15 PG16
for version in PG16
do
PSQL=""
if [ "$version" == "PG15" ]; then
	PSQL="${PG15_BIN}/psql -p 5515 -d postgres"
	${PG15_BIN}/pg_ctl start -D ${PG15_PGDATA}
else
	PSQL="${PG16_BIN}/psql -p 5516 -d postgres"
	${PG16_BIN}/pg_ctl start -D ${PG16_PGDATA}
fi

${PSQL} -c "create unlogged table test (c int) with (autovacuum_enabled = off)"

for c in $CLIENTS
do
	${PSQL} -c "truncate test" > /dev/null 2>&1

	chileren=()
	start=`date +%s.%3N`
	

Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-02 Thread Masahiko Sawada
On Mon, Jul 3, 2023 at 11:55 AM Masahiko Sawada  wrote:
>
> After further investigation, the performance degradation comes from
> calling posix_fallocate() (called via FileFallocate()) and pwritev()
> (called via FileZero) alternatively depending on how many blocks we
> extend by. And it happens only on the xfs filesystem.

FYI, the attached simple C program proves the fact that calling
alternatively posix_fallocate() and pwrite() causes slow performance
on posix_fallocate():

$ gcc -o test test.c
$ time ./test test.1 1
total   20
fallocate   20
filewrite   0

real0m1.305s
user0m0.050s
sys 0m1.255s

$ time ./test test.2 2
total   20
fallocate   10
filewrite   10

real1m29.222s
user0m0.139s
sys 0m3.139s

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
#include 
#include 
#include 
#include 

int
main(int argc, char **argv)
{
char *filename = argv[1];
int ratio = atoi(argv[2]);
char block[8192] = {0};
int fd;
int total_len = 0;
int n_fallocate = 0;
int n_filewrite = 0;
int i;

fd = open(filename, O_RDWR | O_CREAT, S_IRWXU);
if (fd < 0)
{
fprintf(stderr, "could not open file %s: %m\n", filename);
return 1;
}

for (i = 0; i < 20; i++)
{
int ret;

if (ratio != 0 && i % ratio == 0)
{
posix_fallocate(fd, total_len, 8192);
n_fallocate++;
}
else
{
pwrite(fd, block, 8192, total_len);
n_filewrite++;
}
total_len += 8192;
}

printf("total\t%d\n", i);
printf("fallocate\t%d\n", n_fallocate);
printf("filewrite\t%d\n", n_filewrite);

close(fd);
return 0;
}

Re: pgsql: Fix search_path to a safe value during maintenance operations.

2023-07-02 Thread Noah Misch
On Fri, Jun 30, 2023 at 11:35:46AM -0700, Nathan Bossart wrote:
> On Thu, Jun 29, 2023 at 10:09:21PM -0700, Nathan Bossart wrote:
> > On Thu, Jun 29, 2023 at 08:53:56PM -0400, Tom Lane wrote:
> >> I'm leaning to Robert's thought that we need to revert this for now,
> >> and think harder about how to make it work cleanly and safely.

Another dimension of compromise could be to make MAINTAIN affect fewer
commands in v16.  Un-revert the part of commit 05e1737 affecting just the
commands it still affects.  For example, limit MAINTAIN and the 05e1737
behavior change to VACUUM, ANALYZE, and REINDEX.  Don't worry about VACUUM or
ANALYZE failing under commit 05e1737, since they would have been failing under
autovacuum since 2018.  A problem index expression breaks both autoanalyze and
REINDEX, hence the inclusion of REINDEX.  The already-failing argument doesn't
apply to CLUSTER or REFRESH MATERIALIZED VIEW, so those commands could join
MAINTAIN in v17.

> > Since it sounds like this is headed towards a revert, here's a patch for
> > removing MAINTAIN and pg_maintain.
> 
> I will revert this next week unless opinions change before then.  I'm
> currently planning to revert on both master and REL_16_STABLE, but another
> option would be to keep it on master while we sort out the remaining
> issues.  Thoughts?

>From my reading of the objections, I think they're saying that commit 05e1737
arrived too late and that MAINTAIN is unacceptable without commit 05e1737.  I
think you'd conform to all objections by pushing the revert to v16 and pushing
a roll-forward of commit 05e1737 to master.




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-02 Thread Amit Kapila
On Wed, Jun 28, 2023 at 12:02 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > I have analyzed how we handle this. Please see attached the patch (0003) 
> > > which
> > > allows reusing connection.
> > >
> >
> > Why did you change the application name during the connection?
>
> It was because the lifetime of tablesync worker is longer than slots's one and
> tablesync worker creates temporary replication slots many times, per the 
> target
> relation. The name of each slots has relid, so I thought that it was not 
> suitable.
>

Okay, but let's try to give a unique application name to each
tablesync worker for the purpose of pg_stat_activity and synchronous
replication (as mentioned in existing comments as well). One idea is
to generate a name like pg__sync_ but feel free
to suggest if you have any better ideas.

> But in the later patch the tablesync worker tries to reuse the slot during the
> synchronization, so in this case the application_name should be same as 
> slotname.
>

Fair enough. I am slightly afraid that if we can't show the benefits
with later patches then we may need to drop them but at this stage I
feel we need to investigate why those are not helping?

-- 
With Regards,
Amit Kapila.




Re: Cleaning up array_in()

2023-07-02 Thread jian he
On Mon, Jun 5, 2023 at 9:48 AM Nikhil Benesch  wrote:
>
> I took a look at 0002 because I attempted a similar but more surgical
> fix in [0].
>
> I spotted a few opportunities for further reducing state tracked by
> `ArrayCount`. You may not find all of these suggestions to be
> worthwhile.

I pull ArrayCount into a separate C function, regress test (manually)
based on patch regress test.

> 1) `in_quotes` appears to be wholly redundant with `parse_state ==
> ARRAY_QUOTED_ELEM_STARTED`.

removing it works as expected.

> 3) `eoArray` could be replaced with a new `ArrayParseState` of
> `ARRAY_END`. Just a matter of taste, but "end of array" feels like a
> parser state to me.

works. (reduce one variable.)

> I also have a sense that `ndims_frozen` made the distinction between
> `ARRAY_ELEM_DELIMITED` and `ARRAY_LEVEL_DELIMITED` unnecessary, and
> the two states could be merged into a single `ARRAY_DELIMITED` state,
> but I've not pulled on this thread hard enough to say so confidently.

merging these states into one work as expected.




Re: Introduce WAIT_EVENT_EXTENSION and WAIT_EVENT_BUFFER_PIN

2023-07-02 Thread Michael Paquier
On Fri, Jun 09, 2023 at 02:58:42PM +0900, Michael Paquier wrote:
> FWIW, I'm OK with the patch, without the need to worry about the
> performance.  Even if that's the case, we could just mark all these as
> inline and move on..

I am attempting to get all that done for the beginning of the
development cycle, so the first refactoring patch has been applied.
Now into the second, main one..
--
Michael


signature.asc
Description: PGP signature


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-07-02 Thread Michael Paquier
On Fri, Jun 02, 2023 at 11:23:19PM +0200, Daniel Gustafsson wrote:
> Absolutely, let's keep these goalposts in place and deal with that separately.

I have not gone back to this part yet, though I plan to do so.  As we
are at the beginning of the development cycle, I have applied the
patch to remove support for 1.0.1 for now on HEAD.  Let's see what the
buildfarm tells.
--
Michael


signature.asc
Description: PGP signature


Re: Optionally using a better backtrace library?

2023-07-02 Thread Kyotaro Horiguchi
At Sun, 2 Jul 2023 11:31:56 -0700, Andres Freund  wrote in 
> The state I currently have is very hacky, but if there's interest in
> upstreaming something like this, I could clean it up.

I can't help voting +1.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: several attstattarget-related improvements

2023-07-02 Thread Peter Eisentraut

On 28.06.23 23:30, Tomas Vondra wrote:

On 6/28/23 16:52, Peter Eisentraut wrote:

Here are a few patches related to attstattarget:

- 0001: Change type of pg_statistic_ext.stxstattarget, to match
attstattarget.  Maybe this should go into PG16, for consistency?

- 0002: Add macro for maximum statistics target, instead of hardcoding
it everywhere.

- 0003: Take pg_attribute out of VacAttrStats.  This simplifies some
code, especially for extended statistics, which had to have weird
workarounds.


+1 to all three patches

Not sure about 0001 vs PG16, it'd require catversion bump.


committed (to PG17 for now)





Re: Assert !bms_overlap(joinrel->relids, required_outer)

2023-07-02 Thread Richard Guo
On Fri, Jun 30, 2023 at 11:00 PM Tom Lane  wrote:

> Richard Guo  writes:
> > I think it just makes these two assertions meaningless to skip it for
> > non-nestloop joins if the input paths are for otherrels, because paths
> > would never be parameterized by the member relations.  So these two
> > assertions would always be true for otherrel paths.  I think this is why
> > we have not noticed any problem by now.
>
> After studying this some more, I think that maybe it's the "run
> parameterization tests on the parent relids" bit that is misguided.
> I believe the way it's really working is that all paths arriving
> here are parameterized by top parents, because that's the only thing
> we generate to start with.  A path can only become parameterized
> by an otherrel when we apply reparameterize_path_by_child to it.
> But the only place that happens is in try_nestloop_path itself
> (or try_partial_nestloop_path), and then we immediately wrap it in
> a nestloop join node, which becomes a child of an append that's
> forming a partitionwise join.  The partitionwise join as a
> whole won't be parameterized by any child rels.  So I think that
> a path that's parameterized by a child rel can't exist "in the wild"
> in a way that would allow it to get fed to one of the try_xxx_path
> functions.  This explains why the seeming oversights in the merge
> and hash cases aren't causing a problem.
>
> If this theory is correct, we could simplify try_nestloop_path a
> bit.  I doubt the code savings would matter, but maybe it's
> worth changing for clarity's sake.


Yeah, I think this theory is correct that all paths arriving at
try_xxx_path are parameterized by top parents.  But I do not get how to
simplify try_nestloop_path on the basis of that.  Would you please
elaborate on that?

Thanks
Richard


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-02 Thread Amit Kapila
On Mon, Jul 3, 2023 at 9:42 AM Amit Kapila  wrote:
>
> On Wed, Jun 28, 2023 at 12:02 PM Hayato Kuroda (Fujitsu)
>  wrote:
>
> > But in the later patch the tablesync worker tries to reuse the slot during 
> > the
> > synchronization, so in this case the application_name should be same as 
> > slotname.
> >
>
> Fair enough. I am slightly afraid that if we can't show the benefits
> with later patches then we may need to drop them but at this stage I
> feel we need to investigate why those are not helping?
>

On thinking about this, I think the primary benefit we were expecting
by saving network round trips for slot drop/create but now that we
anyway need an extra round trip to establish a snapshot, so such a
benefit was not visible. This is just a theory so we should validate
it. The another idea as discussed before [1] could be to try copying
multiple tables in a single transaction. Now, keeping a transaction
open for a longer time could have side-effects on the publisher node.
So, we probably need to ensure that we don't perform multiple large
syncs and even for smaller tables (and later sequences) perform it
only for some threshold number of tables which we can figure out by
some tests. Also, the other safety-check could be that anytime we need
to perform streaming (sync with apply worker), we won't copy more
tables in same transaction.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAGPVpCRWEVhXa7ovrhuSQofx4to7o22oU9iKtrOgAOtz_%3DY6vg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Making empty Bitmapsets always be NULL

2023-07-02 Thread Yuya Watari
Hello,

On Mon, Jul 3, 2023 at 9:10 AM David Rowley  wrote:
> Here's the patch which includes those Asserts.  I also made some small
> tweaks to a comment.

Thank you for your reply. I am +1 to your change. I think these
assertions will help someone who changes the Bitmapset implementations
in the future.

-- 
Best regards,
Yuya Watari




Re: Make uselocale protection more consistent

2023-07-02 Thread Peter Eisentraut

On 27.06.23 17:02, Tristan Partin wrote:

This is a patch which implements an issue discussed in bug #17946[0]. It
doesn't fix the overarching issue of the bug, but merely a consistency
issue which was found while analyzing code by Heikki. I had originally
submitted the patch within that thread, but for visibility and the
purposes of the commitfest, I have re-sent it in its own thread.

[0]: 
https://www.postgresql.org/message-id/49dfcad8-90fa-8577-008f-d142e61af...@iki.fi


I notice that HAVE_USELOCALE was introduced much later than 
HAVE_LOCALE_T, and at the time the code was already using uselocale(), 
so perhaps the introduction of HAVE_USELOCALE was unnecessary and should 
be reverted.


I think it would be better to keep HAVE_LOCALE_T as encompassing any of 
the various locale_t-using functions, rather than using HAVE_USELOCALE 
as a proxy for them.  Otherwise you create weird situations like having 
#ifdef HAVE_WCSTOMBS_L inside #ifdef HAVE_USELOCALE, which doesn't make 
sense, I think.






Re: Fix last unitialized memory warning

2023-07-02 Thread Peter Eisentraut

On 07.06.23 16:31, Tristan Partin wrote:

This patch is really not necessary from a functional point of view. It
is only necessary if we want to silence a compiler warning.

Tested on `gcc (GCC) 13.1.1 20230511 (Red Hat 13.1.1-2)`.

After silencing this warning, all I am left with (given my build
configuration) is:

[1667/2280] Compiling C object src/pl/plpgsql/src/plpgsql.so.p/pl_exec.c.o
In file included from ../src/include/access/htup_details.h:22,
  from ../src/pl/plpgsql/src/pl_exec.c:21:
In function ‘assign_simple_var’,
 inlined from ‘exec_set_found’ at ../src/pl/plpgsql/src/pl_exec.c:8349:2:
../src/include/varatt.h:230:36: warning: array subscript 0 is outside array 
bounds of ‘char[0]’ [-Warray-bounds=]
   230 | (((varattrib_1b_e *) (PTR))->va_tag)
   |^
../src/include/varatt.h:94:12: note: in definition of macro ‘VARTAG_IS_EXPANDED’
94 | (((tag) & ~1) == VARTAG_EXPANDED_RO)
   |^~~
../src/include/varatt.h:284:57: note: in expansion of macro ‘VARTAG_1B_E’
   284 | #define VARTAG_EXTERNAL(PTR)
VARTAG_1B_E(PTR)
   | ^~~
../src/include/varatt.h:301:57: note: in expansion of macro ‘VARTAG_EXTERNAL’
   301 | (VARATT_IS_EXTERNAL(PTR) && 
!VARTAG_IS_EXPANDED(VARTAG_EXTERNAL(PTR)))
   | ^~~
../src/pl/plpgsql/src/pl_exec.c:8537:17: note: in expansion of macro 
‘VARATT_IS_EXTERNAL_NON_EXPANDED’
  8537 | 
VARATT_IS_EXTERNAL_NON_EXPANDED(DatumGetPointer(newvalue)))
   | ^~~

 From my perspective, this warning definitely seems like a false
positive, but I don't know the code well-enough to say that for certain.


I cannot reproduce this warning with gcc-13.  Are you using any 
non-standard optimization options.  Could you give your full configure 
and build commands and the OS?






Re: Make uselocale protection more consistent

2023-07-02 Thread Thomas Munro
On Mon, Jul 3, 2023 at 6:13 PM Peter Eisentraut  wrote:
> On 27.06.23 17:02, Tristan Partin wrote:
> > This is a patch which implements an issue discussed in bug #17946[0]. It
> > doesn't fix the overarching issue of the bug, but merely a consistency
> > issue which was found while analyzing code by Heikki. I had originally
> > submitted the patch within that thread, but for visibility and the
> > purposes of the commitfest, I have re-sent it in its own thread.
> >
> > [0]: 
> > https://www.postgresql.org/message-id/49dfcad8-90fa-8577-008f-d142e61af...@iki.fi
>
> I notice that HAVE_USELOCALE was introduced much later than
> HAVE_LOCALE_T, and at the time the code was already using uselocale(),
> so perhaps the introduction of HAVE_USELOCALE was unnecessary and should
> be reverted.
>
> I think it would be better to keep HAVE_LOCALE_T as encompassing any of
> the various locale_t-using functions, rather than using HAVE_USELOCALE
> as a proxy for them.  Otherwise you create weird situations like having
> #ifdef HAVE_WCSTOMBS_L inside #ifdef HAVE_USELOCALE, which doesn't make
> sense, I think.

I propose[1] that we get rid of HAVE_LOCALE_T completely and make
"libc" provider support unconditional.  It's standardised, and every
target system has it, even Windows.  But Windows doesn't have
uselocale().

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGL7CmmzeRhoirzjECmOdABVFTn8fo6gEOaFRF1Oxey6Hw%40mail.gmail.com#aef2f2274b28ff8a36f9b8a598e3cec0




Re: Add information about command path and version of flex in meson output

2023-07-02 Thread Peter Eisentraut

On 11.04.23 07:58, Michael Paquier wrote:

While doing a few things on Windows with meson, I have noticed that,
while we output some information related to bison after a setup step,
there is nothing about flex.

I think that adding something about flex in the "Programs" section
would be pretty useful, particularly for Windows as the command used
could be "flex" as much as "win_flex.exe".


I think this would be useful.

> +  flex_version = run_command(flex, '--version', check: true)
> +  flex_version = flex_version.stdout().split(' ')[1].split('\n')[0]

Maybe this could be combined into one command?

Looks good otherwise.





Re: Should heapam_estimate_rel_size consider fillfactor?

2023-07-02 Thread Peter Eisentraut

On 14.06.23 20:41, Corey Huinker wrote:

So maybe we should make table_block_relation_estimate_size smarter to
also consider the fillfactor in the "no statistics" branch, per the
attached patch.


I like this a lot. The reasoning is obvious, the fix is simple,it 
doesn't upset any make-check-world tests, and in order to get a 
performance regression we'd need a table whose fillfactor has been 
changed after the data was loaded but before an analyze happens, and 
that's a narrow enough case to accept.


My only nitpick is to swap

(usable_bytes_per_page * fillfactor / 100) / tuple_width

with

(usable_bytes_per_page * fillfactor) / (tuple_width * 100)


as this will eliminate the extra remainder truncation, and it also gets 
the arguments "in order" algebraically.


The fillfactor is in percent, so it makes sense to divide it by 100 
first before doing any further arithmetic with it.  I find your version 
of this to be more puzzling without any explanation.  You could do 
fillfactor/100.0 to avoid the integer division, but then, the comment 
above that says "integer division is intentional here", without any 
further explanation.  I think a bit more explanation of all the 
subtleties here would be good in any case.






Re: Fix regression tests to work with REGRESS_OPTS=--no-locale

2023-07-02 Thread Peter Eisentraut

On 20.06.23 05:46, Michael Paquier wrote:

On Thu, Jun 15, 2023 at 03:52:13PM +0900, Michael Paquier wrote:

It took some time to notice that, which makes me wonder how relevant
this stuff is these days..  Anyway, I would like to do like the others
and fix it, so I am proposing the attached.


Please find attached a v2 that removes the ENCODING and NO_LOCALE
flags from meson.build and Makefile, that I forgot to clean up
previously.  Note that I am not planning to do anything here until at
least v17 opens for business.


I think it makes sense to make those checks consistent.





Re: Autogenerate some wait events code and documentation

2023-07-02 Thread Michael Paquier
On Thu, May 18, 2023 at 01:36:30PM +0900, Michael Paquier wrote:
> The order looks fine seen from here, thanks!

Now that v17 is open for business, I have looked again at this patch.

perlcritic is formulating three complaints:
./src/backend/utils/activity/generate-waiteventtypes.pl: Loop iterator
is not lexical at line 99, column 1.  See page 108 of PBP.
([Variables::RequireLexicalLoopIterators] Severity: 5)
./src/backend/utils/activity/generate-waiteventtypes.pl: Loop iterator
is not lexical at line 126, column 1.  See page 108 of PBP.
([Variables::RequireLexicalLoopIterators] Severity: 5)
./src/backend/utils/activity/generate-waiteventtypes.pl: Loop iterator
is not lexical at line 181, column 1.  See page 108 of PBP.
([Variables::RequireLexicalLoopIterators] Severity: 5)

These are caused by three foreach loops, where perl wants to use a
local declaration for the iterators.

The indentation was a bit off, as well, perltidy v20230309 has
reported a few diffs.  Not a big deal.

src/common/meson.build includes the following comment:
# For the server build of pgcommon, depend on lwlocknames_h, because at least
# cryptohash_openssl.c, hmac_openssl.c depend on it. That's arguably a
# layering violation, but ...

The thing is that controldata_utils.c has a dependency to wait events
so we should add wait_event_types_h to 'sources'.

The names chosen, as of wait_event_types.h, pgstat_wait_event.c,
waiteventnames.txt and generate-wait_event_types.pl are inconsistent,
comparing them for instance with the lwlock parts.  I have renamed
these a bit, with more underscores.

Building the documentation in a meson/ninja build can be done with the
following command run from the root of the build directory:
ninja alldocs

However this command fails with v10.  The step that fails is:
[6/14] Generating doc/src/sgml/postgres-full.xml with a custom command

It seems to me that the correct thing to do is to add --outdir
@OUTDIR@ to the command?  However, I do see a few problems even after
that:
- The C and H files are still generated in doc/src/sgml/, which is
useless.
- The SGML file wait_event_types.sgml in doc/src/sgml/ seems to be
empty, still to my surprise the HTML part was created correctly.
- The SGML file is not needed for the C code.

I think that we should add some options to the perl script to be more
selective with the files generated.  How about having two options
called --docs and --code to select one or the other, then limit what
gets generated in each path?  I guess that it would be cleaner if we
error in the case where both options are defined, and just use some
gotos to redirect to each foreach loop to limit extra indentations in
the script.  This would avoid the need to remove the C and H files
from the docs, additionally, which is what the Makefile in doc/ does.

I have fixed all the issues I've found in v11 attached, except for the
last one (I have added the OUTDIR trick for reference, but that's
incorrect and incomplete).  Could you look at this part?
--
Michael
From d646b6ba4ec743b5294a51875813d614276fd805 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 3 Jul 2023 15:51:40 +0900
Subject: [PATCH v11] Generate automatically wait-event related code and docs

Purpose is to auto-generate those files based on the newly created
waiteventnames.txt file.

Having auto generated files would help to avoid:

- wait event without description like observed in
- https://www.postgresql.org/message-id/CA%2BhUKGJixAHc860Ej9Qzd_z96Z6aoajAgJ18bYfV3Lfn6t9%3D%2BQ%40mail.gmail.com
- orphaned wait event like observed in
- https://www.postgresql.org/message-id/CA%2BhUKGK6tqm59KuF1z%2Bh5Y8fsWcu5v8%2B84kduSHwRzwjB2aa_A%40mail.gmail.com
---
 src/include/Makefile  |2 +-
 src/include/utils/.gitignore  |1 +
 src/include/utils/meson.build |   18 +
 src/include/utils/wait_event.h|  232 +--
 src/backend/Makefile  |   13 +-
 src/backend/storage/lmgr/lwlocknames.txt  |4 +-
 src/backend/utils/activity/.gitignore |3 +
 src/backend/utils/activity/Makefile   |   10 +
 .../activity/generate-wait_event_types.pl |  219 +++
 src/backend/utils/activity/meson.build|   24 +
 src/backend/utils/activity/wait_event.c   |  611 +---
 .../utils/activity/wait_event_names.txt   |  371 +
 src/backend/utils/adt/lockfuncs.c |3 +-
 src/common/meson.build|7 +-
 src/test/ssl/t/002_scram.pl   |3 +-
 doc/src/sgml/.gitignore   |1 +
 doc/src/sgml/Makefile |5 +-
 doc/src/sgml/filelist.sgml|1 +
 doc/src/sgml/meson.build  |   12 +
 doc/src/sgml/monitoring.sgml  | 1271 +
 src/tools/msvc/Solution.pm|   19 +
 src/tools/msvc/clean.bat  |3 +
 22 files changed, 722 insertions(+)