date:20230215

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread David Rowley

On Wed, 15 Feb 2023 at 17:23, John Naylor  wrote:
> HEAD:
>
> 4 ^ 8: latency average = 113.976 ms
> 5 ^ 8: latency average = 783.830 ms
> 6 ^ 8: latency average = 3990.351 ms
> 7 ^ 8: latency average = 15793.629 ms
>
> Skip rechecking first key:
>
> 4 ^ 8: latency average = 107.028 ms
> 5 ^ 8: latency average = 732.327 ms
> 6 ^ 8: latency average = 3709.882 ms
> 7 ^ 8: latency average = 14570.651 ms

Thanks for testing that. It's good to see improvements in each of them.

> I gather that planner hack was just a demonstration, so I didn't test it, but 
> if that was a move toward something larger I can run additional tests.

Yeah, just a hack. My intention with it was just to prove we had a
problem because sometimes Sort -> Incremental Sort was faster than
Sort.  Ideally, with your change, we'd see that it's always faster to
do the full sort in one go. It would be good to see your patch with
and without the planner hack patch to ensure sort is now always faster
than sort -> incremental sort.

David

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread John Naylor

I wrote:
> it might be worthwhile to "zoom in" with more measurements, but haven't
done that yet.

I've attached the script and image for 1 million / random / varying the mod
by quarter-log intervals. Unfortunately I didn't get as good results as
yesterday. Immediately going from mod 1 to mod 2, sort pushdown regresses
sharply and stays regressed up until 1. The tiebreaker patch helps but
never removes the regression.

I suspect that I fat-fingered work_mem yesterday, so next I'll pick a
badly-performing mod like 32, then range over work_mem and see if that
explains anything, especially whether L3 effects are in fact more important
in this workload.

--
John Naylor
EDB: http://www.enterprisedb.com


bench_windowsort-jcn-random-finegrained.sh
Description: application/shellscript

Fix the description of GUC "max_locks_per_transaction" and "max_pred_locks_per_transaction" in guc_table.c

2023-02-15 Thread wangw.f...@fujitsu.com

Hi,

When I refer to the GUC "max_locks_per_transaction", I find that the description
of the shared lock table size in pg-doc[1] is inconsistent with the code
(guc_table.c). BTW, the GUC "max_predicate_locks_per_xact" has similar problems.

I think the descriptions in pg-doc are correct.
- GUC "max_locks_per_transaction"
Please refer to the macro "NLOCKENTS" used to obtain max_table_size in the
function InitLocks.

- GUC "max_predicate_locks_per_xact"
Please refer to the macro "NPREDICATELOCKTARGETENTS" used to obtain
max_table_size in the function InitPredicateLocks.

Attach the patch to fix the descriptions of these two GUCs in guc_table.c.

[1] - https://www.postgresql.org/docs/devel/runtime-config-locks.html

Regards,
Wang Wei


v1-0001-Fix-the-description-of-GUC-max_locks_per_trans.patch
Description:  v1-0001-Fix-the-description-of-GUC-max_locks_per_trans.patch

Re: Reconcile stats in find_tabstat_entry() and get rid of PgStat_BackendFunctionEntry

2023-02-15 Thread Drouvot, Bertrand


Hi,

On 2/15/23 1:56 AM, Kyotaro Horiguchi wrote:

At Tue, 14 Feb 2023 15:43:26 +0100, "Drouvot, Bertrand" 
 wrote in

The comment assumed that the function directly returns an entry from
shared memory, but now it copies the entry's contents into a palloc'ed
memory and stores the sums of some counters for the current
transaction in it. Do you think we should update the comment to
reflect this change?



Good point, thanks! Yeah, definitively, done in V5 attached.
 
Regards,


--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index f793ac1516..b26e2a5a7a 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -471,20 +471,46 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
  * Find any existing PgStat_TableStatus entry for rel_id in the current
  * database. If not found, try finding from shared tables.
  *
+ * If an entry is found, copy it and increment the copy's counters with their
+ * subtransactions counterparts. Then return the copy. There is no need for the
+ * caller to pfree the copy as the MemoryContext will be reset soon after.
+ *
  * If no entry found, return NULL, don't create a new one
  */
 PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
PgStat_EntryRef *entry_ref;
+   PgStat_TableXactStatus *trans;
+   PgStat_TableStatus *tabentry = NULL;
+   PgStat_TableStatus *tablestatus = NULL;
 
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
MyDatabaseId, rel_id);
if (!entry_ref)
+   {
entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, 
InvalidOid, rel_id);
+   if(!entry_ref)
+   return tablestatus;
+   }
+
+   tabentry = (PgStat_TableStatus *) entry_ref->pending;
+   tablestatus = palloc(sizeof(PgStat_TableStatus));
+   *tablestatus = *tabentry;
+
+   /*
+* Live subtransactions' counts aren't in t_counts yet. This is not a 
hot
+* code path so it sounds ok to reconcile for tuples_inserted,
+* tuples_updated and tuples_deleted even if this is not what the caller
+* is interested in.
+*/
+   for (trans = tabentry->trans; trans != NULL; trans = trans->upper)
+   {
+   tablestatus->t_counts.t_tuples_inserted += 
trans->tuples_inserted;
+   tablestatus->t_counts.t_tuples_updated += trans->tuples_updated;
+   tablestatus->t_counts.t_tuples_deleted += trans->tuples_deleted;
+   }
 
-   if (entry_ref)
-   return entry_ref->pending;
-   return NULL;
+   return tablestatus;
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index 9d707c3521..405d080daa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1548,17 +1548,11 @@ pg_stat_get_xact_tuples_inserted(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_inserted;
-   /* live subtransactions' counts aren't in t_tuples_inserted yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   result += trans->tuples_inserted;
-   }
+   result = (int64) (tabentry->t_counts.t_tuples_inserted);
 
PG_RETURN_INT64(result);
 }
@@ -1569,17 +1563,11 @@ pg_stat_get_xact_tuples_updated(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_updated;
-   /* live subtransactions' counts aren't in t_tuples_updated yet 
*/
-   for (trans = tabentry->trans; trans != NULL; trans = 
trans->upper)
-   result += trans->tuples_updated;
-   }
+   result = (int64) (tabentry->t_counts.t_tuples_updated);
 
PG_RETURN_INT64(result);
 }
@@ -1590,17 +1578,11 @@ pg_stat_get_xact_tuples_deleted(PG_FUNCTION_ARGS)
Oid relid = PG_GETARG_OID(0);
int64   result;
PgStat_TableStatus *tabentry;
-   PgStat_TableXactStatus *trans;
 
if ((tabentry = find_tabstat_entry(relid)) == NULL)
result = 0;
else
-   {
-   result = tabentry->t_counts.t_tuples_deleted;
-   /* live subtransactio

Re: Support logical replication of DDLs

2023-02-15 Thread Alvaro Herrera

On 2023-Feb-15, Peter Smith wrote:

> On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian  wrote:
> >
> > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith  wrote:

> > > 3. ExecuteGrantStmt
> > >
> > > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > > + istmt.grantor_uid = grantor;
> > >
> > > SUGGESTION (comment)
> > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
> >
> > didn't change this, as Alvaro said this was not a parsetree.
> 
> Perhaps there is more to do here? Sorry, I did not understand the
> details of Alvaro's post [1], but I did not recognize the difference
> between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression
> either one or both of these places are either wrongly commented, or
> maybe are doing something that should not be done.

These two cases are different.  In ExecGrantStmt we're adding the
grantor OID to the InternalGrant struct, which is not a parse node, so
there's no strong reason not to modify it (and also the suggested
comment change is factually wrong).  I don't know if the idea is great,
but at least I see no strong objection.

In the other case, as I said in [1], the patch proposes to edit the
parse node to add the grantor, but I think a better idea might be to
change the signature to
ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that
the function can set the provider there; and caller passes
&secondaryObject, which is the method we adopted for this kind of thing.

[1] https://postgr.es/m/20230213090752.27ftbb6byiw3qcbl@alvherre.pgsql

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Use of additional index columns in rows filtering

2023-02-15 Thread Maxim Ivanov

Hi All,

I'd like to report what seems to be a missing optimization opportunity or 
understand why it is not possible to achieve.

TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is 
not used to filter rows in queries like WHERE B = $1 ORDER BY A during 
IndexScan. https://dbfiddle.uk/iehtq44L


Take following database:


  CREATE TABLE t(
a integer NOT NULL,
b integer NOT NULL,
d integer NOT NULL
  );

  CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b);
  -- I'd expect index above to behave as index below for the purpose
  -- of this query
  -- CREATE UNIQUE INDEX ON t(a,b);

  INSERT INTO t(
SELECT random() * 1 as a,
   random() * 3 as b,
   generate_series as d FROM generate_series(1,20)
  ) ON CONFLICT DO NOTHING;


If we filter on `a` and `b` columns while scanning index created as `(a) 
INCLUDE (b)` it seems to be fetching tuples from heap to check for condition `b 
= 4` despite both columns available in the index:

SELECT * FROM  t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10;


Here is the plan (notice high "shared hit"):

  Limit  (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 
rows=0 loops=1)
    Output: a, b, d
    Buffers: shared hit=198307
    ->  Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 
width=12) (actual time=84.280..84.281 rows=0 loops=1)
  Output: a, b, d
      Index Cond: (t.a > 100)
      Filter: (t.b = 4)
      Rows Removed by Filter: 197805
  Buffers: shared hit=198307
  Planning:
    Buffers: shared hit=30
  Planning Time: 0.201 ms
  Execution Time: 84.303 ms


And here is the plan with index on (a,b).

  Limit  (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 
loops=1)
    Output: a, b, d
    Buffers: shared hit=613
    ->  Index Scan using t_a_b_idx on public.t  (cost=0.42..4447.90 rows=1 
width=12) (actual time=6.880..6.881 rows=0 loops=1)
      Output: a, b, d
  Index Cond: ((t.a > 100) AND (t.b = 4))
      Buffers: shared hit=613
  Planning:
    Buffers: shared hit=41
  Planning Time: 0.314 ms
  Execution Time: 6.910 ms


Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd 
expect indexes `(a) INCLUDE (b)` and `(a,b)` behave exactly the same with this 
particular query.

Interestingly, IndexOnlyScan is capable of using additional columns to filter 
rows without fetching them from the heap, but only for visibible tuples:

  VACUUM FREEZE t;
  SELECT a,b FROM  t WHERE a > 100 and b = 4 ORDER BY a ASC LIMIT 10;

  Limit  (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 
loops=1)
    Output: a, b
    Buffers: shared hit=662
    ->  Index Only Scan using t_a_include_b on public.t  (cost=0.42..6619.76 
rows=1 width=8) (actual time=18.477..18.477 rows=0 loops=1)
      Output: a, b
  Index Cond: (t.a > 100)
      Filter: (t.b = 4)
  Rows Removed by Filter: 197771
      Heap Fetches: 0
  Buffers: shared hit=662

Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from 
heap all while returning zero rows in the result.


To make query plan comparable I had to force index scan on both with:

  SET enable_bitmapscan to off;
  SET enable_seqscan to off;
  SET max_parallel_workers_per_gather = 0;

Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L

Regards,
Maxim

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Peter Smith

On Wed, Feb 15, 2023 at 6:10 PM Jim Jones  wrote:
>
> On 15.02.23 02:09, Peter Smith wrote:
> > With v8, in my environment, in psql I see something slightly different:
> >
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR:  invalid XML document
> > LINE 1: SELECT xmlformat('');
> >   ^
> > DETAIL:  line 1: switching encoding : no input
> > line 1: Document is empty
> >
> > ~~
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, ' >  ^
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, ' > ERROR:  relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, ' >  ^
> >
> > ~~
>
> Yes... a cfbot also complained about the same thing.
>
> Setting the VERBOSITY to terse might solve this issue:
>
> postgres=# \set VERBOSITY terse
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
>
> postgres=# \set VERBOSITY default
> postgres=# SELECT xmlformat('');
> ERROR:  invalid XML document
> DETAIL:  line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v9 wraps the corner test cases with VERBOSITY terse to reduce the error
> message output.
>

Bingo!! Your v9 patch now passes all 'make check' tests for me.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.

(I don't understand, maybe someone can explain, how the patch managed
to mess verbosity of the existing tests.)

--
Kind Regards,
Peter Smith.
Fujitsu Austalia.

Re: Perform streaming logical transactions by background workers and parallel apply

2023-02-15 Thread Amit Kapila

On Wed, Feb 15, 2023 at 8:55 AM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, February 15, 2023 10:34 AM Amit Kapila 
>  wrote:
> >
> > > >
> > > > So names like the below seem correct format:
> > > >
> > > > a) WAIT_EVENT_LOGICAL_APPLY_SEND_DATA
> > > > b) WAIT_EVENT_LOGICAL_LEADER_SEND_DATA
> > > > c) WAIT_EVENT_LOGICAL_LEADER_APPLY_SEND_DATA
> > >
> > > Personally I'm fine even without "LEADER" in the wait event name since
> > > we don't have "who is waiting" in it. IIUC a row of pg_stat_activity
> > > shows who, and the wait event name shows "what the process is
> > > waiting". So I prefer (a).
> > >
> >
> > This logic makes sense to me. So, let's go with (a).
>
> OK, here is patch that change the event name to 
> WAIT_EVENT_LOGICAL_APPLY_SEND_DATA.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


On 15.02.23 10:06, Peter Smith wrote:

Bingo!! Your v9 patch now passes all 'make check' tests for me.

Nice! It also passed all tests in the patch tester.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.


I see now other test cases in the xml.sql file that also use this 
option, so it might be a known "issue".


Shall we now set the patch to "Ready for Commiter"?

Thanks a lot for the review!

Best, Jim




smime.p7s
Description: S/MIME Cryptographic Signature

Re: Support logical replication of DDLs

2023-02-15 Thread Amit Kapila

On Wed, Feb 15, 2023 at 2:02 PM Alvaro Herrera  wrote:
>
> On 2023-Feb-15, Peter Smith wrote:
>
> > On Thu, Feb 9, 2023 at 8:55 PM Ajin Cherian  wrote:
> > >
> > > On Fri, Feb 3, 2023 at 11:41 AM Peter Smith  wrote:
>
> > > > 3. ExecuteGrantStmt
> > > >
> > > > + /* Copy the grantor id needed for DDL deparsing of Grant */
> > > > + istmt.grantor_uid = grantor;
> > > >
> > > > SUGGESTION (comment)
> > > > Copy the grantor id to the parsetree, needed for DDL deparsing of Grant
> > >
> > > didn't change this, as Alvaro said this was not a parsetree.
> >
> > Perhaps there is more to do here? Sorry, I did not understand the
> > details of Alvaro's post [1], but I did not recognize the difference
> > between ExecuteGrantStmt and ExecSecLabelStmt so it was my impression
> > either one or both of these places are either wrongly commented, or
> > maybe are doing something that should not be done.
>
> These two cases are different.  In ExecGrantStmt we're adding the
> grantor OID to the InternalGrant struct, which is not a parse node, so
> there's no strong reason not to modify it (and also the suggested
> comment change is factually wrong).  I don't know if the idea is great,
> but at least I see no strong objection.
>
> In the other case, as I said in [1], the patch proposes to edit the
> parse node to add the grantor, but I think a better idea might be to
> change the signature to
> ExecSecLabelStmt(SecLabelStmt *stmt, ObjectAddress *provider) so that
> the function can set the provider there; and caller passes
> &secondaryObject, which is the method we adopted for this kind of thing.
>

+1, that is a better approach to make the required change in ExecSecLabelStmt().

-- 
With Regards,
Amit Kapila.

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-15 Thread Pavel Luzanov


On 15.02.2023 10:11, Michael Paquier wrote:


Which comes down to blame me for both of them.


My only intention was to make postgres better.I'm sorry you took it that 
way.



So, please find attached a patch to close the gap the sample files,
for both things, with descriptions of all the field values they can
use.


A short and precise description. Nothing to add.Next time I will try to 
offer a patch at once.


 
-

Pavel Luzanov

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Alvaro Herrera

On 2023-Feb-13, Peter Smith wrote:

> Something is misbehaving.
> 
> Using the latest HEAD, and before applying the v6 patch, 'make check'
> is working OK.
> 
> But after applying the v6 patch, some XML regression tests are failing
> because the DETAIL part of the message indicating the wrong syntax
> position is not getting displayed. Not only for your new tests -- but
> for other XML tests too.

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message.  I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-15 Thread Michael Paquier

On Wed, Feb 15, 2023 at 01:05:04PM +0300, Pavel Luzanov wrote:
> On 15.02.2023 10:11, Michael Paquier wrote:
>> Which comes down to blame me for both of them.
> 
> My only intention was to make postgres better.I'm sorry you took it that
> way.

You have no need to feel sorry about that.  I really appreciate that
you took the time to report this issue, so don't worry.  My point is
that I have committed this code, so basically it is my responsibility
to take care of its maintenance.

>> So, please find attached a patch to close the gap the sample files,
>> for both things, with descriptions of all the field values they can
>> use.
> 
> A short and precise description. Nothing to add.Next time I will try to
> offer a patch at once.

If you have a proposal of patch, that's always nice to have, but you
should not feel obliged to do so, either.

Thanks a lot for the report, Pavel!
--
Michael

signature.asc
Description: PGP signature

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


On 15.02.23 11:11, Alvaro Herrera wrote:

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message.  I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.


Hi Álvaro,

As my test cases were not specifically about how the error message looks 
like, I thought that suppressing part of the error messages by setting 
VERBOSITY terse would suffice.[1] Is this approach not recommended?


Thanks!

Best, Jim

1 - v9 patch: 
https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com




smime.p7s
Description: S/MIME Cryptographic Signature

Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-15 Thread David Rowley

On Wed, 15 Feb 2023 at 14:10, Andrew Dunstan  wrote:
> We'll email them once this is in. Most people are fairly reponsive.

I pushed the rename patch earlier.

How should we go about making contact with the owners?  I'm thinking
it may be better coming from you, especially if you think technical
details of what exactly should be changed should be included in the
email. But I can certainly have a go if you'd rather I did it or you
don't have time for this.

Thanks

David

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Alvaro Herrera

On 2023-Feb-15, Jim Jones wrote:

> Hi Álvaro,
> 
> As my test cases were not specifically about how the error message looks
> like, I thought that suppressing part of the error messages by setting
> VERBOSITY terse would suffice.[1] Is this approach not recommended?

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests.  I did see that patch and I thought it was taking
the wrong approach.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Re: run pgindent on a regular basis / scripted manner

2023-02-15 Thread Jelte Fennema

> (ITYM "remove as many hurdles as possible").

yes, I messed up rewriting that sentence from "having as few hurdles
as possible" to "removing as many hurdles as possible"

> So far, we have had the following categories suggested: dirty, staged, 
> dirty+staged, untracked. Are there any others?

The two workflows that make most sense to me personally are:
1. staged (indent anything that you're staging for a commit)
2. dirty+staged+untracked (indent anything you've been working on that
is not committed yet)

The obvious way of having --dirty, --staged, and --untracked flags
would require 3 flags for this second (to me seemingly) common
operation. That seems quite unfortunate. So I would propose the
following two flags for those purposes:
1. --staged/--cached (--cached is in line with git, but I personally
think --staged is clearer, git has --staged-only but that seems long
for no reason)
2. --uncommitted

And maybe for completeness we could have the following flags, so you
could target any combination of staged/untracked/dirty files:
3. --untracked (untracked files only)
4. --dirty (tracked files with changes that are not staged)

But I don't know in what workflow people would actually use them.

> Another issue is whether or not to restrict these to files under the current 
> directory. I think we probably should, or at least provide a --relative 
> option.

Good point, I think it makes sense to restrict it to the current
directory by default. You can always cd to the root of the repo if you
want to format everything.

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-15 Thread Hayato Kuroda (Fujitsu)

Dear Andres and other hackers,

> OTOH, if we want to implement the functionality on publisher-side,
> I think we need to first consider the interface.
> We can think of two options (a) Have it as a subscription parameter as the 
> patch
> has now and
> then pass it as an option to the publisher which it will use to delay;
> (b) Have it defined on publisher-side, say via GUC or some other way.
> The basic idea could be that while processing commit record (in DecodeCommit),
> we can somehow check the value of delay and then use it there to delay sending
> the xact.
> Also, during delay, we need to somehow send the keepalive and process replies,
> probably via a new callback or by some existing callback.
> We also need to handle in-progress and 2PC xacts in a similar way.
> For the former, probably we would need to apply the delay before sending the 
> first
> stream.
> Could you please share what you feel on this direction as well ?

I implemented a patch that the delaying is done on the publisher side. In this 
patch,
approach (a) was chosen, in which min_apply_delay is specified as a subscription
parameter, and then apply worker passes it to the publisher as an output plugin 
option.
During the delay, the walsender periodically checks and processes replies from 
the
apply worker and sends keepalive messages if needed. Therefore, the ability to 
handle
keepalives is not loosed.
To delay the transaction in the output plugin layer, the new LogicalOutputPlugin
API was added. For now, I choose the output plugin layer but can consider to do
it from the core if there is a better way.

Could you please share your opinion?

Note: thanks for Osumi-san to help implementing.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



0001-Time-delayed-logical-replication-on-publisher-side.patch
Description:  0001-Time-delayed-logical-replication-on-publisher-side.patch

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread John Naylor

On Wed, Feb 15, 2023 at 3:02 PM David Rowley  wrote:
>
> On Wed, 15 Feb 2023 at 17:23, John Naylor 
wrote:
> > HEAD:
> >
> > 4 ^ 8: latency average = 113.976 ms
> > 5 ^ 8: latency average = 783.830 ms
> > 6 ^ 8: latency average = 3990.351 ms
> > 7 ^ 8: latency average = 15793.629 ms
> >
> > Skip rechecking first key:
> >
> > 4 ^ 8: latency average = 107.028 ms
> > 5 ^ 8: latency average = 732.327 ms
> > 6 ^ 8: latency average = 3709.882 ms
> > 7 ^ 8: latency average = 14570.651 ms

> Yeah, just a hack. My intention with it was just to prove we had a
> problem because sometimes Sort -> Incremental Sort was faster than
> Sort.  Ideally, with your change, we'd see that it's always faster to
> do the full sort in one go. It would be good to see your patch with
> and without the planner hack patch to ensure sort is now always faster
> than sort -> incremental sort.

Okay, here's a rerun including the sort hack, and it looks like incremental
sort is only ahead with the smallest set, otherwise same or maybe slightly
slower:

HEAD:

4 ^ 8: latency average = 113.461 ms
5 ^ 8: latency average = 786.080 ms
6 ^ 8: latency average = 3948.388 ms
7 ^ 8: latency average = 15733.348 ms

tiebreaker:

4 ^ 8: latency average = 106.556 ms
5 ^ 8: latency average = 734.834 ms
6 ^ 8: latency average = 3640.507 ms
7 ^ 8: latency average = 14470.199 ms

tiebreaker + incr sort hack:

4 ^ 8: latency average = 93.998 ms
5 ^ 8: latency average = 740.120 ms
6 ^ 8: latency average = 3715.942 ms
7 ^ 8: latency average = 14749.323 ms

And as far as this:

> I suspect that I fat-fingered work_mem yesterday, so next I'll pick a
badly-performing mod like 32, then range over work_mem and see if that
explains anything, especially whether L3 effects are in fact more important
in this workload.

Attached is a script and image for fixing the input at random / mod32 and
varying work_mem. There is not a whole lot of variation here: pushdown
regresses and the tiebreaker patch only helped marginally. I'm still not
sure why the results from yesterday looked better than today.

--
John Naylor
EDB: http://www.enterprisedb.com


bench_windowsort_workmem-20230215.sh
Description: application/shellscript

Re: Refactoring postgres_fdw/connection.c

2023-02-15 Thread Etsuro Fujita

On Tue, Jan 31, 2023 at 3:44 PM Fujii Masao  wrote:
> On 2023/01/29 19:31, Etsuro Fujita wrote:
> > I agree that if the name of an existing function was bad, we should
> > rename it, but I do not think the name pgfdw_get_cleanup_result is
> > bad; I think it is good in the sense that it well represents what the
> > function waits for.
> >
> > The patch you proposed changes pgfdw_get_cleanup_result to cover the
> > timed_out==NULL case, but I do not think it is a good idea to rename
> > it for such a minor reason.  That function is used in all supported
> > versions, so that would just make back-patching hard.
>
> As far as I understand, the function name pgfdw_get_cleanup_result is
> used because it's used to get the result during abort cleanup as
> the comment tells. OTOH new function is used even not during abort clean,
> e.g., pgfdw_get_result() calls that new function. So I don't think that
> pgfdw_get_cleanup_result is good name in some places.

Yeah, I agree on that point.

> If you want to leave pgfdw_get_cleanup_result for the existing uses,
> we can leave that and redefine it so that it just calls the workhorse
> function pgfdw_get_result_timed.

+1; that's actually what I proposed upthread.  :-)

BTW the name "pgfdw_get_result_timed" is a bit confusing to me,
because the new function works *without* a timeout condition.  We
usually append the suffix "_internal", so how about
"pgfdw_get_result_internal", to avoid that confusion?

> > Yeah, this is intentional; in commit 04e706d42, I coded this to match
> > the behavior in the non-parallel-commit mode, which does not call
> > CHECK_FOR_INTERRUPT.
> >
> >> But could you tell me why?
> >
> > My concern about doing so is that WaitLatchOrSocket is rather
> > expensive, so it might lead to useless overhead in most cases.
>
> pgfdw_get_result() and pgfdw_get_cleanup_result() already call
> WaitLatchOrSocket() and CHECK_FOR_INTERRUPTS(). That is, during
> commit phase, they are currently called when receiving the result
> of COMMIT TRANSACTION command from remote server. Why do we need
> to worry about their overhead only when executing DEALLOCATE ALL?

DEALLOCATE ALL is a light operation and is issued immediately after
executing COMMIT TRANSACTION successfully, so I thought that in most
cases it too would be likely to be executed successfully and quickly;
there would be less need to do so for DEALLOCATE ALL.

> > Anyway, this changes the behavior, so you should show the evidence
> > that this is useful.  I think this would be beyond refactoring,
> > though.
>
> Isn't it useful to react the interrupts (e.g., shutdown requests)
> soon even during waiting for the result of DEALLOCATE ALL?

That might be useful, but another concern about this is error
handling.  The existing code (both in parallel commit and non-parallel
commit) ignores any kinds of errors in libpq as well as interrupts
when doing DEALLOCATE ALL, and then commits the local transaction,
making the remote/local transaction states consistent, but IIUC the
patch aborts the local transaction when doing the command, e.g., if
WaitLatchOrSocket detected some kind of error in socket access, making
the transaction states *inconsistent*, which I don't think would be
great.  So I'm still not sure this would be acceptable.

Best regards,
Etsuro Fujita

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


On 15.02.23 12:11, Alvaro Herrera wrote:

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests.  I did see that patch and I thought it was taking
the wrong approach.


Fair point.

v10 patches the xml.out to xml_2.out - manually removing the additional 
lines.


Thanks for the review!

Best, Jim
From 835c9ec18255adce9f9ae1e1e5d9e4287bac5452 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v10] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 +
 src/backend/utils/adt/xml.c |  45 
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 108 
 src/test/regress/expected/xml_1.out |  57 +++
 src/test/regress/expected/xml_2.out | 105 +++
 src/test/regress/sql/xml.sql|  32 +
 7 files changed, 384 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo(&buf);
+	appendStringInfoString(&buf, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..8bc8919092 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,111 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | 
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat(' Belgian Waffles $15.95
+ Two of our famous Belgian Waffles with plenty of real maple syrup
+650

Re: Minimal logical decoding on standbys

2023-02-15 Thread Ashutosh Sharma

On Thu, Jan 12, 2023 at 11:46 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-01-12 20:08:55 +0530, Ashutosh Sharma wrote:
> > I previously participated in the discussion on "Synchronizing the
> > logical replication slots from Primary to Standby" and one of the
> > purposes of that project was to synchronize logical slots from primary
> > to standby so that if failover occurs, it will not affect the logical
> > subscribers of the old primary much. Can someone help me understand
> > how we are going to solve this problem with this patch? Are we going
> > to encourage users to do LR from standby instead of primary to get rid
> > of such problems during failover?
>
> It only provides a building block towards that. The "Synchronizing the logical
> replication slots from Primary to Standby" project IMO needs all of the
> infrastructure in this patch. With the patch, a logical rep solution can
> e.g. maintain one slot on the primary and one on the standby, and occasionally
> forward the slot on the standby to the position of the slot on the primary. In
> case of a failover it can just start consuming changes from the former
> standby, all the necessary changes are guaranteed to be present.
>
>
> > Also, one small observation:
> >
> > I just played around with the latest (v38) patch a bit and found that
> > when a new logical subscriber of standby is created, it actually
> > creates two logical replication slots for it on the standby server.
> > May I know the reason for creating an extra replication slot other
> > than the one created by create subscription command? See below:
>
> That's unrelated to this patch. There's no changes to the "higher level"
> logical replication code dealing with pubs and subs, it's all on the "logical
> decoding" level.
>
> I think this because logical rep wants to be able to concurrently perform
> ongoing replication, and synchronize tables added to the replication set. The
> pg_16399_sync_16392_7187728548042694423 slot should vanish after the initial
> synchronization.
>

Thanks Andres. I have one more query (both for you and Bertrand). I
don't know if this has already been answered somewhere in this mail
thread, if yes, please let me know the mail that answers this query.

Will there be a problem if we mandate the use of physical replication
slots and hot_standby_feedback to support minimum LD on standby. I
know people can do a physical replication setup without a replication
slot or even with hot_standby_feedback turned off, but are we going to
have any issue if we ask them to use a physical replication slot and
turn on hot_standby_feedback for LD on standby. This will reduce the
code changes required to do conflict handling for logical slots on
standby which is being done by v50-0001 and v50-0002* patches
currently.

IMHO even in normal scenarios i.e. when we are not doing LD on
standby, we should mandate the use of a physical replication slot.

--
With Regards,
Ashutosh Sharma.

Re: Can we do something to help stop users mistakenly using force_parallel_mode?

2023-02-15 Thread Andrew Dunstan



On 2023-02-15 We 06:05, David Rowley wrote:

On Wed, 15 Feb 2023 at 14:10, Andrew Dunstan  wrote:

We'll email them once this is in. Most people are fairly reponsive.

I pushed the rename patch earlier.

How should we go about making contact with the owners?  I'm thinking
it may be better coming from you, especially if you think technical
details of what exactly should be changed should be included in the
email. But I can certainly have a go if you'd rather I did it or you
don't have time for this.



Leave it with me.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: pg_statistic MCVs use float4 but extended stats use float8

2023-02-15 Thread Tomas Vondra

Hi,

On 2/15/23 02:20, Justin Pryzby wrote:
> It seems odd that stats_ext uses double:
> 
> postgres=# SELECT attrelid::regclass, attname, atttypid::regtype, relkind 
> FROM pg_attribute a JOIN pg_class c ON c.oid=a.attrelid WHERE 
> attname='most_common_freqs';
>   attrelid  |  attname  |  atttypid  | relkind 
> +---++-
>  pg_stats   | most_common_freqs | real[] | v
>  pg_stats_ext   | most_common_freqs | double precision[] | v
>  pg_stats_ext_exprs | most_common_freqs | real[] | v
> 
> I'm not sure if that's deliberate ?
> 

Not really, I'm not sure why I chose float8 and not float4. Likely a
cause of muscle memory on 64-bit systems.

I wonder if there are practical reasons to change this, i.e. if the
float8 can have adverse effects on some systems. Yes, it makes the stats
a little bit larger, but I doubt the difference is significant enough to
make a difference. Perhaps on 32-bit systems it's worse, because float8
is going to be pass-by-ref there ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use of additional index columns in rows filtering

2023-02-15 Thread Tomas Vondra




On 2/15/23 09:57, Maxim Ivanov wrote:
> Hi All,
> 
> I'd like to report what seems to be a missing optimization 
> opportunity or understand why it is not possible to achieve.
> 
> TLDR; additional index column B specified in CREATE INDEX .. (A) 
> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L
> 
> ...
> 
> Here is the plan (notice high "shared hit"):
> 
>   Limit  (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 
> rows=0 loops=1)
>     Output: a, b, d
>     Buffers: shared hit=198307
>     ->  Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 
> rows=1 width=12) (actual time=84.280..84.281 rows=0 loops=1)
>   Output: a, b, d
>       Index Cond: (t.a > 100)
>       Filter: (t.b = 4)
>       Rows Removed by Filter: 197805
>   Buffers: shared hit=198307
>   Planning:
>     Buffers: shared hit=30
>   Planning Time: 0.201 ms
>   Execution Time: 84.303 ms
> 

Yeah. The reason for this behavior is pretty simple:

1) When matching clauses to indexes in match_clause_to_index(), we only
   look at key columns (nkeycolumns). We'd need to check all columns
   (ncolumns) and remember if the clause matched a key or included one.

2) index_getnext_slot would need to get "candidate" TIDs using
   conditions on keys, and then check the clauses on included
   columns.

Seems doable, unless I'm missing some fatal issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


Accidentally left the VERBOSE settings out -- sorry!

Now it matches the approach used in a xpath test in xml.sql, xml.out, 
xml_1.out and xml_2.out


-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '');
ERROR:  could not parse XML document
\set VERBOSITY default

v11 now correctly sets xml_2.out.

Best, Jim
From 473aab0a0028cd4de515c6a3698a1cda1c987067 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 2 Feb 2023 21:27:16 +0100
Subject: [PATCH v11] Add pretty-printed XML output option

This small patch introduces a XML pretty print function.
It basically takes advantage of the indentation feature
of xmlDocDumpFormatMemory from libxml2 to format XML strings.
---
 doc/src/sgml/func.sgml  |  34 ++
 src/backend/utils/adt/xml.c |  45 +
 src/include/catalog/pg_proc.dat |   3 +
 src/test/regress/expected/xml.out   | 101 
 src/test/regress/expected/xml_1.out |  53 +++
 src/test/regress/expected/xml_2.out | 101 
 src/test/regress/sql/xml.sql|  33 +
 7 files changed, 370 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e09e289a43..a621192425 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14861,6 +14861,40 @@ SELECT xmltable.*
 ]]>
 

+
+ 
+xmlformat
+
+ 
+ xmlformat
+ 
+
+
+xmlformat ( xml ) xml
+
+
+ 
+ Converts the given XML value to pretty-printed, indented text.
+ 
+
+ 
+ Example:
+ 
+ 
+   
+
   
 
   
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 079bcb1208..ec12707b5c 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -473,6 +473,51 @@ xmlBuffer_to_xmltype(xmlBufferPtr buf)
 }
 #endif
 
+Datum
+xmlformat(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+
+	xmlDocPtr  doc;
+	xmlChar*xmlbuf = NULL;
+	text   *arg = PG_GETARG_TEXT_PP(0);
+	StringInfoData buf;
+	int nbytes;
+
+	doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
+
+	if(!doc)
+		elog(ERROR, "could not parse the given XML document");
+
+	/**
+	 * xmlDocDumpFormatMemory (
+	 *   xmlDocPtr doc, # the XML document
+	 *   xmlChar **xmlbuf,  # the memory pointer
+	 *   int  *nbytes,  # the memory length
+	 *   int   format   # 1 = node indenting
+	 *)
+	 */
+
+	xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
+
+	xmlFreeDoc(doc);
+
+	if(!nbytes)
+		elog(ERROR, "could not indent the given XML document");
+
+	initStringInfo(&buf);
+	appendStringInfoString(&buf, (const char *)xmlbuf);
+
+	xmlFree(xmlbuf);
+
+	PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
+
+#else
+	NO_XML_SUPPORT();
+return 0;
+#endif
+}
+
 
 Datum
 xmlcomment(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c0f2a8a77c..54e8a6262a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8842,6 +8842,9 @@
 { oid => '3053', descr => 'determine if a string is well formed XML content',
   proname => 'xml_is_well_formed_content', prorettype => 'bool',
   proargtypes => 'text', prosrc => 'xml_is_well_formed_content' },
+{ oid => '4642', descr => 'Indented text from xml',
+  proname => 'xmlformat', prorettype => 'xml',
+  proargtypes => 'xml', prosrc => 'xmlformat' },
 
 # json
 { oid => '321', descr => 'I/O',
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 3c357a9c7e..e45116aaa7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -1599,3 +1599,104 @@ SELECT * FROM XMLTABLE('.' PASSING XMLELEMENT(NAME a) columns a varchar(20) PATH
   | 
 (1 row)
 
+-- XML format: single line XML string
+SELECT xmlformat('Belgian Waffles$5.95Two of our famous Belgian Waffles with plenty of real maple syrup650');
+xmlformat 
+--
+ +
+ +
+ Belgian Waffles+
+ $5.95+
+ Two of our famous Belgian Waffles with plenty of real maple syrup+
+ 650+
+  +
++
+ 
+(1 row)
+
+-- XML format: XML string with space, tabs and newline between nodes
+SELECT xmlformat(' Belgian Waffles $15.95
+ Tw

RE: Add LZ4 compression in pg_dump

2023-02-15 Thread shiy.f...@fujitsu.com

On Fri, Jan 27, 2023 2:04 AM gkokola...@pm.me  wrote:
> 
> --- Original Message ---
> On Thursday, January 26th, 2023 at 12:53 PM, Michael Paquier
>  wrote:
> 
> 
> >
> >
> > On Thu, Jan 26, 2023 at 11:24:47AM +, gkokola...@pm.me wrote:
> >
> > > I gave this a little bit of thought. I think that ReadHead should not
> > > emit a warning, or at least not this warning as it is slightly misleading.
> > > It implies that it will automatically turn off data restoration, which is
> > > false. Further ahead, the code will fail with a conflicting error message
> > > stating that the compression is not available.
> > >
> > > Instead, it would be cleaner both for the user and the maintainer to
> > > move the check in RestoreArchive and make it the sole responsible for
> > > this logic.
> >
> >
> > - pg_fatal("cannot restore from compressed archive (compression not
> supported in this installation)");
> > + pg_fatal("cannot restore data from compressed archive (compression not
> supported in this installation)");
> > Hmm. I don't mind changing this part as you suggest.
> >
> > -#ifndef HAVE_LIBZ
> > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> >
> > - pg_fatal("archive is compressed, but this installation does not support
> compression");
> > -#endif
> > However I think that we'd better keep the warning, as it can offer a
> > hint when using pg_restore -l not built with compression support if
> > looking at a dump that has been compressed.
> 
> Fair enough. Please find v27 attached.
> 

Hi,

I am interested in this feature and tried the patch. While reading the comments,
I noticed some minor things that could possibly be improved (in v27-0003 patch).

1.
+   /*
+* Open a file for writing.
+*
+* 'mode' can be one of ''w', 'wb', 'a', and 'ab'. Requrires an already
+* initialized CompressFileHandle.
+*/
+   int (*open_write_func) (const char *path, const 
char *mode,
+   
CompressFileHandle *CFH);

There is a redundant single quote in front of 'w'.

2.
/*
 * Callback function for WriteDataToArchive. Writes one block of (compressed)
 * data to the archive.
 */
/*
 * Callback function for ReadDataFromArchive. To keep things simple, we
 * always read one compressed block at a time.
 */

Should the function names in the comments be updated?

WriteDataToArchive
->
writeData

ReadDataFromArchive
->
readData

3.
+   Assert(strcmp(mode, "r") == 0 || strcmp(mode, "rb") == 0);

Could we use PG_BINARY_R instead of "r" and "rb" here?

Regards,
Shi Yu

Re: Minimal logical decoding on standbys

2023-02-15 Thread Drouvot, Bertrand


Hi,

On 2/15/23 1:32 PM, Ashutosh Sharma wrote:

Thanks Andres. I have one more query (both for you and Bertrand). I
don't know if this has already been answered somewhere in this mail
thread, if yes, please let me know the mail that answers this query.

Will there be a problem if we mandate the use of physical replication
slots and hot_standby_feedback to support minimum LD on standby. 


I don't think we have to make it mandatory. There is use cases
where it's not needed and mentioned by Andres up-thread [1] (see the comment
"The patch deals with this...")


I know people can do a physical replication setup without a replication
slot or even with hot_standby_feedback turned off, but are we going to
have any issue if we ask them to use a physical replication slot and
turn on hot_standby_feedback for LD on standby. This will reduce the
code changes required to do conflict handling for logical slots on
standby which is being done by v50-0001 and v50-0002* patches
currently.



But on the other hand we'd need to ensure that the primary physical slot
and HSF set to on on the standby are always in place. That would probably lead
to extra code too.

I'm -1 on that but +1 on the fact that it should be documented (as done in 
0006).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: 
https://www.postgresql.org/message-id/20211028210755.afmwcvpo6ajwdx6n%40alap3.anarazel.de

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-15 Thread Jelte Fennema

On Wed, 15 Feb 2023 at 08:11, Michael Paquier  wrote:
> Hmm, I am not sure that adding more examples in the sample files is
> worth the duplication with the docs.

I think you misunderstood what I meant (because I admittedly didn't
write it down clearly). I meant the docs for pg_ident don't include
any examples (only descriptions of the new patterns). Attached is a
patch that addresses that.

> So, please find attached a patch to close the gap the sample files,
> for both things, with descriptions of all the field values they can
> use.

LGTM

0001-Add-example-of-new-pg_ident-username-paterns-to-docs.patch
Description: Binary data

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Tom Lane

Alvaro Herrera  writes:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message.  I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
> See commit 085423e3e326 for a bit of background.

Yeah.  That's kind of sad, because it means there are still broken
libxml2s out there in 2023.  Nonetheless, since there are, it is not
optional to fix all three expected-files.

regards, tom lane

Silent overflow of interval type

2023-02-15 Thread Nikolai

Hello hackers,

I've been testing various edge-cases of timestamptz and related types
and noticed that despite being a 16-byte wide type, interval overflows
for some timestamptz (8-byte) subtractions (timestamp_mi).
A simple example of this would be:

select timestamptz'294276-12-31 23:59:59 UTC' - timestamptz'1582-10-15
00:00:00 UTC';

Yielding:
interval'-106599615 days -08:01:50.551616'

This makes sense from the implementation point of view, since both
timestamptz and Interval->TimeOffset are int64.

The patch attached simply throws an error when an overflow is
detected. However I'm not sure this is a reasonable approach for a
code path that could be very hot in some workloads. Another
consideration is that regardless of the values of the timestamps, the
absolute value of the difference can be stored in a uint64. However
that observation has little practical value.

That being said I'm willing to work on a fix that makes sense and
making it commit ready (or step aside if someone else wants to take
over) but I'd also understand if this is marked as "not worth fixing".

Regards,
Nick
From ba326276f79cf847fd1127d0edaf7d7f99a18c8e Mon Sep 17 00:00:00 2001
From: Nick Babadzhanian 
Date: Wed, 15 Feb 2023 08:42:00 +0100
Subject: [PATCH] Fix silent interval overflow

---
 src/backend/utils/adt/timestamp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 47e059a409..88fb4160cb 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -2713,7 +2713,11 @@ timestamp_mi(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
  errmsg("cannot subtract infinite timestamps")));
 
-	result->time = dt1 - dt2;
+	/* Subtract dt1 and dt2 with overflow detection */
+	if (unlikely(pg_sub_s64_overflow(dt1, dt2, &result->time)))
+		ereport(ERROR,
+(errcode(ERRCODE_INTERVAL_FIELD_OVERFLOW),
+ errmsg("interval field out of range")));
 
 	result->month = 0;
 	result->day = 0;
-- 
2.39.1

Re: Use of additional index columns in rows filtering

2023-02-15 Thread Tom Lane

Tomas Vondra  writes:
> On 2/15/23 09:57, Maxim Ivanov wrote:
>> TLDR; additional index column B specified in CREATE INDEX .. (A) 
>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L

> Seems doable, unless I'm missing some fatal issue.

Partly this is lack of round tuits, but there's another significant
issue: there very likely are index entries corresponding to dead heap
rows.  Applying random user-defined quals to values found in such rows
could produce semantic anomalies; for example, divide-by-zero failures
even though you deleted all the rows having a zero in that column.

This isn't a problem for operators found in operator families, because
we trust those to not have undesirable side effects like raising
data-dependent errors.  But it'd be an issue if we started to apply
quals that aren't index quals directly to index rows before doing
the heap liveness check.  (And, of course, once you've fetched the
heap row there's no point in having a special path for columns
available from the index.)

regards, tom lane

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Justin Pryzby

On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote:
> Hi, hackers.
> 
> I found that CREATE DATABASE occurs lost of DDL result after crash server.
> The direct cause is that the checkpoint skips sync for page of template1's 
> main fork
> because buffer status is not marked as BM_PERMANENT in BufferAlloc().

I had some trouble reproducing this when running the commands by hand.

But it reproduces fine like this:

$ ./tmp_install/usr/local/pgsql/bin/postgres -D 
./testrun/regress/regress/tmp_check/data& sleep 2; psql -h /tmp postgres -c 
"DROP DATABASE IF EXISTS j" -c "CREATE DATABASE j STRATEGY wal_log" && psql -h 
/tmp template1 -c "CREATE TABLE t(i int)" -c "INSERT INTO t SELECT 
generate_series(1,9)" -c CHECKPOINT; kill -9 %1; wait; 
./tmp_install/usr/local/pgsql/bin/postgres -D 
./testrun/regress/regress/tmp_check/data& sleep 9; psql -h /tmp template1 -c 
"table t"; kill %1
[1] 29069
2023-02-15 10:10:27.584 CST postmaster[29069] LOG:  starting PostgreSQL 16devel 
on x86_64-linux, compiled by gcc-9.4.0, 64-bit
2023-02-15 10:10:27.584 CST postmaster[29069] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2023-02-15 10:10:27.663 CST postmaster[29069] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2023-02-15 10:10:27.728 CST startup[29074] LOG:  database system was shut down 
at 2023-02-15 10:10:13 CST
2023-02-15 10:10:27.780 CST postmaster[29069] LOG:  database system is ready to 
accept connections
NOTICE:  database "j" does not exist, skipping
DROP DATABASE
CREATE DATABASE
CREATE TABLE
INSERT 0 9
2023-02-15 10:10:30.160 CST checkpointer[29072] LOG:  checkpoint starting: 
immediate force wait
2023-02-15 10:10:30.740 CST checkpointer[29072] LOG:  checkpoint complete: 
wrote 943 buffers (5.8%); 0 WAL file(s) added, 0 removed, 0 recycled; 
write=0.070 s, sync=0.369 s, total=0.581 s; sync files=268, longest=0.274 s, 
average=0.002 s; distance=4322 kB, estimate=4322 kB; lsn=0/BA9E8A0, redo 
lsn=0/BA9E868
CHECKPOINT
[1]+  Killed  ./tmp_install/usr/local/pgsql/bin/postgres -D 
./testrun/regress/regress/tmp_check/data
[1] 29088
2023-02-15 10:10:31.664 CST postmaster[29088] LOG:  starting PostgreSQL 16devel 
on x86_64-linux, compiled by gcc-9.4.0, 64-bit
2023-02-15 10:10:31.665 CST postmaster[29088] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2023-02-15 10:10:31.724 CST postmaster[29088] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2023-02-15 10:10:31.780 CST startup[29094] LOG:  database system was 
interrupted; last known up at 2023-02-15 10:10:30 CST
2023-02-15 10:10:33.888 CST startup[29094] LOG:  database system was not 
properly shut down; automatic recovery in progress
2023-02-15 10:10:33.934 CST startup[29094] LOG:  redo starts at 0/BA9E868
2023-02-15 10:10:33.934 CST startup[29094] LOG:  invalid record length at 
0/BA9E918: wanted 24, got 0
2023-02-15 10:10:33.934 CST startup[29094] LOG:  redo done at 0/BA9E8A0 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-02-15 10:10:34.073 CST checkpointer[29092] LOG:  checkpoint starting: 
end-of-recovery immediate wait
2023-02-15 10:10:34.275 CST checkpointer[29092] LOG:  checkpoint complete: 
wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.035 
s, sync=0.026 s, total=0.257 s; sync files=2, longest=0.019 s, average=0.013 s; 
distance=0 kB, estimate=0 kB; lsn=0/BA9E918, redo lsn=0/BA9E918
2023-02-15 10:10:34.321 CST postmaster[29088] LOG:  database system is ready to 
accept connections
2023-02-15 10:10:39.893 CST client backend[29110] psql ERROR:  relation "t" 
does not exist at character 7
2023-02-15 10:10:39.893 CST client backend[29110] psql STATEMENT:  table t
ERROR:  relation "t" does not exist

Re: Use of additional index columns in rows filtering

2023-02-15 Thread Tomas Vondra




On 2/15/23 16:18, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 2/15/23 09:57, Maxim Ivanov wrote:
>>> TLDR; additional index column B specified in CREATE INDEX .. (A) 
>>> INCLUDE(B) is not used to filter rows in queries like WHERE B = $1
>>> ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L
> 
>> Seems doable, unless I'm missing some fatal issue.
> 
> Partly this is lack of round tuits, but there's another significant
> issue: there very likely are index entries corresponding to dead heap
> rows.  Applying random user-defined quals to values found in such rows
> could produce semantic anomalies; for example, divide-by-zero failures
> even though you deleted all the rows having a zero in that column.
> 
> This isn't a problem for operators found in operator families, because
> we trust those to not have undesirable side effects like raising
> data-dependent errors.  But it'd be an issue if we started to apply
> quals that aren't index quals directly to index rows before doing
> the heap liveness check.  (And, of course, once you've fetched the
> heap row there's no point in having a special path for columns
> available from the index.)

Sure, but we can do the same VM check as index-only scan, right?

That would save some of the I/O to fetch the heap tuple, as long as the
page is all-visible and the filter eliminates the tuples. It makes the
costing a bit trickier, because it needs to consider both how many pages
are all-visible and selectivity of the condition.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use of additional index columns in rows filtering

2023-02-15 Thread Maxim Ivanov

> This isn't a problem for operators found in operator families, because
> we trust those to not have undesirable side effects like raising
> data-dependent errors. But it'd be an issue if we started to apply
> quals that aren't index quals directly to index rows before doing
> the heap liveness check. (And, of course, once you've fetched the
> heap row there's no point in having a special path for columns
> available from the index.)

Assuming operators are pure and don't have global side effects, is it possible 
to ignore any error during that check? If tuple is not visible it shouldn't 
matter, if it is visible then error will be reported by the same routine which 
does filtering now (ExecQual?).


If not, then limiting this optimization to builtin ops is something I can live 
with :)

Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Nathan Bossart

On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
>> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
>> > Not really related: I do wonder how often we end up self deadlocking in
>> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it soon
>> > after, due to postmasters SIGKILL.  Perhaps we should turn on
>> > send_abort_for_kill on CI?
>> 
>> +1, this seems like it'd be useful in general.  I'm guessing this will
>> require a bit of work on the CI side to generate the backtrace.
> 
> They're already generated for all current platforms.  It's possible that debug
> info for some system libraries is missing, but the most important one (like
> libc) should be available.
> 
> Since yesterday the cfbot website allows to easily find the coredumps, too:
> http://cfbot.cputube.org/highlights/core-7.html

Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

> There's definitely some work to be done to make the contents of the backtraces
> more useful though.  E.g. printing out the program name, the current directory
> (although that doesn't seem to be doable for all programs). For everything but
> windows that's in src/tools/ci/cores_backtrace.sh.

Got it, thanks for the info.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: some namespace.c refactoring

2023-02-15 Thread Alvaro Herrera

On 2023-Feb-15, Tom Lane wrote:

> Peter Eisentraut  writes:
> > Here are two patches that refactor the mostly repetitive "${object} is 
> > visible" and get_${object}_oid() functions in namespace.c.  This uses 
> > the functions in objectaddress.c to look up the appropriate per-catalog 
> > system caches and attribute numbers, similar to other refactoring 
> > patches I have posted recently.
> 
> This does not look like a simple refactoring patch to me.  I have
> very serious concerns first about whether it even preserves the
> existing semantics, and second about whether there is a performance
> penalty.

I suppose there are two possible questionable angles from a performance
POV:

1. the new code uses get_object_property_data() when previously there
was a straight dereference after casting to the right struct type.  So
how expensive is that?  I think the answer to that is not great, because
it does a linear array scan on ObjectProperty.  Maybe we need a better
answer.

2. other accesses to the data are done using SysCacheGetAttr instead of
straight struct access dereferences.  I expect that most of the fields
being accessed have attcacheoff set, which allows pretty fast access to
the field in question, so it's not *that* bad.  (For cases where
attcacheoff is not set, then the original code would also have to deform
the tuple.)  Still, it's going to have nontrivial impact in any
microbenchmarking.

That said, I think most of this code is invoked for DDL, where
performance is not so critical; probably just fixing
get_object_property_data to not be so naïve would suffice.

Queries are another matter.  I can't think of a way to determine which
of these paths are used for queries, so that we can optimize more (IOW:
just plain not rewrite those); manually going through the callers seems
a bit laborious, but perhaps doable.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
Thou shalt check the array bounds of all strings (indeed, all arrays), for
surely where thou typest "foo" someone someday shall type
"supercalifragilisticexpialidocious" (5th Commandment for C programmers)

Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Andres Freund

Hi,

On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
> On Tue, Feb 14, 2023 at 04:20:59PM -0800, Andres Freund wrote:
> > On 2023-02-14 14:23:32 -0800, Nathan Bossart wrote:
> >> On Tue, Feb 14, 2023 at 12:47:12PM -0800, Andres Freund wrote:
> >> > Not really related: I do wonder how often we end up self deadlocking in
> >> > quickdie(), due to the ereport() not beeing reentrant. We'll "fix" it 
> >> > soon
> >> > after, due to postmasters SIGKILL.  Perhaps we should turn on
> >> > send_abort_for_kill on CI?
> >> 
> >> +1, this seems like it'd be useful in general.  I'm guessing this will
> >> require a bit of work on the CI side to generate the backtrace.
> > 
> > They're already generated for all current platforms.  It's possible that 
> > debug
> > info for some system libraries is missing, but the most important one (like
> > libc) should be available.
> > 
> > Since yesterday the cfbot website allows to easily find the coredumps, too:
> > http://cfbot.cputube.org/highlights/core-7.html
> 
> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?

I think it'd be too noisy. Right now you get just a core dump of the crashed
process, but if we set send_abort_for_crash we'd end up with a lot of core
dumps, making it harder to know what to look at.

We should never need the send_abort_for_kill path, so I don't think the noise
issue applies to the same degree.

Greetings,

Andres

Re: Improve logging when using Huge Pages

2023-02-15 Thread Nathan Bossart

On Tue, Feb 14, 2023 at 07:32:56PM -0600, Justin Pryzby wrote:
> On Mon, Feb 13, 2023 at 08:18:52PM -0800, Nathan Bossart wrote:
>> On Mon, Feb 13, 2023 at 05:22:45PM -0600, Justin Pryzby wrote:
>> nitpick: Does this need to be initialized here?
> 
> None of the GUCs' C vars need to be initialized, since the guc machinery
> will do it. 
> 
> ...but the convention is that they *are* initialized - and that's now
> partially enforced.
> 
> See:
> d9d873bac67047cfacc9f5ef96ee488f2cb0f1c3
> 7d25958453a60337bcb7bcc986e270792c007ea4
> a73952b795632b2cf5acada8476e7cf75857e9be

I see.  This looked a little strange to me because many of the other
variables are uninitialized.  In a73952b, I see that we allow the variables
for string GUCs to be initialized to NULL.  Anyway, this is only a nitpick.
I don't feel strongly about it.

>> I'm curious why you chose to make this a string instead of an enum.  There
>> might be little practical difference, but since there are only three
>> possible values, I wonder if it'd be better form to make it an enum.
> 
> It takes more code to write as an enum - see 002.txt.  I'm not convinced
> this is better.
> 
> But your comment made me fix its , and reconsider the strings,
> which I changed to active={unknown/true/false} rather than {unk/on/off}.
> It could also be active={unknown/yes/no}...

I think unknown/true/false is fine.  I'm okay with using a string if no one
else thinks it should be an enum (or a bool).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: We shouldn't signal process groups with SIGQUIT

2023-02-15 Thread Nathan Bossart

On Wed, Feb 15, 2023 at 10:12:58AM -0800, Andres Freund wrote:
> On 2023-02-15 09:57:41 -0800, Nathan Bossart wrote:
>> Oh, that's nifty.  Any reason not to enable send_abort_for_crash, too?
> 
> I think it'd be too noisy. Right now you get just a core dump of the crashed
> process, but if we set send_abort_for_crash we'd end up with a lot of core
> dumps, making it harder to know what to look at.
> 
> We should never need the send_abort_for_kill path, so I don't think the noise
> issue applies to the same degree.

Makes sense.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Minimal logical decoding on standbys

2023-02-15 Thread Andres Freund

Hi,

On 2023-02-15 18:02:11 +0530, Ashutosh Sharma wrote:
> Thanks Andres. I have one more query (both for you and Bertrand). I
> don't know if this has already been answered somewhere in this mail
> thread, if yes, please let me know the mail that answers this query.
> 
> Will there be a problem if we mandate the use of physical replication
> slots and hot_standby_feedback to support minimum LD on standby. I
> know people can do a physical replication setup without a replication
> slot or even with hot_standby_feedback turned off, but are we going to
> have any issue if we ask them to use a physical replication slot and
> turn on hot_standby_feedback for LD on standby. This will reduce the
> code changes required to do conflict handling for logical slots on
> standby which is being done by v50-0001 and v50-0002* patches
> currently.

I don't think it would. E.g. while restoring from archives we can't rely on
knowing that the slot still exists on the primary.

We can't just do corrupt things, including potentially crashing, when the
configuration is wrong. We can't ensure that the configuration is accurate all
the time. So we need to detect this case. Hence needing to detect conflicts.

> IMHO even in normal scenarios i.e. when we are not doing LD on
> standby, we should mandate the use of a physical replication slot.

I don't think that's going to fly. There plenty scenarios where you e.g. don't
want to use a slot, e.g. when you want to limit space use on the primary.

Greetings,

Andres Freund

Re: recovery modules

2023-02-15 Thread Nathan Bossart

On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote:
> On Mon, Feb 13, 2023 at 05:02:37PM -0800, Nathan Bossart wrote:
>> Sorry for then noise, cfbot alerted me to a missing #include, which I've
>> added in v13.

Sorry for more noise.  I noticed that I missed updating the IDENTIFICATION
line for shell_archive.c.  That's the only change in v14.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 2f32fdd5cfd89600025000c1ddfee30b6f9103b4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 9 Feb 2023 13:49:46 -0800
Subject: [PATCH v14 1/1] restructure archive modules API

---
 contrib/basic_archive/basic_archive.c | 85 ---
 doc/src/sgml/archive-modules.sgml | 35 ++--
 src/backend/Makefile  |  2 +-
 src/backend/archive/Makefile  | 18 
 src/backend/archive/meson.build   |  5 ++
 .../{postmaster => archive}/shell_archive.c   | 32 ---
 src/backend/meson.build   |  1 +
 src/backend/postmaster/Makefile   |  1 -
 src/backend/postmaster/meson.build|  1 -
 src/backend/postmaster/pgarch.c   | 27 +++---
 src/backend/utils/misc/guc_tables.c   |  1 +
 src/include/archive/archive_module.h  | 58 +
 src/include/archive/shell_archive.h   | 24 ++
 src/include/postmaster/pgarch.h   | 39 -
 14 files changed, 242 insertions(+), 87 deletions(-)
 create mode 100644 src/backend/archive/Makefile
 create mode 100644 src/backend/archive/meson.build
 rename src/backend/{postmaster => archive}/shell_archive.c (77%)
 create mode 100644 src/include/archive/archive_module.h
 create mode 100644 src/include/archive/shell_archive.h

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 36b7a4814a..d7c227a10b 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -30,9 +30,9 @@
 #include 
 #include 
 
+#include "archive/archive_module.h"
 #include "common/int.h"
 #include "miscadmin.h"
-#include "postmaster/pgarch.h"
 #include "storage/copydir.h"
 #include "storage/fd.h"
 #include "utils/guc.h"
@@ -40,14 +40,27 @@
 
 PG_MODULE_MAGIC;
 
+typedef struct BasicArchiveData
+{
+	MemoryContext context;
+} BasicArchiveData;
+
 static char *archive_directory = NULL;
-static MemoryContext basic_archive_context;
 
-static bool basic_archive_configured(void);
-static bool basic_archive_file(const char *file, const char *path);
+static void basic_archive_startup(ArchiveModuleState *state);
+static bool basic_archive_configured(ArchiveModuleState *state);
+static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
 static bool compare_files(const char *file1, const char *file2);
+static void basic_archive_shutdown(ArchiveModuleState *state);
+
+static const ArchiveModuleCallbacks basic_archive_callbacks = {
+	.startup_cb = basic_archive_startup,
+	.check_configured_cb = basic_archive_configured,
+	.archive_file_cb = basic_archive_file,
+	.shutdown_cb = basic_archive_shutdown
+};
 
 /*
  * _PG_init
@@ -67,10 +80,6 @@ _PG_init(void)
 			   check_archive_directory, NULL, NULL);
 
 	MarkGUCPrefixReserved("basic_archive");
-
-	basic_archive_context = AllocSetContextCreate(TopMemoryContext,
-  "basic_archive",
-  ALLOCSET_DEFAULT_SIZES);
 }
 
 /*
@@ -78,11 +87,28 @@ _PG_init(void)
  *
  * Returns the module's archiving callbacks.
  */
+const ArchiveModuleCallbacks *
+_PG_archive_module_init(void)
+{
+	return &basic_archive_callbacks;
+}
+
+/*
+ * basic_archive_startup
+ *
+ * Creates the module's memory context.
+ */
 void
-_PG_archive_module_init(ArchiveModuleCallbacks *cb)
+basic_archive_startup(ArchiveModuleState *state)
 {
-	cb->check_configured_cb = basic_archive_configured;
-	cb->archive_file_cb = basic_archive_file;
+	BasicArchiveData *data;
+
+	data = (BasicArchiveData *) MemoryContextAllocZero(TopMemoryContext,
+	   sizeof(BasicArchiveData));
+	data->context = AllocSetContextCreate(TopMemoryContext,
+		  "basic_archive",
+		  ALLOCSET_DEFAULT_SIZES);
+	state->private_data = (void *) data;
 }
 
 /*
@@ -133,7 +159,7 @@ check_archive_directory(char **newval, void **extra, GucSource source)
  * Checks that archive_directory is not blank.
  */
 static bool
-basic_archive_configured(void)
+basic_archive_configured(ArchiveModuleState *state)
 {
 	return archive_directory != NULL && archive_directory[0] != '\0';
 }
@@ -144,10 +170,12 @@ basic_archive_configured(void)
  * Archives one file.
  */
 static bool
-basic_archive_file(const char *file, const char *path)
+basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
 {
 	sigjmp_buf	local_sigjmp_buf;
 	Mem

Re: run pgindent on a regular basis / scripted manner

2023-02-15 Thread Justin Pryzby

On Mon, Feb 13, 2023 at 07:57:25AM -0500, Andrew Dunstan wrote:
> 
> On 2023-02-12 Su 15:59, Justin Pryzby wrote:
> > It seems like if pgindent knows about git, it ought to process only
> > tracked files.  Then, it wouldn't need to manually exclude generated
> > files, and it wouldn't process vpath builds and who-knows-what else it
> > finds in CWD.
> 
> I don't really want restrict this to tracked files because it would mean you
> can't pgindent files before you `git add` them.

I think you'd allow indenting files which were either tracked *or*
specified on the command line.

Also, it makes a more sense to "add" the file before indenting it, to
allow checking the output and remove unrelated changes.  So that doesn't
seem to me like a restriction of any significance.

But I would never want to indent an untracked file unless I specified
it.

-- 
Justin

Re: Move defaults toward ICU in 16?

2023-02-15 Thread Jeff Davis

On Tue, 2023-02-14 at 16:27 -0500, Jonathan S. Katz wrote:
> Would it be any different than a vulnerability in OpenSSL et al?

In principle, no, but sometimes the details matter. I'm just trying to
add data to the discussion.

> It seems that 
> in general, users would see performance gains switching to ICU.

That's great news, and consistent with my experience. I don't think it
should be a driving factor though. If there's a choice is between
platform-independent semantics (ICU) and performance, platform-
independence should be the default.

> I agree with most of your points in [1]. The platform-consistent 
> behavior is a good point, especially with more PG deployments running
> on 
> different systems.

Now I think semantics are the most important driver, being consistent
across platforms and based on some kind of trusted independent
organization that we can point to.

It feels very wrong to me to explain that sort order is defined by the
operating system on which Postgres happens to run. Saying that it's
defined by ICU, which is part of the Unicode consotium, is much better.
It doesn't eliminate versioning issues, of course, but I think it's a
better explanation for users.

Many users have other systems in their data infrastructure, running on
a variety of platforms, and could (in theory) try to synchronize around
a common ICU version to avoid subtle bugs in their data pipeline.

> Based on the available data, I think it's OK to move towards ICU as
> the 
> default, or preferred, collation provider. I agree (for now) in not 
> taking a hard dependency on ICU.

I count several favorable responses, so I'll take it that we (as a
community) are intending to change the default for build and initdb in
v16.

Robert expressed some skepticism[1], though I don't see an objection.
If I read his concerns correctly, he's mainly concerned with quality
issues like documentaiton, bugs, etc. I understand those concerns (I'm
the one that raised them), but they seem like the kind of issues that
one finds any time they dig into a dependency enough. "Setting our
sights very high"[1], to me, would just be ICU with a bit more rigorous
attention to quality issues.

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYmeGJaW%3DPy9tAZtrnCP%2B_Q%2BzRQthv%3Dzn_HyA_nqEDM-A%40mail.gmail.com

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

Re: doc: add missing "id" attributes to extension packaging page

2023-02-15 Thread Karl O. Pinc

On Tue, 14 Feb 2023 12:13:18 -0800
Andres Freund  wrote:

> A small note: As-is this fails on CI, because we don't allow network
> access during the docs build anymore (since it always fails these
> days):
> 
> https://cirrus-ci.com/task/5474029402849280?logs=docs_build#L297
> 
> [17:02:03.114] time make -s -j${BUILD_JOBS} -C doc
> [17:02:04.092] I/O error : Attempt to load network entity
> http://cdn.docbook.org/release/xsl/current/html/sections.xsl
> [17:02:04.092] warning: failed to load external entity
> "http://cdn.docbook.org/release/xsl/current/html/sections.xsl";
> [17:02:04.092] compilation error: file stylesheet-html-common.xsl
> line 17 element import [17:02:04.092] xsl:import : unable to load
> http://cdn.docbook.org/release/xsl/current/html/sections.xsl

This makes me think that it would be useful to add --nonet to the
xsltproc invocations.  That would catch this error before it goes to
CI.

> I think this is just due to the common URL in docbook packages being
> http://docbook.sourceforge.net/release/xsl/current/*
> Because of that the docbook catalog matching logic won't work for
> that file:
> 
> E.g. I have the following in /etc/xml/docbook-xsl.xml, on debian
> unstable:  uriStartString="http://docbook.sourceforge.net/release/xsl/";
> catalog="file:///usr/share/xml/docbook/stylesheet/docbook-xsl/catalog.xml"/>
> 
> As all our other references use the sourceforge address, this should
> too.

Agreed.

I'm also noticing that the existing xsl:import-s all import entire
docbook stylesheets.  It does not hurt to do this; the output is
unaffected, although I can't say what it means for build performance.
It does keep it simple.  Only one import is needed no matter which
templates we use the import mechanism to extend.  And by importing
"everything" there's no concern about any (unlikely) changes to
the the "internals" of the catalog.

Should we import only what we need or all of docbook?  I don't know.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: Silent overflow of interval type

2023-02-15 Thread Andrey Borodin

On Wed, Feb 15, 2023 at 7:08 AM Nikolai  wrote:
>
> The patch attached simply throws an error when an overflow is
> detected. However I'm not sure this is a reasonable approach for a
> code path that could be very hot in some workloads.

Given the extraordinary amount of overflow checks in the nearby code
of timestamp.c, I'd say that this case should not be an exception.
By chance did you look at all other nearby cases, is it the only place
with overflow? (I took a look too, but haven't found anything
suspicious)

Best regards, Andrey Borodin.

Re: doc: add missing "id" attributes to extension packaging page

2023-02-15 Thread Andres Freund

Hi,

On 2023-02-15 13:34:37 -0600, Karl O. Pinc wrote:
> This makes me think that it would be useful to add --nonet to the
> xsltproc invocations.  That would catch this error before it goes to
> CI.

We are doing that now :)

commit 969509c3f2e3b4c32dcf264f9d642b5ef01319f3
Author: Tom Lane 
Date:   2023-02-08 17:15:23 -0500
 
Stop recommending auto-download of DTD files, and indeed disable it.



> I'm also noticing that the existing xsl:import-s all import entire
> docbook stylesheets.  It does not hurt to do this; the output is
> unaffected, although I can't say what it means for build performance.
> It does keep it simple.  Only one import is needed no matter which
> templates we use the import mechanism to extend.  And by importing
> "everything" there's no concern about any (unlikely) changes to
> the the "internals" of the catalog.
> 
> Should we import only what we need or all of docbook?  I don't know.

It couldn't hurt to check if performance improves when you avoid doing so. I
suspect it won't make much of a difference, because the time is actually spent
evaluating xslt rather than parsing it.

Greetings,

Andres Freund

Re: run pgindent on a regular basis / scripted manner

2023-02-15 Thread Jelte Fennema

> Also, it makes a more sense to "add" the file before indenting it, to
> allow checking the output and remove unrelated changes.  So that doesn't
> seem to me like a restriction of any significance.

For my workflow it would be the same, but afaik there's two ways that
people commonly use git (mine is 1):
1. Adding changes/files to the staging area using and then committing
those changes:
git add (-p)/emacs magit/some other editor integration
2. Just add everything that's changed and commit all of it:
git add -A + git commit/git commit -a

For workflow 1, a --staged/--cached flag would be enough IMHO. But
that's not at all helpful for workflow 2. That's why I proposed
--uncommitted too, to make indenting easier for workflow 2.

> But I would never want to indent an untracked file unless I specified
> it.

Would the --uncommitted flag I proposed be enough of an explicit way
of specifying that you want to indent untracked files?

[PATCH] FIx alloc_var() ndigits thinko

2023-02-15 Thread Joel Jacobson

Hi,

I came across another harmless thinko in numeric.c.

It is harmless since 20/DEC_DIGITS and 40/DEC_DIGITS happens to be exactly 5 
and 10 since DEC_DIGITS == 4,
but should be fixed anyway for correctness IMO.

-   alloc_var(var, 20 / DEC_DIGITS);
+   alloc_var(var, (20 + DEC_DIGITS - 1) / DEC_DIGITS);

-   alloc_var(var, 40 / DEC_DIGITS);
+   alloc_var(var, (40 + DEC_DIGITS - 1) / DEC_DIGITS);

/Joel


0001-fix-alloc-var-ndigits-thinko.patch
Description: Binary data

Re: psql: Add role's membership options to the \du+ command

2023-02-15 Thread David Zhang


On 2023-02-10 2:27 p.m., David G. Johnston wrote:

On Fri, Feb 10, 2023 at 2:08 PM David Zhang  wrote:


I noticed the document psql-ref.sgml has been updated for both
`du+` and
`dg+`, but only `du` and `\du+` are covered in regression test. Is
that
because `dg+` is treated exactly the same as `du+` from testing
point of
view?


Yes.


The reason I am asking this question is that I notice that
`pg_monitor`
also has the detailed information, so not sure if more test cases
required.


Of course it does.  Why does that bother you?  And what does that have 
to do with the previous paragraph?


There is a default built-in role `pg_monitor` and the behavior changed 
after the patch. If `\dg+` and `\du+` is treated as the same, and `make 
check` all pass, then I assume there is no test case to verify the 
output of `duS+`. My point is should we consider add a test case?


Before patch the output for `pg_monitor`,

postgres=# \duS+
List of roles
  Role name  | Attributes | 
Member of   | Description

-++--+-
 alice |    | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} |
 pg_checkpoint   | Cannot 
login   | 
{}   |
 pg_database_owner   | Cannot 
login   | 
{}   |
 pg_execute_server_program   | Cannot 
login   | 
{}   |
 pg_maintain | Cannot 
login   | 
{}   |
 pg_monitor  | Cannot 
login   | 
{pg_read_all_settings,pg_read_all_stats,pg_stat_scan_tables} |
 pg_read_all_data    | Cannot 
login   | 
{}   |
 pg_read_all_settings    | Cannot 
login   | 
{}   |
 pg_read_all_stats   | Cannot 
login   | 
{}   |
 pg_read_server_files    | Cannot 
login   | 
{}   |
 pg_signal_backend   | Cannot 
login   | 
{}   |
 pg_stat_scan_tables | Cannot 
login   | 
{}   |
 pg_use_reserved_connections | Cannot 
login   | 
{}   |
 pg_write_all_data   | Cannot 
login   | 
{}   |
 pg_write_server_files   | Cannot 
login   | 
{}   |
 ubuntu  | Superuser, Create role, Create DB, 
Replication, Bypass RLS | 
{}   |


After patch the output for `pg_monitor`,

postgres=# \duS+
List of roles
  Role name  | Attributes 
|   Member of   | Description

-++---+-
 alice |    | 
pg_read_all_settings WITH ADMIN, INHERIT, SET+|
|    | 
pg_read_all_stats WITH INHERIT   +|
|    | 
pg_stat_scan_tables   |
 pg_checkpoint   | Cannot login 
|   |
 pg_database_owner   | Cannot login 
|   |
 pg_execute_server_program   | Cannot login 
|   |
 pg_maintain | Cannot login 
|   |
 pg_monitor  | Cannot 
login   | 
pg_read_all_settings WITH INHERIT, SET   +|
|

Re: psql: Add role's membership options to the \du+ command

2023-02-15 Thread David G. Johnston

On Wed, Feb 15, 2023 at 2:31 PM David Zhang  wrote:

> There is a default built-in role `pg_monitor` and the behavior changed
> after the patch. If `\dg+` and `\du+` is treated as the same, and `make
> check` all pass, then I assume there is no test case to verify the output
> of `duS+`. My point is should we consider add a test case?
>

I mean, either you accept the change in how this meta-command presents its
information or you don't.  I don't see how a test case is particularly
beneficial.  Or, at least the pg_monitor role is not special in this
regard.  Alice changed too and you don't seem to be including it in your
complaint.

David J.

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-02-15 Thread Michael Paquier

On Wed, Feb 15, 2023 at 03:40:26PM +0100, Jelte Fennema wrote:
> On Wed, 15 Feb 2023 at 08:11, Michael Paquier  wrote:
>> Hmm, I am not sure that adding more examples in the sample files is
>> worth the duplication with the docs.
> 
> I think you misunderstood what I meant (because I admittedly didn't
> write it down clearly). I meant the docs for pg_ident don't include
> any examples (only descriptions of the new patterns). Attached is a
> patch that addresses that.

Shouldn't the paragraph above the example file of pg_ident.conf be
updated to reflect the changes you have added?  An idea would be
cleaner to split that into two sections.  For example, we could keep
the current example with bryanh, ann and bob as it is (splitting it
into its own ), and add a second example with all the new
patterns?

>> So, please find attached a patch to close the gap the sample files,
>> for both things, with descriptions of all the field values they can
>> use.
> 
> LGTM

Thanks for the review, applied this part.
--
Michael

signature.asc
Description: PGP signature

Re: psql: Add role's membership options to the \du+ command

2023-02-15 Thread David Zhang


On 2023-02-15 1:37 p.m., David G. Johnston wrote:


On Wed, Feb 15, 2023 at 2:31 PM David Zhang  wrote:

There is a default built-in role `pg_monitor` and the behavior
changed after the patch. If `\dg+` and `\du+` is treated as the
same, and `make check` all pass, then I assume there is no test
case to verify the output of `duS+`. My point is should we
consider add a test case?

I mean, either you accept the change in how this meta-command presents 
its information or you don't.  I don't see how a test case is 
particularly beneficial.  Or, at least the pg_monitor role is not 
special in this regard.  Alice changed too and you don't seem to be 
including it in your complaint.

Good improvement, +1.

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Peter Smith

On Thu, Feb 16, 2023 at 12:49 AM Jim Jones  wrote:
>
> Accidentally left the VERBOSE settings out -- sorry!
>
> Now it matches the approach used in a xpath test in xml.sql, xml.out,
> xml_1.out and xml_2.out
>
> -- Since different libxml versions emit slightly different
> -- error messages, we suppress the DETAIL in this test.
> \set VERBOSITY terse
> SELECT xpath('/*', '');
> ERROR:  could not parse XML document
> \set VERBOSITY default
>
> v11 now correctly sets xml_2.out.
>
> Best, Jim


Firstly, Sorry it seems like I made a mistake and was premature
calling bingo above for v9.
- today I repeated v9 'make check' and found it failing still.
- the new xmlformat tests are OK, but some pre-existing xmlparse tests
are broken.
- see attached file pretty-v9-results



OTOH, the absence of xml_2.out from this patch appears to be the
correct explanation for why my results have been differing.



Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
 -- XML format: empty string
 SELECT xmlformat('');
 ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

--

The attached patch update (v12-0002) fixes the xml_2.out for me.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v12-0001-Add-pretty-printed-XML-output-option.patch
Description: Binary data


v12-0002-PS-fix-EOF-for-xml_2.out.patch
Description: Binary data


pretty-v9-results
Description: Binary data


pretty-v11-results
Description: Binary data

Re: Perform streaming logical transactions by background workers and parallel apply

2023-02-15 Thread Peter Smith

LGTM. My only comment is about the commit message.

==
Commit message

d9d7fe6 reuse existing wait event when sending data in apply worker. But we
should have invent a new wait state if we are waiting at a new place, so fix
this.

~

SUGGESTION
d9d7fe6 made use of an existing wait event when sending data from the apply
worker, but we should have invented a new wait state since the code was
waiting at a new place.

This patch corrects the mistake by using a new wait state
"LogicalApplySendData".

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Nathan Bossart

On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote:
> The above is occured by the following call.
> The argument 'permanent' of ReadBufferWithoutRelcache() is passed to
> BufferAlloc() as 'relpersistence'.
> 
> [src/backend/commands/]
>  298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno,
>  299 RBM_NORMAL, bstrategy, false);

Indeed, setting that to true (as per the attached patch) seems to fix this.
I don't see any reason this _shouldn't_ be true from what I've read so far.
We're reading pg_class, which will probably never be UNLOGGED.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-		RBM_NORMAL, bstrategy, false);
+		RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);

Re: Silent overflow of interval type

2023-02-15 Thread Tom Lane

Andrey Borodin  writes:
> On Wed, Feb 15, 2023 at 7:08 AM Nikolai  wrote:
>> The patch attached simply throws an error when an overflow is
>> detected. However I'm not sure this is a reasonable approach for a
>> code path that could be very hot in some workloads.

> Given the extraordinary amount of overflow checks in the nearby code
> of timestamp.c, I'd say that this case should not be an exception.

Yeah, I don't think this would create a performance problem, at least not
if you're using a compiler that implements pg_sub_s64_overflow reasonably.
(And if you're not, and this bugs you, the answer is to get a better
compiler.)

> By chance did you look at all other nearby cases, is it the only place
> with overflow?

That was my immediate reaction as well.

regards, tom lane

Re: Normalization of utility queries in pg_stat_statements

2023-02-15 Thread Michael Paquier

On Wed, Feb 08, 2023 at 12:05:24PM +0900, Michael Paquier wrote:
> Thoughts and comments are welcome.  0001 and 0002 are useful on their
> own to keep track of utilities that use Const and A_Const after going
> through the query jumbling, even if an approach based on query string
> or the automated query jumbling for utilities is used (the query
> string approach a bit its value).  I'll add that to the next commit
> fest.

While wondering about this stuff about the last few days and
discussing with bertrand, I have changed my mind on the point that
there is no need to be that aggressive yet with the normalization of
the A_Const nodes, because the query string normalization of
pg_stat_statements is not prepared yet to handle cases where a A_Const
value uses a non-quoted value with whitespaces.  The two cases where I
saw an impact is on the commands that can define an isolation level:
SET TRANSACTION and BEGIN.

For example, applying normalization to A_Const nodes does the
following as of HEAD:
1) BEGIN TRANSACTION READ ONLY, READ WRITE, DEFERRABLE, NOT DEFERRABLE;
BEGIN TRANSACTION $1 ONLY, $2 WRITE, $3, $4 DEFERRABLE
2) SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL $1 COMMITTED

On top of that, specifying a different isolation level may cause these
commands to be grouped, which is not really cool.  All that could be
done incrementally later on, in 17~ or later depending on the
adjustments that make sense.

Attached is an updated patch set.  0003 is basically the same as v3,
that I have kept around for clarity in case one wants to see the
effect of a A_Const normalization to all the related commands, though
I am not proposing that for an upstream integration.  0002 has been
completed with a couple of commands to track all the commands with
A_Const, so as we never lose sight of what happens.  0004 is what I
think could be done for PG16, where normalization affects only Const.
At the end of the day, this reflects the following commands that use
Const nodes because they use directly queries, so the same rules as
SELECT and DMLs apply to them:
- DECLARE
- EXPLAIN
- CREATE MATERIALIZED VIEW
- CTAS, SELECT INTO

Comments and thoughts welcome.
Thanks,
--
Michael
From eb61fe10d0d7399573ebb3a911aed4114742cf25 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 7 Feb 2023 14:33:20 +0900
Subject: [PATCH v2 1/4] Refactor regression tests of pg_stat_statements

pg_stat_statements.sql acts as the main file for all the core tests of
the module, but things have become a bit hairy over the years as some of
the sub-scenarios tested rely on assumptions that may have been set in a
completely different block, like a GUC setup or a different relation.

This commit refactors the tests of pg_stat_statements a bit, by moving a
few test cases out of pg_stat_statements.sql into their own file, as of:
- Planning-related tests in planning.sql.
- Utilities in utility.sql.

Test scenarios and their results remain the same as the originals.
---
 contrib/pg_stat_statements/Makefile   |   2 +-
 .../pg_stat_statements/expected/cleanup.out   |   1 +
 .../expected/pg_stat_statements.out   | 284 ++
 .../pg_stat_statements/expected/planning.out  | 195 
 .../pg_stat_statements/expected/utility.out   |  72 +
 contrib/pg_stat_statements/meson.build|   3 +
 contrib/pg_stat_statements/sql/cleanup.sql|   1 +
 .../sql/pg_stat_statements.sql| 118 +---
 contrib/pg_stat_statements/sql/planning.sql   |  78 +
 contrib/pg_stat_statements/sql/utility.sql|  34 +++
 10 files changed, 417 insertions(+), 371 deletions(-)
 create mode 100644 contrib/pg_stat_statements/expected/cleanup.out
 create mode 100644 contrib/pg_stat_statements/expected/planning.out
 create mode 100644 contrib/pg_stat_statements/expected/utility.out
 create mode 100644 contrib/pg_stat_statements/sql/cleanup.sql
 create mode 100644 contrib/pg_stat_statements/sql/planning.sql
 create mode 100644 contrib/pg_stat_statements/sql/utility.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index edc40c8bbf..78dc4c1d07 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -17,7 +17,7 @@ PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements"
 LDFLAGS_SL += $(filter -lm, $(LIBS))
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_stat_statements/pg_stat_statements.conf
-REGRESS = pg_stat_statements oldextversions
+REGRESS = pg_stat_statements utility planning cleanup oldextversions
 # Disabled because these tests require "shared_preload_libraries=pg_stat_statements",
 # which typical installcheck users do not have (e.g. buildfarm clients).
 NO_INSTALLCHECK = 1
diff --git a/contrib/pg_stat_statements/expected/cleanup.out b/contrib/pg_stat_statements/expected/cleanup.out
new file mode 100644
index 00..36bec35c40
--- /dev/null
+++ b/contrib/pg_stat_statements/expec

Make set_ps_display faster and easier to use

2023-02-15 Thread David Rowley

While doing some benchmarking of some fast-to-execute queries, I see
that set_ps_display() popping up on the profiles.  Looking a little
deeper, there are some inefficiencies in there that we could fix.

For example, the following is pretty poor:

strlcpy(ps_buffer + ps_buffer_fixed_size, activity,
ps_buffer_size - ps_buffer_fixed_size);
ps_buffer_cur_len = strlen(ps_buffer);

We already know the strlen of the fixed-sized part, so why bother
doing strlen on the entire thing?  Also, if we did just do
strlen(activity), we could just memcpy, which would be much faster
than strlcpy's byte-at-a-time method of copying.

Adjusting that lead me to notice that we often just pass string
constants to set_ps_display(), so we already know the strlen for this
at compile time. So maybe we can just have set_ps_display_with_len()
and then make a static inline wrapper that does strlen() so that when
the compiler can figure out the length, it just hard codes it.

After doing that, I went over all usages of set_ps_display() to see if
any of those call sites knew the length already in a way that the
compiler wouldn't be able to deduce. There were a few cases to adjust
when setting the process title to contain the command tag.

After fixing up the set_ps_display()s to use set_ps_display_with_len()
where possible, I discovered some not so nice code which appends "
waiting" onto the process title. Basically, there's a bunch of code
that looks like this:

const char *old_status;
int len;

old_status = get_ps_display(&len);
new_status = (char *) palloc(len + 8 + 1);
memcpy(new_status, old_status, len);
strcpy(new_status + len, " waiting");
set_ps_display(new_status);
new_status[len] = '\0'; /* truncate off " waiting" */

Seeing that made me wonder if we shouldn't just have something more
generic for setting a suffix on the process title.  I came up with
set_ps_display_suffix() and set_ps_display_remove_suffix().  The above
code can just become:

set_ps_display_suffix("waiting");

then to remove the "waiting" suffix, just:

set_ps_display_remove_suffix();

I considered adding a format version to append the suffix as there's
one case that could make use of it, but in the end, decided it might
be overkill, so I left that code like:

char buffer[32];

sprintf(buffer, "waiting for %X/%X", LSN_FORMAT_ARGS(lsn));
set_ps_display_suffix(buffer);

I don't think that's terrible enough to warrant making a va_args
version of set_ps_display_suffix(), especially for just 1 instance of
it.

I also resisted making set_ps_display_suffix_with_len(). The new code
should be quite a bit
faster already without troubling over that additional function.

I've attached the patch.

David
diff --git a/src/backend/replication/syncrep.c 
b/src/backend/replication/syncrep.c
index 80d681b71c..889e20b5dd 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -148,8 +148,6 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
 void
 SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
 {
-   char   *new_status = NULL;
-   const char *old_status;
int mode;
 
/*
@@ -216,15 +214,10 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
/* Alter ps display to show waiting for sync rep. */
if (update_process_title)
{
-   int len;
-
-   old_status = get_ps_display(&len);
-   new_status = (char *) palloc(len + 32 + 1);
-   memcpy(new_status, old_status, len);
-   sprintf(new_status + len, " waiting for %X/%X",
-   LSN_FORMAT_ARGS(lsn));
-   set_ps_display(new_status);
-   new_status[len] = '\0'; /* truncate off " waiting ..." */
+   charbuffer[32];
+
+   sprintf(buffer, "waiting for %X/%X", LSN_FORMAT_ARGS(lsn));
+   set_ps_display_suffix(buffer);
}
 
/*
@@ -322,12 +315,9 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
MyProc->syncRepState = SYNC_REP_NOT_WAITING;
MyProc->waitLSN = 0;
 
-   if (new_status)
-   {
-   /* Reset ps display */
-   set_ps_display(new_status);
-   pfree(new_status);
-   }
+   /* reset ps display to remove the suffix */
+   if (update_process_title)
+   set_ps_display_remove_suffix();
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 2d6dbc6561..98904a7c05 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4302,8 +4302,8 @@ void
 LockBufferForCleanup(Buffer buffer)
 {
BufferDesc *bufHdr;
-   char   *new_status = NULL;
TimestampTz waitStart = 0;
+   boolwaiting = false;
boollogged_recovery_conflict = false;
 
Assert(BufferIsPinned(buffer));
@@ -4350,11 +4350,11 @@ LockBufferForCleanup(Buffer buffer)

Bug in processing conditions on multi-column BRIN indexes

2023-02-15 Thread Tomas Vondra

Hi,

While working on the BRIN SK_SEARCHARRAY patch I noticed a silly bug in
handling clauses on multi-column BRIN indexes, introduced in PG13.

Consider a simple table with two columns (a,b) and a multi-columns BRIN
index on them:

create table t (a int, b int);

insert into t
select
mod(i,1) + 100 * random(),
mod(i,1) + 100 * random()
from generate_series(1,100) s(i);

create index on t using brin(a int4_minmax_ops, b int4_minmax_ops)
  with (pages_per_range=1);

Let's run a query with condition on "a":

select * from t where a = 500;
   QUERY PLAN
-
 Bitmap Heap Scan on t (actual rows=97 loops=1)
   Recheck Cond: (a = 500)
   Rows Removed by Index Recheck: 53189
   Heap Blocks: lossy=236
   ->  Bitmap Index Scan on t_a_b_idx (actual rows=2360 loops=1)
 Index Cond: (a = 500)
 Planning Time: 0.075 ms
 Execution Time: 8.263 ms
(8 rows)

Now let's add another condition on b:

select * from t where a = 500 and b < 800;

   QUERY PLAN
-
 Bitmap Heap Scan on t (actual rows=97 loops=1)
   Recheck Cond: ((a = 500) AND (b < 800))
   Rows Removed by Index Recheck: 101101
   Heap Blocks: lossy=448
   ->  Bitmap Index Scan on t_a_b_idx (actual rows=4480 loops=1)
 Index Cond: ((a = 500) AND (b < 800))
 Planning Time: 0.085 ms
 Execution Time: 14.989 ms
(8 rows)

Well, that's wrong. With one condition we accessed 236 pages, and with
additional condition - which should reduce the number of heap pages - we
accessed 448 pages.

The problem is in bringetbitmap(), which failed to combine the results
from consistent function correctly (and also does not abort early).

Here's a patch for that, I'll push it shortly after a bit more testing.


regard

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom bbe40c94e2293849d977da4720ef76d13160347a Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 15 Feb 2023 17:32:53 +0100
Subject: [PATCH 01/10] Fix handling of multi-column BRIN indexes

When evaluating clauses on multiple keys of multi-column BRIN indexes,
the results should be combined using AND, and we can stop evaluating
once we find a mismatching scan key.

The existing code was simply scanning a range as long as the last batch
of scan keys returned true.
---
 src/backend/access/brin/brin.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index de1427a1e0..85ae795949 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -660,7 +660,7 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 PointerGetDatum(bval),
 PointerGetDatum(keys[attno - 1]),
 Int32GetDatum(nkeys[attno - 1]));
-		addrange = DatumGetBool(add);
+		addrange &= DatumGetBool(add);
 	}
 	else
 	{
@@ -681,11 +681,18 @@ bringetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
 	PointerGetDatum(bdesc),
 	PointerGetDatum(bval),
 	PointerGetDatum(keys[attno - 1][keyno]));
-			addrange = DatumGetBool(add);
+			addrange &= DatumGetBool(add);
 			if (!addrange)
 break;
 		}
 	}
+
+	/*
+	 * We found a clause that eliminates this range. No point
+	 * in evaluating more clauses.
+	 */
+	if (!addrange)
+		break;
 }
 			}
 		}
-- 
2.39.1

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-02-15 Thread David Rowley

On Thu, 16 Feb 2023 at 00:45, John Naylor  wrote:
> Okay, here's a rerun including the sort hack, and it looks like incremental 
> sort is only ahead with the smallest set, otherwise same or maybe slightly 
> slower:
>
> HEAD:
>
> 4 ^ 8: latency average = 113.461 ms
> 5 ^ 8: latency average = 786.080 ms
> 6 ^ 8: latency average = 3948.388 ms
> 7 ^ 8: latency average = 15733.348 ms
>
> tiebreaker:
>
> 4 ^ 8: latency average = 106.556 ms
> 5 ^ 8: latency average = 734.834 ms
> 6 ^ 8: latency average = 3640.507 ms
> 7 ^ 8: latency average = 14470.199 ms
>
> tiebreaker + incr sort hack:
>
> 4 ^ 8: latency average = 93.998 ms
> 5 ^ 8: latency average = 740.120 ms
> 6 ^ 8: latency average = 3715.942 ms
> 7 ^ 8: latency average = 14749.323 ms

Sad news :(  the sort hacks are still quite a bit faster for 4 ^ 8.

I was fooling around with the attached (very quickly and crudely put
together) patch just there. The idea is to sort during
tuplesort_puttuple_common() when the memory consumption goes over some
approximation of L3 cache size in the hope we still have cache lines
for the tuples in question still.   The code is not ideal there as we
track availMem rather than the used mem, so maybe I need to do that
better as we could cross some boundary without actually having done
very much, plus we USEMEM for other reasons too.

I found that the patch didn't really help:

create table t (a int not null, b int not null);
insert into t select (x*random())::int % 100,(x*random())::int % 100
from generate_Series(1,1000)x;
vacuum freeze t;
select pg_prewarm('t');
show work_mem;
 work_mem
--
 4GB

explain (analyze, timing off) select * from t order by a,b;

master:
Execution Time: 5620.704 ms
Execution Time: 5506.705 ms

patched:
Execution Time: 6801.421 ms
Execution Time: 6762.130 ms

I suspect it's slower because the final sort must sort the entire
array still without knowledge that portions of it are pre-sorted.  It
would be very interesting to improve this and do some additional work
and keep track of the "memtupsortedto" index by pushing them onto a
List each time we cross the availMem boundary, then do then qsort just
the final portion of the array in tuplesort_performsort() before doing
a k-way merge on each segment rather than qsorting the entire thing
again. I suspect this would be faster when work_mem exceeds L3 by some
large amount.

David
diff --git a/src/backend/utils/sort/tuplesort.c 
b/src/backend/utils/sort/tuplesort.c
index 9ca9835aab..65ffe442cd 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -216,6 +216,7 @@ struct Tuplesortstate
SortTuple  *memtuples;  /* array of SortTuple structs */
int memtupcount;/* number of tuples currently 
present */
int memtupsize; /* allocated length of 
memtuples array */
+   int memtupsortedto; /* index of where we've already 
sorted up to */
boolgrowmemtuples;  /* memtuples' growth still underway? */
 
/*
@@ -465,7 +466,7 @@ static bool mergereadnext(Tuplesortstate *state, 
LogicalTape *srcTape, SortTuple
 static void dumptuples(Tuplesortstate *state, bool alltuples);
 static void make_bounded_heap(Tuplesortstate *state);
 static void sort_bounded_heap(Tuplesortstate *state);
-static void tuplesort_sort_memtuples(Tuplesortstate *state);
+static void tuplesort_sort_memtuples(Tuplesortstate *state, int start_index);
 static void tuplesort_heap_insert(Tuplesortstate *state, SortTuple *tuple);
 static void tuplesort_heap_replace_top(Tuplesortstate *state, SortTuple 
*tuple);
 static void tuplesort_heap_delete_top(Tuplesortstate *state);
@@ -708,6 +709,7 @@ tuplesort_begin_common(int workMem, SortCoordinate 
coordinate, int sortopt)
 */
state->memtupsize = INITIAL_MEMTUPSIZE;
state->memtuples = NULL;
+   state->memtupsortedto = 0;
 
/*
 * After all of the other non-parallel-related state, we setup all of 
the
@@ -793,6 +795,7 @@ tuplesort_begin_batch(Tuplesortstate *state)
state->tapeset = NULL;
 
state->memtupcount = 0;
+   state->memtupsortedto = 0;
 
/*
 * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD;
@@ -1193,10 +1196,29 @@ tuplesort_puttuple_common(Tuplesortstate *state, 
SortTuple *tuple, bool useAbbre
 
Assert(!LEADER(state));
 
+/* just roughly. Must be power-of-2 for efficient division */
+#define EFFECTIVE_CPU_L3_SIZE 16777216
+
/* Count the size of the out-of-line data */
if (tuple->tuple != NULL)
+   {
+   size_t before_mem = state->availMem / EFFECTIVE_CPU_L3_SIZE;
+
USEMEM(state, GetMemoryChunkSpace(tuple->tuple));
 
+   /*
+* Being a bit more cache awareness to the sort and do a presort
+* of the tuple seen thus far while they're likely still 
sitting in
+* L3.

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-02-15 Thread Masahiko Sawada

On Tue, Feb 14, 2023 at 8:24 PM John Naylor
 wrote:
>
> On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada  wrote:
> >
> > On Sat, Feb 11, 2023 at 2:33 PM John Naylor
> >  wrote:
> > >
> > > I didn't get any closer to radix-tree regression,
> >
> > Me neither. It seems that in v26, inserting chunks into node-32 is
> > slow but needs more analysis. I'll share if I found something
> > interesting.
>
> If that were the case, then the other benchmarks I ran would likely have 
> slowed down as well, but they are the same or faster. There is one 
> microbenchmark I didn't run before: "select * from 
> bench_fixed_height_search(15)" (15 to reduce noise from growing size class, 
> and despite the name it measures load time as well). Trying this now shows no 
> difference: a few runs range 19 to 21ms in each version. That also reinforces 
> that update_inner is fine and that the move to value pointer API didn't 
> regress.
>
> Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of 
> 5, perf run separate from measurements):
>
> v15 + v26 store:
>
>  mem_allocated | load_ms
> ---+-
>   98202152 | 553
>
>   19.71%  postgres  postgres [.] tidstore_add_tids
> + 31.47%  postgres  postgres [.] rt_set
> = 51.18%
>
>   20.62%  postgres  postgres [.] rt_node_insert_leaf
>6.05%  postgres  postgres [.] AllocSetAlloc
>4.74%  postgres  postgres [.] AllocSetFree
>4.62%  postgres  postgres [.] palloc
>2.23%  postgres  postgres [.] SlabAlloc
>
> v26:
>
>  mem_allocated | load_ms
> ---+-
>   98202032 | 617
>
>   57.45%  postgres  postgres [.] tidstore_add_tids
>
>   20.67%  postgres  postgres [.] local_rt_node_insert_leaf
>5.99%  postgres  postgres [.] AllocSetAlloc
>3.55%  postgres  postgres [.] palloc
>3.05%  postgres  postgres [.] AllocSetFree
>2.05%  postgres  postgres [.] SlabAlloc
>
> So it seems the store itself got faster when we removed shared memory paths 
> from the v26 store to test it against v15.
>
> I thought to favor the local memory case in the tidstore by controlling 
> inlining -- it's smaller and will be called much more often, so I tried the 
> following (done in 0007)
>
>  #define RT_PREFIX shared_rt
>  #define RT_SHMEM
> -#define RT_SCOPE static
> +#define RT_SCOPE static pg_noinline
>
> That brings it down to
>
>  mem_allocated | load_ms
> ---+-
>   98202032 | 590

The improvement makes sense to me. I've also done the same test (with
changing TIDS_PER_BLOCK_FOR_LOAD to 1):

w/o 0007 patch:
 mem_allocated | load_ms | iter_ms
---+-+-
  98202032 | 334 | 445
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---+-+-
  98202032 | 316 | 434
(1 row)

On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load
performance didn't improve:

w/0 0007 patch:
 mem_allocated | load_ms | iter_ms
---+-+-
  98202032 | 601 | 608
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---+-+-
  98202032 | 610 | 606
(1 row)

That being said, it might be within noise level, so I agree with 0007 patch.

> Perhaps some slowdown is unavoidable, but it would be nice to understand why.

True.

>
> > I can think that something like traversing a HOT chain could visit
> > offsets out of order. But fortunately we prune such collected TIDs
> > before heap vacuum in heap case.
>
> Further, currently we *already* assume we populate the tid array in order 
> (for binary search), so we can just continue assuming that (with an assert 
> added since it's more public in this form). I'm not sure why such basic 
> common sense evaded me a few versions ago...

Right. TidStore is implemented not only for heap, so loading
out-of-order TIDs might be important in the future.

> > > If these are acceptable, I can incorporate them into a later patchset.
> >
> > These are nice improvements! I agree with all changes.
>
> Great, I've squashed these into the tidstore patch (0004). Also added 0005, 
> which is just a simplification.
>

I've attached some small patches to improve the radix tree and tidstrore:

We have the following WIP comment in test_radixtree:

// WIP: compiles with warnings because rt_attach is defined but not used
// #define RT_SHMEM

How about unsetting RT_SCOPE to suppress warnings for unused rt_attach
and friends?

FYI I've briefly tested the TidStore with blocksize = 32kb, and it
seems to work fine.

> I squashed the earlier dead code removal into the radix tree patch.

Thanks!

>
> v27-0008 measures tid store iteration performance and adds a stub function to 
> prevent spurious warnings, so the benchmarking module can always be built.
>
> Getting the list of offsets from the

Dead code in ps_status.c

2023-02-15 Thread Thomas Munro

Hi,

Here's some archeology I did a while back, but was reminded to post
when I saw David's nearby performance improvements for ps_status.c.

 * there are no systems with HAVE_PS_STRINGS (ancient BSD)
 * setproctitle_fast() is in all live FreeBSD releases
 * setproctitle() is in all other BSDs
 * PostgreSQL can't run on GNU/Hurd apparently, for lack of shared
sempahores, so who would even know if that works?
 * IRIX is rusting in peace
 * there are no other NeXT-derived systems (NeXTSTEP and OPENSTEP are departed)

Therefore I think it is safe to drop the PS_USE_PS_STRING and
PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments
and macro tests, and prune the defunct configure/meson probe.

I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be
changed to just defined(sun) (surely there are no other living
SysV-derived systems, and I think non-BSD Sun probably meant "Solaris
but not SunOS"), but I don't know so I didn't touch that.

I think the history here is that the ancient BSD sendmail code
(conf.c) had all this stuff for BSD and SVR5 systems, but then its
setproctitle() function actually moved into the OS so that the
underlying PS_STRINGS stuff wouldn't have to be stable, and indeed it
was not.
From fdca29db1140aac55f9b07d63bf5cdc7555454da Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 16 Feb 2023 15:48:50 +1300
Subject: [PATCH] Remove obsolete code from ps_status.c.

We can now remove various code, comments and configure/meson probes for
ancient BSD variants, GNU/Hurd, IRIX, NeXT.
---
 configure  | 35 ---
 configure.ac   | 13 ---
 meson.build| 17 -
 src/backend/utils/misc/ps_status.c | 55 +-
 src/include/pg_config.h.in |  3 --
 5 files changed, 9 insertions(+), 114 deletions(-)

diff --git a/configure b/configure
index 7bb829ddf4..e35769ea73 100755
--- a/configure
+++ b/configure
@@ -16278,41 +16278,6 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for PS_STRINGS" >&5
-$as_echo_n "checking for PS_STRINGS... " >&6; }
-if ${pgac_cv_var_PS_STRINGS+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-#include 
-#include 
-
-int
-main ()
-{
-PS_STRINGS->ps_nargvstr = 1;
-PS_STRINGS->ps_argvstr = "foo";
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_link "$LINENO"; then :
-  pgac_cv_var_PS_STRINGS=yes
-else
-  pgac_cv_var_PS_STRINGS=no
-fi
-rm -f core conftest.err conftest.$ac_objext \
-conftest$ac_exeext conftest.$ac_ext
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_var_PS_STRINGS" >&5
-$as_echo "$pgac_cv_var_PS_STRINGS" >&6; }
-if test "$pgac_cv_var_PS_STRINGS" = yes ; then
-
-$as_echo "#define HAVE_PS_STRINGS 1" >>confdefs.h
-
-fi
-
 ac_fn_c_check_func "$LINENO" "explicit_bzero" "ac_cv_func_explicit_bzero"
 if test "x$ac_cv_func_explicit_bzero" = xyes; then :
   $as_echo "#define HAVE_EXPLICIT_BZERO 1" >>confdefs.h
diff --git a/configure.ac b/configure.ac
index 137a40a942..af23c15cb2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1849,19 +1849,6 @@ AC_CHECK_DECLS([pwritev], [], [AC_LIBOBJ(pwritev)], [#include ])
 # This is probably only present on macOS, but may as well check always
 AC_CHECK_DECLS(F_FULLFSYNC, [], [], [#include ])
 
-AC_CACHE_CHECK([for PS_STRINGS], [pgac_cv_var_PS_STRINGS],
-[AC_LINK_IFELSE([AC_LANG_PROGRAM(
-[#include 
-#include 
-],
-[PS_STRINGS->ps_nargvstr = 1;
-PS_STRINGS->ps_argvstr = "foo";])],
-[pgac_cv_var_PS_STRINGS=yes],
-[pgac_cv_var_PS_STRINGS=no])])
-if test "$pgac_cv_var_PS_STRINGS" = yes ; then
-  AC_DEFINE([HAVE_PS_STRINGS], 1, [Define to 1 if the PS_STRINGS thing exists.])
-fi
-
 AC_REPLACE_FUNCS(m4_normalize([
 	explicit_bzero
 	getopt
diff --git a/meson.build b/meson.build
index b5daed9f38..f534704452 100644
--- a/meson.build
+++ b/meson.build
@@ -2293,23 +2293,6 @@ endif
 cdata.set('pg_restrict', '__restrict')
 
 
-if cc.links('''
-#include 
-#include 
-
-int main(void)
-{
-PS_STRINGS->ps_nargvstr = 1;
-PS_STRINGS->ps_argvstr = "foo";
-}
-''',
-  name: 'PS_STRINGS', args: test_c_args)
-  cdata.set('HAVE_PS_STRINGS', 1)
-else
-  cdata.set('HAVE_PS_STRINGS', false)
-endif
-
-
 # Most libraries are included only if they demonstrably provide a function we
 # need, but libm is an exception: always include it, because there are too
 # many compilers that play cute optimization games that will break probes for
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 06bdc2afd1..d6a3389c29 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -15,10 +15,6 @@
 #include "postgres.h"
 
 #include 
-#ifdef HAVE_PS_STRINGS
-#include 	/* for old BSD */
-#include 
-#endif
 #if defined(__darwin__)
 #include 
 #endif
@@ -39,16 +35,10 @@ bool		update_process_title = DEFAULT_UPDATE_PROCESS_TITLE;
  *

Re: Support logical replication of DDLs

2023-02-15 Thread Amit Kapila

On Thu, Feb 16, 2023 at 3:46 AM Zheng Li  wrote:
>
> We have not discussed much about the ownership of replicated objects.
> Currently, replicated
> objects belong to the subscription owner. However, it makes sense to
> allow replicated
> objects to keep the same owner from the publisher for certain use
> cases otherwise users
> may need to run lots of ALTER TABLE/OBJ OWNER TO manually. This issue
> has been raised in [1] and [2].
>
> I've implemented a prototype to allow replicated objects to have the
> same owner from the publisher in
> v69-0008-Allow-replicated-objects-to-have-the-same-owner-from.patch.
>

I also think it would be a helpful addition for users. A few points
that come to my mind are: (a) Shouldn't the role have the same
priveliges (for ex. rolbypassrls or rolsuper) on both sides before we
allow this? (b) Isn't it better to first have a replication of roles?

I think if we have (b) then it would be probably a bit easier because
if the subscription has allowed replicating roles and we can confirm
that the role is replicated then we don't need to worry about the
differences.

Now, coming to implementation, won't it be better if we avoid sending
the owner to the subscriber unless it is changed for the replicated
command? Consider the current case of tables where we send schema only
if it is changed. This is not a direct mapping but it would be better
to avoid sending additional information and then process it on the
subscriber for each command.

-- 
With Regards,
Amit Kapila.

Re: REASSIGN OWNED vs ALTER TABLE OWNER TO permission inconsistencies

2023-02-15 Thread Robert Haas

On Wed, Feb 15, 2023 at 9:01 AM Stephen Frost  wrote:
> I don't think I really agree that "because a superuser can arrange for
> it to not be valid" that it follows that requiring the recipient to have
> CREATE permission on the parent object doesn't make sense.  Surely for
> any of these scenarios, whatever rule we come up with (assuming we have
> any rule at all...) a superuser could arrange to make that rule no
> longer consistent.

Well  yes and no.

The superuser can always hack things by modifying the system catalogs,
but we have plenty of integrity constraints that a superuser can't
just casually violate because they feel like it. For example, a
superuser is no more able to revoke privileges without revoking the
privileges that depend upon them than anyone else.

> I'm not really a fan of just dropping the CREATE check.  If we go with
> "recipient needs CREATE rights" then at least without superuser
> intervention and excluding cases where REVOKE's or such are happening,
> we should be able to see that only objects where the owners of those
> objects have CREATE rights exist in the system.  If we drop the CREATE
> check entirely then clearly any user who happens to have access to
> multiple roles can arrange to have objects owned by any of their roles
> in any schema or database they please without any consideration for what
> the owner of the parent object's wishes are.

That's true, and it is a downside of dropping to CREATE check, but
it's also a bit hard to believe that anyone's really getting a lot of
value out of the current inconsistent checks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: run pgindent on a regular basis / scripted manner

2023-02-15 Thread Noah Misch

On Wed, Feb 15, 2023 at 12:45:52PM -0600, Justin Pryzby wrote:
> On Mon, Feb 13, 2023 at 07:57:25AM -0500, Andrew Dunstan wrote:
> > On 2023-02-12 Su 15:59, Justin Pryzby wrote:
> > > It seems like if pgindent knows about git, it ought to process only
> > > tracked files.  Then, it wouldn't need to manually exclude generated
> > > files, and it wouldn't process vpath builds and who-knows-what else it
> > > finds in CWD.
> > 
> > I don't really want restrict this to tracked files because it would mean you
> > can't pgindent files before you `git add` them.
> 
> I think you'd allow indenting files which were either tracked *or*
> specified on the command line.
> 
> Also, it makes a more sense to "add" the file before indenting it, to
> allow checking the output and remove unrelated changes.  So that doesn't
> seem to me like a restriction of any significance.
> 
> But I would never want to indent an untracked file unless I specified
> it.

Agreed.  I use pgindent three ways:

1. Indent everything that changed between master and the current branch.  Most
   common, since I develop nontrivial patches on branches.
2. Indent all staged files.  For trivial changes.
3. Indent all tracked files.  For typedefs.list changes.

That said, pre-2023 pgindent changed untracked files if called without a file
list.  I've lived with that and could continue to do so.

Re: some namespace.c refactoring

2023-02-15 Thread Noah Misch

On Tue, Feb 14, 2023 at 02:32:04PM +0100, Peter Eisentraut wrote:
> Notes on 0001-Refactor-is-visible-functions.patch:
> 
> Among the functions that are being unified, some check temp schemas and some
> skip them.  I suppose that this is because some (most) object types cannot
> normally be in temp schemas, but this isn't made explicit in the code.  I
> added a code comment about this, the way I understand it.
> 
> That said, you can create objects explicitly in temp schemas, so I'm not
> sure the existing code is completely correct.

> + /*
> +  * Do not look in temp namespace for object types that 
> don't
> +  * support temporary objects
> +  */
> + if (!(classid == RelationRelationId || classid == 
> TypeRelationId) &&
> + namespaceId == myTempNamespace)
> + continue;

I think the reason for the class-specific *IsVisible behavior is alignment
with the lookup rules that CVE-2007-2138 introduced (commit aa27977).  "CREATE
FUNCTION pg_temp.f(...)" works, but calling the resulting function requires a
schema-qualified name regardless of search_path.  Since *IsVisible functions
determine whether you can reach the object without schema qualification, their
outcomes shall reflect those CVE-2007-2138 rules.

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Nikolay Samokhvalov

On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> I suggest we call it "xmlformat", which is an established term for this.
>

Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should
go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly
different form, with ability to specify the number of spaces for
indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

Re: Minimal logical decoding on standbys

2023-02-15 Thread Ashutosh Sharma

On Wed, Feb 15, 2023 at 11:48 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-02-15 18:02:11 +0530, Ashutosh Sharma wrote:
> > Thanks Andres. I have one more query (both for you and Bertrand). I
> > don't know if this has already been answered somewhere in this mail
> > thread, if yes, please let me know the mail that answers this query.
> >
> > Will there be a problem if we mandate the use of physical replication
> > slots and hot_standby_feedback to support minimum LD on standby. I
> > know people can do a physical replication setup without a replication
> > slot or even with hot_standby_feedback turned off, but are we going to
> > have any issue if we ask them to use a physical replication slot and
> > turn on hot_standby_feedback for LD on standby. This will reduce the
> > code changes required to do conflict handling for logical slots on
> > standby which is being done by v50-0001 and v50-0002* patches
> > currently.
>
> I don't think it would. E.g. while restoring from archives we can't rely on
> knowing that the slot still exists on the primary.
>
> We can't just do corrupt things, including potentially crashing, when the
> configuration is wrong. We can't ensure that the configuration is accurate all
> the time. So we need to detect this case. Hence needing to detect conflicts.
>

OK. Got it, thanks.

>
> > IMHO even in normal scenarios i.e. when we are not doing LD on
> > standby, we should mandate the use of a physical replication slot.
>
> I don't think that's going to fly. There plenty scenarios where you e.g. don't
> want to use a slot, e.g. when you want to limit space use on the primary.
>

I think this can be controlled via max_slot_wal_keep_size GUC, if I
understand it correctly.

--
With Regards,
Ashutosh Sharma.

Re: Is psSocketPoll doing the right thing?

2023-02-15 Thread Katsuragi Yuta

On 2023-02-10 10:42, Kyotaro Horiguchi wrote:

On 2023/02/09 11:50, Kyotaro Horiguchi wrote:
> Hello.
> While looking a patch, I found that pqSocketPoll passes through the
> result from poll(2) to the caller and throws away revents. If I
> understand it correctly, poll() *doesn't* return -1 nor errno by the
> reason it has set POLLERR, POLLHUP, POLLNVAL, and POLLRDHUP for some
> of the target sockets, and returns 0 unless poll() itself failed to
> work.

As far as I understand correctly, poll() returns >0 if "revents"
has either of those bits, not 0 nor -1.

Right. as my understanding.

If any of the sockets is in any of the states, pqSocketPoll returns a
positive, which makes pqSocketCheck return 1. Finally
pqRead/WriteReady return "ready" even though the connection socket is
in an error state.  Actually that behavior doesn't harm since the
succeeding actual read/write will "properly" fail. However, once we
use this function to simply check the socket is sound without doing an
actual read/write, that behavior starts giving a harm by the false
answer.

I agree with you. Current pqScoketCheck could return a false result
from a caller's point of view.

You're thinking that pqSocketPoll() should check "revents" and
return -1 if either of those bits is set?

In short, yes.

pqSocketPoll() should somehow inform callers about that
state. Fortunately pqSocketPoll is a private function thus we can
refactor the function so that it can do that properly.

Does this mean that pqSocketPoll or pqSocketCheck somehow returns the
poll's result including error conditions (POLLERR, POLLHUP, POLLNVAL)
to callers? Then callers filter the result to make their final result.

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Dilip Kumar

On Thu, Feb 16, 2023 at 5:37 AM Nathan Bossart  wrote:
>
> On Wed, Feb 15, 2023 at 04:49:38AM +, Ryo Matsumura (Fujitsu) wrote:
> > The above is occured by the following call.
> > The argument 'permanent' of ReadBufferWithoutRelcache() is passed to
> > BufferAlloc() as 'relpersistence'.
> >
> > [src/backend/commands/]
> >  298 buf = ReadBufferWithoutRelcache(rnode, MAIN_FORKNUM, blkno,
> >  299 RBM_NORMAL, bstrategy, false);
>
> Indeed, setting that to true (as per the attached patch) seems to fix this.
> I don't see any reason this _shouldn't_ be true from what I've read so far.
> We're reading pg_class, which will probably never be UNLOGGED.

Yes, there is no reason to pass this as false, seems like this is
passed false by mistake.  And your patch fixes the issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Force testing of query jumbling code in TAP tests

2023-02-15 Thread Michael Paquier

On Tue, Feb 14, 2023 at 10:11:21AM -0800, Andres Freund wrote:
> I didn't mean printing in the sense of outputting the statements to the tap
> log. Maybe creating a temp table or such for all the queries. And yes, then
> doing some top-level analysis on it like you describe sounds like a good idea.

One idea would be something like that, that makes sure that reports
are generated for the most common query patterns:
WITH select_stats AS
 (SELECT upper(substr(query, 1, 6)) AS select_query
FROM pg_stat_statements
WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE',
 'INSERT', 'DELETE',
 'CREATE'))
 SELECT select_query, count(select_query) > 1 AS some_rows
   FROM select_stats
   GROUP BY select_query ORDER BY select_query;

Other ideas are welcome.  At least this would be a start.
--
Michael
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index 570bf42b58..c60314d195 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -9,7 +9,7 @@
 #
 #-
 
-EXTRA_INSTALL=contrib/test_decoding contrib/pg_prewarm
+EXTRA_INSTALL=contrib/pg_prewarm contrib/pg_stat_statements contrib/test_decoding
 
 subdir = src/test/recovery
 top_builddir = ../../..
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 13482adbaf..4f26192ffd 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -14,6 +14,17 @@ $node_primary->init(allows_streaming => 1);
 $node_primary->adjust_conf('postgresql.conf', 'max_connections', '25');
 $node_primary->append_conf('postgresql.conf',
 	'max_prepared_transactions = 10');
+
+# Enable pg_stat_statements to force tests with do query jumbling.
+# pg_stat_statements.max should be large enough to hold all the entries
+# of the regression database.
+$node_primary->append_conf(
+	'postgresql.conf',
+	qq{shared_preload_libraries = 'pg_stat_statements'
+pg_stat_statements.max = 5
+compute_query_id = 'regress'
+});
+
 # We'll stick with Cluster->new's small default shared_buffers, but since that
 # makes synchronized seqscans more probable, it risks changing the results of
 # some test queries.  Disable synchronized seqscans to prevent that.
@@ -106,6 +117,27 @@ command_ok(
 	[ 'diff', $outputdir . '/primary.dump', $outputdir . '/standby.dump' ],
 	'compare primary and standby dumps');
 
+# Check some data from pg_stat_statements.
+$node_primary->safe_psql('postgres', 'CREATE EXTENSION pg_stat_statements');
+# This gathers data based on the first characters for some common query types,
+# providing coverage for SELECT, DMLs, and some DDLs.
+my $result = $node_primary->safe_psql(
+	'postgres',
+	qq{WITH select_stats AS
+  (SELECT upper(substr(query, 1, 6)) AS select_query
+ FROM pg_stat_statements
+ WHERE upper(substr(query, 1, 6)) IN ('SELECT', 'UPDATE',
+  'INSERT', 'DELETE',
+  'CREATE'))
+  SELECT select_query, count(select_query) > 1 AS some_rows
+FROM select_stats
+GROUP BY select_query ORDER BY select_query;});
+is( $result, qq(CREATE|t
+DELETE|t
+INSERT|t
+SELECT|t
+UPDATE|t), 'check contents of pg_stat_statements on regression database');
+
 $node_standby_1->stop;
 $node_primary->stop;
 


signature.asc
Description: PGP signature

RE: Allow logical replication to copy tables in binary format

2023-02-15 Thread Takamichi Osumi (Fujitsu)

Hi, Melih


On Monday, January 30, 2023 7:50 PM Melih Mutlu  wrote:
> Thanks for reviewing. 
...
> Well, with this patch, it will begin to fail in the table copy phase...
The latest patch doesn't get updated for more than two weeks
after some review comments. If you don't have time,
I would like to help updating the patch for you and other reviewers.

Kindly let me know your status.


Best Regards,
Takamichi Osumi

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-15 Thread Kyotaro Horiguchi

At Wed, 15 Feb 2023 11:29:18 +, "Hayato Kuroda (Fujitsu)" 
 wrote in 
> Dear Andres and other hackers,
> 
> > OTOH, if we want to implement the functionality on publisher-side,
> > I think we need to first consider the interface.
> > We can think of two options (a) Have it as a subscription parameter as the 
> > patch
> > has now and
> > then pass it as an option to the publisher which it will use to delay;
> > (b) Have it defined on publisher-side, say via GUC or some other way.
> > The basic idea could be that while processing commit record (in 
> > DecodeCommit),
> > we can somehow check the value of delay and then use it there to delay 
> > sending
> > the xact.
> > Also, during delay, we need to somehow send the keepalive and process 
> > replies,
> > probably via a new callback or by some existing callback.
> > We also need to handle in-progress and 2PC xacts in a similar way.
> > For the former, probably we would need to apply the delay before sending 
> > the first
> > stream.
> > Could you please share what you feel on this direction as well ?
> 
> I implemented a patch that the delaying is done on the publisher side. In 
> this patch,
> approach (a) was chosen, in which min_apply_delay is specified as a 
> subscription
> parameter, and then apply worker passes it to the publisher as an output 
> plugin option.

As Amit-K mentioned, we may need to change the name of the option in
this version, since the delay mechanism in this version causes a delay
in sending from publisher than delaying apply on the subscriber side.

I'm not sure why output plugin is involved in the delay mechanism. It
appears to me that it would be simpler if the delay occurred in
reorder buffer or logical decoder instead. Perhaps what I understand
correctly is that we could delay right before only sending commit
records in this case. If we delay at publisher end, all changes will
be sent at once if !streaming, and otherwise, all changes in a
transaction will be spooled at subscriber end. In any case, apply
worker won't be holding an active transaction unnecessarily.  Of
course we need add the mechanism to process keep-alive and status
report messages.

> During the delay, the walsender periodically checks and processes replies 
> from the
> apply worker and sends keepalive messages if needed. Therefore, the ability 
> to handle
> keepalives is not loosed.

My understanding is that the keep-alives is a different mechanism with
a different objective from status reports. Even if subscriber doesn't
send a spontaneous or extra status reports at all, connection can be
checked and maintained by keep-alive packets. It is possible to setup
an asymmetric configuration where only walsender sends keep-alives,
but none are sent from the peer. Those setups work fine when no
apply-delay involved, but they won't work with the patches we're
talking about because the subscriber won't respond to the keep-alive
packets from the peer.  So when I wrote "practically works" in the
last mail, this is what I meant.

Thus if someone plans to enable apply_delay for logical replication,
that person should be aware of some additional subtle restrictions that
are required compared to a non-delayed setups.

> To delay the transaction in the output plugin layer, the new 
> LogicalOutputPlugin
> API was added. For now, I choose the output plugin layer but can consider to 
> do
> it from the core if there is a better way.
> 
> Could you please share your opinion?
> 
> Note: thanks for Osumi-san to help implementing.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Michael Paquier

On Thu, Feb 16, 2023 at 10:24:13AM +0530, Dilip Kumar wrote:
> Yes, there is no reason to pass this as false, seems like this is
> passed false by mistake.  And your patch fixes the issue.

So, if I am understanding this stuff right, this issue can create data
corruption once a DDL updates any pages of pg_class stored in a
template database that gets copied by this routine.  In this case, the
patch sent makes sure that any page copied will get written once a
checkpoint kicks in.  Shouldn't we have at least a regression test for
such a scenario?  The issue can happen when updating a template
database after creating a database from it, which is less worrying
than the initial impression I got, still I'd like to think that we
should have some coverage as of the special logic this code path
relies on for pg_class when reading its buffers.

I have not given much attention to this area, but I am a bit
suspicious that enforcing the default as WAL_LOG was a good idea for
15~, TBH.  We are usually much more conservative when it comes to
such choices, switching to the new behavior after a few years would
have been wiser..
--
Michael

signature.asc
Description: PGP signature

Re: Dead code in ps_status.c

2023-02-15 Thread Tom Lane

Thomas Munro  writes:
> Therefore I think it is safe to drop the PS_USE_PS_STRING and
> PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments
> and macro tests, and prune the defunct configure/meson probe.

Seems reasonable.  Patch passes an eyeball check.

> I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be
> changed to just defined(sun) (surely there are no other living
> SysV-derived systems, and I think non-BSD Sun probably meant "Solaris
> but not SunOS"), but I don't know so I didn't touch that.

Hm, is "defined(sun)" true on any live systems at all?

regards, tom lane

remove duplicate comment.

2023-02-15 Thread Amul Sul

Hi,

The attached patch removes the comment line noting the same as the
previous paragraph of the ExecUpdateAct() prolog comment.

--
Regards,
Amul Sul
EDB: http://www.enterprisedb.com
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index f419c47065a..645e62f4a76 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1948,9 +1948,6 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
  * caller is also in charge of doing EvalPlanQual if the tuple is found to
  * be concurrently updated.  However, in case of a cross-partition update,
  * this routine does it.
- *
- * Caller is in charge of doing EvalPlanQual as necessary, and of keeping
- * indexes current for the update.
  */
 static TM_Result
 ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,

Missing free_var() at end of accum_sum_final()?

2023-02-15 Thread Joel Jacobson

Hi,

I noticed the NumericVar's pos_var and neg_var are not free_var()'d at the end 
of accum_sum_final().

The potential memory leak seems small, since the function is called only once 
per sum() per worker (and from a few more places), but maybe they should be 
free'd anyways for correctness?

/Joel

Re: Dead code in ps_status.c

2023-02-15 Thread Thomas Munro

On Thu, Feb 16, 2023 at 6:34 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > Therefore I think it is safe to drop the PS_USE_PS_STRING and
> > PS_USE_CHANGE_ARGV code branches, remove a bunch of outdated comments
> > and macro tests, and prune the defunct configure/meson probe.
>
> Seems reasonable.  Patch passes an eyeball check.

Thanks for looking.

> > I guess (defined(sun) && !defined(BSD)) || defined(__svr5__) could be
> > changed to just defined(sun) (surely there are no other living
> > SysV-derived systems, and I think non-BSD Sun probably meant "Solaris
> > but not SunOS"), but I don't know so I didn't touch that.
>
> Hm, is "defined(sun)" true on any live systems at all?

My GCC compile farm account seems to have expired, or something, so I
couldn't check on wrasse's host (though whether wrasse is "live" is
debatable: Solaris 11.3 has reached EOL, it's just that the CPU is too
old to be upgraded, so it's not testing a real OS that anyone would
actually run PostgreSQL on).  But from some googling[1], I think
__sun, __sun__ and sun should all be defined.

Ohh, but __svr5__ should not be.  Solaris boxes define __svr4__, I was
confused by the two fives.  __svr5__ was SCO/Unixware, another dead
OS[1], so I think we can just remove that one too.  So, yeah, I think
we should replace (defined(sun) && !defined(BSD)) || defined(__svr5__)
with defined(__sun).  (Hmph.  We have all of __sun__, __sun and sun in
the tree.)

[1] https://stackoverflow.com/questions/16618604/solaris-and-preprocessor-macros
[2] https://en.wikipedia.org/wiki/UNIX_System_V#SVR5_/_UnixWare_7

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-02-15 Thread Hayato Kuroda (Fujitsu)

Dear Horiguchi-san,

Thank you for responding! Before modifying patches, I want to confirm something
you said.

> As Amit-K mentioned, we may need to change the name of the option in
> this version, since the delay mechanism in this version causes a delay
> in sending from publisher than delaying apply on the subscriber side.

Right, will be changed.

> I'm not sure why output plugin is involved in the delay mechanism. It
> appears to me that it would be simpler if the delay occurred in
> reorder buffer or logical decoder instead.

I'm planning to change, but..

> Perhaps what I understand
> correctly is that we could delay right before only sending commit
> records in this case. If we delay at publisher end, all changes will
> be sent at once if !streaming, and otherwise, all changes in a
> transaction will be spooled at subscriber end. In any case, apply
> worker won't be holding an active transaction unnecessarily.

What about parallel case? Latest patch does not reject the combination of 
parallel
streaming mode and delay. If delay is done at commit and subscriber uses an 
parallel
apply worker, it may acquire lock for a long time.

> Of
> course we need add the mechanism to process keep-alive and status
> report messages.

Could you share the good way to handle keep-alive and status messages if you 
have?
If we changed to the decoding layer, it is strange to call walsender function
directly.

> Those setups work fine when no
> apply-delay involved, but they won't work with the patches we're
> talking about because the subscriber won't respond to the keep-alive
> packets from the peer.  So when I wrote "practically works" in the
> last mail, this is what I meant.

I'm not sure around the part. I think in the latest patch, subscriber can 
respond
to the keepalive packets from the peer. Also, publisher can respond to the peer.
Could you please tell me if you know a case that publisher or subscriber cannot
respond to the opposite side? Note that if we apply the publisher-side patch, we
don't have to apply subscriber-side patch.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Missing free_var() at end of accum_sum_final()?

2023-02-15 Thread Michael Paquier

On Thu, Feb 16, 2023 at 06:59:13AM +0100, Joel Jacobson wrote:
> I noticed the NumericVar's pos_var and neg_var are not free_var()'d
> at the end of accum_sum_final().
> 
> The potential memory leak seems small, since the function is called
> only once per sum() per worker (and from a few more places), but
> maybe they should be free'd anyways for correctness? 

Indeed, it is true that any code path of numeric.c that relies on a
NumericVar with an allocation done in its buffer is careful enough to
free it, except for generate_series's SRF where one step of the
computation is done.  I don't see directly why you could not do the
following:
@@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, NumericVar 
*result)
/* And add them together */
add_var(&pos_var, &neg_var, result);
 
+   free_var(&pos_var);
+   free_var(&neg_var);
+
/* Remove leading/trailing zeroes */
strip_var(result);
--
Michael


signature.asc
Description: PGP signature

Re: Support logical replication of global object commands

2023-02-15 Thread Amit Kapila

On Tue, Aug 30, 2022 at 8:09 AM Zheng Li  wrote:
>
> > > I think a publication of ALL OBJECTS sounds intuitive. Does it mean we'll
> > > publish all DDL commands, all commit and abort operations in every
> > > database if there is such publication of ALL OBJECTS?
> > >
> >
> > Actually, I intend something for global objects. But the main thing
> > that is worrying me about this is that we don't have a clean way to
> > untie global object replication from database-specific object
> > replication.
>
> I think ultimately we need a clean and efficient way to publish (and
> subscribe to) any changes in all databases, preferably in one logical
> replication slot.
>

Agreed. I was thinking currently for logical replication both
walsender and slot are database-specific. So we need a way to
distinguish the WAL for global objects and then avoid filtering based
on the slot's database during decoding. I also thought about whether
we want to have a WALSender that is not connected to a database for
the replication of global objects but I couldn't come up with a reason
for doing so. Do you have any thoughts on this matter?

-- 
With Regards,
Amit Kapila.

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Nathan Bossart

On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote:
> So, if I am understanding this stuff right, this issue can create data
> corruption once a DDL updates any pages of pg_class stored in a
> template database that gets copied by this routine.  In this case, the
> patch sent makes sure that any page copied will get written once a
> checkpoint kicks in.  Shouldn't we have at least a regression test for
> such a scenario?  The issue can happen when updating a template
> database after creating a database from it, which is less worrying
> than the initial impression I got, still I'd like to think that we
> should have some coverage as of the special logic this code path
> relies on for pg_class when reading its buffers.

I was able to quickly hack together a TAP test that seems to reliably fail
before the fix and pass afterwards.  There's probably a better place for
it, though...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index ef05633bb0..a0259cc593 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -296,7 +296,7 @@ ScanSourceDatabasePgClass(Oid tbid, Oid dbid, char *srcpath)
 		CHECK_FOR_INTERRUPTS();
 
 		buf = ReadBufferWithoutRelcache(rlocator, MAIN_FORKNUM, blkno,
-		RBM_NORMAL, bstrategy, false);
+		RBM_NORMAL, bstrategy, true);
 
 		LockBuffer(buf, BUFFER_LOCK_SHARE);
 		page = BufferGetPage(buf);
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
   't/031_recovery_conflict.pl',
   't/032_relfilenode_reuse.pl',
   't/033_replay_tsp_drops.pl',
+  't/100_bugs.pl',
 ],
   },
 }
diff --git a/src/test/recovery/t/100_bugs.pl b/src/test/recovery/t/100_bugs.pl
new file mode 100644
index 00..6429a352e9
--- /dev/null
+++ b/src/test/recovery/t/100_bugs.pl
@@ -0,0 +1,24 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for various bugs found over time
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# DDL on template is persisted after creating database using WAL_LOG strategy
+my $node = PostgreSQL::Test::Cluster->new('node');
+$node->init;
+$node->start;
+$node->safe_psql("postgres", "CREATE DATABASE test STRATEGY WAL_LOG;");
+$node->safe_psql("template1", "CREATE TABLE test (a INT);");
+$node->safe_psql("postgres", "CHECKPOINT;");
+$node->stop('immediate');
+$node->start;
+my $result = $node->safe_psql("template1", "SELECT count(*) FROM test;");
+is($result, "0", "check table still exists");
+$node->stop;
+
+done_testing();

Re: remove duplicate comment.

2023-02-15 Thread Michael Paquier

On Thu, Feb 16, 2023 at 11:05:13AM +0530, Amul Sul wrote:
> The attached patch removes the comment line noting the same as the
> previous paragraph of the ExecUpdateAct() prolog comment.

- * Caller is in charge of doing EvalPlanQual as necessary, and of keeping
- * indexes current for the update.

Indeed, good catch.  Both are mentioned in the previous paragraph.
--
Michael


signature.asc
Description: PGP signature

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Dilip Kumar

On Thu, Feb 16, 2023 at 12:11 PM Nathan Bossart
 wrote:
>
> On Thu, Feb 16, 2023 at 02:26:55PM +0900, Michael Paquier wrote:
> > So, if I am understanding this stuff right, this issue can create data
> > corruption once a DDL updates any pages of pg_class stored in a
> > template database that gets copied by this routine.  In this case, the
> > patch sent makes sure that any page copied will get written once a
> > checkpoint kicks in.  Shouldn't we have at least a regression test for
> > such a scenario?  The issue can happen when updating a template
> > database after creating a database from it, which is less worrying
> > than the initial impression I got, still I'd like to think that we
> > should have some coverage as of the special logic this code path
> > relies on for pg_class when reading its buffers.
>
> I was able to quickly hack together a TAP test that seems to reliably fail
> before the fix and pass afterwards.  There's probably a better place for
> it, though...

I think the below change is not relevant to this bug right?

diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index 209118a639..6e9f8a7c7f 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -39,6 +39,7 @@ tests += {
   't/031_recovery_conflict.pl',
   't/032_relfilenode_reuse.pl',
   't/033_replay_tsp_drops.pl',
+  't/100_bugs.pl',
 ],
   },
 }

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Robert Haas

On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar  wrote:
> I think the below change is not relevant to this bug right?
>
> diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
> index 209118a639..6e9f8a7c7f 100644
> --- a/src/test/recovery/meson.build
> +++ b/src/test/recovery/meson.build
> @@ -39,6 +39,7 @@ tests += {
>'t/031_recovery_conflict.pl',
>'t/032_relfilenode_reuse.pl',
>'t/033_replay_tsp_drops.pl',
> +  't/100_bugs.pl',
>  ],
>},
>  }

Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


On 16.02.23 05:38, Nikolay Samokhvalov wrote:
On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut 
 wrote:


I suggest we call it "xmlformat", which is an established term for
this.


Some very-very old, rusted memory told me that there was something in 
standard – and indeed, it seems it described an optional Feature X069, 
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing 
should go there, to XMLSERIALIZE, to follow the standard?


Oracle also has an option for it in XMLSERIALIZE, although in a 
slightly different form, with ability to specify the number of spaces 
for indentation 
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


Hi Nikolay,

My first thought was to call it xmlpretty, to make it consistent with 
the jsonb equivalent "jsonb_pretty". But yes, you make a good 
observation .. xmlserialize seems to be a much better candidate.


I would be willing to refactor my patch if we agree on xmlserialize.

Thanks for the suggestion!

Jim

Re: Missing free_var() at end of accum_sum_final()?

2023-02-15 Thread Joel Jacobson

On Thu, Feb 16, 2023, at 07:26, Michael Paquier wrote:
> Indeed, it is true that any code path of numeric.c that relies on a
> NumericVar with an allocation done in its buffer is careful enough to
> free it, except for generate_series's SRF where one step of the
> computation is done.  I don't see directly why you could not do the
> following:
> @@ -11973,6 +11973,9 @@ accum_sum_final(NumericSumAccum *accum, 
> NumericVar *result)
> /* And add them together */
> add_var(&pos_var, &neg_var, result);
> 
> +   free_var(&pos_var);
> +   free_var(&neg_var);
> +

Thanks for looking and explaining.

I added the free_var() calls after strip_var() to match the similar existing 
code at the end of sqrt_var().
Not that it matters, but thought it looked nicer to keep them in the same order.

/Joel

numeric-accum-sum-final-free-var.patch
Description: Binary data

wrong query result due to wang plan

2023-02-15 Thread tender wang

Hi hackers,
I have this query as shown below:

select ref_1.r_comment as c0, subq_0.c1 as c1 from public.region as
sample_0 right join public.partsupp as sample_1 right join public.lineitem
as sample_2 on (cast(null as path) = cast(null as path)) on (cast(null as
"timestamp") < cast(null as "timestamp")) inner join public.lineitem as
ref_0 on (true) left join (select sample_3.ps_availqty as c1,
sample_3.ps_comment as c2 from public.partsupp as sample_3 where false
order by c1, c2 ) as subq_0 on (sample_1.ps_supplycost = subq_0.c1 ) right
join public.region as ref_1 on (sample_1.ps_availqty = ref_1.r_regionkey )
where ref_1.r_comment is not NULL order by c0, c1;

*This query has different result on pg12.12 and on HEAD*,
on pg12.12:
   c0
 | c1
-+
 even, ironic theodolites according to the bold platelets wa
  |
 furiously unusual packages use carefully above the unusual, exp
  |
 silent, bold requests sleep slyly across the quickly sly dependencies.
furiously silent instructions alongside  |
 special, bold deposits haggle foxes. platelet
  |
 special Tiresias about the furiously even dolphins are furi
  |
(5 rows)

its plan :
  QUERY PLAN
--
 Sort
   Sort Key: ref_1.r_comment, c1
   ->  Hash Left Join
 Hash Cond: (ref_1.r_regionkey = ps_availqty)
 ->  Seq Scan on region ref_1
   Filter: (r_comment IS NOT NULL)
 ->  Hash
   ->  Result
 One-Time Filter: false
(9 rows)

But on HEAD(pg16devel), its results below:
c0 | c1
+
(0 rows)

its plan:
   QUERY PLAN

 Sort
   Sort Key: ref_1.r_comment, subq_0.c1
   ->  Result
 One-Time Filter: false
(4 rows)

Attached file included table schema info.
regards, tender wang


dbt3-s0.01-janm.sql
Description: Binary data

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Michael Paquier

On Thu, Feb 16, 2023 at 12:37:57PM +0530, Robert Haas wrote:
> Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

It would not matter for REL_15_STABLE, but on HEAD all the new TAP
test files have to be added in their respective meson.build.  If you
don't do that, the CFBot would not test it, neither would a local
build with meson.

Adding a test in src/test/recovery/ is OK by me.  This is a recovery
case, and that's a..  Bug.  Another possible name would be something
like 0XX_create_database.pl, now that's me being picky.
--
Michael


signature.asc
Description: PGP signature

Re: Missing free_var() at end of accum_sum_final()?

2023-02-15 Thread Michael Paquier

On Thu, Feb 16, 2023 at 08:08:37AM +0100, Joel Jacobson wrote:
> I added the free_var() calls after strip_var() to match the similar
> existing code at the end of sqrt_var().
> Not that it matters, but thought it looked nicer to keep them in the
> same order. 

WFM.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Add pretty-printed XML output option

2023-02-15 Thread Jim Jones


On 16.02.23 00:13, Peter Smith wrote:

Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
  -- XML format: empty string
  SELECT xmlformat('');
  ERROR:  invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

--

The attached patch update (v12-0002) fixes the xml_2.out for me.


It works for me too.

Thanks for the quick fix!

Jim

Re: DDL result is lost by CREATE DATABASE with WAL_LOG strategy

2023-02-15 Thread Dilip Kumar

On Thu, Feb 16, 2023 at 12:38 PM Robert Haas  wrote:
>
> On Thu, Feb 16, 2023 at 12:32 PM Dilip Kumar  wrote:
> > I think the below change is not relevant to this bug right?
> >
> > diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
> > index 209118a639..6e9f8a7c7f 100644
> > --- a/src/test/recovery/meson.build
> > +++ b/src/test/recovery/meson.build
> > @@ -39,6 +39,7 @@ tests += {
> >'t/031_recovery_conflict.pl',
> >'t/032_relfilenode_reuse.pl',
> >'t/033_replay_tsp_drops.pl',
> > +  't/100_bugs.pl',
> >  ],
> >},
> >  }
>
> Why not? The patch creates 100_bugs.pl so it also adds it to meson.build.

Yeah my bad, I somehow assumed this was an existing file.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?

2023-02-15 Thread Michael Paquier

On Wed, Feb 15, 2023 at 01:00:00PM +0530, Bharath Rupireddy wrote:
> The v3 patch reduces initialization of iovec array elements which is a
> clear win when pg_pwrite_zeros is called for sizes less than BLCKSZ
> many times (I assume this is what is needed for the relation extension
> lock improvements feature). However, it increases the number of iovec
> initialization with zerobuf for the cases when pg_pwrite_zeros is
> called for sizes far greater than BLCKSZ (for instance, WAL file
> initialization).

It seems to me that v3 would do extra initializations only if
pg_pwritev_with_retry() does *not* retry its writes, but that's not
the case as it retries on a partial write as per its name.  The number
of iov buffers is stricly capped by remaining_size.  FWIW, I find v3
proposed more elegant.

> FWIW, I attached v4 patch, a simplified version of the v2  - it
> initializes all the iovec array elements if the total blocks to be
> written crosses lengthof(iovec array), otherwise it initializes only
> the needed blocks.

+   static size_t   zbuf_sz = BLCKSZ;
In v4, what's the advantage of marking that as static?  It could
actually be dangerous if this is carelessly updated.  Well, that's not
the case, still..
--
Michael


signature.asc
Description: PGP signature

94 matches

Mail list logo