Re: tweak to a few index tests to hits ambuildempty() routine.

2022-09-21 Thread a . kozhemyakin
After analyzing this, I found out why we don't reach that Assert but we 
have coverage shown - firstly, it reached via another test, vacuum; 
secondly, it depends on the gcc optimization flag. We reach that Assert 
only when using -O0.
If we build with -O2 or -Og that function is not reached (due to 
different results of the heap_prune_satisfies_vacuum() check inside 
heap_page_prune()).
But as  the make checks mostly (including the buildfarm testing) 
performed  with -O2/-Og, it looks like that after 4fb5c794e5 we have 
lost the coverage provided by the 4c51a2d1e4.


Amul Sul писал 2022-09-14 14:28:

On Wed, Sep 14, 2022 at 12:16 PM  wrote:


I still wonder, if assert doesn't catch why that place is marked as
covered here?
https://coverage.postgresql.org/src/backend/access/gin/ginvacuum.c.gcov.html



Probably other tests cover that.

Regards,
Amul





Multi-insert related comment in CopyFrom()

2022-09-21 Thread Etsuro Fujita
Hi,

While working on the “Fast COPY FROM based on batch insert” patch, I
noticed this:

else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
 resultRelInfo->ri_TrigDesc->trig_insert_new_table)
{
/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers. It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now, CopyMultiInsertInfoFlush expects that any before row insert
 * and statement level insert triggers are on the same relation.
 */
insertMethod = CIM_SINGLE;
}

I think there is a thinko in the comment; “before” should be after.
Patch attached.

Best regards,
Etsuro Fujita


Fix-thinko-in-comment.patch
Description: Binary data


Re: RFC: Logging plan of the running query

2022-09-21 Thread Alena Rybakina

 Ok, I get it.

Since GetLockMethodLocalHash() is only used for assertions, this is 
only defined when USE_ASSERT_CHECKING is enabled. This patch uses 
GetLockMethodLocalHash() not only for the assertion purpose, so I 
removed "ifdef USE_ASSERT_CHECKING" for this function. I belive it 
does not lead to skip errors. 

Agree.
Since these processes are needed for nested queries, not only for 
AbortSubTransaction[1], added comments on it.


I also noticed it. However I also discovered that above function 
declarations to be aplied for explain command and used to be printed 
details of the executed query.


We have a similar task to print the plan of an interrupted process 
making a request for a specific pid.


In short, I think, this task is different and for separating these parts 
I added this comment.


I feel this comment is unnecessary since the explanation of 
HandleLogQueryPlanInterrupt() is written in explain.c and no functions 
in explain.h have comments in it. 


Yes, I was worried about it. I understood it, thank for explaining.



AFAIU both of them are processed since SendProcSignal flags 
ps_signalFlags for each signal.


```
SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
{
volatile ProcSignalSlot *slot;
...(snip)...
278 if (slot->pss_pid == pid)
279 {
280 /* Atomically set the proper flag */
281 slot->pss_signalFlags[reason] = true;
282 /* Send signal */
283 return kill(pid, SIGUSR1);
```

Comments of ProcSignalReason also say 'We can cope with concurrent 
signals for different reasons'.


```C
/*
* Reasons for signaling a Postgres child process (a backend or an 
auxiliary
* process, like checkpointer). We can cope with concurrent signals for 
different
* reasons. However, if the same reason is signaled multiple times in 
quick
* succession, the process is likely to observe only one notification 
of it.

* This is okay for the present uses.
...
typedef enum
{
PROCSIG_CATCHUP_INTERRUPT, /* sinval catchup interrupt */
PROCSIG_NOTIFY_INTERRUPT, /* listen/notify interrupt */
PROCSIG_PARALLEL_MESSAGE, /* message from cooperating parallel backend */
PROCSIG_WALSND_INIT_STOPPING, /* ask walsenders to prepare for 
shutdown */

PROCSIG_BARRIER, /* global barrier interrupt */
PROCSIG_LOG_MEMORY_CONTEXT, /* ask backend to log the memory contexts */
PROCSIG_LOG_QUERY_PLAN, /* ask backend to log plan of the current 
query */

...
} ProcSignalReason;
```


[1] 
https://www.postgresql.org/message-id/8b53b32f-26cc-0531-4ac0-27310e0bef4b%40oss.nttdata.com

Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-21 Thread Peter Smith
Here are some review comments for patch v30-0001.

==

1. Commit message

In addition, the patch extends the logical replication STREAM_ABORT message so
that abort_time and abort_lsn can also be sent which can be used to update the
replication origin in parallel apply worker when the streaming transaction is
aborted. Because this message extension is needed to support parallel
streaming, meaning that parallel streaming is not supported for publications on
servers < PG16.

"meaning that parallel streaming is not supported" -> "parallel
streaming is not supported"

==

2. doc/src/sgml/logical-replication.sgml

@@ -1611,8 +1622,12 @@ CONTEXT:  processing remote data for
replication origin "pg_16395" during "INSER
to the subscriber, plus some reserve for table synchronization.
max_logical_replication_workers must be set to at least
the number of subscriptions, again plus some reserve for the table
-   synchronization.  Additionally the max_worker_processes
-   may need to be adjusted to accommodate for replication workers, at least
+   synchronization. In addition, if the subscription parameter
+   streaming is set to parallel, please
+   increase max_logical_replication_workers according to
+   the desired number of parallel apply workers.  Additionally the
+   max_worker_processes may need to be adjusted to
+   accommodate for replication workers, at least
(max_logical_replication_workers
+ 1).  Note that some extensions and parallel queries
also take worker slots from max_worker_processes.

IMO it looks a bit strange to have "In addition" followed by "Additionally".

Also, "to accommodate for replication workers"? seems like a typo (but
it is not caused by your patch)

BEFORE
In addition, if the subscription parameter streaming is set to
parallel, please increase max_logical_replication_workers according to
the desired number of parallel apply workers.

AFTER (???)
If the subscription parameter streaming is set to parallel,
max_logical_replication_workers should be increased according to the
desired number of parallel apply workers.

==

3. .../replication/logical/applyparallelworker.c - parallel_apply_can_start

+/*
+ * Returns true, if it is allowed to start a parallel apply worker, false,
+ * otherwise.
+ */
+static bool
+parallel_apply_can_start(TransactionId xid)

Seems a slightly complicated comment for a simple boolean function.

SUGGESTION
Returns true/false if it is OK to start a parallel apply worker.

==

4. .../replication/logical/applyparallelworker.c - parallel_apply_free_worker

+ winfo->in_use = false;
+
+ /* Are there enough workers in the pool? */
+ if (napplyworkers > (max_parallel_apply_workers_per_subscription / 2))
+ {

I felt the comment/logic about "enough" needs a bit more description.
At least it should say to refer to the more detailed explanation atop
worker.c

==

5. .../replication/logical/applyparallelworker.c - parallel_apply_setup_dsm

+ /*
+ * Estimate how much shared memory we need.
+ *
+ * Because the TOC machinery may choose to insert padding of oddly-sized
+ * requests, we must estimate each chunk separately.
+ *
+ * We need one key to register the location of the header, and we need two
+ * other keys to track of the locations of the message queue and the error
+ * message queue.
+ */

"track of" -> "keep track of" ?

==

6. src/backend/replication/logical/launcher.c  - logicalrep_worker_detach

 logicalrep_worker_detach(void)
 {
+ /* Stop the parallel apply workers. */
+ if (!am_parallel_apply_worker() && !am_tablesync_worker())
+ {
+ List*workers;
+ ListCell   *lc;

The condition is not very obvious. This is why I previously suggested
adding another macro/function like 'isLeaderApplyWorker'. In the
absence of that, then I think the comment needs to be more
descriptive.

SUGGESTION
If this is the leader apply worker then stop the parallel apply workers.

==

7. src/backend/replication/logical/proto.c - logicalrep_read_stream_abort

 void
 logicalrep_write_stream_abort(StringInfo out, TransactionId xid,
-   TransactionId subxid)
+   TransactionId subxid, XLogRecPtr abort_lsn,
+   TimestampTz abort_time, bool abort_info)
 {
  pq_sendbyte(out, LOGICAL_REP_MSG_STREAM_ABORT);

@@ -1175,19 +1179,40 @@ logicalrep_write_stream_abort(StringInfo out,
TransactionId xid,
  /* transaction ID */
  pq_sendint32(out, xid);
  pq_sendint32(out, subxid);
+
+ if (abort_info)
+ {
+ pq_sendint64(out, abort_lsn);
+ pq_sendint64(out, abort_time);
+ }


The new param name 'abort_info' seems misleading.

Maybe a name like 'write_abort_info' is better?

~~~

8. src/backend/replication/logical/proto.c - logicalrep_read_stream_abort

+logicalrep_read_stream_abort(StringInfo in,
+ LogicalRepStreamAbortData *abort_data,
+ bool read_abort_lsn)
 {
- Assert(xid && subxid);
+ Assert(abort_data);
+
+ abort_data->xid = pq_getmsgint(in, 4);
+ abort_data->subxid = pq_getmsgint(in, 4);

- *xid = pq_getmsgint(in, 4);
- *subxid = pq_getmsgint(in, 4);
+ i

Re: SLRUs in the main buffer pool, redux

2022-09-21 Thread Thomas Munro
On Sat, Sep 17, 2022 at 12:41 PM Bagga, Rishu  wrote:
> While I was working on adding the page headers to SLRU pages on your patch, I 
> came across this code where it seems like "MultiXactIdToMemberPage" is 
> mistakenly being used instead of MultiXactIdToOffsetPage in the TrimMultiXact 
> function.

Thanks Rishu.  Right.  Will fix soon in the next version, along with
my long overdue replies to Heikki and Konstantin.




Re: Add common function ReplicationOriginName.

2022-09-21 Thread Peter Smith
On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila  wrote:
>
...

> Can't we use the existing function ReplicationOriginNameForTablesync()
> by passing relid as InvalidOid for this purpose? We need a check
> inside to decide which name to construct, otherwise, it should be
> fine. If we agree with this, then we can change the name of the
> function to something like ReplicationOriginNameForLogicalRep or
> ReplicationOriginNameForLogicalRepWorkers.
>

This suggestion attaches special meaning to the reild param.

Won't it seem a bit strange for the non-tablesync callers (who
probably have a perfectly valid 'relid') to have to pass an InvalidOid
relid just so they can format the correct origin name?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Tighten pg_get_object_address argument checking

2022-09-21 Thread Amit Kapila
On Tue, Sep 20, 2022 at 11:14 PM Peter Eisentraut
 wrote:
>
> For publication schemas (OBJECT_PUBLICATION_NAMESPACE) and user
> mappings (OBJECT_USER_MAPPING), pg_get_object_address() checked the
> array length of the second argument, but not of the first argument.
> If the first argument was too long, it would just silently ignore
> everything but the first argument.  Fix that by checking the length of
> the first argument as well.
>

LGTM.

-- 
With Regards,
Amit Kapila.




Re: Add common function ReplicationOriginName.

2022-09-21 Thread Amit Kapila
On Wed, Sep 21, 2022 at 3:09 PM Peter Smith  wrote:
>
> On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila  wrote:
> >
> ...
>
> > Can't we use the existing function ReplicationOriginNameForTablesync()
> > by passing relid as InvalidOid for this purpose? We need a check
> > inside to decide which name to construct, otherwise, it should be
> > fine. If we agree with this, then we can change the name of the
> > function to something like ReplicationOriginNameForLogicalRep or
> > ReplicationOriginNameForLogicalRepWorkers.
> >
>
> This suggestion attaches special meaning to the reild param.
>
> Won't it seem a bit strange for the non-tablesync callers (who
> probably have a perfectly valid 'relid') to have to pass an InvalidOid
> relid just so they can format the correct origin name?
>

For non-tablesync workers, relid should always be InvalidOid. See, how
we launch apply workers in ApplyLauncherMain(). Do you see any case
for non-tablesync workers where relid is not InvalidOid?


-- 
With Regards,
Amit Kapila.




Re: making relfilenodes 56 bits

2022-09-21 Thread Dilip Kumar
On Tue, Sep 20, 2022 at 10:44 PM Robert Haas  wrote:

Thanks for the review, please see my response inline for some of the
comments, rest all are accepted.

> On Fri, Sep 9, 2022 at 6:02 AM Dilip Kumar  wrote:
> > [ new patch ]
>
> +typedef pg_int64 RelFileNumber;
>
> This seems really random to me. First, why isn't this an unsigned
> type? OID is unsigned and I don't see a reason to change to a signed
> type. But even if we were going to change to a signed type, why
> pg_int64? That is declared like this:
>
> /* Define a signed 64-bit integer type for use in client API declarations. */
> typedef PG_INT64_TYPE pg_int64;
>
> Surely this is not a client API declaration
>
> Note that if we change this a lot of references to INT64_FORMAT will
> need to become UINT64_FORMAT.
>
> I think we should use int64 at the SQL level, because we don't have an
> unsigned 64-bit SQL type, and a signed 64-bit type can hold 56 bits.
> So it would still be Int64GetDatum((int64) rd_rel->relfilenode) or
> similar. But internally I think using unsigned is cleaner.

Yeah you are right we can make it uint64.  With respect to this, we
can not directly use uint64 because that is declared in c.h and that
can not be used in
postgres_ext.h IIUC.  So what are the other option maybe we can
typedef the RelFIleNumber similar to what c.h done for uint64 i.e.

#ifdef HAVE_LONG_INT_64
typedef unsigned long int uint64;
#elif defined(HAVE_LONG_LONG_INT_64)
typedef long long int int64;
#endif

And maybe same for UINT64CONST ?

I am not liking duplicating this logic but is there any better
alternative for doing this?  Can we move the existing definitions from
c.h file to some common file (common for client and server)?

>
> +if (relnumber >= ShmemVariableCache->loggedRelFileNumber)
>
> It probably doesn't make any difference, but to me it seems better to
> test flushedRelFileNumber rather than logRelFileNumber here. What do
> you think?

Actually based on this condition are logging more so it make more
sense to check w.r.t loggedRelFileNumber, but OTOH technically,
without flushing log we are not supposed to use the relfilenumber so
make more sense to test flushedRelFileNumber.  But since both are the
same I am fine with flushedRelFileNumber.

>  /*
>   * We set up the lockRelId in case anything tries to lock the dummy
> - * relation.  Note that this is fairly bogus since relNumber may be
> - * different from the relation's OID.  It shouldn't really matter though.
> - * In recovery, we are running by ourselves and can't have any lock
> - * conflicts.  While syncing, we already hold AccessExclusiveLock.
> + * relation.  Note we are setting relId to just FirstNormalObjectId which
> + * is completely bogus.  It shouldn't really matter though. In recovery,
> + * we are running by ourselves and can't have any lock conflicts.  While
> + * syncing, we already hold AccessExclusiveLock.
>   */
>  rel->rd_lockInfo.lockRelId.dbId = rlocator.dbOid;
> -rel->rd_lockInfo.lockRelId.relId = rlocator.relNumber;
> +rel->rd_lockInfo.lockRelId.relId = FirstNormalObjectId;
>
> Boy, this makes me uncomfortable. The existing logic is pretty bogus,
> and we're replacing it with some other bogus thing. Do we know whether
> anything actually does try to use this for locking?
>
> One notable difference between the existing logic and your change is
> that, with the existing logic, we use a bogus value that will differ
> from one relation to the next, whereas with this change, it will
> always be the same value. Perhaps el->rd_lockInfo.lockRelId.relId =
> (Oid) rlocator.relNumber would be a more natural adaptation?
>
> +#define CHECK_RELFILENUMBER_RANGE(relfilenumber)\
> +do {\
> +if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \
> +ereport(ERROR,  \
> +errcode(ERRCODE_INVALID_PARAMETER_VALUE),   \
> +errmsg("relfilenumber %lld is out of range",\
> +(long long) (relfilenumber))); \
> +} while (0)
>
> Here, you take the approach of casting the relfilenumber to long long
> and then using %lld. But elsewhere, you use
> INT64_FORMAT/UINT64_FORMAT. If we're going to use this technique, we
> ought to use it everywhere.

Based on the discussion [1], it seems we can not use
INT64_FORMAT/UINT64_FORMAT while using ereport.  But all other places
I am using INT64_FORMAT/UINT64_FORMAT.  Does this make sense?

[1] 
https://www.postgresql.org/message-id/20220730113922.qd7qmenwcmzyacje%40alvherre.pgsql

>  typedef struct
>  {
> -Oid reltablespace;
> -RelFileNumber relfilenumber;
> -} RelfilenumberMapKey;
> -
> -typedef struct
> -{
> -RelfilenumberMapKey key;/* lookup key - must be first */
> +RelFileNumber relfilenumber;/* lookup key - must be first */
>  Oid relid; 

Re: Add common function ReplicationOriginName.

2022-09-21 Thread Peter Smith
On Wed, Sep 21, 2022 at 8:08 PM Amit Kapila  wrote:
>
> On Wed, Sep 21, 2022 at 3:09 PM Peter Smith  wrote:
> >
> > On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila  wrote:
> > >
> > ...
> >
> > > Can't we use the existing function ReplicationOriginNameForTablesync()
> > > by passing relid as InvalidOid for this purpose? We need a check
> > > inside to decide which name to construct, otherwise, it should be
> > > fine. If we agree with this, then we can change the name of the
> > > function to something like ReplicationOriginNameForLogicalRep or
> > > ReplicationOriginNameForLogicalRepWorkers.
> > >
> >
> > This suggestion attaches special meaning to the reild param.
> >
> > Won't it seem a bit strange for the non-tablesync callers (who
> > probably have a perfectly valid 'relid') to have to pass an InvalidOid
> > relid just so they can format the correct origin name?
> >
>
> For non-tablesync workers, relid should always be InvalidOid. See, how
> we launch apply workers in ApplyLauncherMain(). Do you see any case
> for non-tablesync workers where relid is not InvalidOid?
>

Hmm, my mistake. I was thinking more of all the calls coming from the
subscriptioncmds.c, but now that I look at it maybe none of those has
any relid either.

OK, I can unify the 2 functions as you suggested. I will post another
patch in a few days.

--
Kind Regards,
Peter Smith,
Fujitsu Australia.




Re: [PATCH]Feature improvement for MERGE tab completion

2022-09-21 Thread Alvaro Herrera
On 2022-Sep-21, Fujii Masao wrote:

> How about adding something like PartialMatches() that checks whether
> the keywords are included in the input string or not? If so, we can restrict
> some tab-completion rules to operating only on MERGE, as follows. I attached
> the WIP patch (0002 patch) that introduces PartialMatches().
> Is this approach over-complicated? Thought?

I think it's fine to backpatch your 0001 to 15 and put 0002 in master.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)




Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-21 Thread Bharath Rupireddy
On Wed, Sep 21, 2022 at 12:15 PM Fujii Masao
 wrote:
>
> This looks much complicated to me.
>
> Instead of making allocate_backup_state() or reset_backup_state()
> store the label name in BackupState before do_pg_backup_start(),
> how about making do_pg_backup_start() do that after checking its length?
> Seems this can simplify the code very much.
>
> If so, ISTM that we can replace allocate_backup_state() and
> reset_backup_state() with just palloc0() and MemSet(), respectively.
> Also we can remove fill_backup_label_name().

Yes, that makes things simpler. I will post the v10 patch-set soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-21 Thread Amit Kapila
On Wed, Sep 21, 2022 at 2:55 PM Peter Smith  wrote:
>
> ==
>
> 3. .../replication/logical/applyparallelworker.c - parallel_apply_can_start
>
> +/*
> + * Returns true, if it is allowed to start a parallel apply worker, false,
> + * otherwise.
> + */
> +static bool
> +parallel_apply_can_start(TransactionId xid)
>
> Seems a slightly complicated comment for a simple boolean function.
>
> SUGGESTION
> Returns true/false if it is OK to start a parallel apply worker.
>

I think this is the style followed at some other places as well. So,
we can leave it.

>
> 6. src/backend/replication/logical/launcher.c  - logicalrep_worker_detach
>
>  logicalrep_worker_detach(void)
>  {
> + /* Stop the parallel apply workers. */
> + if (!am_parallel_apply_worker() && !am_tablesync_worker())
> + {
> + List*workers;
> + ListCell   *lc;
>
> The condition is not very obvious. This is why I previously suggested
> adding another macro/function like 'isLeaderApplyWorker'.
>

How about having function a function am_leader_apply_worker() { ...
return OidIsValid(MyLogicalRepWorker->relid) &&
(MyLogicalRepWorker->apply_leader_pid == InvalidPid) ...}?

>
> 13. src/include/replication/worker_internal.h
>
> + /*
> + * Indicates whether the worker is available to be used for parallel apply
> + * transaction?
> + */
> + bool in_use;
>
> This comment seems backward for this member's name.
>
> SUGGESTION (something like...)
> Indicates whether this ParallelApplyWorkerInfo is currently being used
> by a parallel apply worker processing a transaction. (If this flag is
> false then it means the ParallelApplyWorkerInfo is available for
> re-use by another parallel apply worker.)
>

I am not sure if this is an improvement over the current. The current
comment appears reasonable to me as it is easy to follow.

-- 
With Regards,
Amit Kapila.




Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-21 Thread Bharath Rupireddy
On Wed, Sep 21, 2022 at 12:27 PM Michael Paquier  wrote:
>
> >>> I've also taken help of the error callback mechanism to clean up the
> >>> allocated memory in case of a failure. For do_pg_abort_backup() cases,
> >>> I think we don't need to clean the memory because that gets called on
> >>> proc exit (before_shmem_exit()).
> >>
> >> Memory could still bloat while the process running the SQL functions
> >> is running depending on the error code path, anyway.
> >
> > I didn't get your point. Can you please elaborate it? I think adding
> > error callbacks at the right places would free up the memory for us.
> > Please note that we already are causing memory leaks on HEAD today.
>
> I mean that HEAD makes no effort in freeing this memory in
> TopMemoryContext on session ERROR.

Correct. We can also solve it as part of this commit. Please let me
know your thoughts on 0002 patch.

> I have a few comments on 0001.
>
> +#include 
> Double quotes wanted here.

Ah, my bad. Corrected now.

> deallocate_backup_variables() is the only part of xlogbackup.c that
> includes references of the tablespace map_and backup_label
> StringInfos.  I would be tempted to fully decouple that from
> xlogbackup.c/h for the time being.

There's no problem with it IMO, after all, they are backup related
variables. And that function reduces a bit of duplicate code.

> -   tblspc_map_file = makeStringInfo();
> Not sure that there is a need for a rename here.

We're referring tablespace_map and backup_label internally all around,
just to be in sync, I wanted to rename it while we're refactoring this
code.

> +void
> +build_backup_content(BackupState *state, bool ishistoryfile, StringInfo str)
> +{
> It would be more natural to have build_backup_content() do by itself
> the initialization of the StringInfo for the contents of backup_label
> and return it as a result of the routine?  This is short-lived in
> xlogfuncs.c when the backup ends.

See the below explaination.

> @@ -248,18 +250,25 @@ perform_base_backup(basebackup_options *opt, bbsink 
> *sink)
> [...]
> +   /* Construct backup_label contents. */
> +   build_backup_content(backup_state, false, backup_label);
>
> Actually, for base backups, perhaps it would be more intuitive to
> build and free the StringInfo of the backup_label when we send it for
> base.tar rather than initializing it at the beginning and freeing it
> at the end?

sendFileWithContent() is in a for-loop and we are good if we call
build_backup_content() before do_pg_backup_start() just once. Also,
allocating backup_label in the for loop makes error handling trickier,
how do we free-up when sendFileWithContent() errors out? Of course, we
can allocate backup_label once even in the for loop with bool
first_time sort of variable and store StringInfo *ptr_backup_label; in
error callback info, but that would make things unnecessarily complex,
instead we're good allocating and creating backup_label content at the
beginning and freeing-it up at the end.

> - * pg_backup_start: set up for taking an on-line backup dump
> + * pg_backup_start: start taking an on-line backup.
>   *
> - * Essentially what this does is to create a backup label file in $PGDATA,
> - * where it will be archived as part of the backup dump.  The label file
> - * contains the user-supplied label string (typically this would be used
> - * to tell where the backup dump will be stored) and the starting time and
> - * starting WAL location for the dump.
> + * This function starts the backup and creates tablespace_map contents.
>
> The last part of the comment is still correct while the former is not,
> so this loses some information.

Added that part before pg_backup_stop() now where it makes sense with
the refactoring.

I'm attaching the v10 patch-set with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From cb93211220d73cfb4ae832ff4e04fd46e706ddfe Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 21 Sep 2022 10:52:06 +
Subject: [PATCH v10] Refactor backup related code

Refactor backup related code, advantages of doing so are following:

1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now, no error checking
for invalid parsing.
3) backup_label and history file contents have most of the things
in common, they can now be created within a single function.
4) makes backup related code extensible and readable.

This introduces new source files xlogbackup.c/.h for backup related
code and adds the new code in there. The xlog.c file has already grown
to ~9000 LOC (as of this time). Eventually, we would want to move
all the backup related code from xlog.c, xlogfuncs.c, elsewhere to
here.

Author: Bharath Rupireddy
Reviewed-by: Michael Paquier
Reviewed-b

Re: Hash index build performance tweak from sorting

2022-09-21 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley  wrote:
>
> On Tue, 2 Aug 2022 at 03:37, Simon Riggs  wrote:
> > Using the above test case, I'm getting a further 4-7% improvement on
> > already committed code with the attached patch, which follows your
> > proposal.
> >
> > The patch passes info via a state object, useful to avoid API churn in
> > later patches.
>
> Hi Simon,
>
> I took this patch for a spin and saw a 2.5% performance increase using
> the random INT test that Tom posted. The index took an average of
> 7227.47 milliseconds on master and 7045.05 with the patch applied.

Hi David,

Thanks for tests and review. I'm just jumping on a plane, so may not
respond in detail until next Mon.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




[PATCH] polish the error message of creating proc

2022-09-21 Thread Junwang Zhao
when a error occurs when creating proc, it should point out the
specific proc kind instead of just printing "function".

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..58af4b48ce 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName,
if (!replace)
ereport(ERROR,
(errcode(ERRCODE_DUPLICATE_FUNCTION),
-errmsg("function \"%s\"
already exists with same argument types",
+errmsg("%s \"%s\" already
exists with same argument types",
+
(oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" :
+
oldproc->prokind == PROKIND_PROCEDURE ? "procedure" :
+
oldproc->prokind == PROKIND_WINDOW ? "window function" :
+"function"),
procedureName)));
if (!pg_proc_ownercheck(oldproc->oid, proowner))
aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION,

-- 
Regards
Junwang Zhao


0001-polish-the-error-message-of-creating-proc.patch
Description: Binary data


RE: [Proposal] Add foreign-server health checks infrastructure

2022-09-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thanks for checking!

> These failed to be applied to the master branch cleanly. Could you update 
> them?

PSA rebased patches. I reviewed my myself and they contain changes.
E.g., move GUC-related code to option.c.


> +  this option relies on kernel events exposed by Linux, macOS,
> 
> s/this/This

Fixed.

> 
> + GUC_check_errdetail("pgfdw_health_check_interval must be set
> to 0 on this platform");
> 
> The actual parameter name "postgres_fdw.health_check_interval"
> should be used for the message instead of internal variable name.

Fixed.

> This registered signal handler does lots of things. But that's not acceptable
> and they should be performed outside signal handler. No?


I modified like v09 or earlier versions, which has a mechanism for registering 
CheckingRemoteServersCallback.
It had been removed because we want to keep core simpler, but IIUC it is needed
if the signal handler just sets some flags.
The core-side does not consider the current status of transaction and running 
query for simpleness.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v15-0001-Add-an-infrastracture-for-checking-remote-server.patch
Description:  v15-0001-Add-an-infrastracture-for-checking-remote-server.patch


v15-0002-postgres_fdw-Implement-health-check-feature.patch
Description:  v15-0002-postgres_fdw-Implement-health-check-feature.patch


v15-0003-add-doc.patch
Description: v15-0003-add-doc.patch


v15-0004-add-test.patch
Description: v15-0004-add-test.patch


Re: tweak to a few index tests to hits ambuildempty() routine.

2022-09-21 Thread Alvaro Herrera
On 2022-Sep-21, a.kozhemya...@postgrespro.ru wrote:

> After analyzing this, I found out why we don't reach that Assert but we have
> coverage shown - firstly, it reached via another test, vacuum; secondly, it
> depends on the gcc optimization flag. We reach that Assert only when using
> -O0.
> If we build with -O2 or -Og that function is not reached (due to different
> results of the heap_prune_satisfies_vacuum() check inside
> heap_page_prune()).
> But as  the make checks mostly (including the buildfarm testing) performed
> with -O2/-Og, it looks like that after 4fb5c794e5 we have lost the coverage
> provided by the 4c51a2d1e4.

Hmm, so if we merely revert the change to gin.sql then we still won't
get the coverage back?  I was thinking that a simple change would be to
revert the change from temp to unlogged for that table, and create
another unlogged table; but maybe that's not enough.  Do we need a
better test for GIN vacuuming that works regardless of the optimization
level?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-09-21 Thread Damir Belyalov
Thank you for reviewing.
In the previous patch there was an error when processing constraints. The
patch was fixed, but the code grew up and became more complicated
(0005-COPY_IGNORE_ERRORS). I also simplified the logic of
safeNextCopyFrom().
You asked why we need subtransactions, so the answer is in this patch. When
processing a row that does not satisfy constraints or INSTEAD OF triggers,
it is necessary to rollback the subtransaction and return the table to its
original state.
Cause of complexity, I had to abandon the constraints, triggers processing
in and handle only errors that occur when reading the file. Attaching
simplified patch (0006-COPY_IGNORE_ERRORS).
Checked out these patches on all COPY regress tests and it worked correctly.


> BTW in v4 patch, data are loaded into the buffer one by one, and when
> the buffer fills up, the data in the buffer are 'replayed' also one by
> one, right?
> Wouldn't this have more overhead than a normal COPY?
>
The data is loaded into the buffer one by one, but in the "replay mode"
ignore_errors works as standard COPY. Tuples add to the slot and depending
on the type of the slot (CIM_SINGLE or CIM_MULTI) tuples are replayed in
the corresponding case.
For the 0006 patch you can imagine that we divide the loop for(;;) in 2
parts. The first part is adding tuples to the buffer and the second part is
inserting tuples to the table. These parts don't intersect with each other
and are completed sequentially.
The main idea of replay_buffer is that it is needed for the CIM_SINGLE
case. You can implement the CIM_SINGLE case and see that tuples before an
error occurring don't add to the table. Logic of the 0005 patch is similar
but with some differences.

As a test, I COPYed slightly larger data with and without ignore_errors
> option.
> There might be other reasons, but I found a performance difference.

Tried to reduce performance difference with cleaning up replay_buffer with
resetting the new context for replay_buffer - replay_cxt.
```
Before:
Without ignore_errors:
COPY 1000
Time: 15538,579 ms (00:15,539)
With ignore_errors:
COPY 1000
Time: 21289,121 ms (00:21,289)

After:
Without ignore_errors:
COPY 1000
Time: 15318,922 ms (00:15,319)
With ignore_errors:
COPY 1000
Time: 19868,175 ms (00:19,868)
```

 - Put in the documentation that the warnings will not be output for more
> than 101 cases.
>
Yeah, I point it out in the doc.


> I applied v4 patch and when canceled the COPY, there was a case I found
> myself left in a transaction.
> Should this situation be prevented from occurring?
>
> ```
> =# copy test from '/tmp/1000.data' with (ignore_errors );
>
> ^CCancel request sent
> ERROR:  canceling statement due to user request
>
> =# truncate test;
> ERROR:  current transaction is aborted, commands ignored until end of
> transaction block
> ```
>
Tried to implement your error and could not. The result was the same as
COPY FROM implements.
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index c25b52d0cb..c99adabcc9 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -34,6 +34,7 @@ COPY { table_name [ ( format_name
 FREEZE [ boolean ]
+IGNORE_ERRORS [ boolean ]
 DELIMITER 'delimiter_character'
 NULL 'null_string'
 HEADER [ boolean | MATCH ]
@@ -233,6 +234,20 @@ COPY { table_name [ ( 

 
+   
+IGNORE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  containing syntax errors in data, rows with too many or too few columns,
+  rows that result in constraint violations, rows containing columns where
+  the data type's input function raises an error.
+  Outputs warnings about rows with incorrect data (the number of warnings
+  is not more than 100) and the total number of errors.
+ 
+
+   
+

 DELIMITER
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 49924e476a..f41b25f26a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -406,6 +406,7 @@ ProcessCopyOptions(ParseState *pstate,
    bool is_from,
    List *options)
 {
+	bool		ignore_errors_specified = false;
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
@@ -448,6 +449,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_errors") == 0)
+		{
+			if (ignore_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_errors_specified = true;
+			opts_out->ignore_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index e8bb168aea..6474d47d29 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -106,6 +106,12 @@ static char *limit_print

Re: [PATCH] polish the error message of creating proc

2022-09-21 Thread Julien Rouhaud
Hi,

On Wed, Sep 21, 2022 at 07:45:01PM +0800, Junwang Zhao wrote:
> when a error occurs when creating proc, it should point out the
> specific proc kind instead of just printing "function".

You should have kept the original thread in copy (1), or at least mention it.

> diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
> index a9fe45e347..58af4b48ce 100644
> --- a/src/backend/catalog/pg_proc.c
> +++ b/src/backend/catalog/pg_proc.c
> @@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName,
> if (!replace)
> ereport(ERROR,
> (errcode(ERRCODE_DUPLICATE_FUNCTION),
> -errmsg("function \"%s\"
> already exists with same argument types",
> +errmsg("%s \"%s\" already
> exists with same argument types",
> +
> (oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" :
> +
> oldproc->prokind == PROKIND_PROCEDURE ? "procedure" :
> +
> oldproc->prokind == PROKIND_WINDOW ? "window function" :
> +"function"),
> procedureName)));
> if (!pg_proc_ownercheck(oldproc->oid, proowner))
> aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION,

You can't put part of the message in parameter, as the resulting string isn't
translatable.  You should either use "routine" as a generic term or provide 3
different full messages.

[1] 
https://www.postgresql.org/message-id/29ea5666.6ce8.1835f4b4992.coremail.qtds_...@126.com




Re: Support pg_attribute_aligned and noreturn in MSVC

2022-09-21 Thread James Coleman
On Tue, Sep 20, 2022 at 9:18 PM Michael Paquier  wrote:
>
> On Tue, Sep 20, 2022 at 08:01:20AM -0400, James Coleman wrote:
> > I don't have access to a Windows machine for testing, but re-reading
> > the documentation it looks like the issue is that our noreturn macro
> > is used after the definition while the MSVC equivalent is used before.
>
> A CI setup would do the job for example, see src/tools/ci/README that
> explains how to set up things.

That's a good reminder; I've been meaning to set that up but haven't
taken the time yet.

> > I've removed that for now (and commented about it); it's not as
> > valuable anyway since it's mostly an indicator for code analysis
> > (human and machine).
>
> Except for the fact that the patch missed to define
> pg_attribute_noreturn() in the MSVC branch, this looks fine to me.  I
> have been looking at what you meant with packing, and I can see the
> business with PACK(), something actually doable with gcc.
>
> That's a first step, at least, and I see no reason not to do it, so
> applied.

Thanks!

James Coleman




Re: Summary function for pg_buffercache

2022-09-21 Thread Aleksander Alekseev
Hi Andres,

> All you need to do is to read BufferDesc->state into a local variable and 
> then make decisions based on that

You are right, thanks.

Here is the corrected patch.

-- 
Best regards,
Aleksander Alekseev


v9-0001-Added-pg_buffercache_summary-function.patch
Description: Binary data


Re: Mingw task for Cirrus CI

2022-09-21 Thread Melih Mutlu
Hi hackers,

Justin Pryzby , 5 Eyl 2022 Pzt, 14:50 tarihinde şunu
yazdı:

> But cfbot should run the Mingw task for this patch's own commitfest
> entry.  But right now (because cfbot doesn't include the original commit
> message/s), it doesn't get run :(
>

I've been thinking about how to make the mingw task run only for this patch
on cfbot and not for others. TBH, I couldn't come up with a nice way to
achieve this.

Does anyone have any suggestions on this?

Andres Freund , 6 Eyl 2022 Sal, 02:52 tarihinde şunu
yazdı:

> Hi,
>
> On 2022-09-05 06:50:55 -0500, Justin Pryzby wrote:
> > I saw that but hadn't tracked it down yet.  Do you know if the tar
> > failures were from a TAP test added since you first posted the mingw
> > patch, or ??
>
> I think it's that msys now includes tar by default, but not sure.


Seems like msys2 comes with GNU tar now. Previously it was bsd tar under
"/c/Windows/System32/tar.exe " which now we force it to use.


Thanks,
Melih


Re: On login trigger: take three

2022-09-21 Thread Sergey Shinderuk

On 20.09.2022 17:10, Daniel Gustafsson wrote:
On 20 Sep 2022, at 15:43, Sergey Shinderuk  wrote: 
There is a race around setting and clearing of dathasloginevt.


Thanks for the report!  The whole dathasloginevt mechanism is IMO too clunky
and unelegant to go ahead with, I wouldn't be surprised if there are other bugs
lurking there.

Indeed.

CREATE DATABASE doesn't copy dathasloginevt from the template database.
ALTER EVENT TRIGGER ... ENABLE doesn't set dathasloginevt.

In both cases triggers are enabled according to \dy output, but don't 
fire. The admin may not notice it immediately.


--
Sergey Shinderukhttps://postgrespro.com/




Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.

2022-09-21 Thread Ashutosh Bapat
On Mon, Sep 19, 2022 at 8:09 PM Ashutosh Sharma  wrote:
>
> On Mon, Sep 19, 2022 at 5:24 PM Ashutosh Bapat
>  wrote:
> >
> > On Mon, Sep 19, 2022 at 1:43 PM Ashutosh Sharma  
> > wrote:
> > >
> > > On Fri, Sep 9, 2022 at 5:36 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > On Thu, Sep 8, 2022 at 8:32 PM Ashutosh Sharma  
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 8, 2022 at 6:23 PM Ashutosh Bapat
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Sep 8, 2022 at 4:14 PM Ashutosh Sharma 
> > > > > >  wrote:
> > > > Can you please point to the documentation.
> > > >
> > >
> > > AFAIU there is just one documentation. Here is the link for it:
> > >
> > > https://www.postgresql.org/docs/current/view-pg-replication-slots.html
> >
> > Thanks. Description of confirmed_flush_lsn is "The address (LSN) up to
> > which the logical slot's consumer has confirmed receiving data. Data
> > older than this is not available anymore. NULL for physical slots."
> > The second sentence is misleading. AFAIU, it really should be "Data
> > corresponding to the transactions committed before this LSN is not
> > available anymore". WAL before restart_lsn is likely to be removed but
> > WAL with LSN higher than restart_lsn is preserved. This correction
> > makes more sense because of the third sentence.
> >
>
> Thanks for the clarification. Attached is the patch with the changes.
> Please have a look.
>
Looks good to me. Do you want to track this through commitfest app?



-- 
Best Wishes,
Ashutosh Bapat




Re: Tighten pg_get_object_address argument checking

2022-09-21 Thread Peter Eisentraut

On 21.09.22 12:01, Amit Kapila wrote:

On Tue, Sep 20, 2022 at 11:14 PM Peter Eisentraut
 wrote:


For publication schemas (OBJECT_PUBLICATION_NAMESPACE) and user
mappings (OBJECT_USER_MAPPING), pg_get_object_address() checked the
array length of the second argument, but not of the first argument.
If the first argument was too long, it would just silently ignore
everything but the first argument.  Fix that by checking the length of
the first argument as well.


LGTM.


Committed, thanks for checking.





Re: Mingw task for Cirrus CI

2022-09-21 Thread Andres Freund
Hi,

On 2022-09-21 16:18:43 +0300, Melih Mutlu wrote:
> I've been thinking about how to make the mingw task run only for this patch
> on cfbot and not for others. TBH, I couldn't come up with a nice way to
> achieve this.
> 
> Does anyone have any suggestions on this?

Add a commented-out trigger-type: manual and note in the commit message that
the committer needs to adjust that piece.

Greetings,

Andres Freund




Is it time to drop fix-old-flex-code.pl?

2022-09-21 Thread Tom Lane
In view of

Author: John Naylor 
Branch: master [8b878bffa] 2022-09-09 12:55:23 +0700

Bump minimum version of Flex to 2.5.35

I wonder if we should go a little further and get rid of
src/tools/fix-old-flex-code.pl (just in HEAD, to be clear).
That does nothing when flex is 2.5.36 or newer, and even with
older versions it's just suppressing an "unused variable" warning.
I think there's a reasonable case for not caring about that,
or at least valuing it lower than saving one build step.

According to a recent survey, these are the active buildfarm
members still using 2.5.35:

 frogfish  | 2022-08-21 17:59:26 | configure: using flex 2.5.35
 hoverfly  | 2022-09-02 16:02:01 | configure: using flex 2.5.35
 lapwing   | 2022-09-02 16:40:12 | configure: using flex 2.5.35
 skate | 2022-09-02 07:27:10 | configure: using flex 2.5.35
 snapper   | 2022-09-02 13:38:22 | configure: using flex 2.5.35

lapwing uses -Werror, so it will be unhappy, but I don't think it's
unreasonable to say that you should be using fairly modern tools
if you want to use -Werror.

Or we could leave well enough alone.  But the current cycle of
getting rid of ancient portability hacks seems like a good climate
for this to happen.  Thoughts?

regards, tom lane




Removing unused parameter in SnapBuildGetOrBuildSnapshot

2022-09-21 Thread Melih Mutlu
Hi hackers,

Sharing a small patch to remove an unused parameter
in SnapBuildGetOrBuildSnapshot function in snapbuild.c

With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f,
SnapBuildBuildSnapshot no longer needs transaction id. This also makes the
xid parameter in SnapBuildGetOrBuildSnapshot useless.
I couldn't see a reason to keep it and decided to remove it.

Regards,
Melih


0001-Remove-unused-xid-parameter.patch
Description: Binary data


Re: [PATCH] polish the error message of creating proc

2022-09-21 Thread Junwang Zhao
On Wed, Sep 21, 2022 at 8:17 PM Julien Rouhaud  wrote:
>
> Hi,
>
> On Wed, Sep 21, 2022 at 07:45:01PM +0800, Junwang Zhao wrote:
> > when a error occurs when creating proc, it should point out the
> > specific proc kind instead of just printing "function".
>
> You should have kept the original thread in copy (1), or at least mention it.

Yeah, thanks for pointing that out, will do that next time.

>
> > diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
> > index a9fe45e347..58af4b48ce 100644
> > --- a/src/backend/catalog/pg_proc.c
> > +++ b/src/backend/catalog/pg_proc.c
> > @@ -373,7 +373,11 @@ ProcedureCreate(const char *procedureName,
> > if (!replace)
> > ereport(ERROR,
> > 
> > (errcode(ERRCODE_DUPLICATE_FUNCTION),
> > -errmsg("function \"%s\"
> > already exists with same argument types",
> > +errmsg("%s \"%s\" already
> > exists with same argument types",
> > +
> > (oldproc->prokind == PROKIND_AGGREGATE ? "aggregate function" :
> > +
> > oldproc->prokind == PROKIND_PROCEDURE ? "procedure" :
> > +
> > oldproc->prokind == PROKIND_WINDOW ? "window function" :
> > +"function"),
> > procedureName)));
> > if (!pg_proc_ownercheck(oldproc->oid, proowner))
> > aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION,
>
> You can't put part of the message in parameter, as the resulting string isn't
> translatable.  You should either use "routine" as a generic term or provide 3
> different full messages.

I noticed that there are some translations under the backend/po directory,
can we just change
msgid "function \"%s\" already exists with same argument types"
to
msgid "%s \"%s\" already exists with same argument types" ?

>
> [1] 
> https://www.postgresql.org/message-id/29ea5666.6ce8.1835f4b4992.coremail.qtds_...@126.com



-- 
Regards
Junwang Zhao




Re: Removing unused parameter in SnapBuildGetOrBuildSnapshot

2022-09-21 Thread Zhang Mingli
On Sep 21, 2022, 22:22 +0800, Melih Mutlu , wrote:
> Hi hackers,
>
> Sharing a small patch to remove an unused parameter in 
> SnapBuildGetOrBuildSnapshot function in snapbuild.c
>
> With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, SnapBuildBuildSnapshot 
> no longer needs transaction id. This also makes the xid parameter in 
> SnapBuildGetOrBuildSnapshot useless.
> I couldn't see a reason to keep it and decided to remove it.
>
> Regards,
> Melih
+1, Good Catch.

Regards,
Zhang Mingli


Re: [PATCH] polish the error message of creating proc

2022-09-21 Thread Julien Rouhaud
On Wed, Sep 21, 2022 at 10:35:47PM +0800, Junwang Zhao wrote:
> On Wed, Sep 21, 2022 at 8:17 PM Julien Rouhaud  wrote:
> >
> > You can't put part of the message in parameter, as the resulting string 
> > isn't
> > translatable.  You should either use "routine" as a generic term or provide 
> > 3
> > different full messages.
>
> I noticed that there are some translations under the backend/po directory,
> can we just change
> msgid "function \"%s\" already exists with same argument types"
> to
> msgid "%s \"%s\" already exists with same argument types" ?

The problem is that the parameters are passed as-is, so the final emitted
translated string will be a mix of both languages, which isn't acceptable.




Re: why can't a table be part of the same publication as its schema

2022-09-21 Thread Alvaro Herrera
On 2022-Sep-20, Robert Haas wrote:

> > I don't think we should change this behavior that's already in logical
> > replication. While I understand the reasons why "GRANT ... ALL TABLES IN
> > SCHEMA" has a different behavior (i.e. it's not applied to future
> > objects) and do not advocate to change it, I have personally been
> > affected where I thought a permission would be applied to all future
> > objects, only to discover otherwise. I believe it's more intuitive to
> > think that "ALL" applies to "everything, always."
> 
> Nah, there's room for multiple behaviors here. It's reasonable to want
> to add all the tables currently in the schema to a publication (or
> grant permissions on them) and it's reasonable to want to include all
> current and future tables in the schema in a publication (or grant
> permissions on them) too. The reason I don't like the ALL TABLES IN
> SCHEMA syntax is that it sounds like the former, but actually is the
> latter. Based on your link to the email from Tom, I understand now the
> reason why it's like that, but it's still counterintuitive to me.

I already proposed elsewhere that we remove the ALL keyword from there,
which I think serves to reduce confusion (in particular it's no longer
parallel to the GRANT one).  As in the attached.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."
>From a72ceafe65c02998761cf21bec2820bf49b00a8f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 21 Sep 2022 16:18:16 +0200
Subject: [PATCH] remove ALL from ALL TABLES IN SCHEMA

---
 doc/src/sgml/logical-replication.sgml |   4 +-
 doc/src/sgml/ref/alter_publication.sgml   |  16 +--
 doc/src/sgml/ref/create_publication.sgml  |  14 +-
 doc/src/sgml/ref/create_subscription.sgml |   2 +-
 doc/src/sgml/system-views.sgml|   2 +-
 src/backend/catalog/pg_publication.c  |   4 +-
 src/backend/commands/publicationcmds.c|   6 +-
 src/backend/parser/gram.y |  18 +--
 src/backend/replication/pgoutput/pgoutput.c   |   3 +-
 src/bin/pg_dump/pg_dump.c |   2 +-
 src/bin/pg_dump/t/002_pg_dump.pl  |  12 +-
 src/bin/psql/tab-complete.c   |  14 +-
 src/test/regress/expected/alter_table.out |   2 +-
 src/test/regress/expected/object_address.out  |   2 +-
 src/test/regress/expected/publication.out | 120 +-
 src/test/regress/sql/alter_table.sql  |   2 +-
 src/test/regress/sql/object_address.sql   |   2 +-
 src/test/regress/sql/publication.sql  | 104 +++
 .../t/025_rep_changes_for_schema.pl   |   6 +-
 src/test/subscription/t/028_row_filter.pl |  12 +-
 src/test/subscription/t/031_column_list.pl|   4 +-
 21 files changed, 175 insertions(+), 176 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 48fd8e33dc..7fe3a1043e 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -700,7 +700,7 @@ test_sub=# SELECT * FROM t3;
  
   
one of the publications was created using
-   FOR ALL TABLES IN SCHEMA and the table belongs to
+   FOR TABLES IN SCHEMA and the table belongs to
the referred schema. This clause does not allow row filters.
   
  
@@ -1530,7 +1530,7 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
Moreover, if untrusted users can create tables, use only
publications that list tables explicitly.  That is to say, create a
subscription FOR ALL TABLES or
-   FOR ALL TABLES IN SCHEMA only when superusers trust
+   FOR TABLES IN SCHEMA only when superusers trust
every user permitted to create a non-temp table on the publisher or the
subscriber.
   
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index d8ed89ee91..fc2d6d4885 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -31,7 +31,7 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of:
 
 TABLE [ ONLY ] table_name [ * ] [ ( column_name [, ... ] ) ] [ WHERE ( expression ) ] [, ... ]
-ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
+TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
 
@@ -71,19 +71,19 @@ ALTER PUBLICATION name RENAME TO 
You must own the publication to use ALTER PUBLICATION.
Adding a table to a publication additionally requires owning that table.
-   The ADD ALL TABLES IN SCHEMA and
-   SET ALL TABLES IN SCHEMA to a publication requires the
+   The ADD TABLES IN SCHEMA and
+   SET TABLES IN SCHEMA to a publication requires the
invoking user to be a superuser.  To alter the owner, you must also be a
direct or indirect member of the new owning role. The new owner must have
CREATE privilege on the database.  Also, th

Re: ICU for global collation

2022-09-21 Thread Peter Eisentraut

On 21.09.22 08:50, Marina Polyakova wrote:

On 2022-09-20 12:59, Peter Eisentraut wrote:

On 17.09.22 10:33, Marina Polyakova wrote:

3.

The locale provider is ICU, but it has not yet been set from the 
template database:



$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start &&
createdb --icu-locale ru-RU --template template0 mydb
...
createdb: error: database creation failed: ERROR:  ICU locale cannot be
specified unless locale provider is ICU


Please see attached patch for a fix.  Does that work for you?


Yes, it works. The following test checks this fix:


Committed with that test, thanks.  I think that covers all the ICU 
issues you reported for PG15 for now?






Re: [PATCH] polish the error message of creating proc

2022-09-21 Thread Tom Lane
Junwang Zhao  writes:
> I noticed that there are some translations under the backend/po directory,
> can we just change
> msgid "function \"%s\" already exists with same argument types"
> to
> msgid "%s \"%s\" already exists with same argument types" ?

No.  This doesn't satisfy our message translation guidelines [1].
The fact that there are other messages that aren't up to project
standard isn't a license to create more.

More generally: there are probably dozens, if not hundreds, of
messages in the backend that say "function" but nowadays might
also be talking about a procedure.  I'm not sure there's value
in improving just one of them.

I am pretty sure that we made an explicit decision some time back
that it is okay to say "function" when the object could also be
an aggregate or window function.  So you could at least cut this
back to just handling "procedure" and "function".  Or you could
change it to "routine" as Julien suggests, but I think a lot of
people will not think that's an improvement.

regards, tom lane

[1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES




Re: why can't a table be part of the same publication as its schema

2022-09-21 Thread Jonathan S. Katz

On 9/21/22 10:24 AM, Alvaro Herrera wrote:

On 2022-Sep-20, Robert Haas wrote:


I don't think we should change this behavior that's already in logical
replication. While I understand the reasons why "GRANT ... ALL TABLES IN
SCHEMA" has a different behavior (i.e. it's not applied to future
objects) and do not advocate to change it, I have personally been
affected where I thought a permission would be applied to all future
objects, only to discover otherwise. I believe it's more intuitive to
think that "ALL" applies to "everything, always."


Nah, there's room for multiple behaviors here. It's reasonable to want
to add all the tables currently in the schema to a publication (or
grant permissions on them) and it's reasonable to want to include all
current and future tables in the schema in a publication (or grant
permissions on them) too. The reason I don't like the ALL TABLES IN
SCHEMA syntax is that it sounds like the former, but actually is the
latter. Based on your link to the email from Tom, I understand now the
reason why it's like that, but it's still counterintuitive to me.


I already proposed elsewhere that we remove the ALL keyword from there,
which I think serves to reduce confusion (in particular it's no longer
parallel to the GRANT one).  As in the attached.


[personal, not RMT hat]

I'd be OK with this. It would still allow for "FOR SEQUENCES IN SCHEMA" etc.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] polish the error message of creating proc

2022-09-21 Thread Junwang Zhao
On Wed, Sep 21, 2022 at 10:53 PM Tom Lane  wrote:
>
> Junwang Zhao  writes:
> > I noticed that there are some translations under the backend/po directory,
> > can we just change
> > msgid "function \"%s\" already exists with same argument types"
> > to
> > msgid "%s \"%s\" already exists with same argument types" ?
>
> No.  This doesn't satisfy our message translation guidelines [1].
> The fact that there are other messages that aren't up to project
> standard isn't a license to create more.
>
> More generally: there are probably dozens, if not hundreds, of
> messages in the backend that say "function" but nowadays might
> also be talking about a procedure.  I'm not sure there's value
> in improving just one of them.
>
> I am pretty sure that we made an explicit decision some time back
> that it is okay to say "function" when the object could also be
> an aggregate or window function.  So you could at least cut this
> back to just handling "procedure" and "function".  Or you could
> change it to "routine" as Julien suggests, but I think a lot of
> people will not think that's an improvement.

Yeah, make sense, will leave it as is.

>
> regards, tom lane
>
> [1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES



-- 
Regards
Junwang Zhao




Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.

2022-09-21 Thread Ashutosh Sharma
On Wed, Sep 21, 2022 at 7:21 PM Ashutosh Bapat
 wrote:
>
> On Mon, Sep 19, 2022 at 8:09 PM Ashutosh Sharma  wrote:
> >
> > On Mon, Sep 19, 2022 at 5:24 PM Ashutosh Bapat
> >  wrote:
> > >
> > > On Mon, Sep 19, 2022 at 1:43 PM Ashutosh Sharma  
> > > wrote:
> > > >
> > > > On Fri, Sep 9, 2022 at 5:36 PM Ashutosh Bapat
> > > >  wrote:
> > > > >
> > > > > On Thu, Sep 8, 2022 at 8:32 PM Ashutosh Sharma 
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Sep 8, 2022 at 6:23 PM Ashutosh Bapat
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Sep 8, 2022 at 4:14 PM Ashutosh Sharma 
> > > > > > >  wrote:
> > > > > Can you please point to the documentation.
> > > > >
> > > >
> > > > AFAIU there is just one documentation. Here is the link for it:
> > > >
> > > > https://www.postgresql.org/docs/current/view-pg-replication-slots.html
> > >
> > > Thanks. Description of confirmed_flush_lsn is "The address (LSN) up to
> > > which the logical slot's consumer has confirmed receiving data. Data
> > > older than this is not available anymore. NULL for physical slots."
> > > The second sentence is misleading. AFAIU, it really should be "Data
> > > corresponding to the transactions committed before this LSN is not
> > > available anymore". WAL before restart_lsn is likely to be removed but
> > > WAL with LSN higher than restart_lsn is preserved. This correction
> > > makes more sense because of the third sentence.
> > >
> >
> > Thanks for the clarification. Attached is the patch with the changes.
> > Please have a look.
> >
> Looks good to me. Do you want to track this through commitfest app?
>

Yeah, I've added an entry for it in the commitfest app and marked it
as ready for committer. Thanks for the suggestion.

--
With Regards,
Ashutosh Sharma.




Re: [PoC] Let libpq reject unexpected authentication requests

2022-09-21 Thread Jacob Champion
On Fri, Sep 16, 2022 at 1:29 PM Jacob Champion  wrote:
> I'm happy to implement proofs of concept for that, or any other ideas,
> given the importance of getting this "right enough" the first time.
> Just let me know.

v8 rebases over the postgres_fdw HINT changes; there are no functional
differences.

--Jacob
From 2343bc37ebdbb6847d6a537f094a5de660274cc3 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Fri, 24 Jun 2022 15:40:42 -0700
Subject: [PATCH v8 2/2] Add sslcertmode option for client certificates

The sslcertmode option controls whether the server is allowed and/or
required to request a certificate from the client. There are three
modes:

- "allow" is the default and follows the current behavior -- a
  configured  sslcert is sent if the server requests one (which, with
  the current implementation, will happen whenever TLS is negotiated).

- "disable" causes the client to refuse to send a client certificate
  even if an sslcert is configured.

- "require" causes the client to fail if a client certificate is never
  sent and the server opens a connection anyway. This doesn't add any
  additional security, since there is no guarantee that the server is
  validating the certificate correctly, but it may help troubleshoot
  more complicated TLS setups.

sslcertmode=require needs the OpenSSL implementation to support
SSL_CTX_set_cert_cb(). Notably, LibreSSL does not.
---
 configure| 11 ++---
 configure.ac |  4 +-
 doc/src/sgml/libpq.sgml  | 54 +++
 src/include/pg_config.h.in   |  3 ++
 src/interfaces/libpq/fe-auth.c   | 21 +
 src/interfaces/libpq/fe-connect.c| 55 
 src/interfaces/libpq/fe-secure-openssl.c | 40 -
 src/interfaces/libpq/libpq-int.h |  3 ++
 src/test/ssl/t/001_ssltests.pl   | 43 ++
 src/tools/msvc/Solution.pm   |  8 
 10 files changed, 234 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index 4efed743a1..86cbf5d68a 100755
--- a/configure
+++ b/configure
@@ -12904,13 +12904,14 @@ else
 fi
 
   fi
-  # Function introduced in OpenSSL 1.0.2.
-  for ac_func in X509_get_signature_nid
+  # Functions introduced in OpenSSL 1.0.2. LibreSSL doesn't have all of these.
+  for ac_func in X509_get_signature_nid SSL_CTX_set_cert_cb
 do :
-  ac_fn_c_check_func "$LINENO" "X509_get_signature_nid" "ac_cv_func_X509_get_signature_nid"
-if test "x$ac_cv_func_X509_get_signature_nid" = xyes; then :
+  as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
+ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
+if eval test \"x\$"$as_ac_var"\" = x"yes"; then :
   cat >>confdefs.h <<_ACEOF
-#define HAVE_X509_GET_SIGNATURE_NID 1
+#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1
 _ACEOF
 
 fi
diff --git a/configure.ac b/configure.ac
index 967f7e7209..936c4f4d07 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1350,8 +1350,8 @@ if test "$with_ssl" = openssl ; then
  AC_SEARCH_LIBS(CRYPTO_new_ex_data, [eay32 crypto], [], [AC_MSG_ERROR([library 'eay32' or 'crypto' is required for OpenSSL])])
  AC_SEARCH_LIBS(SSL_new, [ssleay32 ssl], [], [AC_MSG_ERROR([library 'ssleay32' or 'ssl' is required for OpenSSL])])
   fi
-  # Function introduced in OpenSSL 1.0.2.
-  AC_CHECK_FUNCS([X509_get_signature_nid])
+  # Functions introduced in OpenSSL 1.0.2. LibreSSL doesn't have all of these.
+  AC_CHECK_FUNCS([X509_get_signature_nid SSL_CTX_set_cert_cb])
   # Functions introduced in OpenSSL 1.1.0. We used to check for
   # OPENSSL_VERSION_NUMBER, but that didn't work with 1.1.0, because LibreSSL
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1d6c02058e..821078ea6a 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1810,6 +1810,50 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  sslcertmode
+  
+   
+This option determines whether a client certificate may be sent to the
+server, and whether the server is required to request one. There are
+three modes:
+
+
+ 
+  disable
+  
+   
+a client certificate is never sent, even if one is provided via
+
+   
+  
+ 
+
+ 
+  allow (default)
+  
+   
+a certificate may be sent, if the server requests one and it has
+been provided via sslcert
+   
+  
+ 
+
+ 
+  require
+  
+   
+the server must request a certificate. The
+connection will fail if the server authenticates the client despite
+not requesting or receiving one.
+   
+  
+ 
+
+   
+  
+ 
+
  
   sslrootcert
   
@@ -7968,6 +8012,16 @

Re: Silencing the remaining clang 15 warnings

2022-09-21 Thread Tom Lane
HEAD and v15 now compile cleanly for me with clang 15.0.0,
but I find that there's still work to do in the back branches:

* There are new(?) -Wunused-but-set-variable warnings in every older
branch, which we evidently cleaned up or rewrote at one point or
another.  I think this is definitely worth fixing in the in-support
branches.  I'm a bit less sure if it's worth the trouble in the
out-of-support branches.

* The 9.2 and 9.3 branches spew boatloads of 'undeclared library function'
warnings about strlcpy() and related functions.  This is evidently
because 16fbac39f was only back-patched as far as 9.4.  There are
enough of these to be pretty annoying if you're trying to build those
branches with clang, so I think this is clearly justified for
back-patching into the older out-of-support branches, assuming that
the patch will work there.  (I see that 9.2 and 9.3 were still on the
prior version of autoconf, so it might not be an easy change.)

I also observe that 9.2 and 9.3 produce

float.c:1278:29: warning: implicit conversion from 'int' to 'float' changes 
value from 2147483647 to 2147483648 [-Wimplicit-const-int-float-conversion]

This is because cbdb8b4c0 was only back-patched as far as 9.4.
However, I think that that would *not* be fit material for
back-patching into out-of-support branches, since our policy
for them is "no behavioral changes".

regards, tom lane




Re: Is it time to drop fix-old-flex-code.pl?

2022-09-21 Thread Julien Rouhaud
On Wed, Sep 21, 2022 at 10:21:25AM -0400, Tom Lane wrote:
>
> lapwing uses -Werror, so it will be unhappy, but I don't think it's
> unreasonable to say that you should be using fairly modern tools
> if you want to use -Werror.

I think that the -Werror helped to find problems multiple times (although less
often than the ones due to the 32bits arch).  If it's worth keeping I could see
if I can get a 2.5.36 flex to compile, if we drop fix-old-flex-code.pl.




Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-21 Thread Jacob Champion
On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion  wrote:
> > 2. Add support to pass on the OAuth bearer token. In this
> > obtaining the bearer token is left to 3rd party application or user.
> >
> > ./psql -U  -d 'dbname=postgres
> > oauth_client_id= oauth_bearer_token=
>
> This hurts, but I think people are definitely going to ask for it, given
> the frightening practice of copy-pasting these (incredibly sensitive
> secret) tokens all over the place...

After some further thought -- in this case, you already have an opaque
Bearer token (and therefore you already know, out of band, which
provider needs to be used), you're willing to copy-paste it from
whatever service you got it from, and you have an extension plugged
into Postgres on the backend that verifies this Bearer blob using some
procedure that Postgres knows nothing about.

Why do you need the OAUTHBEARER mechanism logic at that point? Isn't
that identical to a custom password scheme? It seems like that could
be handled completely by Samay's pluggable auth proposal.

--Jacob




Re: Query Jumbling for CALL and SET utility statements

2022-09-21 Thread Fujii Masao




On 2022/09/19 15:29, Drouvot, Bertrand wrote:

Please find attached v6 taking care of the remarks mentioned above.


Thanks for updating the patch!

+SET pg_stat_statements.track_utility = TRUE;
+
+-- PL/pgSQL procedure and pg_stat_statements.track = all
+-- we drop and recreate the procedures to avoid any caching funnies
+SET pg_stat_statements.track_utility = FALSE;

Could you tell me why track_utility is enabled just before it's disabled?

Could you tell me what actually happens if we don't drop and
recreate the procedures? I'd like to know what "any caching funnies" are.

+SELECT pg_stat_statements_reset();
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";

This test set for the procedures is executed with the following
four conditions, respectively. Do we really need all of these tests?

track = top, track_utility = true
track = top, track_utility = false
track = all, track_utility = true
track = all, track_utility = false

+begin;
+prepare transaction 'tx1';
+insert into test_tx values (1);
+commit prepared 'tx1';

The test set of 2PC commands is also executed with track_utility = on
and off, respectively. But why do we need to run that test when
track_utility = off?

-   if (query->utilityStmt)
+   if (query->utilityStmt && !jstate)
{
if (pgss_track_utility && 
!PGSS_HANDLED_UTILITY(query->utilityStmt))

"pgss_track_utility" should be
"pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Query JITing with LLVM ORC

2022-09-21 Thread João Paulo Labegalini de Carvalho
Hi all,

I am working on a project with LLVM ORC that led us to PostgreSQL as a
target application. We were surprised by learning that PGSQL already uses
LLVM ORC to JIT certain queries.

I would love to know what motivated this feature and for what it is being
currently used for, as it is not enabled by default.

Thanks.

-- 
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carva...@ic.unicamp.br
joao.carva...@ualberta.ca


Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

2022-09-21 Thread Ashutosh Sharma
On Tue, Sep 20, 2022 at 5:13 PM Bharath Rupireddy
 wrote:
>
> On Mon, Sep 19, 2022 at 8:19 PM Ashutosh Sharma  wrote:
> >
> > Hi All,
> >
> > Currently, we have pg_current_wal_insert_lsn and pg_walfile_name sql
> > functions which gives us information about the next wal insert
> > location and the WAL file that the next wal insert location belongs
> > to. Can we have a binary version of these sql functions?
>
> +1 for the idea in general.
>
> As said, pg_waldump seems to be the right candidate. I think we want
> the lsn of the last WAL record and its info and the WAL file name
> given an input data directory or just the pg_wal directory or any
> directory where WAL files are located. For instance, one can use this
> on an archive location containing archived WAL files or on a node
> where pg_receivewal is receiving WAL files. Am I missing any other
> use-cases?
>

Yeah, we can either add this functionality to pg_waldump or maybe add
a new binary itself that would return this information.

--
With Regards,
Ashutosh Sharma.




Re: why can't a table be part of the same publication as its schema

2022-09-21 Thread Amit Kapila
On Wed, Sep 21, 2022 at 8:24 PM Jonathan S. Katz  wrote:
>
> On 9/21/22 10:24 AM, Alvaro Herrera wrote:
> > On 2022-Sep-20, Robert Haas wrote:
> >
> >>> I don't think we should change this behavior that's already in logical
> >>> replication. While I understand the reasons why "GRANT ... ALL TABLES IN
> >>> SCHEMA" has a different behavior (i.e. it's not applied to future
> >>> objects) and do not advocate to change it, I have personally been
> >>> affected where I thought a permission would be applied to all future
> >>> objects, only to discover otherwise. I believe it's more intuitive to
> >>> think that "ALL" applies to "everything, always."
> >>
> >> Nah, there's room for multiple behaviors here. It's reasonable to want
> >> to add all the tables currently in the schema to a publication (or
> >> grant permissions on them) and it's reasonable to want to include all
> >> current and future tables in the schema in a publication (or grant
> >> permissions on them) too. The reason I don't like the ALL TABLES IN
> >> SCHEMA syntax is that it sounds like the former, but actually is the
> >> latter. Based on your link to the email from Tom, I understand now the
> >> reason why it's like that, but it's still counterintuitive to me.
> >
> > I already proposed elsewhere that we remove the ALL keyword from there,
> > which I think serves to reduce confusion (in particular it's no longer
> > parallel to the GRANT one).  As in the attached.
>

Thanks for working on this.

> [personal, not RMT hat]
>
> I'd be OK with this. It would still allow for "FOR SEQUENCES IN SCHEMA" etc.
>

I also think this is reasonable. It can later be extended to have an
option to exclude/include future tables with a publication option.
Also, if we want to keep it compatible with FOR ALL TABLES syntax, we
can later add ALL as an optional keyword in the syntax.

-- 
With Regards,
Amit Kapila.




Re: [RFC] building postgres with meson - v13

2022-09-21 Thread Tom Lane
Andres Freund  writes:
> I think we should:

> - convert windows to build with ninja - it builds faster, runs all tests,
>   parallelizes tests. That means that msbuild based builds don't have coverage
>   via CI / cfbot, but we don't currently have the resources to test both.

Check.  The sooner we can get rid of the custom MSVC scripts, the better,
because now we'll be on the hook to maintain *three* build systems.

> - add a linux build using meson, we currently can afford building both with
>   autoconf and meson for linux

Right.

> I'm less clear on whether we should convert macos / freebsd to meson at this
> point?

We certainly could debug/polish the meson stuff just on linux and windows
for now, but is there a reason to wait on the others?

regards, tom lane




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-21 Thread Nathan Bossart
On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
> In trying to wrap the SIMD code behind layers of abstraction, the latest
> patch (and Nathan's cleanup) threw it away in almost all cases. To explain,
> we need to talk about how vectorized code deals with the "tail" that is too
> small for the register:
> 
> 1. Use a one-by-one algorithm, like we do for the pg_lfind* variants.
> 2. Read some junk into the register and mask off false positives from the
> result.
> 
> There are advantages to both depending on the situation.
> 
> Patch v5 and earlier used #2. Patch v6 used #1, so if a node16 has 15
> elements or less, it will iterate over them one-by-one exactly like a
> node4. Only when full with 16 will the vector path be taken. When another
> entry is added, the elements are copied to the next bigger node, so there's
> a *small* window where it's fast.
> 
> In short, this code needs to be lower level so that we still have full
> control while being portable. I will work on this, and also the related
> code for node dispatch.

Is it possible to use approach #2 here, too?  AFAICT space is allocated for
all of the chunks, so there wouldn't be any danger in searching all them
and discarding any results >= node->count.  Granted, we're depending on the
number of chunks always being a multiple of elements-per-vector in order to
avoid the tail path, but that seems like a reasonably safe assumption that
can be covered with comments.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Proposal to use JSON for Postgres Parser format

2022-09-21 Thread Matthias van de Meent
On Tue, 20 Sept 2022 at 17:29, Alvaro Herrera  wrote:
>
> On 2022-Sep-20, Matthias van de Meent wrote:
>
> > Allow me to add: compressability
> >
> > In the thread surrounding [0] there were complaints about the size of
> > catalogs, and specifically the template database. Significant parts of
> > that (688kB of 8080kB a fresh PG14 database) are in pg_rewrite, which
> > consists mostly of serialized Nodes. If we're going to replace our
> > current NodeToText infrastructure, we'd better know we can effectively
> > compress this data.
>
> True.  Currently, the largest ev_action values compress pretty well.  I
> think if we wanted this to be more succint, we would have to invent some
> binary format -- perhaps something like Protocol Buffers: it'd be stored
> in the binary format in catalogs, but for output it would be converted
> into something easy to read (we already do this for
> pg_statistic_ext_data for example).  We'd probably lose compressibility,
> but that'd be okay because the binary format would already remove most
> of the redundancy by nature.
>
> Do we want to go there?

I don't think that a binary format would be much better for
debugging/fixing than an optimization of the current textual format
when combined with compression. As I mentioned in that thread, there
is a lot of improvement possible with the existing format, and I think
any debugging of serialized nodes would greatly benefit from using a
textual format.

Then again, I also agree that this argument doesn't hold it's weight
when storage and output formats are going to be different. I trust
that any new tooling introduced as a result of this thread will be
better than what we have right now.

As for best format: I don't know. The current format is usable, and a
better format would not store any data for default values. JSON can do
that, but I could think of many formats that could do the same (Smile,
BSON, xml, etc.).

I do not think that protobuf is the best choice for storage, though,
because it has its own rules on what it considers a default value and
what it does or does not serialize: zero is considered the only
default for numbers, as is the empty string for text, etc.
I think it is allright for general use, but with e.g. `location: -1`
in just about every parse node we'd probably want to select our own
values to ignore during (de)serialization of fields.

Kind regards,

Matthias van de Meent




Re: [RFC] building postgres with meson - v13

2022-09-21 Thread Andres Freund
Hi,

On 2022-09-21 13:56:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I think we should:
> 
> > - convert windows to build with ninja - it builds faster, runs all tests,
> >   parallelizes tests. That means that msbuild based builds don't have 
> > coverage
> >   via CI / cfbot, but we don't currently have the resources to test both.
> 
> Check.  The sooner we can get rid of the custom MSVC scripts, the better,
> because now we'll be on the hook to maintain *three* build systems.

Agreed. I think the only "major" missing thing is the windows resource file
generation stuff, which is mostly done in one of the "later" commits. Also
need to test a few more of the optional dependencies (ICU, gettext, ...) on
windows (I did test zlib, lz4, zstd). And of course get a bit of wider
exposure than "just me and CI".


> > I'm less clear on whether we should convert macos / freebsd to meson at this
> > point?
> 
> We certainly could debug/polish the meson stuff just on linux and windows
> for now, but is there a reason to wait on the others?

No - freebsd and macos have worked in CI for a long time. I was wondering
whether we want more coverage for autoconf in CI, but thinking about it
futher, it's more important to have the meson coverage.

Greetings,

Andres Freund




Re: [RFC] building postgres with meson - v13

2022-09-21 Thread Justin Pryzby
On Wed, Sep 21, 2022 at 09:46:30AM -0700, Andres Freund wrote:
> I think we should:
> 
> - convert windows to build with ninja - it builds faster, runs all tests,
>   parallelizes tests. That means that msbuild based builds don't have coverage
>   via CI / cfbot, but we don't currently have the resources to test both.

+1

If multiple Windows (or other) tasks are going to exist, I think they
should have separate "ci-os-only" conditions, like windows-msvc,
windows-ninja, ...  It should be possible to run only one.




Re: Query JITing with LLVM ORC

2022-09-21 Thread Thomas Munro
On Thu, Sep 22, 2022 at 4:17 AM João Paulo Labegalini de Carvalho
 wrote:
> I am working on a project with LLVM ORC that led us to PostgreSQL as a target 
> application. We were surprised by learning that PGSQL already uses LLVM ORC 
> to JIT certain queries.

It JITs expressions but not whole queries.  Query execution at the
tuple-flow level is still done using a C call stack the same shape as
the query plan, but it *could* be transformed to a different control
flow that could be run more efficiently and perhaps JITed.  CCing
Andres who developed all this and had some ideas about that...

> I would love to know what motivated this feature and for what it is being 
> currently used for,

https://www.postgresql.org/docs/current/jit-reason.html

> as it is not enabled by default.

It's enabled by default in v12 and higher (if you built with
--with-llvm, as packagers do), but not always used:

https://www.postgresql.org/docs/current/jit-decision.html




Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-21 Thread Nathan Bossart
On Tue, Sep 20, 2022 at 03:12:00PM -0700, Peter Geoghegan wrote:
> On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan  wrote:
>> I'd like to talk about one such technique on this thread. The attached
>> WIP patch reduces the size of xl_heap_freeze_page records by applying
>> a simple deduplication process.
> 
> Attached is v2, which I'm just posting to keep CFTester happy. No real
> changes here.

This idea seems promising.  I see that you called this patch a
work-in-progress, so I'm curious what else you are planning to do with it.

As I'm reading this thread and the patch, I'm finding myself wondering if
it's worth exploring using wal_compression for these records instead.  I
think you've essentially created an efficient compression mechanism for
this one type of record, but I'm assuming that lz4/zstd would also yield
some rather substantial improvements for this kind of data.  Presumably a
generic WAL record compression mechanism could be reused for other large
records, too.  That could be much easier than devising a deduplication
strategy for every record type.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_auth_members.grantor is bunk

2022-09-21 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 18, 2022 at 1:26 PM Robert Haas  wrote:
>> CI is happier with this version, so I've committed 0001. If no major
>> problems emerge, I'll proceed with 0002 as well.

> Done.

Shouldn't the CF entry [1] be closed as committed?

regards, tom lane

[1] https://commitfest.postgresql.org/39/3745/




Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-21 Thread Peter Geoghegan
On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart  wrote:
> This idea seems promising.  I see that you called this patch a
> work-in-progress, so I'm curious what else you are planning to do with it.

I really just meant that the patch wasn't completely finished at that
point. I hadn't yet convinced myself that I mostly had it right. I'm
more confident now.

> As I'm reading this thread and the patch, I'm finding myself wondering if
> it's worth exploring using wal_compression for these records instead.

The term deduplication works better than compression here because
we're not actually decompressing anything in the REDO routine. Rather,
the REDO routine processes each freeze plan by processing all affected
tuples in order. To me this seems like the natural way to structure
things -- the WAL records are much smaller, but in a way that's kind
of incidental. The approach taken by the patch just seems like the
natural approach, given the specifics of how freezing works at a high
level.

> I think you've essentially created an efficient compression mechanism for
> this one type of record, but I'm assuming that lz4/zstd would also yield
> some rather substantial improvements for this kind of data.

I don't think of it that way. I've used the term "deduplication" to
advertise the patch, but that's mostly just a description of what
we're doing in the patch relative to what we do on HEAD today. There
is nothing truly clever in the patch. We see a huge amount of
redundancy among tuples from the same page in practically all cases,
for reasons that have everything to do with what freezing is, and how
it works at a high level. The thought process that led to my writing
this patch was more high level than appearances suggest. (I often
write patches that combine high level and low level insights in some
way or other, actually.)

Theoretically there might not be very much redundancy within each
xl_heap_freeze_page record, with the right workload, but in practice a
decrease of 4x or more is all but guaranteed once you have more than a
few tuples to freeze on each page. If there are other WAL records that
are as space inefficient as xl_heap_freeze_page is, then I'd be
surprised -- it is *unusually* space inefficient (like I said, I
suspect that this may have something to do with the fact that it was
originally designed under time pressure). So I don't expect that this
patch tells us much about what we should do for any other WAL record.
I certainly *hope* that it doesn't, at least.

> Presumably a
> generic WAL record compression mechanism could be reused for other large
> records, too.  That could be much easier than devising a deduplication
> strategy for every record type.

It's quite possible that that's a good idea, but that should probably
work as an additive thing. That's something that I think of as a
"clever technique", whereas I'm focussed on just not being naive in
how we represent this one specific WAL record type.

-- 
Peter Geoghegan




Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-21 Thread Peter Geoghegan
On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan  wrote:
> > Presumably a
> > generic WAL record compression mechanism could be reused for other large
> > records, too.  That could be much easier than devising a deduplication
> > strategy for every record type.
>
> It's quite possible that that's a good idea, but that should probably
> work as an additive thing. That's something that I think of as a
> "clever technique", whereas I'm focussed on just not being naive in
> how we represent this one specific WAL record type.

BTW, if you wanted to pursue something like this, that would work with
many different types of WAL record, ISTM that a "medium level" (not
low level) approach might be the best place to start. In particular,
the way that page offset numbers are represented in many WAL records
is quite space inefficient.  A domain-specific approach built with
some understanding of how page offset numbers tend to look in practice
seems promising.

The representation of page offset numbers in PRUNE and VACUUM heapam
WAL records (and in index WAL records) always just stores an array of
2 byte OffsetNumber elements. It probably wouldn't be all that
difficult to come up with a simple scheme for compressing an array of
OffsetNumbers in WAL records. It certainly doesn't seem like it would
be all that difficult to get it down to 1 byte per offset number in
most cases (even greater improvements seem doable).

That could also be used for the xl_heap_freeze_page record type --
though only after this patch is committed. The patch makes the WAL
record use a simple array of page offset numbers, just like in
PRUNE/VACUUM records. That's another reason why the approach
implemented by the patch seems like "the natural approach" to me. It's
much closer to how heapam PRUNE records work (we have a variable
number of arrays of page offset numbers in both cases).

-- 
Peter Geoghegan




Re: Query JITing with LLVM ORC

2022-09-21 Thread João Paulo Labegalini de Carvalho
Hi Thomas,

It JITs expressions but not whole queries.


Thanks for the clarification.


> Query execution at the
> tuple-flow level is still done using a C call stack the same shape as
> the query plan, but it *could* be transformed to a different control
> flow that could be run more efficiently and perhaps JITed.


I see, so there is room for extending the use of Orc JIT in PGSQL.


> CCing
> Andres who developed all this and had some ideas about that...
>

Thanks for CCing Andres, it will be great to hear from him.


> > I would love to know what motivated this feature and for what it is
> being currently used for,
>
> https://www.postgresql.org/docs/current/jit-reason.html


In that link I found the README under src/backend/jit, which was very
helpful.

> as it is not enabled by default.
>
> It's enabled by default in v12 and higher (if you built with
> --with-llvm, as packagers do), but not always used:
>
> https://www.postgresql.org/docs/current/jit-decision.html
>

Good to know. I compiled from the REL_14_5 tag and did a simple experiment
to contrast building with and w/o passing --with-llvm.
I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
and 120 of measurement time.
The number of transactions per minute was about the same with & w/o JITing.
Is this expected? Should I use a different benchmark to observe a
performance difference?

Regards,

-- 
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carva...@ic.unicamp.br
joao.carva...@ualberta.ca


Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-21 Thread Jacob Champion
On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy
 wrote:
> We can support both passing the token from an upstream client and libpq 
> implementing OAUTH2 protocol to obtaining one.

Right, I agree that we could potentially do both.

> Libpq passing toked directly from an upstream client is useful in other 
> scenarios:
> 1. Enterprise clients, built with .Net / Java and using provider-specific 
> authentication libraries, like MSAL for AAD. Those can also support more 
> advance provider-specific token acquisition flows.
> 2. Resource-tight (like IoT) clients. Those can be compiled without optional 
> libpq flag not including the iddawc or other dependency.

What I don't understand is how the OAUTHBEARER mechanism helps you in
this case. You're short-circuiting the negotiation where the server
tells the client what provider to use and what scopes to request, and
instead you're saying "here's a secret string, just take it and
validate it with magic."

I realize the ability to pass an opaque token may be useful, but from
the server's perspective, I don't see what differentiates it from the
password auth method plus a custom authenticator plugin. Why pay for
the additional complexity of OAUTHBEARER if you're not going to use
it?

--Jacob




Re: Query JITing with LLVM ORC

2022-09-21 Thread Tom Lane
=?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?=  
writes:
> Good to know. I compiled from the REL_14_5 tag and did a simple experiment
> to contrast building with and w/o passing --with-llvm.
> I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
> and 120 of measurement time.
> The number of transactions per minute was about the same with & w/o JITing.
> Is this expected? Should I use a different benchmark to observe a
> performance difference?

TPC-C is mostly short queries, so we aren't likely to choose to use JIT
(and if we did, it'd likely be slower).  You need a long query that will
execute the same expressions over and over for it to make sense to
compile them.  Did you check whether any JIT was happening there?

There are a bunch of issues in this area concerning whether our cost
models are good enough to accurately predict whether JIT is a good
idea.  But single-row fetches and updates are basically never going
to use it, nor should they.

regards, tom lane




Re: [PoC] Let libpq reject unexpected authentication requests

2022-09-21 Thread Peter Eisentraut

On 21.09.22 17:33, Jacob Champion wrote:

On Fri, Sep 16, 2022 at 1:29 PM Jacob Champion  wrote:

I'm happy to implement proofs of concept for that, or any other ideas,
given the importance of getting this "right enough" the first time.
Just let me know.


v8 rebases over the postgres_fdw HINT changes; there are no functional
differences.


So let's look at the two TODO comments you have:

 * TODO: how should !auth_required interact with an incomplete
 * SCRAM exchange?

What specific combination of events are you thinking of here?


/*
 * If implicit GSS auth has already been performed via GSS
 * encryption, we don't need to have performed an
 * AUTH_REQ_GSS exchange.
 *
 * TODO: check this assumption. What mutual auth guarantees
 * are made in this case?
 */

I don't understand the details involved here, but I would be surprised 
if this assumption is true.  For example, does GSS encryption deal with 
user names and a user name map?  I don't see how these can be 
equivalent.  In any case, it seems to me that it would be safer to *not* 
make this assumption at first and then have someone more knowledgeable 
make the argument that it would be safe.






Re: Query JITing with LLVM ORC

2022-09-21 Thread Thomas Munro
On Thu, Sep 22, 2022 at 10:35 AM Tom Lane  wrote:
> =?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?=  
> writes:
> > Good to know. I compiled from the REL_14_5 tag and did a simple experiment
> > to contrast building with and w/o passing --with-llvm.
> > I ran the TPC-C benchmark with 1 warehouse, 10 terminals, 20min of ramp-up,
> > and 120 of measurement time.
> > The number of transactions per minute was about the same with & w/o JITing.
> > Is this expected? Should I use a different benchmark to observe a
> > performance difference?
>
> TPC-C is mostly short queries, so we aren't likely to choose to use JIT
> (and if we did, it'd likely be slower).  You need a long query that will
> execute the same expressions over and over for it to make sense to
> compile them.  Did you check whether any JIT was happening there?

See also the proposal thread which has some earlier numbers from TPC-H.

https://www.postgresql.org/message-id/flat/20170901064131.tazjxwus3k2w3ybh%40alap3.anarazel.de




Re: Query JITing with LLVM ORC

2022-09-21 Thread Thomas Munro
On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho
 wrote:
>building with and w/o passing --with-llvm

BTW you can also just turn it off with runtime settings, no need to rebuild.




Re: [RFC] building postgres with meson - v13

2022-09-21 Thread Andres Freund
Hi,

On 2022-09-21 09:46:30 -0700, Andres Freund wrote:
> After that I am planning to split the "ci" commit so that it converts a few of
> the CI tasks to use meson, without adding all the other platforms I added for
> development. I think that's important to get in soon, given that it'll
> probably take a bit until the buildfarm grows meson coverage and because it
> provides cfbot coverage which seems important for now as well.
> 
> I think we should:
> 
> - convert windows to build with ninja - it builds faster, runs all tests,
>   parallelizes tests. That means that msbuild based builds don't have coverage
>   via CI / cfbot, but we don't currently have the resources to test both.

I was working on that and hit an issue that took me a while to resolve: Once I
tested only the "main" meson commit plus CI the windows task was running out
of memory. There was an outage of the CI provider at the same time, so I first
blamed it on that. But it turns out to be "legitimately" high memory usage
related to debug symbols - the only reason CI didn't show that before was that
it's incidentally fixed as a indirect consequence of using precompiled
headers, in a later commit. Argh.  It can also be fixed by the option required
to use ccache at some point, so I'll do that for now.

Greetings,

Andres Freund




Re: [PoC] Let libpq reject unexpected authentication requests

2022-09-21 Thread Jacob Champion
On Wed, Sep 21, 2022 at 3:36 PM Peter Eisentraut
 wrote:
> So let's look at the two TODO comments you have:
>
>   * TODO: how should !auth_required interact with an incomplete
>   * SCRAM exchange?
>
> What specific combination of events are you thinking of here?

Let's say the client is using `require_auth=!password`. If the server
starts a SCRAM exchange. but doesn't finish it, the connection will
still succeed with the current implementation (because it satisfies
the "none" case). This is also true for a client setting of
`require_auth=scram-sha-256,none`. I think this is potentially
dangerous, but it mirrors the current behavior of libpq and I'm not
sure that we should change it as part of this patch.

>  /*
>   * If implicit GSS auth has already been performed via GSS
>   * encryption, we don't need to have performed an
>   * AUTH_REQ_GSS exchange.
>   *
>   * TODO: check this assumption. What mutual auth guarantees
>   * are made in this case?
>   */
>
> I don't understand the details involved here, but I would be surprised
> if this assumption is true.  For example, does GSS encryption deal with
> user names and a user name map?

To my understanding, yes. There are explicit tests for it.

> In any case, it seems to me that it would be safer to *not*
> make this assumption at first and then have someone more knowledgeable
> make the argument that it would be safe.

I think I'm okay with that, regardless. Here's one of the wrinkles:
right now, both of the following connstrings work:

require_auth=gss gssencmode=require
require_auth=gss gssencmode=prefer

If we don't treat gssencmode as providing GSS auth, then the first
case will always fail, because there will be no GSS authentication
packet over an encrypted connection. Likewise, the second case will
almost always fail, unless the server doesn't support gssencmode at
all (so why are you using prefer?).

If you're okay with those limitations, I will rip out the code. The
reason I'm not too worried about it is, I don't think it makes much
sense to be strict about your authentication requirements while at the
same time leaving the choice of transport encryption up to the server.

Thanks,
--Jacob




pg_basebackup --create-slot-if-not-exists?

2022-09-21 Thread Ashwin Agrawal
Currently, pg_basebackup has
--create-slot option to create slot if not already exists or
--slot to use existing slot

Which means it needs knowledge on if the slot with the given name already
exists or not before invoking the command. If pg_basebackup --create-slot
<> command fails for some reason after creating the slot. Re-triggering the
same command fails with ERROR slot already exists. Either then need to
delete the slot and retrigger Or need to add a check before retriggering
the command to check if the slot exists and based on the same alter the
command to avoid passing --create-slot option. This poses
inconvenience while automating on top of pg_basebackup. As checking for
slot presence before invoking pg_basebackup unnecessarily involves issuing
separate SQL commands. Would be really helpful for such scenarios if
similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of
semantic. (Seems the limitation most likely is coming from CREATE
REPLICATION SLOT protocol itself), Thoughts?

-- 
*Ashwin Agrawal (VMware)*


Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Justin Pryzby
On Tue, Sep 13, 2022 at 04:13:20PM +0900, Michael Paquier wrote:
> diff --git a/src/common/compression.c b/src/common/compression.c
> index da3c291c0f..ac26287d54 100644
> --- a/src/common/compression.c
> +++ b/src/common/compression.c
> @@ -249,36 +299,49 @@ expect_integer_value(char *keyword, char *value, 
> pg_compress_specification *resu
>  char *
>  validate_compress_specification(pg_compress_specification *spec)
>  {
> + int min_level = 1;
> + int max_level = 1;
> + int default_level = 0;
> +
>   /* If it didn't even parse OK, it's definitely no good. */
>   if (spec->parse_error != NULL)
>   return spec->parse_error;
>  
>   /*
> -  * If a compression level was specified, check that the algorithm 
> expects
> -  * a compression level and that the level is within the legal range for
> -  * the algorithm.
> +  * Check that the algorithm expects a compression level and it is
> +  * is within the legal range for the algorithm.
>*/
> - if ((spec->options & PG_COMPRESSION_OPTION_LEVEL) != 0)
> + switch (spec->algorithm)
>   {
> - int min_level = 1;
> - int max_level;
> -
> - if (spec->algorithm == PG_COMPRESSION_GZIP)
> + case PG_COMPRESSION_GZIP:
>   max_level = 9;
> - else if (spec->algorithm == PG_COMPRESSION_LZ4)
> +#ifdef HAVE_LIBZ
> + default_level = Z_DEFAULT_COMPRESSION;
> +#endif
> + break;
> + case PG_COMPRESSION_LZ4:
>   max_level = 12;
> - else if (spec->algorithm == PG_COMPRESSION_ZSTD)
> + default_level = 0;  /* fast mode */
> + break;
> + case PG_COMPRESSION_ZSTD:
>   max_level = 22;

I should've suggested to add:

>   min_level = -7;

which has been supported since zstd 1.3.4 (and postgres requires 1.4.0).

I think at some point (maybe before releasing 1.3.4) the range was
increased to very large(small), negative levels.  It's possible to query
the library about the lowest supported compression level, but then
there's a complication regarding the client-side library version vs the
server-side version.  So it seems better to just use -7.

-- 
Justin




Re: pg_basebackup --create-slot-if-not-exists?

2022-09-21 Thread David G. Johnston
On Wednesday, September 21, 2022, Ashwin Agrawal 
wrote:

> Currently, pg_basebackup has
> --create-slot option to create slot if not already exists or
> --slot to use existing slot
>
> Which means it needs knowledge on if the slot with the given name already
> exists or not before invoking the command. If pg_basebackup --create-slot
> <> command fails for some reason after creating the slot. Re-triggering the
> same command fails with ERROR slot already exists. Either then need to
> delete the slot and retrigger Or need to add a check before retriggering
> the command to check if the slot exists and based on the same alter the
> command to avoid passing --create-slot option. This poses
> inconvenience while automating on top of pg_basebackup. As checking for
> slot presence before invoking pg_basebackup unnecessarily involves issuing
> separate SQL commands. Would be really helpful for such scenarios if
> similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of
> semantic. (Seems the limitation most likely is coming from CREATE
> REPLICATION SLOT protocol itself), Thoughts?
>

What’s the use case for automating pg_basebackup with a named replication
slot created by the pg_basebackup command?  Why can you not leverage a
temporary replication slot (i.e., omit —slot). ISTM the create option is
basically obsolete now.

David J.


Re: pg_basebackup --create-slot-if-not-exists?

2022-09-21 Thread Michael Paquier
On Wed, Sep 21, 2022 at 05:34:20PM -0700, David G. Johnston wrote:
> What’s the use case for automating pg_basebackup with a named replication
> slot created by the pg_basebackup command?  Why can you not leverage a
> temporary replication slot (i.e., omit —slot). ISTM the create option is
> basically obsolete now.

+1.

Perhaps the ask would ease some monitoring around pg_replication_slots
with a fixed slot name?  One thing that could come into play is the
possibility to enforce the use of a temporary slot with a name given
by -S as pg_basebackup makes it permanent in this case, still the
temporary slot name being pg_basebackup_${PID} makes the slot
searchable with a pattern.
--
Michael


signature.asc
Description: PGP signature


Re: make additional use of optimized linear search routines

2022-09-21 Thread Michael Paquier
On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote:
> I wonder if there are other code paths we can replace with the linear
> search routines. I tried to search for them but no luck.

I have been looking at a couple of simple patterns across the tree but
no luck here either.  Well, if someone spots something, it could
always be done later.  For now I have applied the bits discussed on
this thread.
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup --create-slot-if-not-exists?

2022-09-21 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 21, 2022 at 05:34:20PM -0700, David G. Johnston wrote:
>> What’s the use case for automating pg_basebackup with a named replication
>> slot created by the pg_basebackup command?  Why can you not leverage a
>> temporary replication slot (i.e., omit —slot). ISTM the create option is
>> basically obsolete now.

> +1.

ISTM there'd also be some security concerns, ie what if there's a
pre-existing slot (created by a hostile user, perhaps) that has
properties different from what you expect?  I realize that slot
creation is a pretty high-privilege operation, but it's not
superuser-only.

In any case I agree with the point that --create-slot seems
rather obsolete.  If you are trying to resume in a previous
replication stream (which seems like the point of persistent
slots) then the slot had better already exist.  If you are
satisfied with just starting replication from the current
instant, then a temp slot seems like what you want.

regards, tom lane




Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Michael Paquier
On Wed, Sep 21, 2022 at 07:31:48PM -0500, Justin Pryzby wrote:
> I think at some point (maybe before releasing 1.3.4) the range was
> increased to very large(small), negative levels.  It's possible to query
> the library about the lowest supported compression level, but then
> there's a complication regarding the client-side library version vs the
> server-side version.  So it seems better to just use -7.

Indeed.  Contrary to the default level, there are no variables for the
minimum and maximum levels.  As you are pointing out, a lookup at 
zstd_compress.c shows that we have ZSTD_minCLevel() and
ZSTD_maxCLevel() that assign the bounds.  Both are available since
1.4.0.  We still need a backend-side check as the level passed with a
BASE_BACKUP command would be only validated there.  It seems to me 
that this is going to be less of a headache in the long-term if we
just use those routines at runtime, as zstd wants to keep some freedom
with the min and max bounds for the compression level, at least that's
the flexibility that this gives the library.  So I would tweak things
as the attached.
--
Michael
diff --git a/src/common/compression.c b/src/common/compression.c
index e40ce98ef3..0c6bb9177b 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -324,8 +324,9 @@ validate_compress_specification(pg_compress_specification *spec)
 			default_level = 0;	/* fast mode */
 			break;
 		case PG_COMPRESSION_ZSTD:
-			max_level = 22;
 #ifdef USE_ZSTD
+			max_level = ZSTD_maxCLevel();
+			min_level = ZSTD_minCLevel();
 			default_level = ZSTD_CLEVEL_DEFAULT;
 #endif
 			break;
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index f63c912e97..5cff3d3c07 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2757,8 +2757,10 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
-1), for lz4 an integer
between 1 and 12 (default 0 for fast compression
mode), and for zstd an integer between
-   1 and 22 (default
-   ZSTD_CLEVEL_DEFAULT or 3).
+   ZSTD_minCLevel() (usually -131072)
+   and ZSTD_maxCLevel() (usually 22)
+   (default ZSTD_CLEVEL_DEFAULT or
+   3).
   
 
   


signature.asc
Description: PGP signature


Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-09-21 Thread Bharath Rupireddy
On Wed, Sep 21, 2022 at 8:03 AM Bharath Rupireddy
 wrote:
>
> On Fri, Aug 19, 2022 at 5:27 PM Bharath Rupireddy
>  wrote:
> >
> > Please review the attached v6 patch.
>
> I'm attaching the v7 patch rebased on to the latest HEAD.

v7 patch was failing on Windows [1]. This is because of the full path
name being sent to dir_existsfile() instead of just sending the temp
file name. I fixed the issue. Thanks Michael Paquier for an offlist
chat.

PSA v8 patch.

[1]
t/010_pg_basebackup.pl ... 134/?
#   Failed test 'pg_basebackup reports checksum mismatch stderr
/(?^s:^WARNING.*checksum verification failed)/'
#   at t/010_pg_basebackup.pl line 769.
#   'unrecognized win32 error code: 123WARNING:
checksum verification failed in file "./base/5/16399", block 0:
calculated 4C09 but expected B3F6
# pg_basebackup: error: checksum error occurred

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 4bb81868810f55675a2abb97531d70ae8e0e3405 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 22 Sep 2022 01:07:18 +
Subject: [PATCH v8] Make WAL file initialization by pg_receivewal atomic

While initializing WAL files with zeros (i.e. padding) the
pg_receivewal can fail for any reason, for instance, no space left
on device or host server crash etc., which may leave partially
padded WAL files due to which the pg_receivewal won't come up after
the hostserver restart. The error it emits is "error: write-ahead
log file "foo.partial" has x bytes, should be 0 or y" (where y is
size of WAL file typically and x < y).

To fix this, one needs to manually intervene and delete the partially
padded WAL file which requires an internal understanding of what
the issue is and often a time-consuming process in production
environments.

The above problem can also exist for pg_basebackup as it uses the
same walmethods.c function for initialization.

To address the above problem, make WAL file initialization atomic
i.e. first create a temporary file, pad it and then rename it to the
target file. This is similar to what postgres does to make WAL file
initialization atomic.

Authors: Bharath Rupireddy, SATYANARAYANA NARLAPURAM
Reviewed-by: Cary Huang, mahendrakar s
Reviewed-by: Kyotaro Horiguchi, Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAHg%2BQDcVUss6ocOmbLbV5f4YeGLhOCt%2B1x2yLNfG2H_eDwU8tw%40mail.gmail.com
---
 src/bin/pg_basebackup/pg_receivewal.c | 22 ++
 src/bin/pg_basebackup/walmethods.c| 98 ---
 2 files changed, 111 insertions(+), 9 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c
index f98ec557db..1cbc992e67 100644
--- a/src/bin/pg_basebackup/pg_receivewal.c
+++ b/src/bin/pg_basebackup/pg_receivewal.c
@@ -237,6 +237,28 @@ is_xlogfilename(const char *filename, bool *ispartial,
 		return true;
 	}
 
+	/*
+	 * File looks like a temporary partial uncompressed WAL file that was
+	 * leftover during pre-padding phase from a previous crash, delete it now
+	 * as we don't need it.
+	 */
+	if (fname_len == XLOG_FNAME_LEN + strlen(".partial.tmp") &&
+		strcmp(filename + XLOG_FNAME_LEN, ".partial.tmp") == 0)
+	{
+		/* 12 is length of string ".partial.tmp" */
+		char	path[MAXPGPATH + XLOG_FNAME_LEN + 12];
+
+		snprintf(path, sizeof(path), "%s/%s", basedir, filename);
+
+		if (unlink(path) < 0)
+			pg_log_error("could not remove file \"%s\"", path);
+
+		if (verbose)
+			pg_log_info("removed file \"%s\"", path);
+
+		return false;
+	}
+
 	/* File does not look like something we know */
 	return false;
 }
diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c
index bc2e83d02b..ed7b3185b0 100644
--- a/src/bin/pg_basebackup/walmethods.c
+++ b/src/bin/pg_basebackup/walmethods.c
@@ -118,7 +118,11 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname,
    const char *temp_suffix, size_t pad_to_size)
 {
 	DirectoryMethodData *dir_data = (DirectoryMethodData *) wwmethod;
-	char		tmppath[MAXPGPATH];
+	char	targetpath[MAXPGPATH];
+	char	tmpfilename[MAXPGPATH];
+	char 	tmpsuffixpath[MAXPGPATH];
+	bool	shouldcreatetempfile = false;
+	int			flags;
 	char	   *filename;
 	int			fd;
 	DirectoryMethodFile *f;
@@ -134,8 +138,9 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname,
 	clear_error(wwmethod);
 
 	filename = dir_get_file_name(wwmethod, pathname, temp_suffix);
-	snprintf(tmppath, sizeof(tmppath), "%s/%s",
+	snprintf(targetpath, sizeof(targetpath), "%s/%s",
 			 dir_data->basedir, filename);
+	snprintf(tmpfilename, MAXPGPATH, "%s.tmp", filename);
 	pg_free(filename);
 
 	/*
@@ -143,12 +148,57 @@ dir_open_for_write(WalWriteMethod *wwmethod, const char *pathname,
 	 * the file descriptor is important for dir_sync() method as gzflush()
 	 * does not do any system calls to fsync() to make changes permanent on
 	 * disk.
+	 *
+	 * Create a temporary file for padding and no compression cases to 

Re: why can't a table be part of the same publication as its schema

2022-09-21 Thread Amit Kapila
On Wed, Sep 21, 2022 at 1:15 AM Jonathan S. Katz  wrote:
>
> (RMT hat on, unless otherwise noted)
>
> On 9/20/22 9:42 AM, Robert Haas wrote:
> > On Mon, Sep 19, 2022 at 11:03 PM Jonathan S. Katz  
> > wrote:
> >> For #1 (allowing calls that have schema/table overlap...), there appears
> >> to be both a patch that allows this (reversing[8]), and a suggestion for
> >> dealing with a corner-case that is reasonable, i.e. disallowing adding
> >> schemas to a publication when specifying column-lists. Do we think we
> >> can have consensus on this prior to the RC1 freeze?
> >
> > I am not sure whether we can or should rush a fix in that fast, but I
> > agree with this direction.
>
> The RMT met today to discuss this.
>
> We did agree that the above is an open item that should be resolved
> before this release. While it is an accepted pattern for us to "ERROR"
> on unsupported behavior and then later introduce said behavior, we do
> agree with Peter's original post in this thread and would like it resolved.
>

As there seems to be an agreement with this direction, I think it is
better to commit the patch in this release (before RC1) to avoid users
seeing any behavior change in a later release. If the proposed
behavior for one of the cases (disallowing adding schemas to a
publication when specifying column-lists) turns out to be too
restrictive for users, we can make it less restrictive in a future
release. I am planning to commit the current patch [1] tomorrow unless
RMT or anyone else thinks otherwise.

[1] - 
https://www.postgresql.org/message-id/OS0PR01MB57162E862758402F978725CD944B9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

2022-09-21 Thread Bharath Rupireddy
On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma  wrote:
>
> Yeah, we can either add this functionality to pg_waldump or maybe add
> a new binary itself that would return this information.

IMV, a separate tool isn't the way, since pg_waldump already reads WAL
files and decodes WAL records, what's proposed here can be an
additional functionality of pg_waldump.

It will be great if an initial patch is posted here.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: Allow logical replication to copy tables in binary format

2022-09-21 Thread osumi.takami...@fujitsu.com
Hi


Few more minor comments.

On Tuesday, August 16, 2022 2:04 AM Melih Mutlu  wrote:
> 
> 
>   My main concern is to break a scenario that was previously working (14
> -> 15) but after a subscriber upgrade
>   it won't (14 -> 16).
> 
> Fair concern. Some cases that might break the logical replication with version
> upgrade would be:
...
> 3- Copying in binary format would work with the same schemas. Currently,
> logical replication does not require the exact same schemas in publisher and
> subscriber.
> This is an additional restriction that comes with the COPY command.
> 
> If a logical replication has been set up with different schemas and 
> subscription
> is created with the binary option, then yes this would break things.
> This restriction can be clearly stated and wouldn't be unexpected though.
> 
> I'm also okay with allowing binary copy only for v16 or later, if you think 
> it would
> be safer and no one disagrees with that.
> What are your thoughts?
I agree with the direction to support binary copy for v16 and later.

IIUC, the binary format replication with different data types fails even during 
apply phase on HEAD.
I thought that means, the upgrade concern only applies to a scenario that the 
user executes
only initial table synchronizations between the publisher and subscriber
and doesn't replicate any data at apply phase after that. I would say
this isn't a valid scenario and your proposal makes sense.


Best Regards,
Takamichi Osumi



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-21 Thread wangw.f...@fujitsu.com
On Tues, Sep 20, 2022 at 18:30 PM Önder Kalacı  wrote:
> Thanks for the reviews, attached v12.

Thanks for your patch. Here is a question and a comment:

1. In the function GetCheapestReplicaIdentityFullPath.
+   if (rel->pathlist == NIL)
+   {
+   /*
+* A sequential scan could have been dominated by by an index 
scan
+* during make_one_rel(). We should always have a sequential 
scan
+* before set_cheapest().
+*/
+   Path   *seqScanPath = create_seqscan_path(root, rel, NULL, 
0);
+
+   add_path(rel, seqScanPath);
+   }

This is a question I'm not sure about:
Do we need this part to add sequential scan?

I think in our case, the sequential scan seems to have been added by the
function make_one_rel (see function set_plain_rel_pathlist). If I am missing
something, please let me know. BTW, there is a typo in above comment: `by by`.

2. In the file execReplication.c.
+#ifdef USE_ASSERT_CHECKING
+#include "catalog/index.h"
+#endif
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
+#ifdef USE_ASSERT_CHECKING
+#include "replication/logicalrelation.h"
+#endif

I think it's fine to only add `logicalrelation.h` here, because `index.h` has
been added by `logicalrelation.h`.

Regards,
Wang wei


Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Justin Pryzby
On Thu, Sep 22, 2022 at 10:25:11AM +0900, Michael Paquier wrote:
> On Wed, Sep 21, 2022 at 07:31:48PM -0500, Justin Pryzby wrote:
> > I think at some point (maybe before releasing 1.3.4) the range was
> > increased to very large(small), negative levels.  It's possible to query
> > the library about the lowest supported compression level, but then
> > there's a complication regarding the client-side library version vs the
> > server-side version.  So it seems better to just use -7.
> 
> Indeed.  Contrary to the default level, there are no variables for the
> minimum and maximum levels.  As you are pointing out, a lookup at 
> zstd_compress.c shows that we have ZSTD_minCLevel() and
> ZSTD_maxCLevel() that assign the bounds.  Both are available since
> 1.4.0.  We still need a backend-side check as the level passed with a
> BASE_BACKUP command would be only validated there.  It seems to me 
> that this is going to be less of a headache in the long-term if we
> just use those routines at runtime, as zstd wants to keep some freedom
> with the min and max bounds for the compression level, at least that's
> the flexibility that this gives the library.  So I would tweak things
> as the attached.

Okay.  Will that complicate tests at all?  It looks like it's not an
issue for the tests currently proposed in the CF APP.
https://commitfest.postgresql.org/39/3835/

However the patch ends up, +0.75 to backpatch it to v15 rather than
calling it a new feature in v16.

-- 
Justin




Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Tom Lane
Justin Pryzby  writes:
> However the patch ends up, +0.75 to backpatch it to v15 rather than
> calling it a new feature in v16.

I don't have any opinion on the concrete merits of this change,
but I want to note that 15rc1 wraps on Monday, and we don't like
people pushing noncritical changes shortly before a wrap.  There
is not a lot of time for fooling around here.

regards, tom lane




Re: Query JITing with LLVM ORC

2022-09-21 Thread João Paulo Labegalini de Carvalho
Tom & Thomas:

Thank you so much, those a very useful comments.

I noticed that I didn't make my intentions very clear. My teams goal is to
evaluate if there are any gains in JITing PostgreSQL itself, or at least
parts of it, and not the expressions or parts of a query.

The rationale to use PostgreSQL is because DBs are long running
applications and the cost of JITing can be amortized.

We have a prototype LLVM IR pass that outlines functions in a program to
JIT and a ORC-based runtime to re-compile functions. Our goal is to see
improvements due to target/sub-target specialization.

The reason I was looking at benchmarks is to have a workload to profile
PostgreSQL and find its bottlenecks. The hot functions would then be
outlined for JITing.



On Wed., Sep. 21, 2022, 4:54 p.m. Thomas Munro, 
wrote:

> On Thu, Sep 22, 2022 at 10:04 AM João Paulo Labegalini de Carvalho
>  wrote:
> >building with and w/o passing --with-llvm
>
> BTW you can also just turn it off with runtime settings, no need to
> rebuild.
>


Re: Fix snapshot name for SET TRANSACTION documentation

2022-09-21 Thread Fujii Masao




On 2022/09/21 14:40, Fujii Masao wrote:



On 2022/09/21 12:01, Japin Li wrote:


Hi hackers,

In 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, we change the snapshot name
when exporting snapshot, however, there is one place we missed update the
snapshot name in documentation.  Attach a patch to fix it.


Thanks for the patch! Looks good to me.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: make additional use of optimized linear search routines

2022-09-21 Thread Nathan Bossart
On Thu, Sep 22, 2022 at 09:52:57AM +0900, Michael Paquier wrote:
> On Wed, Sep 21, 2022 at 02:28:08PM +0800, Richard Guo wrote:
>> I wonder if there are other code paths we can replace with the linear
>> search routines. I tried to search for them but no luck.
> 
> I have been looking at a couple of simple patterns across the tree but
> no luck here either.  Well, if someone spots something, it could
> always be done later.  For now I have applied the bits discussed on
> this thread.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-21 Thread Nathan Bossart
On Wed, Sep 21, 2022 at 02:41:28PM -0700, Peter Geoghegan wrote:
> On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan  wrote:
>> > Presumably a
>> > generic WAL record compression mechanism could be reused for other large
>> > records, too.  That could be much easier than devising a deduplication
>> > strategy for every record type.
>>
>> It's quite possible that that's a good idea, but that should probably
>> work as an additive thing. That's something that I think of as a
>> "clever technique", whereas I'm focussed on just not being naive in
>> how we represent this one specific WAL record type.
> 
> BTW, if you wanted to pursue something like this, that would work with
> many different types of WAL record, ISTM that a "medium level" (not
> low level) approach might be the best place to start. In particular,
> the way that page offset numbers are represented in many WAL records
> is quite space inefficient.  A domain-specific approach built with
> some understanding of how page offset numbers tend to look in practice
> seems promising.

I wouldn't mind giving this a try.

> The representation of page offset numbers in PRUNE and VACUUM heapam
> WAL records (and in index WAL records) always just stores an array of
> 2 byte OffsetNumber elements. It probably wouldn't be all that
> difficult to come up with a simple scheme for compressing an array of
> OffsetNumbers in WAL records. It certainly doesn't seem like it would
> be all that difficult to get it down to 1 byte per offset number in
> most cases (even greater improvements seem doable).
> 
> That could also be used for the xl_heap_freeze_page record type --
> though only after this patch is committed. The patch makes the WAL
> record use a simple array of page offset numbers, just like in
> PRUNE/VACUUM records. That's another reason why the approach
> implemented by the patch seems like "the natural approach" to me. It's
> much closer to how heapam PRUNE records work (we have a variable
> number of arrays of page offset numbers in both cases).

Yeah, it seems likely that we could pack offsets in single bytes in many
cases.  A more sophisticated approach could even choose how many bits to
use per offset based on the maximum in the array.  Furthermore, we might be
able to make use of SIMD instructions to mitigate any performance penalty.

I'm tempted to start by just using single-byte offsets when possible since
that should be relatively simple while still yielding a decent improvement
for many workloads.  What do you think?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-21 Thread Nathan Bossart
On Wed, Sep 21, 2022 at 02:11:36PM -0700, Peter Geoghegan wrote:
> On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart  
> wrote:
>> Presumably a
>> generic WAL record compression mechanism could be reused for other large
>> records, too.  That could be much easier than devising a deduplication
>> strategy for every record type.
> 
> It's quite possible that that's a good idea, but that should probably
> work as an additive thing. That's something that I think of as a
> "clever technique", whereas I'm focussed on just not being naive in
> how we represent this one specific WAL record type.

Got it.  I think that's a fair point.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Michael Paquier
On Wed, Sep 21, 2022 at 11:43:56PM -0400, Tom Lane wrote:
> I don't have any opinion on the concrete merits of this change,
> but I want to note that 15rc1 wraps on Monday, and we don't like
> people pushing noncritical changes shortly before a wrap.  There
> is not a lot of time for fooling around here.

If I were to do it in the next couple of hours, we'd still have quite
a couple of days of coverage, which is plenty as far as I understand?

Saying that, it is not a critical change.  Just to give some numbers,
for a fresh initdb's instance base.tar.zst is at:
- 3.6MB at level 0.
- 3.8MB at level 1.
- 3.6MB at level 2.
- 4.3MB at level -1.
- 4.6MB at level -2.
- 6.1MB at level -7.

I am not sure if there would be a huge demand for this much control
over the current [1,22], but the library wants to control dynamically
the bounds and has the APIs to allow that.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-09-21 Thread Michael Paquier
On Thu, Sep 22, 2022 at 07:16:41AM +0530, Bharath Rupireddy wrote:
> t/010_pg_basebackup.pl ... 134/?
> #   Failed test 'pg_basebackup reports checksum mismatch stderr
> /(?^s:^WARNING.*checksum verification failed)/'
> #   at t/010_pg_basebackup.pl line 769.
> #   'unrecognized win32 error code: 123WARNING:
> checksum verification failed in file "./base/5/16399", block 0:
> calculated 4C09 but expected B3F6
> # pg_basebackup: error: checksum error occurred

Shouldn't we extend the mapping table in win32error.c so as the
information provided is more useful when seeing this error, then?
There could be other code path able to trigger this failure, or other
hackers working on separate features that could benefit from this
extra information.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-21 Thread John Naylor
On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart 
wrote:
>
> On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
>
> > In short, this code needs to be lower level so that we still have full
> > control while being portable. I will work on this, and also the related
> > code for node dispatch.
>
> Is it possible to use approach #2 here, too?  AFAICT space is allocated
for
> all of the chunks, so there wouldn't be any danger in searching all them
> and discarding any results >= node->count.

Sure, the caller could pass the maximum node capacity, and then check if
the returned index is within the range of the node count.

> Granted, we're depending on the
> number of chunks always being a multiple of elements-per-vector in order
to
> avoid the tail path, but that seems like a reasonably safe assumption that
> can be covered with comments.

Actually, we don't need to depend on that at all. When I said "junk" above,
that can be any bytes, as long as we're not reading off the end of
allocated memory. We'll never do that here, since the child pointers/values
follow. In that case, the caller can hard-code the  size (it would even
happen to work now to multiply rt_node_kind by 16, to be sneaky). One thing
I want to try soon is storing fewer than 16/32 etc entries, so that the
whole node fits comfortably inside a power-of-two allocation. That would
allow us to use aset without wasting space for the smaller nodes, which
would be faster and possibly would solve the fragmentation problem Andres
referred to in

https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

While on the subject, I wonder how important it is to keep the chunks in
the small nodes in sorted order. That adds branches and memmove calls, and
is the whole reason for the recent "pg_lfind_ge" function.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: pg_basebackup's --gzip switch misbehaves

2022-09-21 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 21, 2022 at 11:43:56PM -0400, Tom Lane wrote:
>> I don't have any opinion on the concrete merits of this change,
>> but I want to note that 15rc1 wraps on Monday, and we don't like
>> people pushing noncritical changes shortly before a wrap.  There
>> is not a lot of time for fooling around here.

> If I were to do it in the next couple of hours, we'd still have quite
> a couple of days of coverage, which is plenty as far as I understand?

Sure.  I'd say we have 48 hours to choose whether to put this in v15.
But not more than that.

regards, tom lane




Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-21 Thread Andrey Chudnovsky
First, My message from corp email wasn't displayed in the thread,
That is what Jacob replied to, let me post it here for context:

> We can support both passing the token from an upstream client and libpq 
> implementing OAUTH2 protocol to obtain one.
>
> Libpq implementing OAUTHBEARER is needed for community/3rd party tools to 
> have user-friendly authentication experience:
>
> 1. For community client tools, like pg_admin, psql etc.
>   Example experience: pg_admin would be able to open a popup dialog to 
> authenticate customers and keep refresh tokens to avoid asking the user 
> frequently.
> 2. For 3rd party connectors supporting generic OAUTH with any provider. 
> Useful for datawiz clients, like Tableau or ETL tools. Those can support both 
> user and client OAUTH flows.
>
> Libpq passing toked directly from an upstream client is useful in other 
> scenarios:
> 1. Enterprise clients, built with .Net / Java and using provider-specific 
> authentication libraries, like MSAL for AAD. Those can also support more 
> advanced provider-specific token acquisition flows.
> 2. Resource-tight (like IoT) clients. Those can be compiled without the 
> optional libpq flag not including the iddawc or other dependency.

-
On this:

> What I don't understand is how the OAUTHBEARER mechanism helps you in
> this case. You're short-circuiting the negotiation where the server
> tells the client what provider to use and what scopes to request, and
> instead you're saying "here's a secret string, just take it and
> validate it with magic."
>
> I realize the ability to pass an opaque token may be useful, but from
> the server's perspective, I don't see what differentiates it from the
> password auth method plus a custom authenticator plugin. Why pay for
> the additional complexity of OAUTHBEARER if you're not going to use
> it?

Yes, passing a token as a new auth method won't make much sense in
isolation. However:
1. Since OAUTHBEARER is supported in the ecosystem, passing a token as
a way to authenticate with OAUTHBEARER is more consistent (IMO), then
passing it as a password.
2. Validation on the backend side doesn't depend on whether the token
is obtained by libpq or transparently passed by the upstream client.
3. Single OAUTH auth method on the server side for both scenarios,
would allow both enterprise clients with their own Token acquisition
and community clients using libpq flows to connect as the same PG
users/roles.

On Wed, Sep 21, 2022 at 8:36 PM Jacob Champion  wrote:
>
> On Wed, Sep 21, 2022 at 3:10 PM Andrey Chudnovskiy
>  wrote:
> > We can support both passing the token from an upstream client and libpq 
> > implementing OAUTH2 protocol to obtaining one.
>
> Right, I agree that we could potentially do both.
>
> > Libpq passing toked directly from an upstream client is useful in other 
> > scenarios:
> > 1. Enterprise clients, built with .Net / Java and using provider-specific 
> > authentication libraries, like MSAL for AAD. Those can also support more 
> > advance provider-specific token acquisition flows.
> > 2. Resource-tight (like IoT) clients. Those can be compiled without 
> > optional libpq flag not including the iddawc or other dependency.
>
> What I don't understand is how the OAUTHBEARER mechanism helps you in
> this case. You're short-circuiting the negotiation where the server
> tells the client what provider to use and what scopes to request, and
> instead you're saying "here's a secret string, just take it and
> validate it with magic."
>
> I realize the ability to pass an opaque token may be useful, but from
> the server's perspective, I don't see what differentiates it from the
> password auth method plus a custom authenticator plugin. Why pay for
> the additional complexity of OAUTHBEARER if you're not going to use
> it?
>
> --Jacob
>
>
>
>




Re: Background writer and checkpointer in crash recovery

2022-09-21 Thread Michael Paquier
On Thu, Sep 15, 2022 at 10:19:01PM -0500, Justin Pryzby wrote:
> I don't know that this warrants an Opened Item, but I think some fix
> ought to be applied to v15, whether that happens this week or next
> month.

With RC1 getting close by, I have looked at that again and applied a
patch that resets the ps display after recovery is done, bringing as
back to the same state as PG <= 14 in the startup process.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory

2022-09-21 Thread Bharath Rupireddy
On Thu, Sep 22, 2022 at 10:07 AM Michael Paquier  wrote:
>
> On Thu, Sep 22, 2022 at 07:16:41AM +0530, Bharath Rupireddy wrote:
> > t/010_pg_basebackup.pl ... 134/?
> > #   Failed test 'pg_basebackup reports checksum mismatch stderr
> > /(?^s:^WARNING.*checksum verification failed)/'
> > #   at t/010_pg_basebackup.pl line 769.
> > #   'unrecognized win32 error code: 123WARNING:
> > checksum verification failed in file "./base/5/16399", block 0:
> > calculated 4C09 but expected B3F6
> > # pg_basebackup: error: checksum error occurred
>
> Shouldn't we extend the mapping table in win32error.c so as the
> information provided is more useful when seeing this error, then?
> There could be other code path able to trigger this failure, or other
> hackers working on separate features that could benefit from this
> extra information.

Thanks. I will start a separate thread to discuss that.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [RFC] building postgres with meson - v13

2022-09-21 Thread Andres Freund
Hi,

On 2022-09-21 09:46:30 -0700, Andres Freund wrote:
> I'm planning to commit this today, unless somebody wants to argue against
> that.

And done!

Changes:
- fixed a few typos (thanks Thomas)
- less duplication in the CI tasks
- removed an incomplete implementation of the target for abbrevs.txt - do we
  even want to have that?
- plenty hand wringing on my part


I also rebased my meson git tree, which still has plenty additional test
platforms (netbsd, openbsd, debian sid, fedora rawhide, centos 8, centos 7,
opensuse tumbleweed), but without the autoconf versions of those targets.  I
also added a commit that translates most of the CompilerWarnings task to
meson.  Still need to add a headerscheck / cpluspluscheck target.

Greetings,

Andres Freund




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-21 Thread Masahiko Sawada
On Thu, Sep 22, 2022 at 1:46 PM John Naylor
 wrote:
>
>
> On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart  
> wrote:
> >
> > On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
> >
> > > In short, this code needs to be lower level so that we still have full
> > > control while being portable. I will work on this, and also the related
> > > code for node dispatch.
> >
> > Is it possible to use approach #2 here, too?  AFAICT space is allocated for
> > all of the chunks, so there wouldn't be any danger in searching all them
> > and discarding any results >= node->count.
>
> Sure, the caller could pass the maximum node capacity, and then check if the 
> returned index is within the range of the node count.
>
> > Granted, we're depending on the
> > number of chunks always being a multiple of elements-per-vector in order to
> > avoid the tail path, but that seems like a reasonably safe assumption that
> > can be covered with comments.
>
> Actually, we don't need to depend on that at all. When I said "junk" above, 
> that can be any bytes, as long as we're not reading off the end of allocated 
> memory. We'll never do that here, since the child pointers/values follow. In 
> that case, the caller can hard-code the  size (it would even happen to work 
> now to multiply rt_node_kind by 16, to be sneaky). One thing I want to try 
> soon is storing fewer than 16/32 etc entries, so that the whole node fits 
> comfortably inside a power-of-two allocation. That would allow us to use aset 
> without wasting space for the smaller nodes, which would be faster and 
> possibly would solve the fragmentation problem Andres referred to in
>
> https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> While on the subject, I wonder how important it is to keep the chunks in the 
> small nodes in sorted order. That adds branches and memmove calls, and is the 
> whole reason for the recent "pg_lfind_ge" function.

Good point. While keeping the chunks in the small nodes in sorted
order is useful for visiting all keys in sorted order, additional
branches and memmove calls could be slow.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com