date:20230112

Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-12 Thread Will Mortensen

I suppose if it's correct that we need to lock the table first (at least
in ACCESS SHARE mode), an option to LOCK perhaps makes
more sense. Maybe you could specify two modes like:

LOCK TABLE IN _lockmode_ MODE AND THEN WAIT FOR CONFLICTS WITH _waitmode_ MODE;

But that might be excessive. :-D And I don't know if there's any
reason to use a _lockmode_ other than ACCESS SHARE.

On Wed, Jan 11, 2023 at 11:03 PM Will Mortensen  wrote:
>
> Hi Andres,
>
> On Wed, Jan 11, 2023 at 12:33 PM Andres Freund  wrote:
> > I think such a function would still have to integrate enough with the lock
> > manager infrastructure to participate in the deadlock detector. Otherwise I
> > think you'd trivially end up with loads of deadlocks.
>
> Could you elaborate on which unusual deadlock concerns arise? To be
> clear, WaitForLockers() is an existing function in lmgr.c
> (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
> and naively it seems like we mostly just need to call it. To my very
> limited understanding, from looking at the existing callers and the
> implementation of LOCK, that would look something like this
> (assuming we're in a SQL command like LOCK and calling unmodified
> WaitForLockers() with a single table):
>
> 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> to ensure the table is not dropped and obtain the table oid
>
> 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
>
> 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> VirtualXactLock(). These certainly take plenty of locks of various types,
> and will likely sleep in LockAcquire() waiting for transactions to finish,
> but there don't seem to be any unusual pre/postconditions, nor do we
> hold any unusual locks already.
>
> Obviously a deadlock is possible if transactions end up waiting for each
> other, just as when taking table or row locks, etc., but it seems like this
> would be detected as usual?

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-12 Thread John Naylor

On Thu, Jan 12, 2023 at 12:44 PM Masahiko Sawada 
wrote:
>
> On Wed, Jan 11, 2023 at 12:13 PM John Naylor
>  wrote:
> >
> > On Tue, Jan 10, 2023 at 7:08 PM Masahiko Sawada 
wrote:

> I agree to keep this as a template.

Okay, I'll squash the previous patch and work on cleaning up the internals.
I'll keep the external APIs the same so that your work on vacuum
integration can be easily rebased on top of that, and we can work
independently.

> From the vacuum integration
> perspective, it would be better if we can use a common data type for
> shared and local. It makes sense to have different data types if the
> radix trees have different values types.

I agree it would be better, all else being equal. I have some further
thoughts below.

> > > It looks no problem in terms of vacuum integration, although I've not
> > > fully tested yet. TID store uses the radix tree as the main storage,
> > > and with the template radix tree, the data types for shared and
> > > non-shared will be different. TID store can have an union for the
> > > radix tree and the structure would be like follows:
> >
> > > /* Storage for Tids */
> > > union tree
> > > {
> > > local_radix_tree*local;
> > > shared_radix_tree   *shared;
> > > };
> >
> > We could possibly go back to using a common data type for this, but
with unused fields in each setting, as before. We would have to be more
careful of things like the 32-bit crash from a few weeks ago.
>
> One idea to have a common data type without unused fields is to use
> radix_tree a base class. We cast it to radix_tree_shared or
> radix_tree_local depending on the flag is_shared in radix_tree. For
> instance we have like (based on non-template version),

> struct radix_tree
> {
> boolis_shared;
> MemoryContext context;
> };

That could work in principle. My first impression is, just a memory context
is not much of a base class. Also, casts can creep into a large number of
places.

Another thought came to mind: I'm guessing the TID store is unusual --
meaning most uses of radix tree will only need one kind of memory
(local/shared). I could be wrong about that, and it _is_ a guess about the
future. If true, then it makes more sense that only code that needs both
memory kinds should be responsible for keeping them separate.

The template might be easier for future use cases if shared memory were
all-or-nothing, meaning either

- completely different functions and types depending on RT_SHMEM
- use branches (like v8)

The union sounds like a good thing to try, but do whatever seems right.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Allow logical replication to copy tables in binary format

2023-01-12 Thread Melih Mutlu

Hi,

Thanks for your reviews.

Takamichi Osumi (Fujitsu) , 12 Oca 2023 Per,
06:07 tarihinde şunu yazdı:

> On Wednesday, January 11, 2023 7:45 PM Melih Mutlu 
> wrote:
> (1-1) There is no need to log the version and the query by elog here.
> (1-2) Also, I suggest we can remove the server_version variable itself,
>   because we have only one actual reference for it.
>   There is a style that we call walrcv_server_version in the
>   'if condition' directly like existing codes in
> fetch_remote_table_info().
> (1-3) Suggestion to improve comments.
>   FROM:
>   /* If the publisher is v16 or later, copy in binary format.*/
>   TO:
>   /* If the publisher is v16 or later, copy data in the required data
> format. */
>

Forgot to remove that LOG line. Removed it now and applied other
suggestions too.


> I think if we change the order of this part of check like below, then
> it would look more aligned with other existing test codes introduced by
> this patch.
>

Right. Changed it to make it more aligned with the rest.

Thanks,
-- 
Melih Mutlu
Microsoft


v6-0001-Allow-logical-replication-to-copy-table-in-binary.patch
Description: Binary data

Re: Lazy allocation of pages required for verifying FPI consistency

2023-01-12 Thread Kyotaro Horiguchi

At Mon, 9 Jan 2023 20:00:00 +0530, Bharath Rupireddy 
 wrote in 
> I propose to statically allocate these two pages using PGAlignedBlock
> structure lazily in verifyBackupPageConsistency() to not waste dynamic
> memory worth 2*BLCKSZ bytes. I'm attaching a small patch herewith.
> 
> Thoughts?

IMHO, it's a bit scaring to me to push down the execution stack by
that large size. I tend to choose the (current) possible memory
wasting only on startup process than digging stack deeply.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Lazy allocation of pages required for verifying FPI consistency

2023-01-12 Thread Julien Rouhaud

On Thu, Jan 12, 2023 at 4:29 PM Kyotaro Horiguchi
 wrote:
>
> At Mon, 9 Jan 2023 20:00:00 +0530, Bharath Rupireddy 
>  wrote in
> > I propose to statically allocate these two pages using PGAlignedBlock
> > structure lazily in verifyBackupPageConsistency() to not waste dynamic
> > memory worth 2*BLCKSZ bytes. I'm attaching a small patch herewith.
> >
> > Thoughts?
>
> IMHO, it's a bit scaring to me to push down the execution stack by
> that large size. I tend to choose the (current) possible memory
> wasting only on startup process than digging stack deeply.

+1

Re: Flush SLRU counters in checkpointer process

2023-01-12 Thread Anthonin Bonnefoy

On Wed, Jan 11, 2023 at 5:33 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-11 10:29:06 +0100, Anthonin Bonnefoy wrote:
> > Currently, the Checkpointer process only reports SLRU statistics at
> server
> > shutdown, leading to delayed statistics for SLRU flushes. This patch
> adds a
> > flush of SLRU stats to the end of checkpoints.
>
> Hm. I wonder if we should do this even earlier, by the
> pgstat_report_checkpointer() calls in CheckpointWriteDelay().
>
> I'm inclined to move the pgstat_report_wal() and pgstat_report_slru() calls
> into pgstat_report_checkpointer() to avoid needing to care about all the
> individual places.
>
That would make sense. I've created a new patch with everything moved in
pgstat_report_checkpointer().
I did split the checkpointer flush in a pgstat_flush_checkpointer()
function as it seemed more readable. Thought?


> > @@ -505,6 +505,7 @@ CheckpointerMain(void)
> >   /* Report pending statistics to the cumulative stats
> system */
> >   pgstat_report_checkpointer();
> >   pgstat_report_wal(true);
> > + pgstat_report_slru(true);
>
> Why do we need a force parameter if all callers use it?

Good point. I've written the same signature as pgstat_report_wal but
there's no need for the nowait parameter.


flush-slru-counters-v2.patch
Description: Binary data

Re: Fix condition in shm_toc and remove unused function shm_toc_freespace.

2023-01-12 Thread Richard Guo

On Thu, Jan 12, 2023 at 2:50 PM Zhang Mingli  wrote:

> On Jan 12, 2023, 14:34 +0800, Zhang Mingli , wrote:
>
> Some conditions in shm_toc_insert and shm_toc_allocate are bogus, like:
> if (toc_bytes + nbytes > total_bytes || toc_bytes + nbytes < toc_bytes)
> Remove the condition `toc_bytes + nbytes < toc_bytes` and take a
> sizeof(shm_entry) into account in shm_toc_allocate though
> shm_toc_allocate does that too.
>
>   shm_toc_insert does that too, and  we can report error earlier.
>

I don't think we should consider sizeof(shm_toc_entry) in the 'if'
condition in shm_toc_allocate, as this function is not in charge of
allocating a new TOC entry.  That's what shm_toc_insert does.

Other parts of this patch look good to me.

Thanks
Richard

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-01-12 Thread Jelte Fennema

> Simpler is better when it comes to authentication

I definitely agree with that, and if we didn't have existing
parsing logic for pg_hba.conf I would agree. But given the existing
logic for pg_hba.conf, I think the path of least surprises is to
support all of the same patterns that pg_hbac.conf supports.

It also makes the code simpler as we can simply reuse the
check_role function, since that. I removed the lines you quoted
since those are actually not strictly necessary. They only change
the detection logic a bit in case of a \1 existing in the string.
And I'm not sure what the desired behaviour is for those.

> I would be really tempted to extract and commit that
> independently of the rest, actually, to provide some coverage of the
> substitution case in the peer test.

I split up that patch in two parts now and added the tests in a new 0001
patch.

> 0002 and 0003 need careful thinking.

0002 should change no behaviour, since it simply stores the token in
the IdentLine struct, but doesn't start using the quoted or the regex field
yet. 0003 is debatable indeed. To me it makes sense conceptually, but
having a literal \1 in a username seems like an unlikely scenario and
there might be pg_ident.conf files in existence where the \1 is quoted
that would break because of this change. I haven't removed 0003 from
the patch set yet, but I kinda feel that the advantage is probably not
worth the risk of breakage.

0004 adds some breakage too. But there I think the advantages far outweigh
the risk of breakage. Both because added functionality is a much bigger
advantage, and because we only risk breaking when there exist users that
are called "all", start with a literal + or start with a literal /.
Only "all" seems
like a somewhat reasonable username, but such a user existing seems
unlikely to me given all its special meaning in pg_hba.conf. (I added this
consideration to the commit message)

> > The main uncertainty I have is if the case insensitivity check is
> > actually needed in check_role. It seems like a case insensitive
> > check against the database user shouldn't actually be necessary.
> > I only understand the need for the case insensitive check against
> > the system user. But I have too little experience with LDAP/kerberos
> > to say for certain. So for now I kept the existing behaviour to
> > not regress.

You didn't write a response about this, but you did quote it. Did you intend
to respond to it?

> Applied 0001

Awesome :)


Finally, one separate thing I noticed is that regcomp_auth_token only
checks the / prefix, but doesn't check if the token was quoted or not.
So even if it's quoted it will be interpreted as a regex. Maybe we should
change that? At least for the regex parsing that is not released yet.


v4-0003-Only-expand-1-in-pg_ident.conf-when-not-quoted.patch
Description: Binary data


v4-0001-Add-tests-for-1-in-pg_ident.conf-to-0003_peer.pl.patch
Description: Binary data


v4-0002-Store-pg_user-as-AuthToken-in-IdentLine.patch
Description: Binary data


v4-0004-Support-same-user-patterns-in-pg_ident.conf-as-in.patch
Description: Binary data

Re: Add BufFileRead variants with short read and EOF detection

2023-01-12 Thread Peter Eisentraut


On 10.01.23 07:20, Amit Kapila wrote:

The existing uses of %m are wrong.  This was already fixed once in
7897e3bb902c557412645b82120f4d95f7474906, but the affected areas of code
were apparently developed at around the same time and didn't get the
fix.  So I have attached a separate patch to fix this first, which could
be backpatched.


+1. The patch is not getting applied due to a recent commit.


The original patch is then rebased on top of that.  I have adjusted the
error message to include the file set name if available.

What this doesn't keep is the purpose of the temp file in some cases,
like "hash-join temporary file".  We could maybe make this an additional
argument or an error context, but it seems cumbersome in any case.


Yeah, we can do that but not sure if it is worth doing any of those
because there are already other places that don't use the exact
context.


Ok, updated patches attached.From d1cd2f2748ecf34831cee565b29725df548da567 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 6 Jan 2023 11:57:35 +0100
Subject: [PATCH v3 1/2] Fix some BufFileRead() error reporting

Remove "%m" from error messages where errno would be bogus.  Add short
read byte counts where appropriate.

This is equivalent to what was done in
7897e3bb902c557412645b82120f4d95f7474906, but some code was apparently
developed concurrently to that and not updated accordingly.

Discussion: 
https://www.postgresql.org/message-id/flat/f3501945-c591-8cc3-5ef0-b72a2e0ea...@enterprisedb.com
---
 src/backend/backup/backup_manifest.c |  3 ++-
 src/backend/replication/logical/worker.c | 33 ++--
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/src/backend/backup/backup_manifest.c 
b/src/backend/backup/backup_manifest.c
index 21ca2941dc..d325ef047a 100644
--- a/src/backend/backup/backup_manifest.c
+++ b/src/backend/backup/backup_manifest.c
@@ -371,7 +371,8 @@ SendBackupManifest(backup_manifest_info *manifest, bbsink 
*sink)
if (rc != bytes_to_read)
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from temporary 
file: %m")));
+errmsg("could not read from temporary 
file: read only %zu of %zu bytes",
+   rc, bytes_to_read)));
bbsink_manifest_contents(sink, bytes_to_read);
manifest_bytes_done += bytes_to_read;
}
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index 79cda39445..f3856c9843 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2063,7 +2063,7 @@ apply_spooled_messages(FileSet *stream_fileset, 
TransactionId xid,
nchanges = 0;
while (true)
{
-   int nbytes;
+   size_t  nbytes;
int len;
 
CHECK_FOR_INTERRUPTS();
@@ -2079,8 +2079,8 @@ apply_spooled_messages(FileSet *stream_fileset, 
TransactionId xid,
if (nbytes != sizeof(len))
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from streaming 
transaction's changes file \"%s\": %m",
-   path)));
+errmsg("could not read from streaming 
transaction's changes file \"%s\": read only %zu of %zu bytes",
+   path, nbytes, 
sizeof(len;
 
if (len <= 0)
elog(ERROR, "incorrect length %d in streaming 
transaction's changes file \"%s\"",
@@ -2090,11 +2090,12 @@ apply_spooled_messages(FileSet *stream_fileset, 
TransactionId xid,
buffer = repalloc(buffer, len);
 
/* and finally read the data into the buffer */
-   if (BufFileRead(stream_fd, buffer, len) != len)
+   nbytes = BufFileRead(stream_fd, buffer, len);
+   if (nbytes != len)
ereport(ERROR,
(errcode_for_file_access(),
-errmsg("could not read from streaming 
transaction's changes file \"%s\": %m",
-   path)));
+errmsg("could not read from streaming 
transaction's changes file \"%s\": read only %zu of %zu bytes",
+   path, nbytes, (size_t) 
len)));
 
BufFileTell(stream_fd, &fileno, &offset);
 
@@ -3992,6 +3993,7 @@ static void
 subxact_info_read(Oid subid, TransactionId xid)
 {
charpath[MAXPGPATH];
+   size_t  nread;

PG11 to PG14 Migration Slowness

2023-01-12 Thread Vigneshk Kvignesh

Hi,

  I'm migrating  our existing PG instances from PG11.4  to PG14.3. I
have around 5 Million Tables in a single database. When migrating using
pg_upgrade, its taking 3 hours for the process to complete. I'm not sure if
its the intended behaviour or we're missing something here.
 Most of the tables (90%) in 5 Million are foreign tables. On analysis
found that most of the time is spent in pg_dump (~2.3 hours). In pg_dump
getTableAttrs(), dumpTable() functions take the most time, approx 1 hour
each since we're processing table by table. Also, there are no columns with
default values, which if present might take some time. We're using PG14's
pg_upgrade binary for the process.
Since we have all these tables in one database, parallelism doesn't
have any effect here. Can we make binary upgrade for a single database run
in parallel ?
 Kindly advise us if we have missed anything here and possible
solutions for this problem.
So we're not sure on what we missed here.
Have added more info on the process below.

No. of Tables: 5 Million
Time Taken: 3 Hours
Command Used: $PG14_UPGRADE -Uroot -b $PG11_DIR/bin -B $PG14_DIR/bin -d
$PG11_DIR/data -D $PG14_DIR/data -k -r -j32
Version: PG11.4 to PG14.3
Environment: CentOS machine (32 cores(Intel), 128GB RAM)


Thanks and Regards,
Vignesh K.

Named Operators

2023-01-12 Thread Gurjeet Singh

Technically correct name of this feature would be Readable Names for
Operators, or Pronounceable Names for Operators. But I'd like to call
it Named Operators.

With this patch in place, the users can name the operators as
:some_pronounceable_name: instead of having to choose from the special
characters like #^&@. For example, users will be able to create and
use operators like:

select
expr1 :distance: expr2,
expr3 :contains_all: expr4,
expr5 :contains_any: expr6
expr7 :contains_exactly_two_of: expr8
from mytable;

instead of being forced to use these:

select
expr1 <#> expr2,
expr3 ?&  expr4,
expr5 ?|  expr6
expr7 ??!! expr8 -- ¯\_(ツ)_/¯
from mytable;

I think Named Operators will significantly improve the readability
of queries.

After a little trial-an-error, it was easy to develop the scan.l
rules to implement this feature, without flex barfing. The hard part
has been convincing myself that this is a safe implementation, even
though there are no regressions in `make check`. I am unsure of this
implementation's compatibility with the SQL Spec, and I'm not able to
envision problems with its interaction with some current or potential
feature of Postgres. So I'd really appreciate feedback from someone
who is conversant with the SQL Spec.

If the colon character being used as a delimiter poses a
challenge, other good candidates for the delimiter seem to be one of
~^` Although I haven't tested any of these to see if they cause a
regression. The colon character is be preferable for the delimiter,
since it is already used in the typecast :: operator.

I tried to strip the delimiters/colons from the name right in the
scanner, primarily because that would allow the identifier part of the
name to be as long as NAMEDATALEN-1, just like other identifiers
Postgres allows. Added benefit of stripping delimiters was that the
rest of the code, and catalogs/storage won't have to see the
delimiters.  But stripping the delimiters made the code brittle; some
places in code now had to be taught different handling depending on
whether the operator name was coming from the user command, or from
the catalogs. I had to special-case code in pg_dump, as well. To share
code with frontends like pg_dump, I had to place code in src/common/.
I was still not able to address some obvious bugs.

By retaining the delimiters : in the name, the code became much
simpler; pg_dump support came for free! The bugs became a non-issue.
To see how much code and complexity was reduced, one can see this
commit [1]. The downside of retaining the delimiters is that the
identifier part of the name can be no more than NAMEDATALEN-3 in
length.

Because of the minimal changes to the scanner rules, and no
changes in the grammar, I don't think there's any impact on precedence
and associativity rules of the operators. I'd be happy to learn
otherwise.

Here's a rudimentary test case to demonstrate the feature:

create operator :add_point: (function = box_add, leftarg = box,
rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a :add_point: '(1,1)' as modified from test;
drop operator :add_point:(box, point);

Feedback will be much appreciated!

[1]: Commit: Don't strip the delimiters
https://github.com/gurjeet/postgres/commit/62d11a578e5325c32109bb2a55a624d0d89d4b7e

[2]: Git branch named_operators
https://github.com/gurjeet/postgres/tree/named_operators

Best regards,
Gurjeet
http://Gurje.et


named_operators.patch
Description: Binary data

Re: Named Operators

2023-01-12 Thread Gurjeet Singh

Please see attached a slightly updated patch. There were some comment
changes sitting in uncommitted in Git worktree, that were missed.

Best regards,
Gurjeet
http://Gurje.et


named_operators_v1.patch
Description: Binary data

Re: Lazy allocation of pages required for verifying FPI consistency

2023-01-12 Thread Bharath Rupireddy

On Thu, Jan 12, 2023 at 1:59 PM Kyotaro Horiguchi
 wrote:
>
> At Mon, 9 Jan 2023 20:00:00 +0530, Bharath Rupireddy 
>  wrote in
> > I propose to statically allocate these two pages using PGAlignedBlock
> > structure lazily in verifyBackupPageConsistency() to not waste dynamic
> > memory worth 2*BLCKSZ bytes. I'm attaching a small patch herewith.
> >
> > Thoughts?
>
> IMHO, it's a bit scaring to me to push down the execution stack by
> that large size. I tend to choose the (current) possible memory
> wasting only on startup process than digging stack deeply.

On the contrary, PGAlignedBlock is being used elsewhere in the code;
some of them are hot paths. verifyBackupPageConsistency() is not
something that gets called always i.e. WAL consistency checks are done
conditionally - when either one enables wal_consistency_checking for
the rmgr or the WAL record is flagged with
XLR_CHECK_CONSISTENCY (core doesn't do, it's an external module, if
any, do that).

I really don't see much of a problem in allocating them statically and
pushing closer to where they're being used. If this really concerns,
at the least, the dynamic allocation needs to be pushed to
verifyBackupPageConsistency() IMO with if (first_time) { allocate two
blocks with palloc} and use them. This at least saves some memory on
the heap for most of the servers out there.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Remove nonmeaningful prefixes in PgStat_* fields

2023-01-12 Thread Drouvot, Bertrand


Hi hackers,

Please find attached a patch to $SUBJECT.

It is a preliminary patch for [1].

The main ideas are: 1) to have consistent naming between the pg_stat_get*() 
functions
and their associated counters and 2) to define the new macros in [1] the same 
way as it
has been done in 8018ffbf58 (aka not using the prefixes in the macros).

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

[1]: 
https://www.postgresql.org/message-id/flat/89606d96-cd94-af74-18f3-c7ab2b684...@gmail.comdiff --git a/src/backend/utils/activity/pgstat_function.c 
b/src/backend/utils/activity/pgstat_function.c
index 6139a27fee..edf79ebc8f 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -121,16 +121,16 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 
pending = entry_ref->pending;
 
-   fcu->fs = &pending->f_counts;
+   fcu->fs = &pending->counts;
 
/* save stats for this function, later used to compensate for recursion 
*/
-   fcu->save_f_total_time = pending->f_counts.f_total_time;
+   fcu->save_total_time = pending->counts.total_time;
 
/* save current backend-wide total time */
fcu->save_total = total_func_time;
 
/* get clock time as of function start */
-   INSTR_TIME_SET_CURRENT(fcu->f_start);
+   INSTR_TIME_SET_CURRENT(fcu->start);
 }
 
 /*
@@ -146,41 +146,41 @@ void
 pgstat_end_function_usage(PgStat_FunctionCallUsage *fcu, bool finalize)
 {
PgStat_FunctionCounts *fs = fcu->fs;
-   instr_time  f_total;
-   instr_time  f_others;
-   instr_time  f_self;
+   instr_time  total;
+   instr_time  others;
+   instr_time  self;
 
/* stats not wanted? */
if (fs == NULL)
return;
 
/* total elapsed time in this function call */
-   INSTR_TIME_SET_CURRENT(f_total);
-   INSTR_TIME_SUBTRACT(f_total, fcu->f_start);
+   INSTR_TIME_SET_CURRENT(total);
+   INSTR_TIME_SUBTRACT(total, fcu->start);
 
/* self usage: elapsed minus anything already charged to other calls */
-   f_others = total_func_time;
-   INSTR_TIME_SUBTRACT(f_others, fcu->save_total);
-   f_self = f_total;
-   INSTR_TIME_SUBTRACT(f_self, f_others);
+   others = total_func_time;
+   INSTR_TIME_SUBTRACT(others, fcu->save_total);
+   self = total;
+   INSTR_TIME_SUBTRACT(self, others);
 
/* update backend-wide total time */
-   INSTR_TIME_ADD(total_func_time, f_self);
+   INSTR_TIME_ADD(total_func_time, self);
 
/*
-* Compute the new f_total_time as the total elapsed time added to the
-* pre-call value of f_total_time.  This is necessary to avoid
+* Compute the new total_time as the total elapsed time added to the
+* pre-call value of total_time.  This is necessary to avoid
 * double-counting any time taken by recursive calls of myself.  (We do
 * not need any similar kluge for self time, since that already excludes
 * any recursive calls.)
 */
-   INSTR_TIME_ADD(f_total, fcu->save_f_total_time);
+   INSTR_TIME_ADD(total, fcu->save_total_time);
 
/* update counters in function stats table */
if (finalize)
-   fs->f_numcalls++;
-   fs->f_total_time = f_total;
-   INSTR_TIME_ADD(fs->f_self_time, f_self);
+   fs->numcalls++;
+   fs->total_time = total;
+   INSTR_TIME_ADD(fs->self_time, self);
 }
 
 /*
@@ -203,11 +203,11 @@ pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool 
nowait)
if (!pgstat_lock_entry(entry_ref, nowait))
return false;
 
-   shfuncent->stats.f_numcalls += localent->f_counts.f_numcalls;
-   shfuncent->stats.f_total_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_total_time);
-   shfuncent->stats.f_self_time +=
-   INSTR_TIME_GET_MICROSEC(localent->f_counts.f_self_time);
+   shfuncent->stats.numcalls += localent->counts.numcalls;
+   shfuncent->stats.total_time +=
+   INSTR_TIME_GET_MICROSEC(localent->counts.total_time);
+   shfuncent->stats.self_time +=
+   INSTR_TIME_GET_MICROSEC(localent->counts.self_time);
 
pgstat_unlock_entry(entry_ref);
 
diff --git a/src/backend/utils/activity/pgstat_relation.c 
b/src/backend/utils/activity/pgstat_relation.c
index 1730425de1..737ec525b3 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -38,9 +38,9 @@ typedef struct TwoPhasePgStatRecord
PgStat_Counter inserted_pre_truncdrop;
PgStat_Counter updated_pre_truncdrop;
PgStat_Counter deleted_pre_truncdrop;
-   Oid t_id;   /* table's OID */
-   boolt_shared;   /* is it a

Re: Named Operators

2023-01-12 Thread Matthias van de Meent

On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh  wrote:
>
> Technically correct name of this feature would be Readable Names for
> Operators, or Pronounceable Names for Operators. But I'd like to call
> it Named Operators.
>
> With this patch in place, the users can name the operators as
> :some_pronounceable_name: instead of having to choose from the special
> characters like #^&@.
> [...]
> I think Named Operators will significantly improve the readability
> of queries.

Couldn't the user better opt to call the functions that implement the
operator directly if they want more legible operations? So, from your
example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?

I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.

Kind regards,

Matthias van de Meent

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread shveta malik

On Wed, Jan 11, 2023 at 6:16 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > 2.
> > I think users can set ' wal_receiver_status_interval ' to 0 or more
> > than 'wal_sender_timeout'. But is this a frequent use-case scenario or
> > do we see DBAs setting these in such a way by mistake? If so, then I
> > think, it is better to give Warning message in such a case when a user
> > tries to create or alter a subscription with a large 'min_apply_delay'
> > (>=  'wal_sender_timeout') , rather than leaving it to the user's
> > understanding that WalSender may repeatedly timeout in such a case.
> > Parse_subscription_options and AlterSubscription can be modified to
> > log a warning. Any thoughts?
>
> Yes, DBAs may set wal_receiver_status_interval to more than 
> wal_sender_timeout by
> mistake.
>
> But to handle the scenario we must compare between min_apply_delay *on 
> subscriber*
> and wal_sender_timeout *on publisher*. Both values are not transferred to 
> opposite
> sides, so the WARNING cannot be raised. I considered that such a mechanism 
> seemed
> to be complex. The discussion around [1] may be useful.
>
> [1]: 
> https://www.postgresql.org/message-id/CAA4eK1Lq%2Bh8qo%2BrqGU-E%2BhwJKAHYocV54y4pvou4rLysCgYD-g%40mail.gmail.com
>

okay, I see. So even when 'wal_receiver_status_interval' is set to 0,
no log/warning is needed when the user tries to set min_apply_delay>0?
Are we good with doc alone?

One trivial correction in config.sgml:
+   terminates due to the timeout errors. Hence, make sure this parameter
+   shorter than the wal_sender_timeout of the publisher.
Hence, make sure this parameter is shorter...  

thanks
Shveta

UPDATE operation terminates logical replication receiver process due to an assertion

2023-01-12 Thread v . davydov


Dear all,

I think I've found a problem in logical replication that was introduced 
recently in the patch:


Fix calculation of which GENERATED columns need to be updated
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f7836ff651ad710fef52fa87b248ecdfc6468dc

There is an assertion which accidentally terminates logical replication 
worker process. The assertion was introduced in the patch. To reproduce 
the problem Postgres should be compiled with enabled assertions. The 
problem appears when executing UPDATE operation on a non-empty table 
with GENERATED columns and a BEFORE UPDATE trigger. The problem seems to 
appear on the latest snapshots of 13 and 14 versions (sorry, I haven't 
tested other versions).


Stack:
--
TRAP: FailedAssertion("relinfo->ri_GeneratedExprs != NULL", File: 
"execUtils.c", Line: 1292)
postgres: logical replication worker for subscription 16401 
(ExceptionalCondition+0x89)[0x55838760b902]
postgres: logical replication worker for subscription 16401 
(ExecGetExtraUpdatedCols+0x90)[0x558387314bd8]
postgres: logical replication worker for subscription 16401 
(ExecGetAllUpdatedCols+0x1c)[0x558387314c20]
postgres: logical replication worker for subscription 16401 
(ExecUpdateLockMode+0x19)[0x558387306ce3]
postgres: logical replication worker for subscription 16401 
(ExecBRUpdateTriggers+0xc7)[0x5583872debe8]
postgres: logical replication worker for subscription 16401 
(ExecSimpleRelationUpdate+0x122)[0x55838730dca7]
postgres: logical replication worker for subscription 16401 
(+0x43d32f)[0x55838745632f]
postgres: logical replication worker for subscription 16401 
(+0x43e382)[0x558387457382]
postgres: logical replication worker for subscription 16401 
(+0x43e5d3)[0x5583874575d3]
postgres: logical replication worker for subscription 16401 
(+0x43e76b)[0x55838745776b]
postgres: logical replication worker for subscription 16401 
(ApplyWorkerMain+0x3ac)[0x558387457e8b]
postgres: logical replication worker for subscription 16401 
(StartBackgroundWorker+0x253)[0x5583874157ed]
postgres: logical replication worker for subscription 16401 
(+0x40e9c9)[0x5583874279c9]
postgres: logical replication worker for subscription 16401 
(+0x40eb43)[0x558387427b43]
postgres: logical replication worker for subscription 16401 
(+0x40fd28)[0x558387428d28]

/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f08cd44b520]
/lib/x86_64-linux-gnu/libc.so.6(__select+0xbd)[0x7f08cd52474d]
postgres: logical replication worker for subscription 16401 
(+0x410ceb)[0x558387429ceb]
postgres: logical replication worker for subscription 16401 
(PostmasterMain+0xbf3)[0x55838742ac4d]
postgres: logical replication worker for subscription 16401 
(main+0x20c)[0x55838736076d]

/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f08cd432d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f08cd432e40]

How to reproduce:
-
1. Create master-replica configuration with enabled logical replication. 
The initial schema is shown below:


CREATE TABLE gtest26 (
 a int PRIMARY KEY,
 b int GENERATED ALWAYS AS (a * 2) STORED
 );

CREATE FUNCTION gtest_trigger_func() RETURNS trigger
 LANGUAGE plpgsql
 AS $$
 BEGIN
 IF tg_op IN ('DELETE', 'UPDATE') THEN
 RAISE INFO '%: %: old = %', TG_NAME, TG_WHEN, OLD;
 END IF;
 IF tg_op IN ('INSERT', 'UPDATE') THEN
 RAISE INFO '%: %: new = %', TG_NAME, TG_WHEN, NEW;
 END IF;
 IF tg_op = 'DELETE' THEN
 RETURN OLD;
 ELSE
 RETURN NEW;
 END IF;
 END
 $$;

CREATE TRIGGER gtest1 BEFORE DELETE OR UPDATE ON gtest26
 FOR EACH ROW
 WHEN (OLD.b < 0) -- ok
 EXECUTE PROCEDURE gtest_trigger_func();

INSERT INTO gtest26(a) values (-2), (0), (3)

2. The problem appears if to execute the following sql on the master 
node:


UPDATE gtest26 SET a = a + 1;

I'm not sure that this assertion is the proper one and how to properly 
fix the issue. That's why I'm asking for some help of the community. 
Thank you in advance.


With best regards,
Vitaly

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread shveta malik

On Thu, Jan 12, 2023 at 10:34 AM Amit Kapila  wrote:
>
> On Thu, Jan 12, 2023 at 9:54 AM Peter Smith  wrote:
> >
> >
> > doc/src/sgml/monitoring.sgml
> >
> > 5. pg_stat_subscription
> >
> > @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
> > pg_stat_activity WHERE wait_event i
> >
> >   
> >
> > +   apply_leader_pid integer
> > +  
> > +  
> > +   Process ID of the leader apply worker, if this process is a apply
> > +   parallel worker. NULL if this process is a leader apply worker or a
> > +   synchronization worker.
> > +  
> > + 
> > +
> > + 
> > +  
> > relid oid
> >
> >
> > OID of the relation that the worker is synchronizing; null for the
> > -   main apply worker
> > +   main apply worker and the parallel apply worker
> >
> >   
> >
> > 5a.
> >
> > (Same as general comment #1 about terminology)
> >
> > "apply_leader_pid" --> "leader_apply_pid"
> >
>
> How about naming this as just leader_pid? I think it could be helpful
> in the future if we decide to parallelize initial sync (aka parallel
> copy) because then we could use this for the leader PID of parallel
> sync workers as well.
>
> --

I still prefer leader_apply_pid.
leader_pid does not tell which 'operation' it belongs to. 'apply'
gives the clarity that it is apply related process.

The terms used in patch look very confusing. I had to read a few lines
multiple times to understand it.

1.
Summary says 'main_worker_pid' to be added but I do not see
'main_worker_pid' added in pg_stat_subscription, instead I see
'apply_leader_pid'. Am I missing something? Also, as stated above
'leader_apply_pid' makes more sense.
it is better to correct it everywhere (apply leader-->leader apply).
Once that is done, it can be reviewed again.

thanks
Shveta

Re: Named Operators

2023-01-12 Thread Gurjeet Singh

On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
 wrote:
>
> On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh  wrote:
> >
> > Technically correct name of this feature would be Readable Names for
> > Operators, or Pronounceable Names for Operators. But I'd like to call
> > it Named Operators.
> >
> > With this patch in place, the users can name the operators as
> > :some_pronounceable_name: instead of having to choose from the special
> > characters like #^&@.
> > [...]
> > I think Named Operators will significantly improve the readability
> > of queries.
>
> Couldn't the user better opt to call the functions that implement the
> operator directly if they want more legible operations? So, from your
> example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?

Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.

https://www.postgresql.org/docs/current/sql-createoperator.html

This proposal is primarily a replacement for the myriad of
hard-to-pronounce operators that users have to memorize. For example,
it'd be nice to have readable names for the PostGIS operators.

https://postgis.net/docs/reference.html#Operators

For someone who's reading/troubleshooting a PostGIS query, when they
encounter operator <<| — in the query for the first time, they'd have
to open up the docs. But if the query used the :strictly_below:
operator, there's no need to switch to docs and lose context.

> I'm -1 on the chosen syntax; :name: shadows common variable
> substitution patterns including those of psql.

Ah, thanks for reminding! Early on when I hadn't written code yet, I
remember discarding colon : as a delimiter choice, precisely because
it is used for using variables in psql, and perhaps in some drivers,
as well. But in the rush of implementing and wrangling code, I forgot
about that argument altogether.

I'll consider using one of the other special characters. Do you have
any suggestions?

Best regards,
Gurjeet
http://Gurje.et

Re: PG11 to PG14 Migration Slowness

2023-01-12 Thread Ilya Anfimov

On Thu, Jan 12, 2023 at 02:45:41PM +0530, Vigneshk Kvignesh wrote:
>Hi,
> 
>  I'm migrating  our existing PG instances from PG11.4  to PG14.3. I
>have around 5 Million Tables in a single database. When migrating using
>pg_upgrade, its taking 3 hours for the process to complete. I'm not sure
>if its the intended behaviour or we're missing something here.

 Yes. In fact, you have a good hardware and I would expect longer time
on average.

> Most of the tables (90%) in 5 Million are foreign tables. On analysis
>found that most of the time is spent in pg_dump (~2.3 hours). In pg_dump
>getTableAttrs(), dumpTable() functions take the most time, approx 1 hour
>each since we're processing table by table. Also, there are no columns
>with default values, which if present might take some time. We're using
>PG14's pg_upgrade binary for the process.
>Since we have all these tables in one database, parallelism doesn't
>have any effect here. Can we make binary upgrade for a single database run
>in parallel ?
> Kindly advise us if we have missed anything here and possible
>solutions for this problem.

 I  don't  see any problem. Three-hour downtime every three years
for such setup... You are lucky you have only that.

 But you could try some logical replication  to  the  new  server
version  for upgrading, if you really want to bother.  (Well, pg-
logical is my preferred on that scale, but the three general  op-
tions are: internal logical replication, pglogical, slony).



>So we're not sure on what we missed here.
>Have added more info on the process below.
>No. of Tables: 5 Million
>Time Taken: 3 Hours
>Command Used: $PG14_UPGRADE -Uroot -b $PG11_DIR/bin -B $PG14_DIR/bin -d
>$PG11_DIR/data -D $PG14_DIR/data -k -r -j32
>Version: PG11.4 to PG14.3
>Environment: CentOS machine (32 cores(Intel), 128GB RAM)
> 
>Thanks and Regards,
>Vignesh K.

Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread Amit Kapila

On Thu, Jan 12, 2023 at 4:21 PM shveta malik  wrote:
>
> On Thu, Jan 12, 2023 at 10:34 AM Amit Kapila  wrote:
> >
> > On Thu, Jan 12, 2023 at 9:54 AM Peter Smith  wrote:
> > >
> > >
> > > doc/src/sgml/monitoring.sgml
> > >
> > > 5. pg_stat_subscription
> > >
> > > @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
> > > pg_stat_activity WHERE wait_event i
> > >
> > >   
> > >
> > > +   apply_leader_pid integer
> > > +  
> > > +  
> > > +   Process ID of the leader apply worker, if this process is a apply
> > > +   parallel worker. NULL if this process is a leader apply worker or 
> > > a
> > > +   synchronization worker.
> > > +  
> > > + 
> > > +
> > > + 
> > > +  
> > > relid oid
> > >
> > >
> > > OID of the relation that the worker is synchronizing; null for the
> > > -   main apply worker
> > > +   main apply worker and the parallel apply worker
> > >
> > >   
> > >
> > > 5a.
> > >
> > > (Same as general comment #1 about terminology)
> > >
> > > "apply_leader_pid" --> "leader_apply_pid"
> > >
> >
> > How about naming this as just leader_pid? I think it could be helpful
> > in the future if we decide to parallelize initial sync (aka parallel
> > copy) because then we could use this for the leader PID of parallel
> > sync workers as well.
> >
> > --
>
> I still prefer leader_apply_pid.
> leader_pid does not tell which 'operation' it belongs to. 'apply'
> gives the clarity that it is apply related process.
>

But then do you suggest that tomorrow if we allow parallel sync
workers then we have a separate column leader_sync_pid? I think that
doesn't sound like a good idea and moreover one can refer to docs for
clarification.

-- 
With Regards,
Amit Kapila.

Beautify pg_walinspect docs a bit

2023-01-12 Thread Bharath Rupireddy

Hi,

As discussed [1], here's a patch to beautify pg_walinspect docs
similar to pageinspect docs. The existing pg_walinspect docs calls out
the column names explicitly and then also shows them in the function
execution examples which is duplicate and too informative. Also \x
isn't used so some of the execution outputs are out of indentation.

Thoughts?

[1] https://www.postgresql.org/message-id/Y7+gQy/louwk4...@paquier.xyz

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Beutify-pg_walinspect-docs-a-bit.patch
Description: Binary data

Re: Add a new pg_walinspect function to extract FPIs from WAL records

2023-01-12 Thread Bharath Rupireddy

On Thu, Jan 12, 2023 at 11:23 AM Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 06:59:18PM +0530, Bharath Rupireddy wrote:
> > I've done it that way for pg_get_wal_fpi_info. If this format looks
> > okay, I can propose to do the same for other functions (for
> > backpatching too) in a separate thread though.
>
> My vote would be to make that happen first, to have in place cleaner
> basics for the docs.  I could just do it and move on..

Sure. Posted a separate patch here -
https://www.postgresql.org/message-id/CALj2ACVGcUpziGgQrcT-1G3dHWQQfWjYBu1YQ2ypv9y86dgogg%40mail.gmail.com

> This reminds me of the slot advancing, where we discarded this case
> just because it is useful to enforce a LSN far in the future.
> Honestly, I cannot think of any case where I would use this level of
> validation, especially having *two* functions to decide one behavior
> or the other: this stuff could just use one function and use for
> example NULL as a setup to enforce the end of WAL, on top of a LSN far
> ahead..  But well..

I understand. I don't mind discussing something like [1] with the
following behaviour and discarding till_end_of_wal functions
altogether:
If start_lsn is NULL, error out/return NULL.
If end_lsn isn't specified, default to NULL, then determine the end_lsn.
If end_lsn is specified as NULL, then determine the end_lsn.
If end_lsn is specified as non-NULL, then determine if it is greater
than start_lsn if yes, go ahead do the job, otherwise error out.

I'll think a bit more on this and perhaps discuss it separately.

> > Hm. How about having pg_get_wal_fpi_info_till_end_of_wal() then?
>
> I don't really want to make the interface more bloated with more
> functions than necessary, TBH, though I get your point to push for
> consistency in these functions.  This makes me really wonder whether
> we'd better make relax all the existing functions, but I may get
> outvoted.

I'll keep the FPI extract function simple as proposed in the patch and
I'll not go write till_end_of_wal version. If needed to get all the
FPIs till the end of WAL, one can always determine the end_lsn with
pg_current_wal_flush_lsn()/pg_last_wal_replay_lsn() when in recovery,
and use the proposed function.

Note that I kept the FPI extract function test simple - ensure FPI
gets generated and check if the new function can fetch it, discarding
lsn or other FPI sanity checks. Whole FPI sanity check needs
pageinspect and we don't want to create the dependency just for tests.
And checking for FPI lsn with WAL record lsn requires us to fetch lsn
from raw bytea stream. As there's no straightforward way to convert
raw bytes from bytea to pg_lsn, doing that in a platform-agnostic
manner (little-endian and big-endian comes into play here) actually is
a no-go IMO. FWIW, for a little-endian system I had to do [2].

Therefore I stick to the simple test unless there's a better way.

I'm attaching the v6 patch for further review.

[1]
CREATE FUNCTION pg_get_wal_records_info(IN start_lsn pg_lsn,
IN end_lsn pg_lsn DEFAULT NULL,
OUT start_lsn pg_lsn,
OUT end_lsn pg_lsn,
OUT prev_lsn pg_lsn,
OUT xid xid,
OUT resource_manager text,
OUT record_type text,
OUT record_length int4,
OUT main_data_length int4,
OUT fpi_length int4,
OUT description text,
OUT block_ref text
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_records_info'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;

CREATE FUNCTION pg_get_wal_stats(IN start_lsn pg_lsn,
IN end_lsn pg_lsn DEFAULT NULL,
IN  per_record boolean DEFAULT false,
OUT "resource_manager/record_type" text,
OUT count int8,
OUT count_percentage float8,
OUT record_size int8,
OUT record_size_percentage float8,
OUT fpi_size int8,
OUT fpi_size_percentage float8,
OUT combined_size int8,
OUT combined_size_percentage float8
)
RETURNS SETOF record
AS 'MODULE_PATHNAME', 'pg_get_wal_stats'
LANGUAGE C CALLED ON NULL INPUT IMMUTABLE PARALLEL SAFE STRICT PARALLEL SAFE;

[2]

select 
'\xb80789012c00601f00200420df02e09f3800c09f3800a09f380080'::bytea
AS fpi \gset
select substring(:'fpi' from 3 for 16) AS rlsn \gset
select substring(:'rlsn' from 7 for 2) || substring(:'rlsn' from 5 for
2) || substring(:'rlsn' from 3 for 2) || substring(:'rlsn' from 1 for
2) AS hi \gset
select substring(:'rlsn' from 15 for 2) || substring(:'rlsn' from 13
for 2) || substring(:'rlsn' from 11 for 2) || substring(:'rlsn' from 9
for 2) AS lo \gset
select (:'hi' || '/' || :'lo')::pg_lsn;

postgres=# select (:'hi' || '/' || :'lo')::pg_lsn;
  pg_lsn
---
 0/18907B8
(1 row)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v6-0001-Add-FPI-extract-function-to-pg_walinspect.patch
Description: Binary data

Re: psql's FETCH_COUNT (cursor) is not being respected for CTEs

2023-01-12 Thread Daniel Verite

Tom Lane wrote:

> I agree that it seems like a good idea to try.
> There will be more per-row overhead, but the increase in flexibility
> is likely to justify that.

Here's a POC patch implementing row-by-row fetching.

If it wasn't for the per-row overhead, we could probably get rid of
ExecQueryUsingCursor() and use row-by-row fetches whenever
FETCH_COUNT is set, independently of the form of the query.

However the difference in processing time seems to be substantial: on
some quick tests with FETCH_COUNT=1, I'm seeing almost a 1.5x
increase on large datasets. I assume it's the cost of more allocations.
I would have hoped that avoiding the FETCH queries and associated
round-trips with the cursor method would compensate for that, but it
doesn't appear to be the case, at least with a fast local connection.

So in this patch, psql still uses the cursor method if the
query starts with "select", and falls back to the row-by-row in
the main code (ExecQueryAndProcessResults) otherwise.
Anyway it solves the main issue of the over-consumption of memory
for CTE and update/insert queries returning large resultsets.


Best regards,

-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 00627830c4..d3de9d8336 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -372,6 +372,7 @@ AcceptResult(const PGresult *result, bool show_error)
 		{
 			case PGRES_COMMAND_OK:
 			case PGRES_TUPLES_OK:
+			case PGRES_SINGLE_TUPLE:
 			case PGRES_EMPTY_QUERY:
 			case PGRES_COPY_IN:
 			case PGRES_COPY_OUT:
@@ -675,13 +676,13 @@ PrintNotifications(void)
  * Returns true if successful, false otherwise.
  */
 static bool
-PrintQueryTuples(const PGresult *result, const printQueryOpt *opt,
+PrintQueryTuples(const PGresult **result, int nresults, const printQueryOpt *opt,
  FILE *printQueryFout)
 {
 	bool		ok = true;
 	FILE	   *fout = printQueryFout ? printQueryFout : pset.queryFout;
 
-	printQuery(result, opt ? opt : &pset.popt, fout, false, pset.logfile);
+	printQueryChunks(result, nresults, opt ? opt : &pset.popt, fout, false, pset.logfile);
 	fflush(fout);
 	if (ferror(fout))
 	{
@@ -958,7 +959,7 @@ PrintQueryResult(PGresult *result, bool last,
 			else if (last && pset.crosstab_flag)
 success = PrintResultInCrosstab(result);
 			else if (last || pset.show_all_results)
-success = PrintQueryTuples(result, opt, printQueryFout);
+success = PrintQueryTuples((const PGresult**)&result, 1, opt, printQueryFout);
 			else
 success = true;
 
@@ -1369,6 +1370,47 @@ DescribeQuery(const char *query, double *elapsed_msec)
 	return OK;
 }
 
+/*
+ * Check if an output stream for \g needs to be opened, and if
+ * yes, open it.
+ * Return false if an error occurred, true otherwise.
+ */
+static bool
+SetupGOutput(PGresult *result, FILE **gfile_fout, bool *is_pipe)
+{
+	ExecStatusType status = PQresultStatus(result);
+	if (pset.gfname != NULL &&			/* there is a \g file or program */
+		*gfile_fout == NULL &&   /* and it's not already opened */
+		(status == PGRES_TUPLES_OK ||
+		 status == PGRES_SINGLE_TUPLE ||
+		 status == PGRES_COPY_OUT))
+	{
+		if (openQueryOutputFile(pset.gfname, gfile_fout, is_pipe))
+		{
+			if (is_pipe)
+disable_sigpipe_trap();
+		}
+		else
+			return false;
+	}
+	return true;
+}
+
+static void
+CloseGOutput(FILE *gfile_fout, bool is_pipe)
+{
+	/* close \g file if we opened it */
+	if (gfile_fout)
+	{
+		if (is_pipe)
+		{
+			pclose(gfile_fout);
+			restore_sigpipe_trap();
+		}
+		else
+			fclose(gfile_fout);
+	}
+}
 
 /*
  * ExecQueryAndProcessResults: utility function for use by SendQuery()
@@ -1400,10 +1442,16 @@ ExecQueryAndProcessResults(const char *query,
 	bool		success;
 	instr_time	before,
 after;
+	int fetch_count = pset.fetch_count;
 	PGresult   *result;
+
 	FILE	   *gfile_fout = NULL;
 	bool		gfile_is_pipe = false;
 
+	PGresult   **result_array = NULL; /* to collect results in single row mode */
+	int64		total_tuples = 0;
+	int			ntuples;
+
 	if (timing)
 		INSTR_TIME_SET_CURRENT(before);
 
@@ -1424,6 +1472,33 @@ ExecQueryAndProcessResults(const char *query,
 		return -1;
 	}
 
+	/*
+	 * If FETCH_COUNT is set and the context allows it, use the single row
+	 * mode to fetch results and have no more than FETCH_COUNT rows in
+	 * memory.
+	 */
+	if (fetch_count > 0 && !pset.crosstab_flag && !pset.gexec_flag && !is_watch
+		&& !pset.gset_prefix && pset.show_all_results)
+	{
+		/*
+		 * The row-by-row fetch is not enabled when SHOW_ALL_RESULTS is false,
+		 * since we would need to accumulate all rows before knowing
+		 * whether they need to be discarded or displayed, which contradicts
+		 * FETCH_COUNT.
+		 */
+		if (!PQsetSingleRowMode(pset.db))
+		{
+			pg_log_warning("fetching results in single row mode is unavailable");
+			fetch_count = 0;
+		}
+		else
+		{
+			result_array = (PGresult**) pg_malloc(fetch_count * sizeof(PGresult*));
+		}
+	}
+	else
+		fetch_count = 0;		/* d

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread houzj.f...@fujitsu.com

On Thursday, January 12, 2023 7:08 PM Amit Kapila  
wrote:
> 
> On Thu, Jan 12, 2023 at 4:21 PM shveta malik  wrote:
> >
> > On Thu, Jan 12, 2023 at 10:34 AM Amit Kapila 
> wrote:
> > >
> > > On Thu, Jan 12, 2023 at 9:54 AM Peter Smith 
> wrote:
> > > >
> > > >
> > > > doc/src/sgml/monitoring.sgml
> > > >
> > > > 5. pg_stat_subscription
> > > >
> > > > @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event
> > > > FROM pg_stat_activity WHERE wait_event i
> > > >
> > > >   
> > > > > > > role="column_definition">
> > > > +   apply_leader_pid
> integer
> > > > +  
> > > > +  
> > > > +   Process ID of the leader apply worker, if this process is a 
> > > > apply
> > > > +   parallel worker. NULL if this process is a leader apply worker 
> > > > or a
> > > > +   synchronization worker.
> > > > +  
> > > > + 
> > > > +
> > > > + 
> > > > +   > > > + role="column_definition">
> > > > relid oid
> > > >
> > > >
> > > > OID of the relation that the worker is synchronizing; null for 
> > > > the
> > > > -   main apply worker
> > > > +   main apply worker and the parallel apply worker
> > > >
> > > >   
> > > >
> > > > 5a.
> > > >
> > > > (Same as general comment #1 about terminology)
> > > >
> > > > "apply_leader_pid" --> "leader_apply_pid"
> > > >
> > >
> > > How about naming this as just leader_pid? I think it could be
> > > helpful in the future if we decide to parallelize initial sync (aka
> > > parallel
> > > copy) because then we could use this for the leader PID of parallel
> > > sync workers as well.
> > >
> > > --
> >
> > I still prefer leader_apply_pid.
> > leader_pid does not tell which 'operation' it belongs to. 'apply'
> > gives the clarity that it is apply related process.
> >
> 
> But then do you suggest that tomorrow if we allow parallel sync workers then
> we have a separate column leader_sync_pid? I think that doesn't sound like a
> good idea and moreover one can refer to docs for clarification.

I agree that leader_pid would be better not only for future parallel copy sync 
feature,
but also it's more consistent with the leader_pid column in pg_stat_activity.

And here is the version patch which addressed Peter's comments and renamed all
the related stuff to leader_pid.

Best Regards,
Hou zj


v79-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch
Description:  v79-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch


v79-0001-Add-leader_pid-to-pg_stat_subscription.patch
Description: v79-0001-Add-leader_pid-to-pg_stat_subscription.patch


v79-0002-Stop-extra-worker-if-GUC-was-changed.patch
Description: v79-0002-Stop-extra-worker-if-GUC-was-changed.patch


v79-0003-Add-GUC-stream_serialize_threshold-and-test-seri.patch
Description:  v79-0003-Add-GUC-stream_serialize_threshold-and-test-seri.patch

RE: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread houzj.f...@fujitsu.com

On Thursday, January 12, 2023 12:24 PM Peter Smith  
wrote:
> 
> Hi, here are some review comments for patch v78-0001.

Thanks for your comments.

> ==
> 
> General
> 
> 1. (terminology)
> 
> AFAIK everywhere until now we’ve been referring everywhere
> (docs/comments/code) to the parent apply worker as the "leader apply
> worker". Not the "main apply worker". Not the "apply leader worker".
> Not any other variations...
> 
> From this POV I think the worker member "apply_leader_pid" would be better
> named "leader_apply_pid",  but I see that this was already committed to
> HEAD differently.
> 
> Maybe it is not possible (or you don't want) to change that internal member
> name but IMO at least all the new code and docs should try to be using
> consistent terminology (e.g. leader_apply_XXX) where possible.
> 
> ==
> 
> Commit message
> 
> 2.
> 
> main_worker_pid is Process ID of the leader apply worker, if this process is a
> apply parallel worker. NULL if this process is a leader apply worker or a
> synchronization worker.
> 
> IIUC, this text is just cut/paste from the monitoring.sgml. In a review 
> comment
> below I suggest some changes to that text, so then this commit message
> should also change to be the same.

Changed.

> ~~
> 
> 3.
> 
> The new column can make it easier to distinguish leader apply worker and
> apply parallel worker which is also similar to the 'leader_pid' column in
> pg_stat_activity.
> 
> SUGGESTION
> The new column makes it easier to distinguish parallel apply workers from
> other kinds of workers. It is implemented this way to be similar to the
> 'leader_pid' column in pg_stat_activity.

Changed.

> ==
> 
> doc/src/sgml/logical-replication.sgml
> 
> 4.
> 
> +   being synchronized. Moreover, if the streaming transaction is applied in
> +   parallel, there will be additional workers.
> 
> SUGGESTION
> there will be additional workers -> there may be additional parallel apply
> workers

Changed.

> ==
> 
> doc/src/sgml/monitoring.sgml
> 
> 5. pg_stat_subscription
> 
> @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
> pg_stat_activity WHERE wait_event i
> 
>   
>
> +   apply_leader_pid integer
> +  
> +  
> +   Process ID of the leader apply worker, if this process is a apply
> +   parallel worker. NULL if this process is a leader apply worker or a
> +   synchronization worker.
> +  
> + 
> +
> + 
> +  
> relid oid
>
>
> OID of the relation that the worker is synchronizing; null for the
> -   main apply worker
> +   main apply worker and the parallel apply worker
>
>   
> 
> 5a.
> 
> (Same as general comment #1 about terminology)
> 
> "apply_leader_pid" --> "leader_apply_pid"

I changed this and all related stuff to "leader_pid" as I agree with Amit that
this might be useful for future features and is more consistent with the
leader_pid in pg_stat_activity.

> 
> ~~
> 
> 5b.
> 
> The current text feels awkward. I see it was copied from the similar text of
> 'pg_stat_activity' but perhaps it can be simplified a bit.
> 
> SUGGESTION
> Process ID of the leader apply worker if this process is a parallel apply 
> worker;
> otherwise NULL.

I slightly adjusted this according Amit's suggestion which I think would provide
more information.

"Process ID of the leader apply worker, if this process is a parallel apply 
worker.
NULL if this process is a leader apply worker or does not participate in 
parallel apply, or a synchronization worker."
"

> ~~
> 
> 5c.
> BEFORE
> null for the main apply worker and the parallel apply worker
> 
> AFTER
> null for the leader apply worker and parallel apply workers

Changed.

> ~~
> 
> 5c.
> 
> relid oid
>
>
> OID of the relation that the worker is synchronizing; null for the
> -   main apply worker
> +   main apply worker and the parallel apply worker
>
> 
> 
> main apply worker -> leader apply worker
> 

Changed.

> ~~~
> 
> 6.
> 
> @@ -3212,7 +3223,7 @@ SELECT pid, wait_event_type, wait_event FROM
> pg_stat_activity WHERE wait_event i
>
>
> Last write-ahead log location received, the initial value of
> -   this field being 0
> +   this field being 0; null for the parallel apply worker
>
>   
> 
> BEFORE
> null for the parallel apply worker
> 
> AFTER
> null for parallel apply workers
> 

Changed.

> ~~~
> 
> 7.
> 
> @@ -3221,7 +3232,8 @@ SELECT pid, wait_event_type, wait_event FROM
> pg_stat_activity WHERE wait_event i
> last_msg_send_time timestamp
> with time zone
>
>
> -   Send time of last message received from origin WAL sender
> +   Send time of last message received from origin WAL sender; null for
> the
> +   parallel apply worker
>
>   
> 
> (same as #6)
> 
> BEFORE
> null for the parallel apply worker
> 
> AFTER
> null for parallel apply workers
> 

Changed.

> ~~~

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread Melih Mutlu

Hi,

I've a question about 032_apply_delay.pl.

+# Test ALTER SUBSCRIPTION. Delay 86460 seconds (1 day 1 minute).
> +$node_subscriber->safe_psql('postgres',
> +"ALTER SUBSCRIPTION tap_sub SET (min_apply_delay = 8646)"
> +);
> +
> +# New row to trigger apply delay.
> +$node_publisher->safe_psql('postgres',
> +"INSERT INTO test_tab VALUES (0, 'foobar')");
> +


I couldn't quite see how these lines test whether ALTER SUBSCRIPTION
successfully worked.
Don't we need to check that min_apply_delay really changed as a result?

But also I see that subscription.sql already tests this ALTER SUBSCRIPTION
behaviour.

Best,
-- 
Melih Mutlu
Microsoft

Re: Named Operators

2023-01-12 Thread Matthias van de Meent

On Thu, 12 Jan 2023 at 11:59, Gurjeet Singh  wrote:
>
> On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
>  wrote:
> >
> > On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh  wrote:
> > >
> > > Technically correct name of this feature would be Readable Names for
> > > Operators, or Pronounceable Names for Operators. But I'd like to call
> > > it Named Operators.
> > >
> > > With this patch in place, the users can name the operators as
> > > :some_pronounceable_name: instead of having to choose from the special
> > > characters like #^&@.
> > > [...]
> > > I think Named Operators will significantly improve the readability
> > > of queries.
> >
> > Couldn't the user better opt to call the functions that implement the
> > operator directly if they want more legible operations? So, from your
> > example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?
>
> Matter of taste, I guess. But more importantly, defining an operator
> gives you many additional features that the planner can use to
> optimize your query differently, which it can't do with functions. See
> the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.

I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?

E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.

Kind regards,

Matthias van de Meent

Re: Add index scan progress to pg_stat_progress_vacuum

2023-01-12 Thread Imseih (AWS), Sami

Thanks for the feedback and I apologize for the delay in response.

>I think the problem here is that you're basically trying to work around the
>lack of an asynchronous state update mechanism between leader and workers. 
> The
>workaround is to add a lot of different places that poll whether there has
>been any progress. And you're not doing that by integrating with the 
> existing
>machinery for interrupt processing (i.e. CHECK_FOR_INTERRUPTS()), but by
>developing a new mechanism.

>I think your best bet would be to integrate with HandleParallelMessages().

You are correct. I have been trying to work around the async nature
of parallel workers performing the index vacuum. As you have pointed out,
integrating with HandleParallelMessages does appear to be the proper way.
Doing so will also avoid having to do any progress updates in the index AMs.

In the attached patch, the parallel workers send a new type of protocol message
type to the leader called 'P' which signals the leader that it should handle a
progress update. The leader then performs the progress update by
invoking a callback set in the ParallelContext. This is done inside  
HandleParallelMessages.
In the index vacuum case, the callback is parallel_vacuum_update_progress. 

The new message does not contain a payload, and it's merely used to
signal the leader that it can invoke a progress update.

Also, If the leader performs the index vacuum, it can call 
parallel_vacuum_update_progress
directly inside vacuumparallel.c.

Regards,

Sami Imseih
Amazon Web Services (AWS)






v22-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch
Description: v22-0001-Add-2-new-columns-to-pg_stat_progress_vacuum.-Th.patch

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Andrew Dunstan



On 2023-01-12 Th 00:12, Justin Pryzby wrote:
> On Wed, Jan 11, 2023 at 10:45:33PM -0500, Tom Lane wrote:
>> Amit Langote  writes:
>>>  On Thu, Jan 12, 2023 at 10:06 AM Tom Lane  wrote:
 I've pushed this with some cleanup --- aside from fixing
 outfuncs/readfuncs, I did some more work on the comments, which
 I think you were too sloppy about.
>>> Thanks a lot for the fixes.
>> It looks like we're not out of the woods on this: the buildfarm
>> members that run cross-version-upgrade tests are all unhappy.
>> Most of them are not reporting any useful details, but I suspect
>> that they are barfing because dumps from the old server include
>> table-qualified variable names in some CREATE VIEW commands while
>> dumps from HEAD omit the qualifications.  I don't see any
>> mechanism in TestUpgradeXversion.pm that could deal with that
>> conveniently, and in any case we'd have to roll out a client
>> script update to the affected animals.  I fear we may have to
>> revert this pending development of better TestUpgradeXversion.pm
>> support.
> There's a diffs available for several of them:
>
> - SELECT citext_table.id,
> -citext_table.name
> + SELECT id,
> +name
>
> It looks like TestUpgradeXversion.pm is using the diff command I sent to
> get tigher bounds on allowable changes.
>
> 20210415153722.gl6...@telsasoft.com
>
> It's ugly and a terrible hack, and I don't know whether anyone would say
> it's good enough, but one could can probably avoid the diff like:
>
> sed -r '/CREATE/,/^$/{ s/\w+\.//g }'
>
> You'd still have to wait for it to be deployed, though.


That looks quite awful. I don't think you could persuade me to deploy it
(We don't use sed anyway). It might be marginally better if the pattern
were /CREATE.*VIEW/ and we ignored that first line, but it still seems
awful to me.

Another approach might be simply to increase the latitude allowed for
old versions <= 15 with new versions >= 16. Currently we allow 90 for
cases where the versions differ, but we could increase it to, say, 200
in such cases (we'd need to experiment a bit to find the right limit).


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Minimal logical decoding on standbys

2023-01-12 Thread Ashutosh Sharma

Hi,

On Thu, Jan 12, 2023 at 5:29 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 1/11/23 7:04 PM, Drouvot, Bertrand wrote:
> > Hi,
> >
> > Please find V38 attached, I'll look at the other comments you've done in 
> > [1] on 0004 and 0006.
> >
> >

Sorry for joining late. I totally missed it. AFAICU, with this patch
users can now do LR from standby, previously they could only do it
from the primary server.

To start with, I have one small question:

I previously participated in the discussion on "Synchronizing the
logical replication slots from Primary to Standby" and one of the
purposes of that project was to synchronize logical slots from primary
to standby so that if failover occurs, it will not affect the logical
subscribers of the old primary much. Can someone help me understand
how we are going to solve this problem with this patch? Are we going
to encourage users to do LR from standby instead of primary to get rid
of such problems during failover?

Also, one small observation:

I just played around with the latest (v38) patch a bit and found that
when a new logical subscriber of standby is created, it actually
creates two logical replication slots for it on the standby server.
May I know the reason for creating an extra replication slot other
than the one created by create subscription command? See below:

Subscriber:
=
create subscription t1_sub connection 'host=127.0.0.1 port=38500
dbname=postgres user=ashu' publication t1_pub;

Standby:
===
postgres=# select * from pg_replication_slots;
slot_name|  plugin  | slot_type |
datoid | database | temporary | active | active_pid | xmin |
catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_w
al_size | two_phase
-+--+---++--+---+++--+--+-+-++---
+---
 pg_16399_sync_16392_7187728548042694423 | pgoutput | logical   |
5 | postgres | f | t  | 112595 |  |  760 |
0/3082528   | | reserved   |
| f
 t1_sub  | pgoutput | logical   |
5 | postgres | f | t  | 111940 |  |  760 |
0/30824F0   | 0/3082528   | reserved   |
| f

May I know the reason for creating pg_16399_sync_16392_7187728548042694423?

--
With Regards,
Ashutosh Sharma.

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-01-12 Thread Masahiko Sawada

On Thu, Jan 12, 2023 at 5:21 PM John Naylor
 wrote:
>
> On Thu, Jan 12, 2023 at 12:44 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 11, 2023 at 12:13 PM John Naylor
> >  wrote:
> > >
> > > On Tue, Jan 10, 2023 at 7:08 PM Masahiko Sawada  
> > > wrote:
>
> > I agree to keep this as a template.
>
> Okay, I'll squash the previous patch and work on cleaning up the internals. 
> I'll keep the external APIs the same so that your work on vacuum integration 
> can be easily rebased on top of that, and we can work independently.

Thanks!

>
> > From the vacuum integration
> > perspective, it would be better if we can use a common data type for
> > shared and local. It makes sense to have different data types if the
> > radix trees have different values types.
>
> I agree it would be better, all else being equal. I have some further 
> thoughts below.
>
> > > > It looks no problem in terms of vacuum integration, although I've not
> > > > fully tested yet. TID store uses the radix tree as the main storage,
> > > > and with the template radix tree, the data types for shared and
> > > > non-shared will be different. TID store can have an union for the
> > > > radix tree and the structure would be like follows:
> > >
> > > > /* Storage for Tids */
> > > > union tree
> > > > {
> > > > local_radix_tree*local;
> > > > shared_radix_tree   *shared;
> > > > };
> > >
> > > We could possibly go back to using a common data type for this, but with 
> > > unused fields in each setting, as before. We would have to be more 
> > > careful of things like the 32-bit crash from a few weeks ago.
> >
> > One idea to have a common data type without unused fields is to use
> > radix_tree a base class. We cast it to radix_tree_shared or
> > radix_tree_local depending on the flag is_shared in radix_tree. For
> > instance we have like (based on non-template version),
>
> > struct radix_tree
> > {
> > boolis_shared;
> > MemoryContext context;
> > };
>
> That could work in principle. My first impression is, just a memory context 
> is not much of a base class. Also, casts can creep into a large number of 
> places.
>
> Another thought came to mind: I'm guessing the TID store is unusual -- 
> meaning most uses of radix tree will only need one kind of memory 
> (local/shared). I could be wrong about that, and it _is_ a guess about the 
> future. If true, then it makes more sense that only code that needs both 
> memory kinds should be responsible for keeping them separate.

True.

>
> The template might be easier for future use cases if shared memory were 
> all-or-nothing, meaning either
>
> - completely different functions and types depending on RT_SHMEM
> - use branches (like v8)
>
> The union sounds like a good thing to try, but do whatever seems right.

I've implemented the idea of using union. Let me share WIP code for
discussion, I've attached three patches that can be applied on top of
v17-0009 patch. v17-0010 implements missing shared memory support
functions such as RT_DETACH and RT_GET_HANDLE, and some fixes.
v17-0011 patch adds TidStore, and v17-0012 patch is the vacuum
integration.

Overall, TidStore implementation with the union idea doesn't look so
ugly to me. But I got many compiler warning about unused radix tree
functions like:

tidstore.c:99:19: warning: 'shared_rt_delete' defined but not used
[-Wunused-function]

I'm not sure there is a convenient way to suppress this warning but
one idea is to have some macros to specify what operations are
enabled/declared.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v17-0010-fix-shmem-support.patch
Description: Binary data


v17-0011-Add-TIDStore-to-store-sets-of-TIDs-ItemPointerDa.patch
Description: Binary data


v17-0012-Use-TIDStore-for-storing-dead-tuple-TID-during-l.patch
Description: Binary data

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Tom Lane

Andrew Dunstan  writes:
> On 2023-01-12 Th 00:12, Justin Pryzby wrote:
>> It's ugly and a terrible hack, and I don't know whether anyone would say
>> it's good enough, but one could can probably avoid the diff like:
>> sed -r '/CREATE/,/^$/{ s/\w+\.//g }'

> That looks quite awful. I don't think you could persuade me to deploy it
> (We don't use sed anyway). It might be marginally better if the pattern
> were /CREATE.*VIEW/ and we ignored that first line, but it still seems
> awful to me.

Yeah, does not sound workable: it would risk ignoring actual problems.

I was wondering whether we could store a per-version patch or Perl
script that edits the old dump file to remove known discrepancies
from HEAD.  If well-maintained, that could eliminate the need for the
arbitrary "fuzz factors" that are in TestUpgradeXversion.pm right now.
I'd really want these files to be kept in the community source tree,
though, so that we do not need a new BF client release to change them.

This isn't the first time this has come up, but now we have a case
where it's actually blocking development, so maybe it's time to
make something happen.  If you want I can work on a patch for the
BF client.

regards, tom lane

Re: Remove source code display from \df+?

2023-01-12 Thread Magnus Hagander

On Thu, Jan 12, 2023 at 6:23 AM Pavel Stehule 
wrote:

>
>
> st 11. 1. 2023 v 22:11 odesílatel Tom Lane  napsal:
>
>> Pavel Stehule  writes:
>> > st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander 
>> > napsal:
>> >> This is only about Internal and C, isn't it? Isn't the oid of these
>> >> static, and identified by INTERNALlanguageId and ClanguageId
>> respectively?
>> >> So we could just have the query show the prosrc column if the language
>> oid
>> >> is one of those two, and otherwise show "Please use \sf to view the
>> >> source"?
>>
>> > I think it can be acceptable - maybe we can rename the column "source
>> code"
>> > like "internal name" or some like that.
>>
>> Yeah, "source code" has always been kind of a misnomer for these
>> languages.
>>
>> > again I don't think printing  "Please use \sf to view the source"? "
>> often
>> > can be user friendly.  \? is clear and \sf is easy to use
>>
>> We could shorten it to "See \sf" or something like that.  But if we change
>> the column header to "internal name" or the like, then the column just
>> obviously doesn't apply for non-internal languages, so leaving it null
>> should be fine.
>>
>
> +1
>
>
Sure, that works for me as well. I agree the suggested text was way too
long, I was more thinking of "something in this direction" rather than that
exact text. But yes, with a change of names, we can leave it NULL as well.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Improving btree performance through specializing by key shape, take 2

2023-01-12 Thread David Christensen

Hi Matthias,

I'm going to look at this patch series if you're still interested.  What was 
the status of your final performance testing for the 0008 patch alone vs the 
specialization series?  Last I saw on the thread you were going to see if the 
specialization was required or not.

Best,

David

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-12 Thread Robert Haas

On Wed, Jan 11, 2023 at 7:53 PM David G. Johnston
 wrote:
> Justed wanted to chime in and say Robert has eloquently put into words much 
> of what I have been thinking here, and that I concur that guiding the DBA to 
> use care with the power they have been provided is a sane position to take.
>
> +1, and thank you.

Thanks!

Here's a patch. In it I make three changes, only one of which is
directly relevant to the topic at hand:

1. Add a sentence to the documentation on writing SECURITY FUNCTIONS
safely concerning createrole_self_grant.
2. Add a sentence to the documentation on SECURITY DEFINER referring
to the section about writing such functions safely.
3. Remove a note discussing the fact that pre-8.3 versions did not
have SET clauses for functions.

I can separate this into multiple patches if desired. And of course
you, Tom, or others may have suggestions on which of these changes
should be included at all or how to word them better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

crsgdoc.patch
Description: Binary data

PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-12 Thread Xing Guo

Hi hackers,

I was running static analyser against PostgreSQL and found there're 2
return statements in PL/Python module which is not safe. Patch is
attached.

-- 
Best Regards,
Xing
diff --git a/src/pl/plpython/plpy_exec.c b/src/pl/plpython/plpy_exec.c
index 923703535a..c0e4a81283 100644
--- a/src/pl/plpython/plpy_exec.c
+++ b/src/pl/plpython/plpy_exec.c
@@ -414,12 +414,12 @@ PLy_function_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc)
 	PyObject   *volatile args = NULL;
 	int			i;
 
+	args = PyList_New(proc->nargs);
+	if (!args)
+		return NULL;
+
 	PG_TRY();
 	{
-		args = PyList_New(proc->nargs);
-		if (!args)
-			return NULL;
-
 		for (i = 0; i < proc->nargs; i++)
 		{
 			PLyDatumToOb *arginfo = &proc->args[i];
@@ -690,12 +690,12 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, PLyProcedure *proc, HeapTuple *r
 	PyObject   *volatile pltdata = NULL;
 	char	   *stroid;
 
+	pltdata = PyDict_New();
+	if (!pltdata)
+		return NULL;
+
 	PG_TRY();
 	{
-		pltdata = PyDict_New();
-		if (!pltdata)
-			return NULL;
-
 		pltname = PLyUnicode_FromString(tdata->tg_trigger->tgname);
 		PyDict_SetItemString(pltdata, "name", pltname);
 		Py_DECREF(pltname);

Re: Named Operators

2023-01-12 Thread Tom Lane

Matthias van de Meent  writes:
> I'm -1 on the chosen syntax; :name: shadows common variable
> substitution patterns including those of psql.

Yeah, this syntax is DOA because of that.  I think almost
anything you might invent is going to have conflict risks.

We could probably make it work by allowing the existing OPERATOR
syntax to take things that look like names as well as operators,
like

expr3 OPERATOR(contains_all) expr4

But that's bulky enough that nobody will care to use it.

On the whole I don't see this proposal going anywhere.
There's too much investment in the existing operator names,
and too much risk of conflicts if you try to shorten the
syntax.

regards, tom lane

Re: allowing for control over SET ROLE

2023-01-12 Thread Robert Haas

On Thu, Jan 12, 2023 at 12:09 AM Noah Misch  wrote:
> I think this is good to go modulo one or two things:
>
> > Subject: [PATCH v2] More documentation update for GRANT ... WITH SET OPTION.
> >
> > Update the reference pages for various ALTER commands that
> > mentioned that you must be a member of role that will be the
> > new owner to instead say that you must be able to SET ROLE
> > to the new owner. Update ddl.sgml's generate statement on this
>
> s/generate/general/

Oops, yes.

> > --- a/doc/src/sgml/ref/grant.sgml
> > +++ b/doc/src/sgml/ref/grant.sgml
> > @@ -298,6 +298,20 @@ GRANT  > class="parameter">role_name [, ...] TO  > This option defaults to TRUE.
> >
> >
> > +  
> > +   To create an object owned by another role or give ownership of an 
> > existing
> > +   object to another role, you must have the ability to SET
> > +   ROLE to that role; otherwise, commands such as ALTER
> > +   ... OWNER TO or CREATE DATABASE ... OWNER
> > +   will fail.  However, a user who inherits the privileges of a role but 
> > does
> > +   not have the ability to SET ROLE to that role may be
> > +   able to obtain full access to the role by manipulating existing objects
> > +   owned by that role (e.g. they could redefine an existing function to act
> > +   as a Trojan horse).  Therefore, if a role's privileges are to be 
> > inherited
> > +   but should not be accessible via SET ROLE, it should 
> > not
> > +   own any SQL objects.
> > +  
>
> I recommend deleting the phrase "are to be inherited but" as superfluous.  The
> earlier sentence's mention will still be there.  WITH SET FALSE + NOINHERIT is
> a combination folks should not use or should use only when the role has no
> known privileges.

I don't think I agree with this suggestion. If the privileges aren't
going to be inherited, it doesn't matter whether the role owns SQL
objects or not. And I think that there are two notable use cases for
SET FALSE + NOINHERIT (or SET FALSE + INHERIT FALSE). First, the a
grant with SET FALSE, INHERIT FALSE, ADMIN TRUE gives you the ability
to administer a role without inheriting its privileges or being able
to SET ROLE to it. You could grant yourself those abilities if you
want, but you don't have them straight off. In fact, CREATE ROLE
issued by a non-superuser creates such a grant implicitly as of
cf5eb37c5ee0cc54c80d95c1695d7fca1f7c68cb. Second, SET FALSE, INHERIT
FALSE could be used to set up groups for pg_hba.conf matching without
conferring privileges.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Named Operators

2023-01-12 Thread David G. Johnston

On Thu, Jan 12, 2023 at 3:59 AM Gurjeet Singh  wrote:

> On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
>  wrote:
>
> > I'm -1 on the chosen syntax; :name: shadows common variable
> > substitution patterns including those of psql.
>
> I'll consider using one of the other special characters. Do you have
> any suggestions?
>
>
The R language uses %...% to denote custom operators.

That would be a bit annoying for dynamic SQL using format though...

Do we have to choose?  There are 15 allowed characters for operator names
presently (aside from + and -), could we define the rule that an operator
name can contain any sequence of alphabetic+underscore+space? characters so
long as the first and last symbol of the operator name is one of those 15
characters?

Another appealing option would be the non-matching but complementary pair
<...> (I'd consider removing these from the 15 choices in we go that route)

SELECT 1  2;

I would probably avoid requiring back-ticks given their usage as identifier
quoting in other systems - probably remove it from the 15 choices if we go
that route.

David J.

Re: daitch_mokotoff module

2023-01-12 Thread Dag Lem

Paul Ramsey  writes:

> On Mon, Jan 2, 2023 at 2:03 PM Dag Lem  wrote:
>
>> I also improved on the documentation example (using Full Text Search).
>> AFAIK you can't make general queries like that using arrays, however in
>> any case I must admit that text arrays seem like more natural building
>> blocks than space delimited text here.
>
> This is a fun addition to fuzzystrmatch.

I'm glad to hear it! :-)

>
> While it's a little late in the game, I'll just put it out there:
> daitch_mokotoff() is way harder to type than soundex_dm(). Not sure
> how you feel about that.

I chose the name in order to follow the naming of the other functions in
fuzzystrmatch, which as far as I can tell are given the name which each
algorithm is known by.

Personally I don't think it's worth it to deviate from the naming of the
other functions just to avoid typing a few characters, and I certainly
don't think daitch_mokotoff is any harder to get right than
levenshtein_less_equal ;-)

So, if I were to decide, I wouldn't change the name of the function.
However I'm obviously not calling the shots on what goes into PostgreSQL
- perhaps someone else would like to weigh in on this?

>
> On the documentation, I found the leap directly into the tsquery
> example a bit too big. Maybe start with a very simple example,
>
> --
> dm=# SELECT daitch_mokotoff('Schwartzenegger'),
> daitch_mokotoff('Swartzenegger');
>
>  daitch_mokotoff | daitch_mokotoff
> -+-
>  {479465}| {479465}
> --
>
> Then transition into a more complex example that illustrates the GIN
> index technique you mention in the text, but do not show:
>
> --
> CREATE TABLE dm_gin (source text, dm text[]);
>
> INSERT INTO dm_gin (source) VALUES
> ('Swartzenegger'),
> ('John'),
> ('James'),
> ('Steinman'),
> ('Steinmetz');
>
> UPDATE dm_gin SET dm = daitch_mokotoff(source);
>
> CREATE INDEX dm_gin_x ON dm_gin USING GIN (dm);
>
> SELECT * FROM dm_gin WHERE dm && daitch_mokotoff('Schwartzenegger');
> --

Sure, I can do that. You don't think this much example text will be
TL;DR?

>
> And only then go into the tsearch example. Incidentally, what does the
> tsearch approach provide that the simple GIN approach does not?

The example shows how to do a simultaneous match on first AND last
names, where the first and last names (any number of names) are stored
in the same indexed column, and the order of the names in the index and
the search term does not matter.

If you were to use the GIN "&&" operator, you would get a match if
either the first OR the last name matches. If you were to use the GIN
"@>" operator, you would *not* get a match if the search term contains
more soundex codes than the indexed name.

E.g. this yields a correct match:
SELECT soundex_tsvector('John Yamson') @@ soundex_tsquery('John Jameson');

While this yields a false positive:
SELECT (daitch_mokotoff('John') || daitch_mokotoff('Yamson')) && 
(daitch_mokotoff('John') || daitch_mokotoff('Doe'));

And this yields a false negative:
SELECT (daitch_mokotoff('John') || daitch_mokotoff('Yamson')) @> 
(daitch_mokotoff('John') || daitch_mokotoff('Jameson'));

This may explained better by simply showing the output of
soundex_tsvector and soundex_tsquery:

SELECT soundex_tsvector('John Yamson');
 soundex_tsvector 
--
 '16':1 '164600':3 '46':2

SELECT soundex_tsquery('John Jameson');
  soundex_tsquery  
---
 ( '16' | '46' ) & ( '164600' | '464600' )

> Ideally explain that briefly before launching into the example. With
> all the custom functions and so on it's a little involved, so maybe if
> there's not a huge win in using that approach drop it entirely?

I believe this functionality is quite useful, and that it's actually
what's called for in many situations. So, I'd rather not drop this
example.

>
> ATB,
> P
>

Best regards,

Dag Lem

Re: split TOAST support out of postgres.h

2023-01-12 Thread Robert Haas

On Wed, Jan 11, 2023 at 1:14 AM Noah Misch  wrote:
> If the patch had just made postgres.h include varatt.h, like it does elog.h,
> I'd consider that change a nonnegative.  Grouping things is nice, even if it
> makes compilation a bit slower.  That also covers your frontend use case.  How
> about that?

I'm not direly opposed to that, but I'm also unconvinced that having
the varatt.h stuff is important enough relative to other things to
justify giving it a privileged place forever.

> I agree fixing any one extension is easy enough.  Thinking back to the
> htup_details.h refactor, I found the aggregate pain unreasonable relative to
> alleged benefits, even though each individual extension wasn't too bad.
> SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).

What annoyed me about that refactoring is that, in most cases where I
had been including htup.h, I had to change it to include
htup_details.h. In the main source tree, too, we've got 31 places that
include access/htup.h and 241 that include access/htup_details.h. It's
hard to argue that it was worth splitting the file given those
numbers, and in fact I think that change was a mistake. But that
doesn't mean every such change is a mistake.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread Takamichi Osumi (Fujitsu)

On Thursday, January 12, 2023 12:04 PM Kyotaro Horiguchi 
 wrote:
> At Wed, 11 Jan 2023 12:46:24 +, "Hayato Kuroda (Fujitsu)"
>  wrote in
> > them. Which version is better?
> 
> 
> Some comments by a quick loock, different from the above.
Horiguchi-san, thanks for your review !


> + CONNECTION 'host=192.168.1.50 port=5432 user=foo
> dbname=foodb'
> 
> I understand that we (not PG people, but IT people) are supposed to use in
> documents a certain set of special addresses that is guaranteed not to be
> routed in the field.
> 
> > TEST-NET-1 : 192.0.2.0/24
> > TEST-NET-2 : 198.51.100.0/24
> > TEST-NET-3 : 203.0.113.0/24
> 
> (I found 192.83.123.89 in the postgres_fdw doc, but it'd be another issue..)
Fixed. If necessary we can create another thread for this.

> + if (strspn(tmp, "-0123456789 ") == strlen(tmp))
> 
> Do we need to bother spending another memory block for apparent non-digits
> here?
Yes. The characters are necessary to handle an issue reported in [1].
The issue happened if the user inputs a negative value,
then the length comparison became different between strspn and strlen
and the input value was recognized as seconds, when
the unit wasn't described. This led to a wrong error message for the user.

Those addition of such characters solve the issue.

> + errmsg(INT64_FORMAT " ms
> is outside the valid range for parameter
> +\"%s\"",
> 
> We don't use INT64_FORMAT in translatable message strings. Cast then
> use %lld instead.
Thanks for teaching us. Fixed.

> This message looks unfriendly as it doesn't suggest the valid range, and it
> shows the input value in a different unit from what was in the input. A I 
> think we
> can spell it as "\"%s\" is outside the valid range for subsciription parameter
> \"%s\" (0 ..  in millisecond)"
Makes sense. I incorporated the valid range with the aligned format of 
recovery_min_apply_delay.
FYI, the physical replication's GUC doesn't write the unites for the range like 
below.
I followed and applied this style.

---
LOG:  -1 ms is outside the valid range for parameter "recovery_min_apply_delay" 
(0 .. 2147483647)
FATAL:  configuration file 
"/home/k5user/new/pg/l/make_v15/slave/postgresql.conf" contains errors
---

> + int64   min_apply_delay;
> ..
> + if (ms < 0 || ms > INT_MAX)
> 
> Why is the variable wider than required?
You are right. Fixed.

> + errmsg("%s and %s are mutually
> exclusive options",
> +"min_apply_delay > 0",
> "streaming = parallel"));
> 
> Mmm. Couldn't we refuse 0 as min_apply_delay?
Sorry, the previous patch's behavior wasn't consistent with this error message.

In the previous patch, if we conducted alter subscription
with stream = parallel and min_apply_delay = 0 (from a positive value) at the 
same time,
the alter command failed, although this should succeed by this time-delayed 
feature specification.
We fixed this part accordingly by some more tests in AlterSubscription().

By the way, we should allow users to change min_apply_dealy to 0
whenever they want from different value. Then, we didn't restrict
this kind of operation.

> + sub->minapplydelay > 0)
> ...
> + if (opts.min_apply_delay > 0 &&
> 
> Is there any reason for the differenciation?
Yes. The former is the object for an existing subscription configuration.
For example, if we alter subscription with setting streaming = 'parallel'
for a subscription created with min_apply_delay = '1 day', we
need to reject the alter command. The latter is new settings.


> +
>   errmsg("cannot set %s for subscription with %s",
> +
> "streaming = parallel", "min_apply_delay > 0"));
> 
> I think that this shoud be more like human-speking. Say, "cannot enable
> min_apply_delay for subscription in parallel streaming mode" or something..
> The same is applicable to the nearby message.
Reworded the error messages. Please check.

> +static void maybe_delay_apply(TimestampTz ts);
> 
>   apply_spooled_messages(FileSet *stream_fileset, TransactionId xid,
> -XLogRecPtr lsn)
> +XLogRecPtr lsn, TimestampTz ts)
> 
> "ts" looks too generic. Couldn't it be more specific?
> We need a explanation for the parameter in the function comment.
Changed it to finish_ts, since it indicates commit/prepare time.
This terminology should be aligned with finish lsn.

> + if (!am_parallel_apply_worker())
> + {
> + Assert(ts > 0);
> + maybe_delay_apply(ts);
> 
> It seems to me better that the if condition and assertion are checked inside
> maybe_delay_apply().
Fixed.


[1] - 
https://www.postgresql.org/message-id/CALDaNm3Bpzhh60nU-keuGxMPb-OhcqsfpCN3ysfCfCJ-2ShYPA%40mail.gmail.com


Best Regards,
Takamichi Osumi



v15-0001-Time-

Re: PG11 to PG14 Migration Slowness

2023-01-12 Thread Tom Lane

Vigneshk Kvignesh  writes:
>   I'm migrating  our existing PG instances from PG11.4  to PG14.3. I
> have around 5 Million Tables in a single database. When migrating using
> pg_upgrade, its taking 3 hours for the process to complete. I'm not sure if
> its the intended behaviour or we're missing something here.

There was some work done in v15 to make pg_dump deal better with
zillions of tables.  Don't know if you can consider retargeting
to v15, or how much the speedups would help in your particular
situation.

Why are you using 14.3, when the current release is 14.6?

regards, tom lane

Re: daitch_mokotoff module

2023-01-12 Thread Paul Ramsey




> On Jan 12, 2023, at 7:30 AM, Dag Lem  wrote:
> 
> Paul Ramsey  writes:
> 
>> On Mon, Jan 2, 2023 at 2:03 PM Dag Lem  wrote:
>> 
>>> I also improved on the documentation example (using Full Text Search).
>>> AFAIK you can't make general queries like that using arrays, however in
>>> any case I must admit that text arrays seem like more natural building
>>> blocks than space delimited text here.
>> 
>> This is a fun addition to fuzzystrmatch.
> 
> I'm glad to hear it! :-)
> 
>> 
>> While it's a little late in the game, I'll just put it out there:
>> daitch_mokotoff() is way harder to type than soundex_dm(). Not sure
>> how you feel about that.
> 
> I chose the name in order to follow the naming of the other functions in
> fuzzystrmatch, which as far as I can tell are given the name which each
> algorithm is known by.
> 
> Personally I don't think it's worth it to deviate from the naming of the
> other functions just to avoid typing a few characters, and I certainly
> don't think daitch_mokotoff is any harder to get right than
> levenshtein_less_equal ;-)

Good points :)

> 
>> 
>> On the documentation, I found the leap directly into the tsquery
>> example a bit too big. Maybe start with a very simple example,
>> 
>> --
>> dm=# SELECT daitch_mokotoff('Schwartzenegger'),
>>daitch_mokotoff('Swartzenegger');
>> 
>> daitch_mokotoff | daitch_mokotoff
>> -+-
>> {479465}| {479465}
>> --
>> 
>> Then transition into a more complex example that illustrates the GIN
>> index technique you mention in the text, but do not show:
>> 
>> --
>> CREATE TABLE dm_gin (source text, dm text[]);
>> 
>> INSERT INTO dm_gin (source) VALUES
>>('Swartzenegger'),
>>('John'),
>>('James'),
>>('Steinman'),
>>('Steinmetz');
>> 
>> UPDATE dm_gin SET dm = daitch_mokotoff(source);
>> 
>> CREATE INDEX dm_gin_x ON dm_gin USING GIN (dm);
>> 
>> SELECT * FROM dm_gin WHERE dm && daitch_mokotoff('Schwartzenegger');
>> --
> 
> Sure, I can do that. You don't think this much example text will be
> TL;DR?

I can only speak for myself, but examples are the meat of documentation 
learning, so as long as they come with enough explanatory context to be legible 
it's worth having them, IMO.

> 
>> 
>> And only then go into the tsearch example. Incidentally, what does the
>> tsearch approach provide that the simple GIN approach does not?
> 
> The example shows how to do a simultaneous match on first AND last
> names, where the first and last names (any number of names) are stored
> in the same indexed column, and the order of the names in the index and
> the search term does not matter.
> 
> If you were to use the GIN "&&" operator, you would get a match if
> either the first OR the last name matches. If you were to use the GIN
> "@>" operator, you would *not* get a match if the search term contains
> more soundex codes than the indexed name.
> 
> E.g. this yields a correct match:
> SELECT soundex_tsvector('John Yamson') @@ soundex_tsquery('John Jameson');
> 
> While this yields a false positive:
> SELECT (daitch_mokotoff('John') || daitch_mokotoff('Yamson')) && 
> (daitch_mokotoff('John') || daitch_mokotoff('Doe'));
> 
> And this yields a false negative:
> SELECT (daitch_mokotoff('John') || daitch_mokotoff('Yamson')) @> 
> (daitch_mokotoff('John') || daitch_mokotoff('Jameson'));
> 
> This may explained better by simply showing the output of
> soundex_tsvector and soundex_tsquery:
> 
> SELECT soundex_tsvector('John Yamson');
> soundex_tsvector 
> --
> '16':1 '164600':3 '46':2
> 
> SELECT soundex_tsquery('John Jameson');
>  soundex_tsquery  
> ---
> ( '16' | '46' ) & ( '164600' | '464600' )
> 
>> Ideally explain that briefly before launching into the example. With
>> all the custom functions and so on it's a little involved, so maybe if
>> there's not a huge win in using that approach drop it entirely?
> 
> I believe this functionality is quite useful, and that it's actually
> what's called for in many situations. So, I'd rather not drop this
> example.

Sounds good

P

> 
>> 
>> ATB,
>> P
>> 
> 
> Best regards,
> 
> Dag Lem

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread Takamichi Osumi (Fujitsu)

Hi, Shveta


Thanks for your comments!
On Thursday, January 12, 2023 6:51 PM shveta malik  
wrote:
> > Yes, DBAs may set wal_receiver_status_interval to more than
> > wal_sender_timeout by mistake.
> >
> > But to handle the scenario we must compare between min_apply_delay *on
> > subscriber* and wal_sender_timeout *on publisher*. Both values are not
> > transferred to opposite sides, so the WARNING cannot be raised. I
> > considered that such a mechanism seemed to be complex. The discussion
> around [1] may be useful.
> >
> > [1]:
> >
> https://www.postgresql.org/message-id/CAA4eK1Lq%2Bh8qo%2BrqGU-E%2B
> hwJK
> > AHYocV54y4pvou4rLysCgYD-g%40mail.gmail.com
> >
> 
> okay, I see. So even when 'wal_receiver_status_interval' is set to 0, no
> log/warning is needed when the user tries to set min_apply_delay>0?
> Are we good with doc alone?
Yes. As far as I can remember, we don't emit log or warning
for some kind of combination of those parameters (in the context
of timeout too). So, it should be fine.


> One trivial correction in config.sgml:
> +   terminates due to the timeout errors. Hence, make sure this parameter
> +   shorter than the wal_sender_timeout of the
> publisher.
> Hence, make sure this parameter is shorter...  
Fixed.

Kindly have a look at the latest patch shared in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83739C6133B50DDA8BAD1601EDFD9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Justin Pryzby

On Thu, Jan 12, 2023 at 09:54:09AM -0500, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 2023-01-12 Th 00:12, Justin Pryzby wrote:
> >> It's ugly and a terrible hack, and I don't know whether anyone would say
> >> it's good enough, but one could can probably avoid the diff like:
> >> sed -r '/CREATE/,/^$/{ s/\w+\.//g }'
> 
> > That looks quite awful. I don't think you could persuade me to deploy it
> > (We don't use sed anyway). It might be marginally better if the pattern
> > were /CREATE.*VIEW/ and we ignored that first line, but it still seems
> > awful to me.
> 
> Yeah, does not sound workable: it would risk ignoring actual problems.
> 
> I was wondering whether we could store a per-version patch or Perl
> script that edits the old dump file to remove known discrepancies
> from HEAD.  If well-maintained, that could eliminate the need for the
> arbitrary "fuzz factors" that are in TestUpgradeXversion.pm right now.
> I'd really want these files to be kept in the community source tree,
> though, so that we do not need a new BF client release to change them.
> 
> This isn't the first time this has come up, but now we have a case
> where it's actually blocking development, so maybe it's time to
> make something happen.  If you want I can work on a patch for the
> BF client.

What about also including a dump from an old version, too ?
Then the upgrade test can test actual upgrades.

A new dump file would need to be updated at every release; the old ones
could stick around, maybe forever.

-- 
Justin

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Tom Lane

Justin Pryzby  writes:
> What about also including a dump from an old version, too ?
> Then the upgrade test can test actual upgrades.

The BF clients already do that (if enabled), but they work from
up-to-date installations of the respective branch tips.  I'd not
want to have some branches including hypothetical output of
other branches, because it'd be too easy for those files to get
out of sync and deliver misleading answers.

regards, tom lane

RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread Takamichi Osumi (Fujitsu)

Hi, Melih

On Thursday, January 12, 2023 10:12 PM Melih Mutlu  
wrote:
> I've a question about 032_apply_delay.pl.
> ...
> I couldn't quite see how these lines test whether ALTER SUBSCRIPTION 
> successfully worked.
> Don't we need to check that min_apply_delay really changed as a result?
Yeah, we should check it from the POV of apply worker's debug logs.

The latest patch posted in [1] addressed your concern,
by checking the logged delay time in the server log.

I'd say what we could do is to check the logged time is long enough
after the ALTER SUBSCRIPTION command.

Please have a look at the patch.

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83739C6133B50DDA8BAD1601EDFD9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

Re: Add SHELL_EXIT_CODE to psql

2023-01-12 Thread Maxim Orlov

Unfortunately, cirrus-ci still not happy
https://cirrus-ci.com/task/6502730475241472

[23:14:34.206] time make -s -j${BUILD_JOBS} world-bin
[23:14:43.174] psqlscanslash.l: In function ‘evaluate_backtick’:
[23:14:43.174] psqlscanslash.l:827:11: error: implicit declaration of
function ‘WIFSTOPPED’ [-Werror=implicit-function-declaration]
[23:14:43.174] 827 | exit_code = WSTOPSIG(close_exit_code);
[23:14:43.174] | ^~
[23:14:43.174] psqlscanslash.l:828:16: error: implicit declaration of
function ‘WSTOPSIG’ [-Werror=implicit-function-declaration]
[23:14:43.174] 828 |
[23:14:43.174] | ^
[23:14:43.174] cc1: all warnings being treated as errors

>
and on FreeBSD

[23:13:03.914] cc -o ...
[23:13:03.914] ld: error: undefined symbol: WEXITSTATUS
[23:13:03.914] >>> referenced by command.c:5036
(../src/bin/psql/command.c:5036)
[23:13:03.914] >>>
src/bin/psql/psql.p/command.c.o:(exec_command_shell_escape)
[23:13:03.914] cc: error: linker command failed with exit code 1 (use -v to
see invocation)

and on Windows

[23:19:51.870] meson-generated_.._psqlscanslash.c.obj : error LNK2019:
unresolved external symbol WIFSTOPPED referenced in function
evaluate_backtick
[23:19:52.197] meson-generated_.._psqlscanslash.c.obj : error LNK2019:
unresolved external symbol WSTOPSIG referenced in function evaluate_backtick
[23:19:52.197] src\bin\psql\psql.exe : fatal error LNK1120: 2 unresolved
externals

I belive, we need proper includes.

-- 
Best regards,
Maxim Orlov.

Re: split TOAST support out of postgres.h

2023-01-12 Thread Peter Eisentraut


On 10.01.23 09:48, Peter Eisentraut wrote:

On 10.01.23 08:39, Noah Misch wrote:

On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:

On 30.12.22 17:50, Tom Lane wrote:

Peter Eisentraut  writes:

On 28.12.22 16:07, Tom Lane wrote:
I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is 
included

so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?



See this incremental patch set.


Wow, 41 files requiring varatt.h is a lot fewer than I would have 
guessed.

I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.


committed


SET_VARSIZE alone appears in 74 pgxn distributions, so I predict 
extension

breakage en masse.  I would revert this.


Well, that was sort of my thinking, but people seemed to like this.  I'm 
happy to consider alternatives.


Given the subsequent discussion, I'll keep it as is for now but consider 
it a semi-open item.  It's easy to change.

Re: Transparent column encryption

2023-01-12 Thread Peter Eisentraut


On 10.01.23 18:26, Mark Dilger wrote:

I wonder if logical replication could be made to work more easily with this 
feature.  Specifically, subscribers of encrypted columns will need the 
encrypted column encryption key (CEK) and the name of the column master key 
(CMD) as exists on the publisher, but getting access to that is not automated 
as far as I can see. It doesn't come through automatically as part of a 
subscription, and publisher's can't publish the pg_catalog tables where the 
keys are kept (because publishing system tables is not supported.)  Is it 
reasonable to make available the CEK and CMK to subscribers in an automated 
fashion, to facilitate setting up logical replication with less manual 
distribution of key information?  Is this already done, and I'm just not 
recognizing that you've done it?


This would be done as part of DDL replication.


Can we do anything about the attack vector wherein a malicious DBA simply 
copies the encrypted datum from one row to another?


We discussed this earlier [0].  This patch is not that feature.  We 
could get there eventually, but it would appear to be an immense amount 
of additional work.  We have to start somewhere.



[0]: 
https://www.postgresql.org/message-id/4fbcf5540633699fc3d81ffb59cb0ac884673a7c.ca...@vmware.com

Re: drop postmaster symlink

2023-01-12 Thread Peter Eisentraut


On 23.11.22 21:32, Joe Conway wrote:

Yeah. Also, I don't think it's generally too hard to find the parent
process anyway, because at least on my system, the other ones end up
with ps display that looks like "postgres: logical replication
launcher" or whatever. The main process doesn't set the ps status
display, so that's the only one that shows a full path to the
executable in the ps status, which is how I usually spot it. That has
the advantage that it doesn't matter which name was used to launch it,
too.


I think it is a problem that one of the most widely used packagings of 
PostgreSQL uses techniques that are directly contradicting the 
PostgreSQL documentation and are also inconsistent with other widely 
used packagings.  Users might learn this "trick" but then can't reuse it 
elsewhere, and conversely those who come from other systems might not be 
able to reuse their scripts.  That is annoying.


FWIW, the reason I took note of the postmaster symlink in the first 
place a few years ago was because selinux treats execution of programs 
from symlinks differently than from actual files.


This is another such case, where knowledge about selinux configuration 
cannot be transported between Linux distributions.


I almost feel that issues like this make a stronger case for removing 
the postmaster symlink than if it hadn't actually been in use, since the 
removal would serve to unify the landscape for the benefit of users.

Re: Named Operators

2023-01-12 Thread Isaac Morland

On Thu, 12 Jan 2023 at 05:59, Gurjeet Singh  wrote:

I'll consider using one of the other special characters. Do you have
> any suggestions?
>

What about backticks (`)? They are allowed as operator characters but do
not otherwise appear in the lexical syntax as far as I can tell:
https://www.postgresql.org/docs/current/sql-syntax-lexical.html

Re: Remove source code display from \df+?

2023-01-12 Thread Isaac Morland

On Thu, 12 Jan 2023 at 10:04, Magnus Hagander  wrote:

We could shorten it to "See \sf" or something like that.  But if we change
>>> the column header to "internal name" or the like, then the column just
>>> obviously doesn't apply for non-internal languages, so leaving it null
>>> should be fine.
>>>
>>
>> +1
>>
>>
> Sure, that works for me as well. I agree the suggested text was way too
> long, I was more thinking of "something in this direction" rather than that
> exact text. But yes, with a change of names, we can leave it NULL as well.
>

Thanks everybody. So based on the latest discussion I will:

1) rename the column from “Source code” to “Internal name”; and
2) change the contents to NULL except when the language (identified by oid)
is INTERNAL or C.

Patch forthcoming, I hope.

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-12 Thread Robert Haas

On Mon, Jan 9, 2023 at 8:40 PM Peter Geoghegan  wrote:
> That's not what the patch does. It doubles the time that the anti-wrap
> no-autocancellation behaviors kick in, up to a maximum of 1 billion
> XIDs/MXIDs. So it goes from autovacuum_freeze_max_age to
> autovacuum_freeze_max_age x 2, without changing the basic fact that we
> initially launch autovacuums that advance relfrozenxid/relminmxid when
> the autovacuum_freeze_max_age threshold is first crossed.

I'm skeptical about this kind of approach.

I do agree that it's good to slowly increase the aggressiveness of
VACUUM as we get further behind, rather than having big behavior
changes all at once, but I think that should happen by smoothly
varying various parameters rather than by making discrete behavior
changes at a whole bunch of different times. For instance, when VACUUM
goes into emergency mode, it stops respecting the vacuum delay. I
think that's great, but it happens all at once, and maybe it would be
better if it didn't. We could consider gradually ramping the vacuum
delay from 100% down to 0% instead of having it happen all at once.
Maybe that's not the right idea, I don't know, and a naive
implementation might be worse than nothing, but I think it has some
chance of being worth consideration.

But what the kind of change you're proposing here does is create
another threshold where the behavior changes suddenly, and I think
that's challenging from the point of view of understanding the
behavior of the system. The behavior already changes when you hit
vacuum_freeze_min_age and then again when you hit
vacuum_freeze_table_age and then there's also
autoovacuum_freeze_max_age and xidWarnLimit and xidStopLimit and a few
others, and these setting all interact in pretty complex ways. The
more conditional logic we add to that, the harder it becomes to
understand what's actually happening. You see a system where
age(relfrozenxid) = 673m and you need a calculator and a spreadsheet
to figure out what the vacuum behavior is at that point. Honestly, I
think we already have a problem with the behaviors here being too
complex for normal human beings to understand them, and I think that
the kinds of changes you are proposing here could make that quite a
bit worse.

Now, you might reply to the above by saying, well, some behaviors
can't vary continuously. vacuum_cost_limit can perhaps be phased out
gradually, but autocancellation seems like something that you must
either do, or not do. I would agree with that. But what I'm saying is
that we ought to favor having those kinds of behaviors all engage at
the same point rather than at different times. I'm not saying that
there can't ever be good reasons to separate out different behaviors
and have the engage at different times, but I think we will end up
better off if we minimize that sort of thing as much as we reasonably
can. In your opening email you write "Why should the
PROC_VACUUM_FOR_WRAPAROUND behavior happen on *exactly* the same
timeline as the one used to launch an antiwraparound autovacuum,
though?" and my answer is "because that's easier to understand and I
don't see that it has much of a downside."

I did take a look at the post-mortem to which you linked, but I am not
quite sure how that bears on the behavior change under discussion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Named Operators

2023-01-12 Thread Tom Lane

Isaac Morland  writes:
> What about backticks (`)? They are allowed as operator characters but do
> not otherwise appear in the lexical syntax as far as I can tell:
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html

Since they're already allowed as operator characters, you can't
use them for this purpose without breaking existing use-cases.

Even if they were completely unused, I'd be pretty hesitant to
adopt them for this purpose because of the potential confusion
for users coming from mysql.

Pretty much the only available syntax space is curly braces,
and I don't really want to give those up for this either.
(One has to assume that the SQL committee has their eyes
on those too.)

regards, tom lane

Re: Refactor recordExtObjInitPriv()

2023-01-12 Thread Peter Eisentraut


On 12.01.23 01:04, Nathan Bossart wrote:

-classoid == AggregateRelationId ||

I noticed that AggregateRelationId isn't listed in the ObjectProperty
array, so I think recordExtObjInitPriv() will begin erroring for that
classoid instead of ignoring it like we do today.


Hmm, we do have some extensions in contrib that add aggregates (citext, 
intagg).  I suspect that the aggregate function is actually registered 
into the extension via its pg_proc entry, so this wouldn't actually 
matter.  But maybe the commenting should be clearer?

Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-01-12 Thread Alvaro Herrera

On 2023-Jan-12, Drouvot, Bertrand wrote:

> Please find attached a patch to $SUBJECT.
> 
> It is a preliminary patch for [1].
> 
> The main ideas are: 1) to have consistent naming between the pg_stat_get*() 
> functions
> and their associated counters and 2) to define the new macros in [1] the same 
> way as it
> has been done in 8018ffbf58 (aka not using the prefixes in the macros).

I don't like this at all.  With these prefixes in place, it's much more
likely that you'll be able to grep the whole source tree and not run
into tons of false positives.  If you remove them, that tends to be not
very workable.  If we use these commits as precedent for expanding this
sort of renaming across the tree, we'll soon end up with a very
grep-unfriendly code base.

The PGSTAT_ACCUM_DBCOUNT and PG_STAT_GET_DBENTRY macros are just one
argument away from being able to generate the variable name including
the prefix, anyway.  I don't know why we had to rename everything in
order to do 8018ffbf5895, and if I had my druthers, we'd undo that.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Andrew Dunstan

On 2023-01-12 Th 09:54, Tom Lane wrote:
>
> I was wondering whether we could store a per-version patch or Perl
> script that edits the old dump file to remove known discrepancies
> from HEAD.  If well-maintained, that could eliminate the need for the
> arbitrary "fuzz factors" that are in TestUpgradeXversion.pm right now.
> I'd really want these files to be kept in the community source tree,
> though, so that we do not need a new BF client release to change them.
>
> This isn't the first time this has come up, but now we have a case
> where it's actually blocking development, so maybe it's time to
> make something happen.  If you want I can work on a patch for the
> BF client.
>
>   

I wouldn't worry too much about the client for now. What we'd need is a)
a place in the source code where we know to find the module b) a module
name c) one or more functions to call to make the adjustment(s).

so, say in src/test/perl we have PostgreSQL/AdjustUpgrade.pm with a
subroutine adjust_dumpfile($oldversion, $dumpfile).

That would be fairly easy to look for and call, and a good place to
start. More ambitiously we might also provide a function do do most of
the pre_upgrade adjustments made in TestUpgradeXversion.pm at lines
405-604. But let's walk before we try to run. This is probably a good
time to be doing this as I want to push out a new release pretty soon to
deal with the up-to-date check issues.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Refactor recordExtObjInitPriv()

2023-01-12 Thread Tom Lane

Peter Eisentraut  writes:
> On 12.01.23 01:04, Nathan Bossart wrote:
> -  classoid == AggregateRelationId ||
>> I noticed that AggregateRelationId isn't listed in the ObjectProperty
>> array, so I think recordExtObjInitPriv() will begin erroring for that
>> classoid instead of ignoring it like we do today.

> Hmm, we do have some extensions in contrib that add aggregates (citext, 
> intagg).  I suspect that the aggregate function is actually registered 
> into the extension via its pg_proc entry, so this wouldn't actually 
> matter.  But maybe the commenting should be clearer?

Yeah, I don't believe that AggregateRelationId is used in object
addresses; we just refer to pg_proc for any kind of function including
aggregates.  Note that there is no "oid" column in pg_aggregate.

regards, tom lane

Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-01-12 Thread Tom Lane

Alvaro Herrera  writes:
> I don't like this at all.  With these prefixes in place, it's much more
> likely that you'll be able to grep the whole source tree and not run
> into tons of false positives.  If you remove them, that tends to be not
> very workable.  If we use these commits as precedent for expanding this
> sort of renaming across the tree, we'll soon end up with a very
> grep-unfriendly code base.

+1, that was my immediate fear as well.

regards, tom lane

Re: on placeholder entries in view rule action query's range table

2023-01-12 Thread Tom Lane

Andrew Dunstan  writes:
> On 2023-01-12 Th 09:54, Tom Lane wrote:
>> I was wondering whether we could store a per-version patch or Perl
>> script that edits the old dump file to remove known discrepancies
>> from HEAD.

> so, say in src/test/perl we have PostgreSQL/AdjustUpgrade.pm with a
> subroutine adjust_dumpfile($oldversion, $dumpfile).

Seems reasonable.  I was imagining per-old-version .pm files, but
there's likely to be a fair amount of commonality between what to
do for different old versions, so probably that approach would be
too duplicative.

> That would be fairly easy to look for and call, and a good place to
> start. More ambitiously we might also provide a function do do most of
> the pre_upgrade adjustments made in TestUpgradeXversion.pm at lines
> 405-604. But let's walk before we try to run.

I think that part is also very very important to abstract out of the
BF client.  We've been burnt on that before too.  So, perhaps one
subroutine that can apply updates to the source DB just before
we dump it, and then a second that can edit the dump file after?

We could imagine a third custom subroutine that abstracts the
actual dump file comparison, but I'd prefer to get to a place
where we just expect exact match after the edit step.

I'll work on a straw-man patch.

regards, tom lane

Re: Refactor recordExtObjInitPriv()

2023-01-12 Thread Nathan Bossart

On Thu, Jan 12, 2023 at 12:20:50PM -0500, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 12.01.23 01:04, Nathan Bossart wrote:
>> - classoid == AggregateRelationId ||
>>> I noticed that AggregateRelationId isn't listed in the ObjectProperty
>>> array, so I think recordExtObjInitPriv() will begin erroring for that
>>> classoid instead of ignoring it like we do today.
> 
>> Hmm, we do have some extensions in contrib that add aggregates (citext, 
>> intagg).  I suspect that the aggregate function is actually registered 
>> into the extension via its pg_proc entry, so this wouldn't actually 
>> matter.  But maybe the commenting should be clearer?
> 
> Yeah, I don't believe that AggregateRelationId is used in object
> addresses; we just refer to pg_proc for any kind of function including
> aggregates.  Note that there is no "oid" column in pg_aggregate.

Got it, thanks for clarifying.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Named Operators

2023-01-12 Thread Vik Fearing


On 1/12/23 18:14, Tom Lane wrote:


Pretty much the only available syntax space is curly braces,
and I don't really want to give those up for this either.
(One has to assume that the SQL committee has their eyes
on those too.)


They are used in row pattern recognition.
--
Vik Fearing

Re: What object types should be in schemas?

2023-01-12 Thread Alvaro Herrera

On 2023-Jan-11, Peter Eisentraut wrote:

> How does one decide whether something should be in a schema or not?  The
> current state feels intuitively correct, but I can't determine any firm way
> to decide.
> 
> Over in the column encryption thread, the patch proposes to add various key
> types as new object types.  For simplicity, I just stuck them directly under
> database, but I don't know whether that is correct.

I think one important criterion to think about is how does encryption work
when you have per-customer (or per-whatever) schemas.  Is the concept of
a column encryption [objtype] a thing that you would like to set up per
customer?  In that case, you will probably want that object to live in
that customer's schema.  Otherwise, you'll force the DBA to come up with
a naming scheme that includes the customer name in the column encryption
object.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: Remove nonmeaningful prefixes in PgStat_* fields

2023-01-12 Thread Andres Freund

Hi,

On 2023-01-12 18:12:52 +0100, Alvaro Herrera wrote:
> On 2023-Jan-12, Drouvot, Bertrand wrote:
>
> > Please find attached a patch to $SUBJECT.
> >
> > It is a preliminary patch for [1].
> >
> > The main ideas are: 1) to have consistent naming between the pg_stat_get*() 
> > functions
> > and their associated counters and 2) to define the new macros in [1] the 
> > same way as it
> > has been done in 8018ffbf58 (aka not using the prefixes in the macros).
>
> I don't like this at all.  With these prefixes in place, it's much more
> likely that you'll be able to grep the whole source tree and not run
> into tons of false positives.  If you remove them, that tends to be not
> very workable.  If we use these commits as precedent for expanding this
> sort of renaming across the tree, we'll soon end up with a very
> grep-unfriendly code base.

The problem with that is that the prefixes are used completely inconsistently
- and have been for a long time. And not just between the different type of
stats. Compare e.g. PgStat_TableCounts with PgStat_TableXactStatus and
PgStat_StatTabEntry. Whereas PgStat_FunctionCounts and PgStat_StatFuncEntry
both use it.  Right now there's no way to remember where to add the t_ prefix,
and where not.

Imo the reason to rename here isn't to abolish prefixes, it's to be halfway
consistent within closeby code. And the code overwhelmingly doesn't use the
prefixes.

Greetings,

Andres Freund

Re: Minimal logical decoding on standbys

2023-01-12 Thread Andres Freund

Hi,

On 2023-01-12 20:08:55 +0530, Ashutosh Sharma wrote:
> I previously participated in the discussion on "Synchronizing the
> logical replication slots from Primary to Standby" and one of the
> purposes of that project was to synchronize logical slots from primary
> to standby so that if failover occurs, it will not affect the logical
> subscribers of the old primary much. Can someone help me understand
> how we are going to solve this problem with this patch? Are we going
> to encourage users to do LR from standby instead of primary to get rid
> of such problems during failover?

It only provides a building block towards that. The "Synchronizing the logical
replication slots from Primary to Standby" project IMO needs all of the
infrastructure in this patch. With the patch, a logical rep solution can
e.g. maintain one slot on the primary and one on the standby, and occasionally
forward the slot on the standby to the position of the slot on the primary. In
case of a failover it can just start consuming changes from the former
standby, all the necessary changes are guaranteed to be present.

> Also, one small observation:
> 
> I just played around with the latest (v38) patch a bit and found that
> when a new logical subscriber of standby is created, it actually
> creates two logical replication slots for it on the standby server.
> May I know the reason for creating an extra replication slot other
> than the one created by create subscription command? See below:

That's unrelated to this patch. There's no changes to the "higher level"
logical replication code dealing with pubs and subs, it's all on the "logical
decoding" level.

I think this because logical rep wants to be able to concurrently perform
ongoing replication, and synchronize tables added to the replication set. The
pg_16399_sync_16392_7187728548042694423 slot should vanish after the initial
synchronization.

Greetings,

Andres Freund

Re: recovery modules

2023-01-12 Thread Nathan Bossart

On Thu, Jan 12, 2023 at 03:30:40PM +0900, Michael Paquier wrote:
> On Wed, Jan 11, 2023 at 11:29:01AM -0800, Nathan Bossart wrote:
>> I initially created a separate basic_restore module, but decided to fold it
>> into basic_archive to simplify the patch and tests.  I hesitated to rename
>> it because it already exists in v15, and since it deals with creating and
>> restoring archive files, the name still seemed somewhat accurate.  That
>> being said, I don't mind renaming it if that's what folks want.
> 
> I've done that in the past for pg_verify_checksums -> pg_checksums, so
> I would not mind renaming it so as it reflects better its role.
> (Being outvoted is fine for me if this suggestion sounds bad).

IMHO I don't think there's an urgent need to rename it, but if there's a
better name that people like, I'm happy to do so.

> Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with
> the duplication for the segment name build), so I would be tempted to
> get this one done.  My gut tells me that we'd better remove the
> duplication and just pass down the two fields to
> shell_archive_cleanup() and shell_recovery_end(), with the segment
> name given to ExecuteRecoveryCommand()..

I moved the duplicated logic to its own function in v6.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 078e81ce389680fb98a8b5c2fb6cfb3fb42f892e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 23 Dec 2022 16:35:25 -0800
Subject: [PATCH v6 1/3] Move the code to restore files via the shell to a
 separate file.

This is preparatory work for allowing more extensibility in this
area.
---
 src/backend/access/transam/Makefile|   1 +
 src/backend/access/transam/meson.build |   1 +
 src/backend/access/transam/shell_restore.c | 158 +
 src/backend/access/transam/xlog.c  |  37 +++--
 src/backend/access/transam/xlogarchive.c   | 120 +---
 src/include/access/xlogarchive.h   |   7 +-
 6 files changed, 197 insertions(+), 127 deletions(-)
 create mode 100644 src/backend/access/transam/shell_restore.c

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 661c55a9db..099c315d03 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -19,6 +19,7 @@ OBJS = \
 	multixact.o \
 	parallel.o \
 	rmgr.o \
+	shell_restore.o \
 	slru.o \
 	subtrans.o \
 	timeline.o \
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index 8920c1bfce..3031c2f6cf 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -7,6 +7,7 @@ backend_sources += files(
   'multixact.c',
   'parallel.c',
   'rmgr.c',
+  'shell_restore.c',
   'slru.c',
   'subtrans.c',
   'timeline.c',
diff --git a/src/backend/access/transam/shell_restore.c b/src/backend/access/transam/shell_restore.c
new file mode 100644
index 00..f52f0b92a4
--- /dev/null
+++ b/src/backend/access/transam/shell_restore.c
@@ -0,0 +1,158 @@
+/*-
+ *
+ * shell_restore.c
+ *
+ * These recovery functions use a user-specified shell command (e.g., the
+ * restore_command GUC).
+ *
+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/backend/access/transam/shell_restore.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/xlogarchive.h"
+#include "access/xlogrecovery.h"
+#include "common/archive.h"
+#include "common/percentrepl.h"
+#include "storage/ipc.h"
+#include "utils/wait_event.h"
+
+static void ExecuteRecoveryCommand(const char *command,
+   const char *commandName, bool failOnSignal,
+   uint32 wait_event_info,
+   const char *lastRestartPointFileName);
+
+bool
+shell_restore(const char *file, const char *path,
+			  const char *lastRestartPointFileName)
+{
+	char	   *cmd;
+	int			rc;
+
+	/* Build the restore command to execute */
+	cmd = BuildRestoreCommand(recoveryRestoreCommand, path, file,
+			  lastRestartPointFileName);
+
+	ereport(DEBUG3,
+			(errmsg_internal("executing restore command \"%s\"", cmd)));
+
+	/*
+	 * Copy xlog from archival storage to XLOGDIR
+	 */
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+	rc = system(cmd);
+	pgstat_report_wait_end();
+
+	pfree(cmd);
+
+	/*
+	 * Remember, we rollforward UNTIL the restore fails so failure here is
+	 * just part of the process... that makes it difficult to determine
+	 * whether the restore failed because there isn't an archive to restore,
+	 * or because the administrator has specified the restore program
+	 * incorrectly.  We have to assume the former.
+	 *
+	 * However, if the failure was due to any sort of signal, it's best to
+	 * punt and abort recovery.  (If we "return false" here, upper levels will
+	 * assume that recovery is complete

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-12 Thread Andres Freund

Hi,

On 2023-01-12 08:38:57 +0100, Drouvot, Bertrand wrote:
> On 1/11/23 11:59 PM, Andres Freund wrote:
> > > Now that this patch renames some fields
> > 
> > I don't mind renaming the fields - the prefixes really don't provide 
> > anything
> > useful. But it's not clear why this is related to this patch? You could just
> > include the f_ prefix in the macro, no?
> > 
> > 
> 
> Right, but the idea is to take the same approach that the one used in 
> 8018ffbf58 (where placing the prefixes in the macro
> would have been possible too).

I'm not super happy about that patch tbh.


> > Probably should remove PgStat_BackendFunctionEntry.
> 
> I think that would be a 3rd patch, agree?

Yep.



> > I now see that PG_STAT_GET_DBENTRY_FLOAT8 already exists, defined the same
> > way. But the name fields misleading enough that I'd be inclined to rename 
> > it?
> > 
> 
> PG_STAT_GET_FUNCENTRY_FLOAT8_MS looks good by me. Waiting on what we'll 
> decide for
> the existing PG_STAT_GET_DBENTRY_FLOAT8 (so that I can align for the 
> PG_STAT_GET_FUNCENTRY_FLOAT8).

+1



> > Although I suspect this actually hints at an architectural thing that could 
> > be
> > fixed better: Perhaps we should replace find_tabstat_entry() with a version
> > returning a fully "reconciled" PgStat_StatTabEntry?  It feels quite wrong to
> > have that intimitate knowledge of the subtransaction stuff in pgstatfuncs.c
> > and about how the different counts get combined.
> > 
> > I think that'd allow us to move the definition of PgStat_TableStatus to
> > PgStat_TableXactStatus, PgStat_TableCounts to pgstat_internal.h. Which 
> > feels a
> > heck of a lot cleaner.
> 
> Yeah, I think that would be for a 4th patch, agree?

Yea. I am of multiple minds about the ordering. I can see benefits on fixing
the architectural issue before reducing duplication in the accessor with a
macro. The reason is that if we addressed the architectural issue, the
difference between the xact and non-xact version will be very minimal, and
could even be generated by the same macro.

Greetings,

Andres Freund

Re: Using WaitEventSet in the postmaster

2023-01-12 Thread Andres Freund

Hi,

On 2023-01-12 20:35:43 +1300, Thomas Munro wrote:
> Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.
> 
> The WAIT_USE_EPOLL and WAIT_USE_KQUEUE implementations of
> WaitEventSetWaitBlock() confused the size of their internal buffer with
> the size of the caller's output buffer, and could ask the kernel for too
> many events.  In fact the set of events retrieved from the kernel needs
> to be able to fit in both buffers, so take the minimum of the two.
> 
> The WAIT_USE_POLL and WAIT_USE WIN32 implementations didn't have this
> confusion.

> This probably didn't come up before because we always used the same
> number in both places, but commit 7389aad6 calculates a dynamic size at
> construction time, while using MAXLISTEN for its output event buffer on
> the stack.  That seems like a reasonable thing to want to do, so
> consider this to be a pre-existing bug worth fixing.

> As reported by skink, valgrind and Tom Lane.
> 
> Discussion: https://postgr.es/m/901504.1673504836%40sss.pgh.pa.us

Makes sense. We should backpatch this, I think?

Greetings,

Andres Freund

Re: drop postmaster symlink

2023-01-12 Thread Joe Conway


On 1/12/23 12:00, Peter Eisentraut wrote:

On 23.11.22 21:32, Joe Conway wrote:

Yeah. Also, I don't think it's generally too hard to find the parent
process anyway, because at least on my system, the other ones end up
with ps display that looks like "postgres: logical replication
launcher" or whatever. The main process doesn't set the ps status
display, so that's the only one that shows a full path to the
executable in the ps status, which is how I usually spot it. That has
the advantage that it doesn't matter which name was used to launch it,
too.


I think it is a problem that one of the most widely used packagings of
PostgreSQL uses techniques that are directly contradicting the
PostgreSQL documentation and are also inconsistent with other widely
used packagings.  Users might learn this "trick" but then can't reuse it
elsewhere, and conversely those who come from other systems might not be
able to reuse their scripts.  That is annoying.

FWIW, the reason I took note of the postmaster symlink in the first 
place a few years ago was because selinux treats execution of programs 
from symlinks differently than from actual files.


This is another such case, where knowledge about selinux configuration
cannot be transported between Linux distributions.

I almost feel that issues like this make a stronger case for removing
the postmaster symlink than if it hadn't actually been in use, since the
removal would serve to unify the landscape for the benefit of users.


To be clear, I am completely in agreement with you about removing the 
symlink. I just wanted to be sure Devrim was alerted because I knew he 
had a strong opinion on this topic ;-)


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Named Operators

2023-01-12 Thread David G. Johnston

On Thu, Jan 12, 2023 at 10:14 AM Tom Lane  wrote:

> Isaac Morland  writes:
> > What about backticks (`)? They are allowed as operator characters but do
> > not otherwise appear in the lexical syntax as far as I can tell:
> > https://www.postgresql.org/docs/current/sql-syntax-lexical.html
>
> Since they're already allowed as operator characters, you can't
> use them for this purpose without breaking existing use-cases.
>
>
IIUC, specifically the fact that an operator is defined to start with one
of those symbols and end at the first non-symbol.  We can't change the
allowed set of non-symbols at this point, without defining something else
to denote the start of an operator.

David J.

Re: PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-12 Thread Nathan Bossart

On Thu, Jan 12, 2023 at 11:19:29PM +0800, Xing Guo wrote:
> I was running static analyser against PostgreSQL and found there're 2
> return statements in PL/Python module which is not safe. Patch is
> attached.

Is the problem that PG_exception_stack and error_context_stack aren't
properly reset?

> @@ -690,12 +690,12 @@ PLy_trigger_build_args(FunctionCallInfo fcinfo, 
> PLyProcedure *proc, HeapTuple *r
>   PyObject   *volatile pltdata = NULL;
>   char   *stroid;
>  
> + pltdata = PyDict_New();
> + if (!pltdata)
> + return NULL;
> +
>   PG_TRY();
>   {
> - pltdata = PyDict_New();
> - if (!pltdata)
> - return NULL;
> -
>   pltname = PLyUnicode_FromString(tdata->tg_trigger->tgname);
>   PyDict_SetItemString(pltdata, "name", pltname);
>   Py_DECREF(pltname);

There's another "return" later on in this PG_TRY block.  I wonder if it's
possible to detect this sort of thing at compile time.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Can we let extensions change their dumped catalog schemas?

2023-01-12 Thread Jacob Champion

On Wed, Jan 11, 2023 at 1:03 PM Tom Lane  wrote:
> Jacob Champion  writes:
> > Right, I think it would have to be opt-in. Say, a new control file
> > option dump_version or some such.
>
> That would require all the installed extensions to cope with this
> the same way, which does not seem like a great assumption.

How so? Most installed extensions would not opt into a version dump,
I'd imagine.

Or do you mean that the version dump would apply retroactively to
older versions of the extension, even if it wasn't needed in the past?

--Jacob

Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-01-12 Thread mahendrakar s

Hi All,

Changes added to Jacob's patch(v2) as per the discussion in the thread.

The changes allow the customer to send the OAUTH BEARER token through psql
connection string.

Example:
psql  -U u...@example.com -d 'dbname=postgres oauth_bearer_token=abc'

To configure OAUTH, the pg_hba.conf line look like:
local   all all oauth
 provider=oauth_provider issuer="https://example.com"; scope="openid email"

We also added hook to libpq to pass on the metadata about the issuer.

Thanks,
Mahendrakar.


On Sat, 17 Dec 2022 at 04:48, Jacob Champion 
wrote:
>
> On Mon, Dec 12, 2022 at 9:06 PM Andrey Chudnovsky
>  wrote:
> > If your concern is extension not honoring the DBA configured values:
> > Would a server-side logic to prefer HBA value over extension-provided
> > resolve this concern?
>
> Yeah. It also seals the role of the extension here as "optional".
>
> > We are definitely biased towards the cloud deployment scenarios, where
> > direct access to .hba files is usually not offered at all.
> > Let's find the middle ground here.
>
> Sure. I don't want to make this difficult in cloud scenarios --
> obviously I'd like for Timescale Cloud to be able to make use of this
> too. But if we make this easy for a lone DBA (who doesn't have any
> institutional power with the providers) to use correctly and securely,
> then it should follow that the providers who _do_ have power and
> resources will have an easy time of it as well. The reverse isn't
> necessarily true. So I'm definitely planning to focus on the DBA case
> first.
>
> > A separate reason for creating this pre-authentication hook is further
> > extensibility to support more metadata.
> > Specifically when we add support for OAUTH flows to libpq, server-side
> > extensions can help bridge the gap between the identity provider
> > implementation and OAUTH/OIDC specs.
> > For example, that could allow the Github extension to provide an OIDC
> > discovery document.
> >
> > I definitely see identity providers as institutional actors here which
> > can be given some power through the extension hooks to customize the
> > behavior within the framework.
>
> We'll probably have to make some compromises in this area, but I think
> they should be carefully considered exceptions and not a core feature
> of the mechanism. The gaps you point out are just fragmentation, and
> adding custom extensions to deal with it leads to further
> fragmentation instead of providing pressure on providers to just
> implement the specs. Worst case, we open up new exciting security
> flaws, and then no one can analyze them independently because no one
> other than the provider knows how the two sides work together anymore.
>
> Don't get me wrong; it would be naive to proceed as if the OAUTHBEARER
> spec were perfect, because it's clearly not. But if we need to make
> extensions to it, we can participate in IETF discussions and make our
> case publicly for review, rather than enshrining MS/GitHub/Google/etc.
> versions of the RFC and enabling that proliferation as a Postgres core
> feature.
>
> > Obtaining a token is an asynchronous process with a human in the loop.
> > Not sure if expecting a hook function to return a token synchronously
> > is the best option here.
> > Can that be an optional return value of the hook in cases when a token
> > can be obtained synchronously?
>
> I don't think the hook is generally going to be able to return a token
> synchronously, and I expect the final design to be async-first. As far
> as I know, this will need to be solved for the builtin flows as well
> (you don't want a synchronous HTTP call to block your PQconnectPoll
> architecture), so the hook should be able to make use of whatever
> solution we land on for that.
>
> This is hand-wavy, and I don't expect it to be easy to solve. I just
> don't think we have to solve it twice.
>
> Have a good end to the year!
> --Jacob


v3-0001-libpq-add-OAUTHBEARER-SASL-mechanism-and-call-back-hooks.patch.gz
Description: application/gzip


v3-0002-backend-add-OAUTHBEARER-SASL-mechanishm.patch.gz
Description: application/gzip


v3-0004-common-jsonapi-support-FRONTEND-clients.patch.gz
Description: application/gzip


v3-0003-simple-oauth_provider-extension.patch.gz
Description: application/gzip

Re: drop postmaster symlink

2023-01-12 Thread Devrim Gündüz

Hi,

On Thu, 2023-01-12 at 13:35 -0500, Joe Conway wrote:
> To be clear, I am completely in agreement with you about removing the
> symlink. I just wanted to be sure Devrim was alerted because I knew
> he  had a strong opinion on this topic ;-)

Red Hat's own packages, thus their users may be unhappy about that,
too. They also call postmaster directly.

Regsards,
-- 
Devrim Gündüz
Open Source Solution Architect, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-12 Thread Peter Geoghegan

On Thu, Jan 12, 2023 at 9:12 AM Robert Haas  wrote:
> I do agree that it's good to slowly increase the aggressiveness of
> VACUUM as we get further behind, rather than having big behavior
> changes all at once, but I think that should happen by smoothly
> varying various parameters rather than by making discrete behavior
> changes at a whole bunch of different times.

In general I tend to agree, but, as you go on to acknowledge yourself,
this particular behavior is inherently discrete. Either the
PROC_VACUUM_FOR_WRAPAROUND behavior is in effect, or it isn't.

In many important cases the only kind of autovacuum that ever runs
against a certain big table is antiwraparound autovacuum. And
therefore every autovacuum that runs against the table must
necessarily not be auto cancellable. These are the cases where we see
disastrous interactions with automated DDL, such as a TRUNCATE run by
a cron job (to stop those annoying antiwraparound autovacuums) -- a
heavyweight lock traffic jam that causes the application to lock up.

All that I really want to do here is give an autovacuum that *can* be
auto cancelled *some* non-zero chance to succeed with these kinds of
tables. TRUNCATE completes immediately, so the AEL is no big deal.
Except when it's blocked behind an antiwraparound autovacuum. That
kind of interaction is occasionally just disastrous. Even just the
tiniest bit of wiggle room could avoid it in most cases, possibly even
almost all cases.

> Maybe that's not the right idea, I don't know, and a naive
> implementation might be worse than nothing, but I think it has some
> chance of being worth consideration.

It's a question of priorities. The failsafe isn't supposed to be used
(when it is it is a kind of a failure), and so presumably only kicks
in on very rare occasions, where nobody was paying attention anyway.
So far I've heard no complaints about this, but I've heard lots of
complaints about the antiwrap autocancellation behavior.

> The behavior already changes when you hit
> vacuum_freeze_min_age and then again when you hit
> vacuum_freeze_table_age and then there's also
> autoovacuum_freeze_max_age and xidWarnLimit and xidStopLimit and a few
> others, and these setting all interact in pretty complex ways. The
> more conditional logic we add to that, the harder it becomes to
> understand what's actually happening.

In general I strongly agree. In fact that's a big part of what
motivates my ongoing work on VACUUM. The user experience is central.

As Andres pointed out, presenting antiwraparound autovacuums as kind
of an emergency thing but also somehow a routine thing is just
horribly confusing. I want to make them into an emergency thing in
every sense -- something that you as a user can reasonably expect to
never see (like the failsafe). But if you do see one, then that's a
useful signal of an underlying problem with contention, say from
automated DDL that pathologically cancels autovacuums again and again.

> Now, you might reply to the above by saying, well, some behaviors
> can't vary continuously. vacuum_cost_limit can perhaps be phased out
> gradually, but autocancellation seems like something that you must
> either do, or not do. I would agree with that. But what I'm saying is
> that we ought to favor having those kinds of behaviors all engage at
> the same point rather than at different times.

Right now aggressive VACUUMs do just about all freezing at the same
time, to the extent that many users seem to think that it's a totally
different thing with totally different responsibilities to regular
VACUUM. Doing everything at the same time like that causes huge
practical problems, and is very confusing.

I think that users will really appreciate having only one kind of
VACUUM/autovacuum (since the other patch gets rid of discrete
aggressive mode VACUUMs). I want "table age autovacuuming" (as I
propose to call it) come to be seen as not any different to any other
autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
autovacuum. The difference is only in how autovacuum.c triggers the
VACUUM, not in any runtime behavior. That's an important goal here.

> I did take a look at the post-mortem to which you linked, but I am not
> quite sure how that bears on the behavior change under discussion.

The post-mortem involved a single "DROP TRIGGER" that caused chaos
when it interacted with the auto cancellation behavior. It would
usually completely instantly, so the AEL wasn't actually disruptive,
but one day antiwraparound autovacuum made the cron job effectively
block all reads and writes for hours.

The similar outages I was called in to help with personally had either
an automated TRUNCATE or an automated CREATE INDEX. Had autovacuum
only been willing to yield once or twice, then it probably would have
been fine -- the situation probably would have worked itself out
naturally. That's the best outcome you can hope for.

--
Peter Geoghegan

Re: Transaction timeout

2023-01-12 Thread Nathan Bossart

On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote:
> I've rewritten this part to correctly report all timeouts that did
> happen. However there's now a tricky comma-formatting code which was
> tested only manually.

I suspect this will make translation difficult.

>> > > > + ahprintf(AH, "SET transaction_timeout = 0;\n");
>> > >
>> > > Hm - why is that the right thing to do?
>> > Because transaction_timeout has effects of statement_timeout.
>>
>> I guess it's just following precedent - but it seems a bit presumptuous
>> to just disable safety settings a DBA might have set up. That makes some
>> sense for e.g. idle_in_transaction_session_timeout, because I think
>> e.g. parallel backup can lead to a connection being idle for a bit.
> 
> I do not know. My reasoning - everywhere we turn off
> statement_timeout, we should turn off transaction_timeout too.
> But I have no strong opinion here. I left this code as is in the patch
> so far. For the same reason I did not change anything in
> pg_backup_archiver.c.

>From 8383486's commit message:

We disable statement_timeout and lock_timeout during dump and restore,
to prevent any global settings that might exist from breaking routine
backups.

I imagine changing this could disrupt existing servers that depend on these
overrides during backups, although I think Andres has a good point about
disabling safety settings.  This might be a good topic for another thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Exposing the lock manager's WaitForLockers() to SQL

2023-01-12 Thread Andres Freund

Hi,

On 2023-01-11 23:03:30 -0800, Will Mortensen wrote:
> On Wed, Jan 11, 2023 at 12:33 PM Andres Freund  wrote:
> > I think such a function would still have to integrate enough with the lock
> > manager infrastructure to participate in the deadlock detector. Otherwise I
> > think you'd trivially end up with loads of deadlocks.
>
> Could you elaborate on which unusual deadlock concerns arise? To be
> clear, WaitForLockers() is an existing function in lmgr.c
> (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
> and naively it seems like we mostly just need to call it.

I know that WaitForLockers() is an existing function :). I'm not sure it's
entirely suitable for your use case. So I mainly wanted to point out that if
you end up writing a separate version of it, you still need to integrate with
the deadlock detection. WaitForLockers() does that by actually acquiring a
lock on the "transaction" its waiting for.

> To my very limited understanding, from looking at the existing callers and
> the implementation of LOCK, that would look something like this (assuming
> we're in a SQL command like LOCK and calling unmodified WaitForLockers()
> with a single table):
>
> 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> to ensure the table is not dropped and obtain the table oid
>
> 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
>
> 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> VirtualXactLock(). These certainly take plenty of locks of various types,
> and will likely sleep in LockAcquire() waiting for transactions to finish,
> but there don't seem to be any unusual pre/postconditions, nor do we
> hold any unusual locks already.

I suspect that keeping the AccessShareLock while doing the WaitForLockers() is
likely to increase the deadlock risk noticeably.  I think for the use case you
might get away with resolving the relation names, building the locktags, and
then release the lock, before calling WaitForLockers. If somebody drops the
table or such, you'd presumably still get desired behaviour that way, without
the increased deaadlock risk.

Greetings,

Andres Freund

Re: Transaction timeout

2023-01-12 Thread Andrey Borodin

On Thu, Jan 12, 2023 at 11:24 AM Nathan Bossart
 wrote:
>
> On Sun, Dec 18, 2022 at 12:53:31PM -0800, Andrey Borodin wrote:
> > I've rewritten this part to correctly report all timeouts that did
> > happen. However there's now a tricky comma-formatting code which was
> > tested only manually.
>
> I suspect this will make translation difficult.
I use special functions for this like _()

char* lock_reason = lock_timeout_occurred ? _("lock timeout") : "";

and then
ereport(ERROR, (errcode(err_code),
 errmsg("canceling statement due to %s%s%s%s%s", lock_reason, comma1,
 stmt_reason, comma2, tx_reason)));

I hope it will be translatable...

> >> > > > + ahprintf(AH, "SET transaction_timeout = 0;\n");
> >> > >
> >> > > Hm - why is that the right thing to do?
> >> > Because transaction_timeout has effects of statement_timeout.
> >>
> >> I guess it's just following precedent - but it seems a bit presumptuous
> >> to just disable safety settings a DBA might have set up. That makes some
> >> sense for e.g. idle_in_transaction_session_timeout, because I think
> >> e.g. parallel backup can lead to a connection being idle for a bit.
> >
> > I do not know. My reasoning - everywhere we turn off
> > statement_timeout, we should turn off transaction_timeout too.
> > But I have no strong opinion here. I left this code as is in the patch
> > so far. For the same reason I did not change anything in
> > pg_backup_archiver.c.
>
> From 8383486's commit message:
>
> We disable statement_timeout and lock_timeout during dump and restore,
> to prevent any global settings that might exist from breaking routine
> backups.
>
> I imagine changing this could disrupt existing servers that depend on these
> overrides during backups, although I think Andres has a good point about
> disabling safety settings.  This might be a good topic for another thread.
>
+1.

Thanks for the review!

Best regards, Andrey Borodin.

Re: GUC for temporarily disabling event triggers

2023-01-12 Thread Daniel Gustafsson

> On 11 Jan 2023, at 17:38, vignesh C  wrote:
> 
> On Tue, 29 Nov 2022 at 18:16, Daniel Gustafsson  wrote:
>> 
>>> On 3 Nov 2022, at 21:47, Daniel Gustafsson  wrote:
>> 
>>> The patch adds a new GUC, ignore_event_trigger with two option values, 'all'
>>> and 'none' (the login event patch had 'login' as well).
>> 
>> The attached v2 fixes a small bug which caused testfailures the CFBot.
> 
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:

The attached rebased v3 fixes the conflict.

--
Daniel Gustafsson   https://vmware.com/



v3-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data

errdetail vs errdetail_log?

2023-01-12 Thread Christophe Pettus

What's the distinction between errdetail and errdetail_log in the ereport 
interface?

Re: errdetail vs errdetail_log?

2023-01-12 Thread Andres Freund

On 2023-01-12 12:28:39 -0800, Christophe Pettus wrote:
> What's the distinction between errdetail and errdetail_log in the ereport 
> interface?

Only goes to the server log, not to the client.

Re: GUC for temporarily disabling event triggers

2023-01-12 Thread Ted Yu

On Thu, Jan 12, 2023 at 12:26 PM Daniel Gustafsson  wrote:

> > On 11 Jan 2023, at 17:38, vignesh C  wrote:
> >
> > On Tue, 29 Nov 2022 at 18:16, Daniel Gustafsson  wrote:
> >>
> >>> On 3 Nov 2022, at 21:47, Daniel Gustafsson  wrote:
> >>
> >>> The patch adds a new GUC, ignore_event_trigger with two option values,
> 'all'
> >>> and 'none' (the login event patch had 'login' as well).
> >>
> >> The attached v2 fixes a small bug which caused testfailures the CFBot.
> >
> > The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
> The attached rebased v3 fixes the conflict.
>
> --
> Daniel Gustafsson   https://vmware.com/
>
> Hi,

`this GUC allows to temporarily suspending event triggers.`

It would be better to mention the name of GUC in the description.
Typo: suspending -> suspend

w.r.t. guc `ignore_event_trigger`, since it is supposed to disable event
triggers for a short period of time, is there mechanism to turn it off
(IGNORE_EVENT_TRIGGER_ALL) automatically ?

Cheers

Re: errdetail vs errdetail_log?

2023-01-12 Thread Christophe Pettus




> On Jan 12, 2023, at 12:35, Andres Freund  wrote:
> 
> On 2023-01-12 12:28:39 -0800, Christophe Pettus wrote:
>> What's the distinction between errdetail and errdetail_log in the ereport 
>> interface?
> 
> Only goes to the server log, not to the client.

Thanks!

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-12 Thread Robert Haas

On Thu, Jan 12, 2023 at 2:22 PM Peter Geoghegan  wrote:
> All that I really want to do here is give an autovacuum that *can* be
> auto cancelled *some* non-zero chance to succeed with these kinds of
> tables. TRUNCATE completes immediately, so the AEL is no big deal.
> Except when it's blocked behind an antiwraparound autovacuum. That
> kind of interaction is occasionally just disastrous. Even just the
> tiniest bit of wiggle room could avoid it in most cases, possibly even
> almost all cases.

I doubt it. Wiggle room that's based on the XID threshold being
different for one behavior vs. another can easily fail to produce any
benefit, because there's no guarantee that the autovacuum launcher
will ever try to launch a worker against that table while the XID is
in the range where you'd get one behavior and not the other.  I've
long thought that the fact that vacuum_freeze_table_age is documented
as capped at 0.95 * autovacuum_freeze_max_age is silly for just this
reason. The interval that you're proposing is much wider so the
chances of getting a benefit are greater, but supposing that it's
going to solve it in most cases seems like an exercise in unwarranted
optimism.

In fact, I would guess that in fact it will very rarely solve the
problem. Normally, the XID age of a table never reaches
autovacuum_freeze_max_age in the first place. If it does, there's some
reason. Maybe there's a really old open transaction or an abandon
replication slot or an unresolved 2PC transaction. Maybe the
autovacuum system is overloaded and no table is getting visited
regularly because the system just can't keep up. Or maybe there are
regular AELs being taken on the table at issue. If there's only an AEL
taken against a table once in blue moon, some autovacuum attempt ought
to succeed before we reach autovacuum_freeze_max_age. Flipping that
around, if we reach autovacuum_freeze_max_age without advancing
relfrozenxid, and an AEL shows up behind us in the lock queue, it's
really likely that the reason *why* we've reached
autovacuum_freeze_max_age is that this same thing has happened to
every previous autovacuum attempt and they all cancelled themselves.
If we cancel ourselves too, we're just postponing resolution of the
problem to some future point when we decide to stop cancelling
ourselves. That's not a win.

> I think that users will really appreciate having only one kind of
> VACUUM/autovacuum (since the other patch gets rid of discrete
> aggressive mode VACUUMs). I want "table age autovacuuming" (as I
> propose to call it) come to be seen as not any different to any other
> autovacuum, such as an "insert tuples" autovacuum or a "dead tuples"
> autovacuum. The difference is only in how autovacuum.c triggers the
> VACUUM, not in any runtime behavior. That's an important goal here.

I don't agree with that goal. I think that having different kinds of
autovacuums with different, identifiable names and corresponding,
easily-identifiable behaviors is really important for troubleshooting.
Trying to remove those distinctions and make everything look the same
will not keep autovacuum from getting itself into trouble. It will
just make it harder to understand what's happening when it does.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Using WaitEventSet in the postmaster

2023-01-12 Thread Thomas Munro

On Fri, Jan 13, 2023 at 7:26 AM Andres Freund  wrote:
> On 2023-01-12 20:35:43 +1300, Thomas Munro wrote:
> > Subject: [PATCH] Fix WaitEventSetWait() buffer overrun.

> Makes sense. We should backpatch this, I think?

Done.

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2023-01-12 Thread Peter Geoghegan

On Thu, Jan 12, 2023 at 1:08 PM Robert Haas  wrote:
> I doubt it. Wiggle room that's based on the XID threshold being
> different for one behavior vs. another can easily fail to produce any
> benefit, because there's no guarantee that the autovacuum launcher
> will ever try to launch a worker against that table while the XID is
> in the range where you'd get one behavior and not the other.

Of course it's true that in general it might not succeed in
forestalling the auto cancellation behavior. You can say something
similar about approximately anything like this. For example, there is
no absolute guarantee that any autovacuum will ever complete. But we
still try!

> I've long thought that the fact that vacuum_freeze_table_age is documented
> as capped at 0.95 * autovacuum_freeze_max_age is silly for just this
> reason. The interval that you're proposing is much wider so the
> chances of getting a benefit are greater, but supposing that it's
> going to solve it in most cases seems like an exercise in unwarranted
> optimism.

I don't claim to be dealing in certainties, especially about the final
outcome. Whether or not you accept my precise claim is perhaps not
important, in the end. What is important is that we give things a
chance to succeed, based on the information that we have available,
with a constant eye towards avoiding disaster scenarios.

Some of the problems with VACUUM seem to be cases where VACUUM takes
on a potentially ruinous obligation, that it cannot possibly meet in
some rare cases that do come up sometimes -- like the cleanup lock
behavior. Is a check for $1000 written by me really worth less than a
check written by me for a billion dollars? They're both nominally
equivalent guarantees about an outcome, after all, though one has a
far greater monetary value. Which would you value more, subjectively?

Nothing is guaranteed -- even (and perhaps especially) strong guarantees.

> In fact, I would guess that in fact it will very rarely solve the
> problem. Normally, the XID age of a table never reaches
> autovacuum_freeze_max_age in the first place. If it does, there's some
> reason.

Probably, but none of this matters at all if the table age never
reaches autovacuum_freeze_max_age in the first place. We're only
talking about tables where that isn't the case, by definition.
Everything else is out of scope here.

> Maybe there's a really old open transaction or an abandon
> replication slot or an unresolved 2PC transaction. Maybe the
> autovacuum system is overloaded and no table is getting visited
> regularly because the system just can't keep up. Or maybe there are
> regular AELs being taken on the table at issue.

Maybe an asteroid hits the datacenter, making all of these
considerations irrelevant. But perhaps it won't!

> If there's only an AEL
> taken against a table once in blue moon, some autovacuum attempt ought
> to succeed before we reach autovacuum_freeze_max_age. Flipping that
> around, if we reach autovacuum_freeze_max_age without advancing
> relfrozenxid, and an AEL shows up behind us in the lock queue, it's
> really likely that the reason *why* we've reached
> autovacuum_freeze_max_age is that this same thing has happened to
> every previous autovacuum attempt and they all cancelled themselves.

Why do you assume that a previous autovacuum ever got launched in the
first place? There is always going to be a certain kind of table that
can only get an autovacuum when its table age crosses
autovacuum_freeze_max_age. And it's not just static tables -- there is
very good reason to have doubts about the statistics that drive
autovacuum. Plus vacuum_freeze_table_age works very unreliably (which
is why my big VACUUM patch more or less relegates it to a
compatibility option, while retaining a more sophisticated notion of
table age creating pressure to advance relfrozenxid).

Under the scheme from this autovacuum patch, it really does become
reasonable to make a working assumption that there was a previous
autovacuum, that failed (likely due to the autocancellation behavior,
as you said). We must have tried and failed in an earlier autovacuum,
once we reach the point of needing an antiwraparound autovacuum
(meaning a table age autovacuum which cannot be autocancelled) --
which is not the case today at all. If nothing else, table age
autovacuums will have been scheduled much earlier on -- they will have
at least started up, barring pathological cases.

That's a huge difference in the strength of the signal, compared to today.

The super aggressive autocancellation behavior is actually
proportionate to the problem at hand. Kind of like how if you go to
the doctor and tell them you have a headache, they don't schedule you
for emergency brain surgery. What they do is tell you to take an
aspirin, and make sure that you stay well hydrated -- if the problem
doesn't go away after a few days, then call back, reassess. Perhaps it
really will be a brain tumor, but there is nothing to gain and
everythin

Re: Patch: Global Unique Index

2023-01-12 Thread Cary Huang


On 2022-11-29 6:16 p.m., Tom Lane wrote:

Assuming that you are inserting into index X, and you've checked
index Y to find that it has no conflicts, what prevents another
backend from inserting a conflict into index Y just after you look?
AIUI the idea is to prevent that by continuing to hold an exclusive
lock on the whole index Y until you've completed the insertion.
Perhaps there's a better way to do that, but it's not what was
described.


During inserts, global unique index patch does not acquire exclusive 
lock on the whole index Y while checking it for the uniqueness; it 
acquires a low level AccessShareLock on Y and will release after 
checking. So while it is checking, another backend can still insert a 
duplicate in index Y. If this is the case, a "transaction level lock" 
will be triggered.


For example.

Say backend A inserts into index X, and checks index Y to find no 
conflict, and backend B inserts a conflict into index Y right after. In 
this case, backend B still has to check index X for conflict and It will 
fetch a duplicate tuple that has been inserted by A, but it cannot 
declare a duplicate error yet. This is because the transaction inserting 
this conflict tuple started by backend A is still in progress. At this 
moment, backend B has to wait for backend A to commit / abort before it 
can continue. This is how "transaction level lock" prevents concurrent 
insert conflicts.


There is a chance of deadlock if the conflicting insertions done by A 
and B happen at roughly the same time, where both backends trigger 
"transaction level lock" to wait for each other to commit/abort. If this 
is the case, PG's deadlock detection code will error out one of the 
backends.  It should be okay because it means one of the backends tries 
to insert a conflict. The purpose of global unique index is also to 
error out backends trying to insert duplicates. In the end the effects 
are the same, it's just that the error says deadlock detected instead of 
duplicate detected.


If backend B did not insert a conflicting tuple, no transaction lock 
wait will be triggered, and therefore no deadlock will happen.


Regards
Cary Huang
---
HighGo Software Canada

Experimenting with Postmaster variable scope

2023-01-12 Thread Thomas Munro

Hi,

While working on the postmaster latch stuff, one of the things I
looked into, but de-scoped for now, is how the postmaster code would
look if it didn't use global variables to track its sockets, children
and state (ie now that it's no longer necessary for technical
reasons).  Here's the quick experimental patch I came up with that
lifts most of the global variables out of postmaster.c and puts them
into a struct Postmaster, which is allocated in the postmaster and
freed in forked children.  Then 'pm' gets passed around to postmaster
subroutines and all references to X are replaced with pm->X (so
pm->ListenSockets, pm->WalWriterPid, pm->WalReceiverRequested, etc).

Unfortunately bgworker.c isn't quite so easy to refactor along these
lines, because its list of background workers, which you might think
should be in Postmaster private memory in the Postmaster struct much
like pm->BackendList, also needs to be accessible globally for
extensions to be able to register them in their init hook.  Perhaps
there should be separate 'running' and 'registered' worker lists.
That stopped me in my tracks (decisions are so much harder than
mechanical changes...), but I thought I'd share this concept patch
anyway...  This is not a proposal for 16, more of a sketch to see what
people's appetite is for global variable removal projects, which
(IMHO) increase clarity about module boundaries, but I guess also have
back-patching and code churn costs.
From acc6b4a8961b93c7f07a09181a4369c01a2685a8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 27 Dec 2022 03:46:56 +1300
Subject: [PATCH] Move postmaster's state out of global variables.

Historically, the postmaster's state for tracking its children had to
live in global variables, because it had to be accessible from signal
handlers.

After commit 7389aad6, we can shift most of it into a struct that is
allocated on the heap, and freed in child processes.

This is a mechanical change: variable names remain the same, they're
just moved into a Postmaster struct and accessed as pm->X rather than X,
with pm passed down to relevant subroutines.

XXX This is not complete.  There is more postmaster state in bgworker.c.
The problem is that BackgroundWorkerList probably belongs in Postmaster
struct, as far as tracking running bgworkers goes, but it's also
accessed by RegisterBackgroundWorker(), called by extensions, to
register bgworkers that should be running.

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f5ea381c53..c983f4aef2 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -304,8 +304,8 @@ static WorkerInfo MyWorkerInfo = NULL;
 int			AutovacuumLauncherPid = 0;
 
 #ifdef EXEC_BACKEND
-static pid_t avlauncher_forkexec(void);
-static pid_t avworker_forkexec(void);
+static pid_t avlauncher_forkexec(Postmaster *pm);
+static pid_t avworker_forkexec(Postmaster *pm);
 #endif
 NON_EXEC_STATIC void AutoVacWorkerMain(int argc, char *argv[]) pg_attribute_noreturn();
 NON_EXEC_STATIC void AutoVacLauncherMain(int argc, char *argv[]) pg_attribute_noreturn();
@@ -360,7 +360,7 @@ static void avl_sigusr2_handler(SIGNAL_ARGS);
  * Format up the arglist, then fork and exec.
  */
 static pid_t
-avlauncher_forkexec(void)
+avlauncher_forkexec(Postmaster *pm)
 {
 	char	   *av[10];
 	int			ac = 0;
@@ -372,7 +372,7 @@ avlauncher_forkexec(void)
 
 	Assert(ac < lengthof(av));
 
-	return postmaster_forkexec(ac, av);
+	return postmaster_forkexec(pm, ac, av);
 }
 
 /*
@@ -390,12 +390,12 @@ AutovacuumLauncherIAm(void)
  * postmaster.
  */
 int
-StartAutoVacLauncher(void)
+StartAutoVacLauncher(Postmaster *pm)
 {
 	pid_t		AutoVacPID;
 
 #ifdef EXEC_BACKEND
-	switch ((AutoVacPID = avlauncher_forkexec()))
+	switch ((AutoVacPID = avlauncher_forkexec(pm)))
 #else
 	switch ((AutoVacPID = fork_process()))
 #endif
@@ -411,7 +411,7 @@ StartAutoVacLauncher(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmasterPorts(false);
+			ClosePostmasterPorts(pm, false);
 
 			AutoVacLauncherMain(0, NULL);
 			break;
@@ -1437,7 +1437,7 @@ avl_sigusr2_handler(SIGNAL_ARGS)
  * Format up the arglist, then fork and exec.
  */
 static pid_t
-avworker_forkexec(void)
+avworker_forkexec(Postmaster *pm)
 {
 	char	   *av[10];
 	int			ac = 0;
@@ -1449,7 +1449,7 @@ avworker_forkexec(void)
 
 	Assert(ac < lengthof(av));
 
-	return postmaster_forkexec(ac, av);
+	return postmaster_forkexec(pm, ac, av);
 }
 
 /*
@@ -1468,12 +1468,12 @@ AutovacuumWorkerIAm(void)
  * This code is heavily based on pgarch.c, q.v.
  */
 int
-StartAutoVacWorker(void)
+StartAutoVacWorker(struct Postmaster *pm)
 {
 	pid_t		worker_pid;
 
 #ifdef EXEC_BACKEND
-	switch ((worker_pid = avworker_forkexec()))
+	switch ((worker_pid = avworker_forkexec(pm)))
 #else
 	switch ((worker_pid = fork_process()))
 #endif
@@ -1489,7 +1489,7 @@ StartAutoVacWorker(void)
 			InitPostmasterChild();
 
 			/* Close the postmaster's sockets */
-			ClosePostmast

Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-12 Thread Jacob Champion

On Wed, Sep 14, 2022 at 7:54 AM Maxim Orlov  wrote:
> For the patch itself, I think it is better to use a more precise time 
> function in libpq_prng_init or call it only once.
> Thought it is a special corner case, imagine all the connection attempts at 
> first second will be seeded with the save
> value, i.e. will attempt to connect to the same host. I think, this is not we 
> want to achieve.

Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simpler -- with the tangible benefit
that it would eliminate weird behavior on simultaneous connections
when the client isn't using OpenSSL. (I'm guessing a simple lock on a
global PRNG would be less overhead than the per-connection
strong_random machinery, too, but I have no data to back that up.) The
test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection
string.

Overall, I like the concept.

--Jacob

Re: document the need to analyze partitioned tables

2023-01-12 Thread Nathan Bossart

On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
> On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
>> I've pushed the last version, and backpatched it to 10 (not sure I'd
>> call it a bugfix, but I certainly agree with Justin it's worth
>> mentioning in the docs, even on older branches).
> 
> I'd like to suggest an improvement to this.  The current wording could
> be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage.  But I see what you mean.  Readers
might misinterpret this to mean that autovacuum will not process the
partitions.  There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions."  So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables.  Instead, autovacuum
processes the individual partitions that are regular tables.  This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables.  Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

[0] 
https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Beautify pg_walinspect docs a bit

2023-01-12 Thread Michael Paquier

On Thu, Jan 12, 2023 at 05:29:39PM +0530, Bharath Rupireddy wrote:
> As discussed [1], here's a patch to beautify pg_walinspect docs
> similar to pageinspect docs. The existing pg_walinspect docs calls out
> the column names explicitly and then also shows them in the function
> execution examples which is duplicate and too informative. Also \x
> isn't used so some of the execution outputs are out of indentation.
> 
> Thoughts?

Thanks, this looked basically fine, so applied.  I have tweaked a few
sentences while reviewing the docs, while on it.  I have decided to
remove the example where we specify per_record=true for
pg_get_wal_stats(), as it does not bring much value while bloating the
whole, and the parameter is clearly documented.
--
Michael


signature.asc
Description: PGP signature

Re: mprove tab completion for ALTER EXTENSION ADD/DROP

2023-01-12 Thread vignesh C

On Thu, 12 Jan 2023 at 05:22, Michael Paquier  wrote:
>
> On Wed, Jan 11, 2023 at 10:29:25PM +0530, vignesh C wrote:
> > I too felt keeping it simpler is better. How about using the simple
> > first version of patch itself?
>
> Okay, I have just done that, then, after checking that all the object
> types were covered (28).  Note that PROCEDURAL LANGUAGE has been
> removed for simplicity.

Thanks for pushing this.

Regards,
Vignesh

PG_SETMASK() archeology

2023-01-12 Thread Thomas Munro

Hi,

This is a follow-up for commit c94ae9d8.  It's in the spirit of other
recent changes to remove noise from ancient pre-standard systems.

The reason we introduced PG_SETMASK() in the first place was to
support one particular system that was very slow to adopt the POSIX
signals stuff:  NeXTSTEP 3.x.

>From some time in the dark age before our current repo begins until
'97 we used sigprocmask() freely.  Then commit a5494a2d added a
sigsetmask() fallback for NeXTSTEP (that's a pre-standard function
inherited from '80s BSD).  In 1999 we added the PG_SETMASK() macro to
avoid repeating #ifdef HAVE_SIGPROCMASK to select between them at each
call site (commit 47937403676).  I have no personal knowledge of those
systems; I wonder if it was already effectively quite defunct while we
were adding the macro, but I dunno (NS 4.x never shipped?, but its
living descendent OSX had already shipped that year).

Then we invented a bogus reason to need the macro for a couple more
decades: our Windows simulated signal layer accidentally implemented
the old BSD interface instead of the standard one, as complained about
in commit a65e0864.

That's all ancient history now, and I think we might as well drop the
macro to make our source a tiny bit less weird for new players, with a
slightly richer interface.  Trivial patch attached.
From 1233157f7e7e0ea8c36bc4eaaf3ed2c89f3541df Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 9 Jan 2023 12:10:11 +1300
Subject: [PATCH] Retire PG_SETMASK() macro.

In the 90s we needed to deal with computers that still had the
pre-standard signal masking APIs.  That's hasn't been necessary for a
very long time on Unix systems (and since c94ae9d8 on Windows system).
The wrapper doesn't expose all the functionality, it's not part of the
API that extensions are expected to be using (or if they are, the change
will be trivial) and it's generally better to use standardised
interfaces.  So, drop the macro and open-code its expansion.
---
 src/backend/access/transam/xact.c |  4 ++--
 src/backend/postmaster/autovacuum.c   |  4 ++--
 src/backend/postmaster/bgworker.c |  2 +-
 src/backend/postmaster/bgwriter.c |  2 +-
 src/backend/postmaster/checkpointer.c |  2 +-
 src/backend/postmaster/pgarch.c   |  2 +-
 src/backend/postmaster/postmaster.c   | 12 ++--
 src/backend/postmaster/startup.c  |  2 +-
 src/backend/postmaster/syslogger.c|  2 +-
 src/backend/postmaster/walwriter.c|  2 +-
 src/backend/replication/walreceiver.c |  2 +-
 src/backend/tcop/postgres.c   |  4 ++--
 src/backend/utils/init/miscinit.c |  4 ++--
 src/include/libpq/pqsignal.h  |  2 --
 14 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d85e313908..b876401260 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2755,7 +2755,7 @@ AbortTransaction(void)
 	 * handler.  We do this fairly early in the sequence so that the timeout
 	 * infrastructure will be functional if needed while aborting.
 	 */
-	PG_SETMASK(&UnBlockSig);
+	sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
 
 	/*
 	 * check the current transaction state
@@ -5115,7 +5115,7 @@ AbortSubTransaction(void)
 	 * handler.  We do this fairly early in the sequence so that the timeout
 	 * infrastructure will be functional if needed while aborting.
 	 */
-	PG_SETMASK(&UnBlockSig);
+	sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
 
 	/*
 	 * check the current transaction state
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f5ea381c53..ff6149a179 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -568,7 +568,7 @@ AutoVacLauncherMain(int argc, char *argv[])
 	PG_exception_stack = &local_sigjmp_buf;
 
 	/* must unblock signals before calling rebuild_database_list */
-	PG_SETMASK(&UnBlockSig);
+	sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
 
 	/*
 	 * Set always-secure search path.  Launcher doesn't connect to a database,
@@ -1589,7 +1589,7 @@ AutoVacWorkerMain(int argc, char *argv[])
 	/* We can now handle ereport(ERROR) */
 	PG_exception_stack = &local_sigjmp_buf;
 
-	PG_SETMASK(&UnBlockSig);
+	sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
 
 	/*
 	 * Set always-secure search path, so malicious users can't redirect user
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index e7a4a7136a..0dd22b2351 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -726,7 +726,7 @@ SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
 static void
 bgworker_die(SIGNAL_ARGS)
 {
-	PG_SETMASK(&BlockSig);
+	sigprocmask(SIG_SETMASK, &BlockSig, NULL);
 
 	ereport(FATAL,
 			(errcode(ERRCODE_ADMIN_SHUTDOWN),
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 69667f0eb4..9bb47da404 100644
--- a/src/backend/postmaster/bgwriter.c
++

Re: pgsql: Add new GUC createrole_self_grant.

2023-01-12 Thread David G. Johnston

On Thu, Jan 12, 2023 at 8:11 AM Robert Haas  wrote:

> On Wed, Jan 11, 2023 at 7:53 PM David G. Johnston
>  wrote:
> > Justed wanted to chime in and say Robert has eloquently put into words
> much of what I have been thinking here, and that I concur that guiding the
> DBA to use care with the power they have been provided is a sane position
> to take.
> >
> > +1, and thank you.
>
> Thanks!
>
> Here's a patch. In it I make three changes, only one of which is
> directly relevant to the topic at hand:
>
> 1. Add a sentence to the documentation on writing SECURITY FUNCTIONS
> safely concerning createrole_self_grant.
> 2. Add a sentence to the documentation on SECURITY DEFINER referring
> to the section about writing such functions safely.
> 3. Remove a note discussing the fact that pre-8.3 versions did not
> have SET clauses for functions.
>
> I can separate this into multiple patches if desired. And of course
> you, Tom, or others may have suggestions on which of these changes
> should be included at all or how to word them better.
>
>
I'm still not really feeling how security definer is the problematic option
here.  Security invoker scares me much more.

postgres=# create role unpriv login;
CREATE ROLE
postgres=# create role creator createrole login;
CREATE ROLE
postgres=# grant pg_read_all_data to creator with admin option;
postgres=#
create procedure makeunprivpowerful() LANGUAGE sql AS $$
grant pg_read_all_data to unpriv;
$$;
CREATE PROCEDURE
postgres=# alter procedure makeunprivpowerful() owner to unpriv;
ALTER PROCEDURE

postgres=# \c postgres unpriv
You are now connected to database "postgres" as user "unpriv".
postgres=> call makeunprivpowerful();
ERROR:  must have admin option on role "pg_read_all_data"
CONTEXT:  SQL function "makeunprivpowerful" statement 1

postgres=# \c postgres creator
You are now connected to database "postgres" as user "creator".
postgres=> call makeunprivpowerful();
CALL

Personally I think the best advice for a CREATEROLE granted user is to
never call routines they themselves don't own; or at least only those have
reviewed thoroughly and understand the consequences of.  Regardless of
security definer/invoker.

In short, a low-privilege user can basically run anything without much fear
because they can't do harm anyway.  Thus security invoker applies to them,
and having security definer gives them the ability to do some things
without needing to have permanent higher privileges.  These are useful,
security related attributes with respect to them.

For a high-privilege I would argue that neither security invoker nor
definer are relevant considerations from a security point-of-view.  If you
have high-privilege you must know what it is you are executing before you
execute it and anything bad it causes you to do using your privileges, or
higher if you run a security definer function blindly, is an administrative
failure, not a security hole.

I think it would be good to move the goalposts here a bit with respect to
encouraging safe behavior.  But I also don't really think that it is fair
to make this a prerequisite for the feature.

If we cannot write a decent why sentence for the proposed paragraph I say
we don't commit it (the cross-ref should go in):

If the security definer function intends to create roles, and if it
is running as a non-superuser, createrole_self_grant
should also be set to a known value using the SET
clause.

This is a convenience feature that a CREATEROLE user can leverage if they
so choose.  Anything bad coming of it is going to be strictly less worse
than whatever can happen just because the CREATEROLE user is being
careless. Whomever gets the admin privilege grant from the superuser when
the role is created may or may not have two other self-granted memberships
on the newly created role.  Do the two optional grants really mean anything
important here compared to the newly created object and superuser-granted
admin privilege (which means that regardless of the GUC the same end state
can eventually be reached anyway)?

David J.

Re: PL/Python: Fix return in the middle of PG_TRY() block.

2023-01-12 Thread Michael Paquier

On Thu, Jan 12, 2023 at 10:44:33AM -0800, Nathan Bossart wrote:
> There's another "return" later on in this PG_TRY block.  I wonder if it's
> possible to detect this sort of thing at compile time.

Note also:
src/pl/tcl/pltcl.c- *   PG_CATCH();
src/pl/tcl/pltcl.c- *   {
src/pl/tcl/pltcl.c- *   pltcl_subtrans_abort(interp, oldcontext, oldowner);
src/pl/tcl/pltcl.c- *   return TCL_ERROR;
src/pl/tcl/pltcl.c- *   }

This is documented once, repeated twice:
src/pl/plpython/plpy_spi.c- *   PG_CATCH();
src/pl/plpython/plpy_spi.c- *   {
src/pl/plpython/plpy_spi.c- *   
src/pl/plpython/plpy_spi.c- *   PLy_spi_subtransaction_abort(oldcontext, 
oldowner);
src/pl/plpython/plpy_spi.c- *   return NULL;
src/pl/plpython/plpy_spi.c- *   }

I don't quite get why this would be a sane thing to do here when
aborting a subtransaction in pl/python, but my experience with this
code is limited.
--
Michael


signature.asc
Description: PGP signature

Re: Missed condition-variable wakeups on FreeBSD

2023-01-12 Thread Thomas Munro

On Sun, Feb 27, 2022 at 8:07 AM Tom Lane  wrote:
> I have observed this three times in the REL_11 branch, once
> in REL_12, and a couple of times last summer before it occurred
> to me to start keeping notes.  Over that time the machine has
> been running various patchlevels of FreeBSD 13.0.

FTR I noticed that my animal elver hadn't reported for a while and
logged and found it locked up, a bit like that, but in test_shm_mq
rather than PHJ.  It's FreeBSD 13.1 on a fairly fast 16 way amd64 (in
a jail), and it was testing REL_12_STABLE, so using poll() for
WaitEventSetWaitBlock().  When I connected a debugger then
disconnected, it was unblocked and continued.  Unfortunately I didn't
get any new information :-(

Re: [EXTERNAL] Re: [PATCH] Support using "all" for the db user in pg_ident.conf

2023-01-12 Thread Michael Paquier

On Thu, Jan 12, 2023 at 10:10:02AM +0100, Jelte Fennema wrote:
> It also makes the code simpler as we can simply reuse the
> check_role function, since that. I removed the lines you quoted
> since those are actually not strictly necessary. They only change
> the detection logic a bit in case of a \1 existing in the string.
> And I'm not sure what the desired behaviour is for those.

Hmm.  This is a very good point.  0004 gets really easy to follow
now.

> I split up that patch in two parts now and added the tests in a new 0001
> patch.

Thanks, applied 0001.

> 0002 should change no behaviour, since it simply stores the token in
> the IdentLine struct, but doesn't start using the quoted or the regex field
> yet. 0003 is debatable indeed. To me it makes sense conceptually, but
> having a literal \1 in a username seems like an unlikely scenario and
> there might be pg_ident.conf files in existence where the \1 is quoted
> that would break because of this change. I haven't removed 0003 from
> the patch set yet, but I kinda feel that the advantage is probably not
> worth the risk of breakage.

0003 would allow folks to use \1 in a Postgres username if quoted.  My
choice would be to agree with you here.  Even if folks applying quotes
would not be able anymore to replace the pattern, the risk seems a bit
remote?  I would suspect that basically everybody does not rely on
'\1' being in the middle of pg-username string, using it only as a
strict replacement of the result coming from system-username to keep a
simpler mapping between the PG roles and the krb5/gss system roles.
Even if they use a more complex schema that depends on strstr(),
things would break if they began the pg-username with quotes.  Put it
simply, I'd agree with your 0003.

> 0004 adds some breakage too. But there I think the advantages far outweigh
> the risk of breakage. Both because added functionality is a much bigger
> advantage, and because we only risk breaking when there exist users that
> are called "all", start with a literal + or start with a literal /.
> Only "all" seems
> like a somewhat reasonable username, but such a user existing seems
> unlikely to me given all its special meaning in pg_hba.conf. (I added this
> consideration to the commit message)

I don't see how much that's different from the recent discussion with
regexps added for databases and users to pg_hba.conf.  And consistency
sounds pretty good to me here.

> Finally, one separate thing I noticed is that regcomp_auth_token only
> checks the / prefix, but doesn't check if the token was quoted or not.
> So even if it's quoted it will be interpreted as a regex. Maybe we should
> change that? At least for the regex parsing that is not released yet.

regcomp_auth_token() should not decide to compile a regexp depending
on if an AuthToken is quoted or not.  Regexps can have commas, and
this would impact the case of database or role lists in HBA entries.
And that could be an issue with spaces as well?  See the docs for
patterns like:
db1,"/^db\d{2,4}$",db2

Point taken that we don't care about lists for pg_ident entries,
though.

> You didn't write a response about this, but you did quote it. Did you intend
> to respond to it?

Nah, I should have deleted it.  I had no useful opinion on this
particular point.
--
Michael

signature.asc
Description: PGP signature

1 2 >

1 - 100 of 127 matches

Mail list logo