date:20220719

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila  wrote:
>
> On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada  wrote:
> >
> > On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila  wrote:
> > >
> > > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > > yu, could you please test it again with this patch?
> > > >
> > >
> > > Can you explain the cause of the failure and your fix for the same?
> >
> > @@ -1694,6 +1788,8 @@ out:
> > /* be tidy */
> > if (ondisk)
> > pfree(ondisk);
> > +   if (catchange_xip)
> > +   pfree(catchange_xip);
> >
> > Regarding the above code in the previous version patch, looking at the
> > generated assembler code shared by Shi yu offlist, I realized that the
> > “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> > because we dereference catchange_xip before null-pointer check as
> > follow:
> >
> > +   /* copy catalog modifying xacts */
> > +   sz = sizeof(TransactionId) * catchange_xcnt;
> > +   memcpy(ondisk_c, catchange_xip, sz);
> > +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> > +   ondisk_c += sz;
> >
> > Since sz is 0 in this case, memcpy doesn’t do anything actually.
> >
> > By checking the assembler code, I’ve confirmed that gcc does the
> > optimization for these code and setting
> > -fno-delete-null-pointer-checks flag prevents the if statement from
> > being folded. Also, I’ve confirmed that adding the check if
> > "catchange.xcnt > 0” before the null-pointer check also can prevent
> > that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> > added a similar check for builder->committed.xcnt as well for
> > consistency. builder->committed.xip could have no transactions.
> >
>
> Good work. I wonder without comments this may create a problem in the
> future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> freeing the memory any less robust. Also, for consistency, we can use
> a similar check based on xcnt in the SnapBuildRestore to free the
> memory in the below code:
> + /* set catalog modifying transactions */
> + if (builder->catchange.xip)
> + pfree(builder->catchange.xip);

I would hesitate to add comments about preventing the particular
optimization. I think we do null-pointer-check-then-pfree many place.
It seems to me that checking the array length before memcpy is more
natural than checking both the array length and the array existence
before pfree.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread osumi.takami...@fujitsu.com

On Sunday, July 17, 2022 9:59 PM Masahiko Sawada  wrote:
> I've attached patches for all supported branches including the master.
Hi,


Minor comments for REL14.

(1) There are some foreign characters in the patches (in the commit message)

When I had a look at your patch for back branches with some editor,
I could see some unfamiliar full-width characters like below two cases,
mainly around "single quotes" in the sentences.

Could you please check the entire patches,
probably by some tool that helps you to detect this kind of characters ?

* the 2nd paragraph of the commit message

...mark the transaction as containing catalog changes if it窶冱 in the list of the
initial running transactions ...

* the 3rd paragraph of the same

It doesn窶冲 have the information on which (sub) transaction has catalog 
changes

FYI, this comment applies to other patches for REL13, REL12, REL11, REL10.


(2) typo in the commit message

FROM:
To fix this problem, this change the reorder buffer so that...
TO:
To fix this problem, this changes the reorder buffer so that...


(3) typo in ReorderBufferProcessInitialXacts

+   /*
+* Remove transactions that would have been processed and we don't need 
to
+* keep track off anymore.


Kindly change
FROM:
keep track off
TO:
keep track of



Best Regards,
Takamichi Osumi

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher


Am 19.07.22 um 00:52 schrieb Martin Kalcher:


On the contrary! I am pretty sure there are people out there wanting 
sampling-without-shuffling. I will think about that.


I gave it some thought. Even though there might be use cases, where a 
stable order is desired, i would consider them edge cases, not worth the 
additional complexity. I personally would not expect array_sample() to 
return elements in any specific order. I looked up some sample() 
implementations. None of them makes guarantees about the order of the 
resulting array or explicitly states that the resulting array is in 
random or selection order.


- Python random.sample [0]
- Ruby Array#sample [1]
- Rust rand::seq::SliceRandom::choose_multiple [2]
- Julia StatsBase.sample [3] stable order needs explicit request

[0] https://docs.python.org/3/library/random.html#random.sample
[1] https://ruby-doc.org/core-3.0.0/Array.html#method-i-sample
[2] 
https://docs.rs/rand/0.6.5/rand/seq/trait.SliceRandom.html#tymethod.choose_multiple

[3] https://juliastats.org/StatsBase.jl/stable/sampling/#StatsBase.sample

Re: NAMEDATALEN increase because of non-latin languages

2022-07-19 Thread John Naylor

I wrote:

> On Mon, Jul 18, 2022 at 9:58 AM Andres Freund  wrote:
> > Hm. Wouldn't it make sense to just use the normal tuple deforming
routines and
> > then map the results to the structs?
>
> I wasn't sure if they'd be suitable for this, but if they are, that'd
make this easier and more maintainable. I'll look into it.

This would seem to have its own problems: heap_deform_tuple writes to
passed arrays of datums and bools. The lower level parts like fetchatt and
nocachegetattr return datums, so still need some generated boilerplate.
Some of these also assume they can write cached offsets on a passed tuple
descriptor.

I'm thinking where the first few attributes are fixed length, not null, and
(because of AIX) not double-aligned, we can do a single memcpy on multiple
columns at once. That will still be a common pattern after namedata is
varlen. Otherwise, use helper functions/macros similar to the above but
instead of passing a tuple descriptor, use info we have at compile time.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila  wrote 
in 
> Good work. I wonder without comments this may create a problem in the
> future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> freeing the memory any less robust. Also, for consistency, we can use
> a similar check based on xcnt in the SnapBuildRestore to free the
> memory in the below code:
> + /* set catalog modifying transactions */
> + if (builder->catchange.xip)
> + pfree(builder->catchange.xip);

But xip must be positive there.  We can add a comment explains that.


+* Array of transactions and subtransactions that had modified catalogs
+* and were running when the snapshot was serialized.
+*
+* We normally rely on HEAP2_NEW_CID and XLOG_XACT_INVALIDATIONS 
records to
+* know if the transaction has changed the catalog. But it could happen 
that
+* the logical decoding decodes only the commit record of the 
transaction.
+* This array keeps track of the transactions that have modified 
catalogs

(Might be only me, but) "track" makes me think that xids are added and
removed by activities. On the other hand the array just remembers
catalog-modifying xids in the last life until the all xids in the list
gone.

+* and were running when serializing a snapshot, and this array is used 
to
+* add such transactions to the snapshot.
+*
+* This array is set once when restoring the snapshot, xids are removed

(So I want to add "only" between "are removed").

+* from the array when decoding xl_running_xacts record, and then 
eventually
+* becomes empty.


+   catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);

catchange_xip is allocated in the current context, but ondisk is
allocated in builder->context.  I see it kind of inconsistent (even if
the current context is same with build->context).


+   if (builder->committed.xcnt > 0)
+   {

It seems to me comitted.xip is always non-null, so we don't need this.
I don't strongly object to do that, though.

-* Remove TXN from its containing list.
+* Remove TXN from its containing lists.

The comment body only describes abut txn->nodes. I think we need to
add that for catchange_node.


+   Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));

(xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
(xcnt == rb->catchange_ntxns) is not what should be checked here. The
assert just requires that catchange_txns and catchange_ntxns are
consistent so it should be checked just after dlist_empty.. I think.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila  wrote:
>
> On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada  wrote:
> >
> > On Fri, Jul 15, 2022 at 3:32 PM shiy.f...@fujitsu.com
> >  wrote:
> > >
> >
> > I've attached patches for all supported branches including the master.
> >
>
> For back branch patches,
> * Wouldn't it be better to move purge logic into the function
> SnapBuildPurge* function for the sake of consistency?

Agreed.

> * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> Can't we instead have a function similar to
> SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> will avoid calling it when the snapshot
> state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT

Seems a good idea. We would need to pass the information about
(parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
we can change ReorderBufferXidHasCatalogChanges() so that it checks
the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
array.

BTW on backbranches, I think that the reason why we add
initial_running_xacts stuff to ReorderBuffer is that we cannot modify
SnapBuild that could be serialized. Can we add a (private) array for
the initial running xacts in snapbuild.c instead of adding new
variables to ReorderBuffer? That way, the code would become more
consistent with the changes on the master branch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Tue, Jul 19, 2022 at 4:28 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Sunday, July 17, 2022 9:59 PM Masahiko Sawada  
> wrote:
> > I've attached patches for all supported branches including the master.
> Hi,
>
>
> Minor comments for REL14.
>
> (1) There are some foreign characters in the patches (in the commit message)
>
> When I had a look at your patch for back branches with some editor,
> I could see some unfamiliar full-width characters like below two cases,
> mainly around "single quotes" in the sentences.
>
> Could you please check the entire patches,
> probably by some tool that helps you to detect this kind of characters ?
>
> * the 2nd paragraph of the commit message
>
> ...mark the transaction as containing catalog changes if it窶冱 in the list of 
> the
> initial running transactions ...
>
> * the 3rd paragraph of the same
>
> It doesn窶冲 have the information on which (sub) transaction has catalog 
> changes
>
> FYI, this comment applies to other patches for REL13, REL12, REL11, REL10.
>
>
> (2) typo in the commit message
>
> FROM:
> To fix this problem, this change the reorder buffer so that...
> TO:
> To fix this problem, this changes the reorder buffer so that...
>
>
> (3) typo in ReorderBufferProcessInitialXacts
>
> +   /*
> +* Remove transactions that would have been processed and we don't 
> need to
> +* keep track off anymore.
>
>
> Kindly change
> FROM:
> keep track off
> TO:
> keep track of

Thank you for the comments! I'll address these comments in the next
version patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada  
wrote in 
> On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila  wrote:
> > Good work. I wonder without comments this may create a problem in the
> > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > freeing the memory any less robust. Also, for consistency, we can use
> > a similar check based on xcnt in the SnapBuildRestore to free the
> > memory in the below code:
> > + /* set catalog modifying transactions */
> > + if (builder->catchange.xip)
> > + pfree(builder->catchange.xip);
> 
> I would hesitate to add comments about preventing the particular
> optimization. I think we do null-pointer-check-then-pfree many place.
> It seems to me that checking the array length before memcpy is more
> natural than checking both the array length and the array existence
> before pfree.

Anyway according to commit message of 46ab07ffda, POSIX forbits
memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
false (or over) optimization.  So if we add some comment, it would be
for memcpy, not pfree..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 16:57:14 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada  
> wrote in 
> > On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila  wrote:
> > > Good work. I wonder without comments this may create a problem in the
> > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > freeing the memory any less robust. Also, for consistency, we can use
> > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > memory in the below code:
> > > + /* set catalog modifying transactions */
> > > + if (builder->catchange.xip)
> > > + pfree(builder->catchange.xip);
> > 
> > I would hesitate to add comments about preventing the particular
> > optimization. I think we do null-pointer-check-then-pfree many place.
> > It seems to me that checking the array length before memcpy is more
> > natural than checking both the array length and the array existence
> > before pfree.
> 
> Anyway according to commit message of 46ab07ffda, POSIX forbits
> memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
> false (or over) optimization.  So if we add some comment, it would be
> for memcpy, not pfree..

For clarilty, I meant that I don't think we need that comment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-07-19 Thread Önder Kalacı

Hi, thanks for your reply.

Amit Kapila , 18 Tem 2022 Pzt, 08:29 tarihinde
şunu yazdı:

> On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı 
> wrote:
> >
> > Hi hackers,
> >
> >
> > It is often not feasible to use `REPLICA IDENTITY FULL` on the
> publication, because it leads to full table scan
> >
> > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL`
> impracticable -- probably other
> >
> > than some small number of use cases.
> >
>
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?
>

Yes, that is right.

In a similar perspective, I see this patch useful for reducing the "use
primary key/unique index" requirement to "use any index" for a reasonably
performant logical replication with updates/deletes.


>
> It seems that in favorable cases it will improve performance but we
> should consider unfavorable cases as well. Two things that come to
> mind in that regard are (a) while choosing index/seq. scan paths, the
> patch doesn't account for cost for tuples_equal() which needs to be
> performed for index scans, (b) it appears to me that the patch decides
> which index to use the first time it opens the rel (or if the rel gets
> invalidated) on subscriber and then for all consecutive operations it
> uses the same index. It is quite possible that after some more
> operations on the table, using the same index will actually be
> costlier than a sequence scan or some other index scan
>

Regarding (b), yes that is a concern I share. And, I was actually
considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or
maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or
CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
re-create the cache entries. In this case, as far as I can see, we need a
callback that is called when table "ANALYZE"d, because that is when the
statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I
can try but also see that committing becomes even harder. So, please see
the next idea as well.

- Ask users to manually pick the index they want to use: Currently, the
main complexity of the patch comes with the planner related code. In fact,
if you look into the logical replication related changes, those are
relatively modest changes. If we can drop the feature that Postgres picks
the index, and provide a user interface to set the indexes per table in the
subscription, we can probably have an easier patch to review & test. For
example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i`
type of an API. This also needs some coding, but probably much simpler than
the current code. And, obviously, this pops up the question of can users
pick the right index? Probably not always, but at least that seems like a
good start to use this performance improvement.

Thoughts?

Thanks,
Onder Kalaci

Re: Costing elided SubqueryScans more nearly correctly

2022-07-19 Thread Alvaro Herrera

On 2022-Jul-19, Richard Guo wrote:

> On Tue, Jul 19, 2022 at 1:30 AM Tom Lane  wrote:

> > WFM.  (I'd fixed the comment typo in my patch, but I don't mind if
> > you get there first.)

Ah, I see now you had other grammatical fixes and even more content
there.

> +1 The fix looks good to me.

Thanks, pushed.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
 wrote:

Thank you for the comments!

>
> At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila  
> wrote in
> > Good work. I wonder without comments this may create a problem in the
> > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > freeing the memory any less robust. Also, for consistency, we can use
> > a similar check based on xcnt in the SnapBuildRestore to free the
> > memory in the below code:
> > + /* set catalog modifying transactions */
> > + if (builder->catchange.xip)
> > + pfree(builder->catchange.xip);
>
> But xip must be positive there.  We can add a comment explains that.
>

Yes, if we add the comment for it, probably we need to explain a gcc's
optimization but it seems to be too much to me.

>
> +* Array of transactions and subtransactions that had modified 
> catalogs
> +* and were running when the snapshot was serialized.
> +*
> +* We normally rely on HEAP2_NEW_CID and XLOG_XACT_INVALIDATIONS 
> records to
> +* know if the transaction has changed the catalog. But it could 
> happen that
> +* the logical decoding decodes only the commit record of the 
> transaction.
> +* This array keeps track of the transactions that have modified 
> catalogs
>
> (Might be only me, but) "track" makes me think that xids are added and
> removed by activities. On the other hand the array just remembers
> catalog-modifying xids in the last life until the all xids in the list
> gone.
>
> +* and were running when serializing a snapshot, and this array is 
> used to
> +* add such transactions to the snapshot.
> +*
> +* This array is set once when restoring the snapshot, xids are 
> removed
>
> (So I want to add "only" between "are removed").
>
> +* from the array when decoding xl_running_xacts record, and then 
> eventually
> +* becomes empty.

Agreed. WIll fix.

>
>
> +   catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);
>
> catchange_xip is allocated in the current context, but ondisk is
> allocated in builder->context.  I see it kind of inconsistent (even if
> the current context is same with build->context).

Right. I thought that since the lifetime of catchange_xip is short,
until the end of SnapBuildSerialize() function we didn't need to
allocate it in builder->context. But given ondisk, we need to do that
for catchange_xip as well. Will fix it.

>
>
> +   if (builder->committed.xcnt > 0)
> +   {
>
> It seems to me comitted.xip is always non-null, so we don't need this.
> I don't strongly object to do that, though.

But committed.xcnt could be 0, right? We don't need to copy anything
by calling memcpy with size = 0 in this case. Also, it looks more
consistent with what we do for catchange_xcnt.

>
> -* Remove TXN from its containing list.
> +* Remove TXN from its containing lists.
>
> The comment body only describes abut txn->nodes. I think we need to
> add that for catchange_node.

Will add.

>
>
> +   Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
>
> (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> assert just requires that catchange_txns and catchange_ntxns are
> consistent so it should be checked just after dlist_empty.. I think.
>

If we want to check if catchange_txns and catchange_ntxns are
consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
This function requires the caller to use rb->catchange_ntxns as the
length of the returned array. I think this assertion ensures that the
actual length of the array is consistent with the length we
pre-calculated.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: System column support for partitioned tables using heap

2022-07-19 Thread Morris de Oryx

> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.

Yeah, I figured it was niche when I saw so little mention of the issue.

My reason for xmax() in the result is to break down the affected rows count
into an insert count, and a modified estimate. Not super critical, but
helpful. I've built out some simple custom logging table in out system for
this kind of detail, and folks have been wanting to break down rows
submitted, rows inserted, and rows updated a bit better. Rows submitted is
easy and rows inserted is too...update is an estimate as I'm not using
anything fancy with xmax() to sort out what exactly happened.

For clarification, we're not using an ORM, and may need to support
straggling clients, so our push cycle works like this:

* Create a view with the fields expected in the insert. I figured I'd use
CREATE VIEW instead of CREATE TYPE as then I can quickly check out the
details against live data, and I still get a custom compound type.

* Write a function that accepts an array of view_name_type. I *love* Postgres'
typing system, It has spoiled me forever. Can't submit badly formatted
objects from the client, they're rejected automatically.

* Write a client-side routine to package data as an array and push it into
the insert handling function. The function unnests the array, and then the
actual insert code draws from the unpacked values. If I need to extend the
table, I can add a new function that knows about the revised fields, and
revise (when necessary) earlier supported formats to map to new
types/columns/defaults.

There are few CTEs in there, including one that does the main insert and
returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <>
0 (not an insert).

> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?

You should never feel bad about anything ;-) You and others on that thread
contribute so much that I'm getting value out of.

I had it in mind that it would be nice to have some kind of
catalog/abstraction that would make it possible to interrogate what system
columns are available on a table/partition based on access method. In my
vague notion, that might make some of the other ideas from that thread,
such as index-oriented stores with quite different physical layouts, easier
to implement. But, it's all free when you aren't the one who can write the
code.

I've switched the partition-based tables back to returning * on the insert
CTE, and then aggregating that to add to a log table and the client result.
It's fine. A rich result summary would be very nice. As in rows
added/modified/deleted on whatever table(s). If anyone ever decides to
implement such a structure for MERGE, it would be nice to see it
retrofitted to the other data modification commands where RETURNING works.

On Tue, Jul 19, 2022 at 6:13 AM Robert Haas  wrote:

> On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx 
> wrote:
> > This fails on a partitioned table because xmax() may not exist. In fact,
> it does exist in all of those tables, but the system doesn't know how to
> guarantee that. I know which tables are partitioned, and can downgrade the
> result on partitioned tables to the count(*) I've been using to date. But
> now I'm wondering if working with xmax() like this is a poor idea going
> forward. I don't want to lean on a feature/behavior that's likely to
> change. For example, I noticed the other day that MERGE does not support
> RETURNING.
> >
> > I'd appreciate any insight or advice you can offer.
>
> What is motivating you to want to see the xmax value here? It's not an
> unreasonable thing to want to do, IMHO, but it's a little bit niche so
> I'm just curious what the motivation is.
>
> I do agree with you that it would be nice if this worked better than
> it does, but I don't really know exactly how to make that happen. The
> column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.
> I guess we could have tried to work things out so that a 0 value would
> be passed up from children that lack an xmax column, and that would
> allow the parent to have such a column, but I don't feel too bad that
> we didn't do that ... should I?
>
> --
> Robert Haas
> EDB: http://www

Re: System column support for partitioned tables using heap

2022-07-19 Thread Morris de Oryx

> The column list for a partitioned table must be fixed at the time it is
> created, but we do not know what partitions might be added in the
> future, and thus we don't know whether they will have an xmax column.

Right, seeing what you're meaning there. It's fantastic that a partition
might be an FDW to a system that has no concept at all of anything like a
"system column", or something with an alternative AM to heap that has a
different set of system columns. That flexibility in partitions is super
valuable. I'd love to be able to convert old partitions into column stores,
for example. (I think that Citus offers that feature now.)

I guess if anyone ever felt it was worth the effort, maybe whatever checks
are done at attach-partition time for the column list could also enforce
meta/system columns. If missing, a shimming mechanism would be pretty
necessary.

Sounds like a lot of work for not much gain, at least in this narrow case.

Thanks again for answering.

On Tue, Jul 19, 2022 at 6:43 PM Morris de Oryx 
wrote:

> > What is motivating you to want to see the xmax value here? It's not an
> > unreasonable thing to want to do, IMHO, but it's a little bit niche so
> > I'm just curious what the motivation is.
>
> Yeah, I figured it was niche when I saw so little mention of the issue.
>
> My reason for xmax() in the result is to break down the affected rows
> count into an insert count, and a modified estimate. Not super critical,
> but helpful. I've built out some simple custom logging table in out system
> for this kind of detail, and folks have been wanting to break down rows
> submitted, rows inserted, and rows updated a bit better. Rows submitted is
> easy and rows inserted is too...update is an estimate as I'm not using
> anything fancy with xmax() to sort out what exactly happened.
>
> For clarification, we're not using an ORM, and may need to support
> straggling clients, so our push cycle works like this:
>
> * Create a view with the fields expected in the insert. I figured I'd use
> CREATE VIEW instead of CREATE TYPE as then I can quickly check out the
> details against live data, and I still get a custom compound type.
>
> * Write a function that accepts an array of view_name_type. I *love* Postgres'
> typing system, It has spoiled me forever. Can't submit badly formatted
> objects from the client, they're rejected automatically.
>
> * Write a client-side routine to package data as an array and push it into
> the insert handling function. The function unnests the array, and then the
> actual insert code draws from the unpacked values. If I need to extend the
> table, I can add a new function that knows about the revised fields, and
> revise (when necessary) earlier supported formats to map to new
> types/columns/defaults.
>
> There are few CTEs in there, including one that does the main insert and
> returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <>
> 0 (not an insert).
>
> > I do agree with you that it would be nice if this worked better than
> > it does, but I don't really know exactly how to make that happen. The
> > column list for a partitioned table must be fixed at the time it is
> > created, but we do not know what partitions might be added in the
> > future, and thus we don't know whether they will have an xmax column.
> > I guess we could have tried to work things out so that a 0 value would
> > be passed up from children that lack an xmax column, and that would
> > allow the parent to have such a column, but I don't feel too bad that
> > we didn't do that ... should I?
>
> You should never feel bad about anything ;-) You and others on that thread
> contribute so much that I'm getting value out of.
>
> I had it in mind that it would be nice to have some kind of
> catalog/abstraction that would make it possible to interrogate what system
> columns are available on a table/partition based on access method. In my
> vague notion, that might make some of the other ideas from that thread,
> such as index-oriented stores with quite different physical layouts, easier
> to implement. But, it's all free when you aren't the one who can write the
> code.
>
> I've switched the partition-based tables back to returning * on the insert
> CTE, and then aggregating that to add to a log table and the client result.
> It's fine. A rich result summary would be very nice. As in rows
> added/modified/deleted on whatever table(s). If anyone ever decides to
> implement such a structure for MERGE, it would be nice to see it
> retrofitted to the other data modification commands where RETURNING works.
>
> On Tue, Jul 19, 2022 at 6:13 AM Robert Haas  wrote:
>
>> On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx 
>> wrote:
>> > This fails on a partitioned table because xmax() may not exist. In
>> fact, it does exist in all of those tables, but the system doesn't know how
>> to guarantee that. I know which tables are partitioned, and can downgrade
>> the result on partitioned tables to the count

Expose last replayed timeline ID along with last replayed LSN

2022-07-19 Thread Bharath Rupireddy

Hi,

At times it's useful to know the last replayed WAL record's timeline
ID (especially on the standbys that are lagging in applying WAL while
failing over - for reporting, logging and debugging purposes). AFICS,
there's no function that exposes the last replayed TLI. We can either
change the existing pg_last_wal_replay_lsn() to report TLI along with
the LSN which might break the compatibility or introduce a new
function pg_last_wal_replay_info() that emits both LSN and TLI. I'm
fine with either of the approaches, but for now, I'm attaching a WIP
patch that adds a new function pg_last_wal_replay_info().

Thoughts?

Regards,
Bharath Rupireddy.


v1-0001-Add-new-function-pg_last_wal_replay_info.patch
Description: Binary data

Memory leak fix in psql

2022-07-19 Thread tanghy.f...@fujitsu.com

Hi

I think there is a newly introduced memory leak in your patch d2d3547.
Try to fix it in the attached patch. 
Kindly to have a check.

Regards,
Tang


v1-0001-fix-the-memory-leak-in-validateSQLNamePattern.patch
Description:  v1-0001-fix-the-memory-leak-in-validateSQLNamePattern.patch

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-07-19 Thread Bharath Rupireddy

On Tue, Jul 19, 2022 at 12:00 AM Nathan Bossart
 wrote:
>
> Overall, these patches look reasonable.
>
> On Mon, Jul 18, 2022 at 04:24:12PM +0530, Bharath Rupireddy wrote:
> > record.  Because the entire content of data pages is saved in the
> > -   log on the first page modification after a checkpoint (assuming
> > +   WAL record on the first page modification after a checkpoint (assuming
> >  is not disabled), all pages
> > changed since the checkpoint will be restored to a consistent
> > state.
>
> nitpick: I would remove the word "record" in this change.

Done. PSA v5 patch set.

Regards,
Bharath Rupireddy.


v5-0001-Use-WAL-segment-instead-of-log-segment.patch
Description: Binary data


v5-0002-Replace-log-record-with-WAL-record-in-docs.patch
Description: Binary data

Re: Fast COPY FROM based on batch insert

2022-07-19 Thread Andrey Lepikhov


On 18/7/2022 13:22, Etsuro Fujita wrote:

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
 wrote:

On 3/22/22 06:54, Etsuro Fujita wrote:

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.



I'll ponder how to do it.


I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?

While working on the patch, I fixed a few issues as well:

+   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+   resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c.  For consistency I fixed this by
copying-and-pasting the code from that file.

+* Also, the COPY command requires a non-zero input list of attributes.
+* Therefore, the length of the attribute list is checked here.
+*/
+   if (!cstate->volatile_defexprs &&
+   list_length(cstate->attnumlist) > 0 &&
+   !contain_volatile_functions(cstate->whereClause))
+   target_resultRelInfo->ri_usesMultiInsert =
+   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case.  postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test?  But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize.  So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such.  I
might miss something, though.

Thanks a lot,
maybe you forgot this code:
/*
 * If a partition's root parent isn't allowed to use it, neither is the
 * partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

Also, maybe to describe in documentation, if the value of batch_size is 
more than 1, the ExecForeignBatchInsert routine have a chance to be called?


--
regards,
Andrey Lepikhov
Postgres Professional

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Aleksander Alekseev

Hi Martin,

I didn't look at the code yet but I very much like the idea. Many
thanks for working on this!

It's a pity your patch was too late for the July commitfest. In
September, please keep an eye on cfbot [1] to make sure your patch
applies properly.

> As Tom's investigation showed, there is no consensus in the code if
> multi-dimensional arrays are treated as arrays-of-arrays or not. We need
> to decide what should be the correct treatment.

Here are my two cents.

>From the user perspective I would expect that by default a
multidimensional array should be treated as an array of arrays. So for
instance:

```
array_shuffle([ [1,2], [3,4], [5,6] ])
```

... should return something like:

```
[ [3,4], [1,2], [5,6] ]
```

Note that the order of the elements in the internal arrays is preserved.

However, I believe there should be an optional argument that overrides
this behavior. For instance:

```
array_shuffle([ [1,2], [3,4], [5,6] ], depth => 2)
```

BTW, while on it, shouldn't we add similar functions for JSON and/or
JSONB? Or is this going to be too much for a single discussion?

[1]: http://cfbot.cputube.org/

-- 
Best regards,
Aleksander Alekseev

Re: Memory leak fix in psql

2022-07-19 Thread Japin Li


On Tue, 19 Jul 2022 at 17:02, tanghy.f...@fujitsu.com  
wrote:
> Hi
>
> I think there is a newly introduced memory leak in your patch d2d3547.
> Try to fix it in the attached patch. 
> Kindly to have a check.
>

Yeah, it leaks, and the patch can fix it.

After looking around, I found psql/describe.c also has some memory leaks,
attached a patch to fix these leaks.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 3036a0986f8ff490c133930524e2d5f5104249ff Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Tue, 19 Jul 2022 18:27:25 +0800
Subject: [PATCH v1 1/1] Fix the memory leak in psql describe

---
 src/bin/psql/describe.c | 150 
 1 file changed, 150 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0ce38e4b4c..11a441f52f 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -112,7 +112,10 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 4;");
 
@@ -182,7 +185,10 @@ describeAccessMethods(const char *pattern, bool verbose)
 NULL, "amname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -244,7 +250,10 @@ describeTablespaces(const char *pattern, bool verbose)
 NULL, "spcname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -534,7 +543,10 @@ describeFunctions(const char *functypes, const char *func_pattern,
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	for (int i = 0; i < num_arg_patterns; i++)
 	{
@@ -561,7 +573,10 @@ describeFunctions(const char *functypes, const char *func_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
+			{
+termPQExpBuffer(&buf);
 return false;
+			}
 		}
 		else
 		{
@@ -682,7 +697,10 @@ describeTypes(const char *pattern, bool verbose, bool showSystem)
 "pg_catalog.format_type(t.oid, NULL)",
 "pg_catalog.pg_type_is_visible(t.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -836,7 +854,10 @@ describeOperators(const char *oper_pattern,
 "n.nspname", "o.oprname", NULL,
 "pg_catalog.pg_operator_is_visible(o.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	if (num_arg_patterns == 1)
 		appendPQExpBufferStr(&buf, "  AND o.oprleft = 0\n");
@@ -866,7 +887,10 @@ describeOperators(const char *oper_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
+			{
+termPQExpBuffer(&buf);
 return false;
+			}
 		}
 		else
 		{
@@ -953,7 +977,10 @@ listAllDbs(const char *pattern, bool verbose)
 		if (!validateSQLNamePattern(&buf, pattern, false, false,
 	NULL, "d.datname", NULL, NULL,
 	NULL, 1))
+		{
+			termPQExpBuffer(&buf);
 			return false;
+		}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 	res = PSQLexec(buf.data);
@@ -1106,7 +1133,10 @@ permissionsList(const char *pattern)
 "n.nspname", "c.relname", NULL,
 "n.nspname !~ '^pg_' AND pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -1177,7 +1207,10 @@ listDefaultACLs(const char *pattern)
 "pg_catalog.pg_get_userbyid(d.defaclrole)",
 NULL,
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 3;");
 
@@ -1428,7 +1461,10 @@ describeTableDetails(const char *pattern, bool verbose, bool showSystem)
 "n.nspname", "c.relname", NULL,
 "pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 2, 3;");
 
@@ -3614,7 +3650,10 @@ describeRoles(const char *pattern, bool verbose, bool showSystem)
 	if (!validateSQLNamePattern(&buf, pattern, false, false,
 NULL, "r.rolname", NULL, NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -3739,11 +3778,17 @@ listDbRoleSettings(const char *pattern, const char *pattern2)
 	  gettext_noop("Settings"));
 	if (!validateSQLNamePattern(&buf, pattern, false, false,
 NULL, "r.rolname", NULL, NULL, &havewhere, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 	if (!validateSQLNamePattern(&buf, pattern2, havewhere, false,
 NULL, "d.datname", NULL, NULL,
 NULL, 1))
+	{
+		termPQE

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher


Am 18.07.22 um 23:48 schrieb Martin Kalcher:


If we go with (1) array_shuffle() and array_sample() should shuffle each 
element individually and always return a one-dimensional array.


   select array_shuffle('{{1,2},{3,4},{5,6}}');
   ---
    {1,4,3,5,6,2}

   select array_sample('{{1,2},{3,4},{5,6}}', 3);
   --
    {1,4,3}

If we go with (2) both functions should only operate on the first 
dimension and shuffle whole subarrays and keep the dimensions intact.


   select array_shuffle('{{1,2},{3,4},{5,6}}');
   -
    {{3,4},{1,2},{5,6}}

   select array_sample('{{1,2},{3,4},{5,6}}', 2);
   ---
    {{3,4},{1,2}}



Having thought about it, i would go with (2). It gives the user the 
ability to decide wether or not array-of-arrays behavior is desired. If 
he wants the behavior of (1) he can flatten the array before applying 
array_shuffle(). Unfortunately there is no array_flatten() function (at 
the moment) and the user would have to work around it with unnest() and 
array_agg().

Re: Proposal: allow database-specific role memberships

2022-07-19 Thread Antonin Houska

Kenaniah Cerny  wrote:

> Rebased yet again...
> 
> On Mon, Jul 4, 2022 at 1:17 PM Kenaniah Cerny  wrote:

>  The "unsafe_tests" directory is where the pre-existing role tests were
>  located. According to the readme of the "unsafe_tests" directory, the tests
>  contained within are not run during "make installcheck" because they could
>  have side-effects that seem undesirable for a production installation. This
>  seemed like a reasonable location as the new tests that this patch
>  introduces also modifies the "state" of the database cluster by adding,
>  modifying, and removing roles & databases (including template1).

ok, I missed the purpose of "unsafe_tests" so far, thanks.

>  Regarding roles_is_member_of(), the nuance is that role "A" in your example
>  would only be considered a member of role "B" (and by extension role "C")
>  when connected to the database in which "A" was granted database-specific
>  membership to "B".

>  Conversely, when connected to any other database, "A" would not be 
> considered to be a member of "B".  
> 
>  This patch is designed to solve the scenarios in which one may want to
>  grant constrained access to a broader set of privileges. For example,
>  membership in "pg_read_all_data" effectively grants SELECT and USAGE rights
>  on everything (implicitly cluster-wide in today's implementation). By
>  granting a role membership to "pg_read_all_data" within the context of a
>  specific database, the grantee's read-everything privilege is effectively
>  constrained to just that specific database (as membership within
>  "pg_read_all_data" would not otherwise be held).

ok, I tried to view the problem rather from general perspective. However, the
permissions like "pg_read_all_data" are unusual in that they are rather strong
and at the same time they are usually located at the top of the groups
hierarchy. I've got no better idea how to solve the problem.

A few more comments on the patch:

* It's not clear from the explanation of the GRANT ... IN DATABASE ... / GRANT
  ... IN CURRENT DATABASE ... that, even if "membership in ... will be
  effective only when the recipient is connected to the database ...", the
  ADMIN option might not be "fully effective". I refer to the part of the
  regression tests starting with

-- Ensure database-specific admin option can only grant within that database

  For example, "role_read_34" does have the ADMIN option for the
  "pg_read_all_data" role and for the "db_4" database:

GRANT pg_read_all_data TO role_read_34 IN DATABASE db_4 WITH ADMIN OPTION;

  (in other words, "role_read_34" does have the database-specific membership
  in "pg_read_all_data"), but it cannot use the option (in other words, cannot
  use some ability resulting from that membership) unless the session to that
  database is active:

\connect db_3
SET SESSION AUTHORIZATION role_read_34;
...
GRANT pg_read_all_data TO role_granted IN CURRENT DATABASE; -- success
GRANT pg_read_all_data TO role_granted IN DATABASE db_3; -- notice
NOTICE:  role "role_granted" is already a member of role "pg_read_all_data" 
in database "db_3"
GRANT pg_read_all_data TO role_granted IN DATABASE db_4; -- error
ERROR:  must have admin option on role "pg_read_all_data"

Specifically on the regression tests:

* The function check_memberships() has no parameters - is there a reason 
not to use a view?

* I'm not sure if the pg_auth_members catalog can contain InvalidOid in
  other columns than dbid. Thus I think that the query in
  check_memberships() only needs an outer JOIN for the pg_database table,
  while the other joins can be inner.

* In this part

SET SESSION AUTHORIZATION role_read_12_noinherit;
SELECT * FROM data; -- error
SET ROLE role_read_12; -- error
SELECT * FROM data; -- error

I think you don't need to query the table again if the SET ROLE statement
failed and the same query had been executed before the SET ROLE.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Handle infinite recursion in logical replication setup

2022-07-19 Thread Amit Kapila

On Mon, Jul 18, 2022 at 9:46 PM vignesh C  wrote:
>
> I have updated the patch to handle the origin value case
> insensitively. The attached patch has the changes for the same.
>

Thanks, the patch looks mostly good to me. I have made a few changes
in 0001 patch which are as follows: (a) make a comparison of origin
names in maybe_reread_subscription similar to slot names as in future
we may support origin names other than 'any' and 'none', (b) made
comment changes at few places and minor change in one of the error
message, (c) ran pgindent and slightly changed the commit message.

I am planning to push this day after tomorrow unless there are any
comments/suggestions.

-- 
With Regards,
Amit Kapila.

v35-0001-Allow-uses-to-skip-logical-replication-of-data-h.patch
Description: Binary data

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-19 Thread Amit Langote

Hi,

On Tue, Jul 19, 2022 at 4:09 AM Andrew Dunstan  wrote:
> On 2022-07-15 Fr 17:07, Andres Freund wrote:
> > Perhaps you could post your current state? I might be able to help resolving
> > some of the problems.
>
> Ok. Here is the state of things. This has proved to be rather more
> intractable than I expected. Almost all the legwork here has been done
> by Amit Langote, for which he deserves both my thanks and considerable
> credit, but I take responsibility for it.
>
> I just discovered today that this scheme is failing under
> "force_parallel_mode = regress". I have as yet no idea if that can be
> fixed simply or not.

The errors Andrew mentions here had to do with a bug of the new
coercion evaluation logic.  The old code in ExecEvalJsonExpr() would
skip coercion evaluation and thus also the sub-transaction associated
with it for some JsonExprs that the new code would not and that didn't
sit well with the invariant that a parallel worker shouldn't try to
start a sub-transaction.

That bug has been fixed in the attached updated version.

> Apart from that I think the main outstanding issue
> is to fill in the gaps in llvm_compile_expr().

About that, I was wondering if the blocks in llvm_compile_expr() need
to be hand-coded to match what's added in ExecInterpExpr() or if I've
missed some tool that can be used instead?

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

v2-0002-Evaluate-various-JsonExpr-sub-expressions-using-p.patch
Description: Binary data

v2-0003-Use-one-ExprState-to-implement-JsonItemCoercions.patch
Description: Binary data

v2-0001-in-JsonExprState-just-store-a-pointer-to-the-inpu.patch
Description: Binary data

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Amit Kapila

On Tue, Jul 19, 2022 at 1:43 PM Kyotaro Horiguchi
 wrote:
>
> At Tue, 19 Jul 2022 16:57:14 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada  
> > wrote in
> > > On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila  
> > > wrote:
> > > > Good work. I wonder without comments this may create a problem in the
> > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > > freeing the memory any less robust. Also, for consistency, we can use
> > > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > > memory in the below code:
> > > > + /* set catalog modifying transactions */
> > > > + if (builder->catchange.xip)
> > > > + pfree(builder->catchange.xip);
> > >
> > > I would hesitate to add comments about preventing the particular
> > > optimization. I think we do null-pointer-check-then-pfree many place.
> > > It seems to me that checking the array length before memcpy is more
> > > natural than checking both the array length and the array existence
> > > before pfree.
> >
> > Anyway according to commit message of 46ab07ffda, POSIX forbits
> > memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
> > false (or over) optimization.  So if we add some comment, it would be
> > for memcpy, not pfree..
>
> For clarilty, I meant that I don't think we need that comment.
>

Fair enough. I think commit 46ab07ffda clearly explains why it is a
good idea to add a check as Sawada-San did in his latest version. I
also agree that we don't any comment for this change.

-- 
With Regards,
Amit Kapila.

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-07-19 Thread Bharath Rupireddy

On Tue, Apr 26, 2022 at 6:31 AM Michael Paquier  wrote:
>
> On Mon, Apr 25, 2022 at 01:34:38PM -0700, Nathan Bossart wrote:
> > I took another look at the example output, and I think I agree that logging
> > the total time for logical decoding operations is probably the best path
> > forward.  This information would be enough to clue an administrator into
> > the possible causes of lengthy checkpoints, but it also wouldn't disrupt
> > the readability of the log statement too much.
>
> +   /* translator: the placeholders after first %s show 
> restartpoint/checkpoint options */
> +   (errmsg("%s starting:%s%s%s%s%s%s%s%s",
> +   restartpoint ?
> _("restartpoint") : _("checkpoint"),
>
> 0001 breaks translation, as "checkpoint/restartpoint" and "starting"
> would treated as separate terms to translate.  That would not matter
> for English, but it does in French where we'd say "début du
> checkpoint".  You could fix that by adding "starting" to each
> refactored term or build a string.  0002 does the latter, so my take
> is that you should begin using a StringInfo in 0001.

Thanks for reviewing. I've addressed the review comments, PSA v10
patch. Note that we can't use StringInfo as the checkpointer memory
context doesn't allow pallocs in the critical section and the
checkpoint can sometimes be run in the critical section.

I've also added the total number of WAL files a checkpoint has
processed (scanned the pg_wal directory) while removing old WAL files.
This helps to estimate the pg_wal disk space at the time of a
particular checkpoint, especially useful for debugging issues.

[1] sample output:
2022-07-19 10:33:45.378 UTC [3027866] LOG:  checkpoint starting: wal
2022-07-19 10:33:51.434 UTC [3027866] LOG:  checkpoint complete: wrote
50 buffers (0.3%); 0 WAL file(s) added, 12 removed, 35 recycled, 76
processed; write=3.651 s, sync=0.011 s, total=6.136 s; sync files=11,
longest=0.004 s, average=0.001 s; distance=770045 kB, estimate=770045
kB; lsn=0/95000260, redo lsn=0/7968; logical decoding file(s)
processing=0.007 s

Regards,
Bharath Rupireddy.
From b113dfb9e876a84c2121b64ab94a9d10c3c670b4 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 19 Jul 2022 10:35:29 +
Subject: [PATCH v10] Add time taken for processing logical decoding files to
 checkpoint log

At times, there can be many snapshot and mapping files under
pg_logical dir that the checkpoint might have to delete/fsync
based on the cutoff LSN which can increase the checkpoint time.
Add stats related to these files to better understand the delays
or time spent by the checkpointer processing them.

Also, add total number of WAL files processed during checkpoint to
the log message. This will be useful for debugging issues like
total time taken by checkpoint (if there are many WAL files) and
estimate the disk space occupied by pg_wal at the time of checkpoint.
---
 src/backend/access/transam/xlog.c | 30 +++---
 src/include/access/xlog.h |  5 +
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b809a2152c..20c1689ed2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3603,6 +3603,8 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr,
 			   insertTLI);
 			}
 		}
+
+		CheckpointStats.ckpt_segs_processed++;
 	}
 
 	FreeDir(xldir);
@@ -6092,7 +6094,8 @@ LogCheckpointEnd(bool restartpoint)
 sync_msecs,
 total_msecs,
 longest_msecs,
-average_msecs;
+average_msecs,
+l_dec_ops_msecs;
 	uint64		average_sync_time;
 
 	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
@@ -6129,6 +6132,9 @@ LogCheckpointEnd(bool restartpoint)
 			CheckpointStats.ckpt_sync_rels;
 	average_msecs = (long) ((average_sync_time + 999) / 1000);
 
+	l_dec_ops_msecs = TimestampDifferenceMilliseconds(CheckpointStats.l_dec_ops_start_t,
+	  CheckpointStats.l_dec_ops_end_t);
+
 	/*
 	 * ControlFileLock is not required to see ControlFile->checkPoint and
 	 * ->checkPointCopy here as we are the only updator of those variables at
@@ -6137,16 +6143,18 @@ LogCheckpointEnd(bool restartpoint)
 	if (restartpoint)
 		ereport(LOG,
 (errmsg("restartpoint complete: wrote %d buffers (%.1f%%); "
-		"%d WAL file(s) added, %d removed, %d recycled; "
+		"%d WAL file(s) added, %d removed, %d recycled, %d processed; "
 		"write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s; "
 		"sync files=%d, longest=%ld.%03d s, average=%ld.%03d s; "
 		"distance=%d kB, estimate=%d kB; "
-		"lsn=%X/%X, redo lsn=%X/%X",
+		"lsn=%X/%X, redo lsn=%X/%X; "
+		"logical decoding file(s) processing=%ld.%03d s",
 		CheckpointStats.ckpt_bufs_written,
 		(double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
 		CheckpointStats.ckpt_segs_added,
 		CheckpointStats.ckpt_segs_rem

errdetail/errhint style

2022-07-19 Thread Justin Pryzby

Most of these are new in v15.
In any case, I'm not sure if the others ought to be backpatched.
There may be additional fixes to make with the same grepping.
>From a33bd79c2f36046463489fbd37c76d7f0c3f7a8e Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Mon, 18 Jul 2022 13:54:38 -0500
Subject: [PATCH] errdetail/hint messages should be capitalized and end with a
 dot

https://www.postgresql.org/docs/current/error-style-guide.html#id-1.10.6.4.7

git grep 'errdetail("[[:lower:]]'
git grep 'errdetail(".*".*;' |grep -v '\."'

See also:
501ed02cf6f4f60c3357775eb07578aebc912d3a
730422afcdb6784bbe20efc65de72156d470b0c4
---
 contrib/basic_archive/basic_archive.c |  4 +--
 .../postgres_fdw/expected/postgres_fdw.out|  6 ++--
 contrib/postgres_fdw/option.c |  4 +--
 src/backend/commands/publicationcmds.c|  2 +-
 src/backend/commands/tablecmds.c  |  2 +-
 src/backend/parser/parse_expr.c   |  4 +--
 src/backend/parser/parse_jsontable.c  | 12 
 src/backend/utils/adt/jsonpath_exec.c |  2 +-
 src/backend/utils/adt/jsonpath_gram.y |  2 +-
 src/backend/utils/misc/guc.c  |  2 +-
 src/test/regress/expected/jsonb_sqljson.out   | 28 +--
 src/test/regress/expected/jsonpath.out|  2 +-
 src/test/regress/expected/publication.out |  2 +-
 src/test/regress/expected/sqljson.out | 10 +++
 src/test/regress/expected/triggers.out|  2 +-
 15 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index bba767c8f36..776a386e352 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -111,7 +111,7 @@ check_archive_directory(char **newval, void **extra, GucSource source)
 	 */
 	if (strlen(*newval) + 64 + 2 >= MAXPGPATH)
 	{
-		GUC_check_errdetail("archive directory too long");
+		GUC_check_errdetail("Archive directory too long.");
 		return false;
 	}
 
@@ -122,7 +122,7 @@ check_archive_directory(char **newval, void **extra, GucSource source)
 	 */
 	if (stat(*newval, &st) != 0 || !S_ISDIR(st.st_mode))
 	{
-		GUC_check_errdetail("specified archive directory does not exist");
+		GUC_check_errdetail("Specified archive directory does not exist.");
 		return false;
 	}
 
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index ebf9ea35988..ade797159dc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9590,7 +9590,7 @@ HINT:  Target server's authentication method must be changed or password_require
 -- Unpriv user cannot make the mapping passwordless
 ALTER USER MAPPING FOR CURRENT_USER SERVER loopback_nopw OPTIONS (ADD password_required 'false');
 ERROR:  password_required=false is superuser-only
-HINT:  User mappings with the password_required option set to false may only be created or modified by the superuser
+HINT:  User mappings with the password_required option set to false may only be created or modified by the superuser.
 SELECT 1 FROM ft1_nopw LIMIT 1;
 ERROR:  password is required
 DETAIL:  Non-superuser cannot connect if the server does not request a password.
@@ -9611,10 +9611,10 @@ SELECT 1 FROM ft1_nopw LIMIT 1;
 ALTER USER MAPPING FOR CURRENT_USER SERVER loopback_nopw OPTIONS (SET password_required 'true');
 ALTER USER MAPPING FOR CURRENT_USER SERVER loopback_nopw OPTIONS (ADD sslcert 'foo.crt');
 ERROR:  sslcert and sslkey are superuser-only
-HINT:  User mappings with the sslcert or sslkey options set may only be created or modified by the superuser
+HINT:  User mappings with the sslcert or sslkey options set may only be created or modified by the superuser.
 ALTER USER MAPPING FOR CURRENT_USER SERVER loopback_nopw OPTIONS (ADD sslkey 'foo.key');
 ERROR:  sslcert and sslkey are superuser-only
-HINT:  User mappings with the sslcert or sslkey options set may only be created or modified by the superuser
+HINT:  User mappings with the sslcert or sslkey options set may only be created or modified by the superuser.
 -- We're done with the role named after a specific user and need to check the
 -- changes to the public mapping.
 DROP USER MAPPING FOR CURRENT_USER SERVER loopback_nopw;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index cd2ef234d66..95dde056eba 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -193,7 +193,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 ereport(ERROR,
 		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 		 errmsg("password_required=false is superuser-only"),
-		 errhint("User mappings with the password_required option set to false may only be created or modified by the superuser")));
+		 errhint("User mappings with the password_required option set to false may only be created or modified by the superuser.")));
 		}
 		else if (strcm

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Amit Kapila

On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada  wrote:
>
> On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila  wrote:
> >
> > On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Jul 15, 2022 at 3:32 PM shiy.f...@fujitsu.com
> > >  wrote:
> > > >
> > >
> > > I've attached patches for all supported branches including the master.
> > >
> >
> > For back branch patches,
> > * Wouldn't it be better to move purge logic into the function
> > SnapBuildPurge* function for the sake of consistency?
>
> Agreed.
>
> > * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> > Can't we instead have a function similar to
> > SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> > will avoid calling it when the snapshot
> > state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT
>
> Seems a good idea. We would need to pass the information about
> (parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
> we can change ReorderBufferXidHasCatalogChanges() so that it checks
> the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
> array.
>

Let's try to keep this as much similar to the master branch patch as possible.

> BTW on backbranches, I think that the reason why we add
> initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> SnapBuild that could be serialized. Can we add a (private) array for
> the initial running xacts in snapbuild.c instead of adding new
> variables to ReorderBuffer?
>

While thinking about this, I wonder if the current patch for back
branches can lead to an ABI break as it changes the exposed structure?
If so, it may be another reason to change it to some other way
probably as you are suggesting.

-- 
With Regards,
Amit Kapila.

Re: errdetail/errhint style

2022-07-19 Thread Alvaro Herrera

On 2022-Jul-19, Justin Pryzby wrote:

> https://www.postgresql.org/docs/current/error-style-guide.html#id-1.10.6.4.7
> 
> git grep 'errdetail("[[:lower:]]'
> git grep 'errdetail(".*".*;' |grep -v '\."'

Hmm, +1, though a few of these are still missing ending periods after
your patch.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
   (Paul Graham)

Re: Memory leak fix in psql

2022-07-19 Thread Michael Paquier

On Tue, Jul 19, 2022 at 06:41:13PM +0800, Japin Li wrote:
> After looking around, I found psql/describe.c also has some memory leaks,
> attached a patch to fix these leaks.

Indeed.  There are quite a bit of them, so let's fix all that.  You
have missed a couple of code paths in objectDescription().
--
Michael


signature.asc
Description: PGP signature

Re: System column support for partitioned tables using heap

2022-07-19 Thread Robert Haas

On Tue, Jul 19, 2022 at 4:44 AM Morris de Oryx  wrote:
> My reason for xmax() in the result is to break down the affected rows count 
> into an insert count, and a modified estimate. Not super critical, but 
> helpful. I've built out some simple custom logging table in out system for 
> this kind of detail, and folks have been wanting to break down rows 
> submitted, rows inserted, and rows updated a bit better. Rows submitted is 
> easy and rows inserted is too...update is an estimate as I'm not using 
> anything fancy with xmax() to sort out what exactly happened.

I wonder whether you could just have the CTEs bubble up 1 or 0 and
then sum them at some stage, instead of relying on xmax. Presumably
your UPSERT simulation knows which thing it did in each case.

For MERGE itself, I wonder if some information about this should be
included in the command tag. It looks like MERGE already includes some
sort of row count in the command tag, but I guess perhaps it doesn't
distinguish between inserts and updates. I don't know why we couldn't
expose multiple values this way, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2022-07-19 Thread Damir Belyalov

Hi!

Improved my patch by adding block subtransactions.
The block size is determined by the REPLAY_BUFFER_SIZE parameter.
I used the idea of a buffer for accumulating tuples in it.
If we read REPLAY_BUFFER_SIZE rows without errors, the subtransaction will
be committed.
If we find an error, the subtransaction will rollback and the buffer will
be replayed containing tuples.

In the patch REPLAY_BUFFER_SIZE equals 3, but it can be changed to any
other number (for example 1000).
There is an idea to create a GUC parameter for it.
Also maybe create a GUC parameter for the number of occurring WARNINGS by
rows with errors.

For CIM_MULTI and CIM_MULTI_CONDITIONAL cases the buffer is not needed.
It is needed for the CIM_SINGLE case.

Tests:

-- CIM_MULTI case
CREATE TABLE check_ign_err (n int, m int, k int);
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
1 1 1
2 2 2 2
3 3
a 4 4
5 b b

7 7 7
\.
SELECT * FROM check_ign_err;
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
 n | m | k
---+---+---
 1 | 1 | 1
 7 | 7 | 7
(2 rows)

##


-- CIM_SINGLE case
-- BEFORE row trigger
CREATE TABLE trig_test(n int, m int);
CREATE FUNCTION fn_trig_before () RETURNS TRIGGER AS '
  BEGIN
INSERT INTO trig_test VALUES(NEW.n, NEW.m);
RETURN NEW;
  END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_before BEFORE INSERT ON check_ign_err
FOR EACH ROW EXECUTE PROCEDURE fn_trig_before();
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
1 1 1
2 2 2 2
3 3
a 4 4
5 b b

7 7 7
\.
SELECT * FROM check_ign_err;
 n | m | k
---+---+---
 1 | 1 | 1
 7 | 7 | 7
(2 rows)

##


-- INSTEAD OF row trigger
CREATE VIEW check_ign_err_view AS SELECT * FROM check_ign_err;
CREATE FUNCTION fn_trig_instead_of () RETURNS TRIGGER AS '
  BEGIN
INSERT INTO check_ign_err VALUES(NEW.n, NEW.m, NEW.k);
RETURN NEW;
  END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_instead_of INSTEAD OF INSERT ON check_ign_err_view
FOR EACH ROW EXECUTE PROCEDURE fn_trig_instead_of();
COPY check_ign_err_view FROM STDIN WITH IGNORE_ERRORS;
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b

7 7 7
\.
SELECT * FROM check_ign_err_view;
 n | m | k
---+---+---
 1 | 1 | 1
 7 | 7 | 7
(2 rows)

##

-- foreign table case in postgres_fdw extension

##

-- volatile function in WHERE clause
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS
  WHERE n = floor(random()*(1-1+1))+1; /* find values equal 1 */
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b

7 7 7
\.
SELECT * FROM check_ign_err;
 n | m | k
---+---+---
 1 | 1 | 1
(1 row)

##

-- CIM_MULTI_CONDITIONAL case
-- INSERT triggers for partition tables
CREATE TABLE check_ign_err (n int, m int, k int) PARTITION BY RANGE (n);
CREATE TABLE check_ign_err_part1 PARTITION OF check_ign_err
  FOR VALUES FROM (1) TO (4);
CREATE TABLE check_ign_err_part2 PARTITION OF check_ign_err
  FOR VALUES FROM (4) TO (8);
CREATE FUNCTION fn_trig_before_part () RETURNS TRIGGER AS '
  BEGIN
INSERT INTO trig_test VALUES(NEW.n, NEW.m);
RETURN NEW;
  END;
' LANGUAGE plpgsql;
CREATE TRIGGER trig_before_part BEFORE INSERT ON check_ign_err
FOR EACH ROW EXECUTE PROCEDURE fn_trig_before_part();
COPY check_ign_err FROM STDIN WITH IGNORE_ERRORS;
WARNING:  COPY check_ign_err, line 2: "2 2 2 2"
WARNING:  COPY check_ign_err, line 3: "3 3"
WARNING:  COPY check_ign_err, line 4, column n: "a"
WARNING:  COPY check_ign_err, line 5, column m: "b"
WARNING:  COPY check_ign_err, line 6, column n: ""
SELECT * FROM check_ign_err;
1 1 1
2 2 2 2
3 3
a 4 4
5 b b

7 7 7
\.
 n | m | k
---+---+---
 1 | 1 | 1
 7 | 7 | 7
(2 rows)

Thanks for feedback.
Regards, Damir
From 6bf216

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Amit Kapila

On Tue, Jul 19, 2022 at 2:01 PM Masahiko Sawada  wrote:
>
> On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
>  wrote:
>
> >
> >
> > +   Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
> >
> > (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> > (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> > assert just requires that catchange_txns and catchange_ntxns are
> > consistent so it should be checked just after dlist_empty.. I think.
> >
>
> If we want to check if catchange_txns and catchange_ntxns are
> consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
> This function requires the caller to use rb->catchange_ntxns as the
> length of the returned array. I think this assertion ensures that the
> actual length of the array is consistent with the length we
> pre-calculated.
>

Right, so, I think it is better to keep this assertion but remove
(xcnt > 0) part as pointed out by Horiguchi-San.

-- 
With Regards,
Amit Kapila.

Re: errdetail/errhint style

2022-07-19 Thread Michael Paquier

On Tue, Jul 19, 2022 at 02:31:28PM +0200, Alvaro Herrera wrote:
> Hmm, +1, though a few of these are still missing ending periods after
> your patch.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Robert Haas

On Mon, Jul 18, 2022 at 6:43 PM Tom Lane  wrote:
> Um ... why is "the order in which the elements were chosen" a concept
> we want to expose?  ISTM sample() is a black box in which notionally
> the decisions could all be made at once.

I agree with that. But I also think it's fine for the elements to be
returned in a shuffled order rather than the original order.

> > I really think this function needs to grow an algorithm argument that can
> > be used to specify stuff like ordering, replacement/without-replacement,
> > etc...just some enums separated by commas that can be added to the call.
>
> I think you might run out of gold paint somewhere around here.  I'm
> still not totally convinced we should bother with the sample() function
> at all, let alone that it needs algorithm variants.  At some point we
> say to the user "here's a PL, write what you want for yourself".

I don't know what gold paint has to do with anything here, but I agree
that David's proposal seems to be moving the goalposts a very long
way.

The thing is, as Martin points out, these functions already exist in a
bunch of other systems. For one example I've used myself, see
https://underscorejs.org/

I probably wouldn't have called a function to put a list into a random
order "shuffle" in a vacuum, but it seems to be common nomenclature
these days. I believe that if you don't make reference to Fisher-Yates
in the documentation, they kick you out of the cool programmers club.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Memory leak fix in psql

2022-07-19 Thread Japin Li


On Tue, 19 Jul 2022 at 20:32, Michael Paquier  wrote:
> On Tue, Jul 19, 2022 at 06:41:13PM +0800, Japin Li wrote:
>> After looking around, I found psql/describe.c also has some memory leaks,
>> attached a patch to fix these leaks.
>
> Indeed.  There are quite a bit of them, so let's fix all that.  You
> have missed a couple of code paths in objectDescription().

Thanks for reviewing. Attached fix the memory leak in objectDescription().
Please consider v2 for further review.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From b2bcc3a1bac67b8b414f2025607f8dd35e096289 Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Tue, 19 Jul 2022 18:27:25 +0800
Subject: [PATCH v2 1/1] Fix the memory leak in psql describe

---
 src/bin/psql/describe.c | 168 
 1 file changed, 168 insertions(+)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0ce38e4b4c..7a070a6cd0 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -112,7 +112,10 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 4;");
 
@@ -182,7 +185,10 @@ describeAccessMethods(const char *pattern, bool verbose)
 NULL, "amname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -244,7 +250,10 @@ describeTablespaces(const char *pattern, bool verbose)
 NULL, "spcname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -534,7 +543,10 @@ describeFunctions(const char *functypes, const char *func_pattern,
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	for (int i = 0; i < num_arg_patterns; i++)
 	{
@@ -561,7 +573,10 @@ describeFunctions(const char *functypes, const char *func_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
+			{
+termPQExpBuffer(&buf);
 return false;
+			}
 		}
 		else
 		{
@@ -682,7 +697,10 @@ describeTypes(const char *pattern, bool verbose, bool showSystem)
 "pg_catalog.format_type(t.oid, NULL)",
 "pg_catalog.pg_type_is_visible(t.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -836,7 +854,10 @@ describeOperators(const char *oper_pattern,
 "n.nspname", "o.oprname", NULL,
 "pg_catalog.pg_operator_is_visible(o.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	if (num_arg_patterns == 1)
 		appendPQExpBufferStr(&buf, "  AND o.oprleft = 0\n");
@@ -866,7 +887,10 @@ describeOperators(const char *oper_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
+			{
+termPQExpBuffer(&buf);
 return false;
+			}
 		}
 		else
 		{
@@ -953,7 +977,10 @@ listAllDbs(const char *pattern, bool verbose)
 		if (!validateSQLNamePattern(&buf, pattern, false, false,
 	NULL, "d.datname", NULL, NULL,
 	NULL, 1))
+		{
+			termPQExpBuffer(&buf);
 			return false;
+		}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 	res = PSQLexec(buf.data);
@@ -1106,7 +1133,10 @@ permissionsList(const char *pattern)
 "n.nspname", "c.relname", NULL,
 "n.nspname !~ '^pg_' AND pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -1177,7 +1207,10 @@ listDefaultACLs(const char *pattern)
 "pg_catalog.pg_get_userbyid(d.defaclrole)",
 NULL,
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 3;");
 
@@ -1253,7 +1286,10 @@ objectDescription(const char *pattern, bool showSystem)
 false, "n.nspname", "pgc.conname", NULL,
 "pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	/* Domain constraint descriptions */
 	appendPQExpBuffer(&buf,
@@ -1277,7 +1313,10 @@ objectDescription(const char *pattern, bool showSystem)
 false, "n.nspname", "pgc.conname", NULL,
 "pg_catalog.pg_type_is_visible(t.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	/* Operator class descriptions */
 	appendPQExpBuffer(&buf,
@@ -1301,7 +1340,10 @@ objectDescription(const char *pattern, bool showSystem)
 "n.nspname", "o.opcname", NULL,
 "pg_catalog.pg_opclass_is_visible(o.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	/* Operator family descriptions */
 	appendPQExpBuffe

Re: Windows now has fdatasync()

2022-07-19 Thread Tom Lane

Thomas Munro  writes:
> The reason we need it for macOS is that they have had fdatasync
> function for many years now, and configure detects it, but they
> haven't ever declared it in a header, so we (accidentally?) do it in
> c.h.  We didn't set that up for Apple!  The commit that added it was
> 33cc5d8a, which was about a month before Apple shipped the first
> version of OS X (and long before they defined the function).  So there
> must have been another Unix with that problem, lost in the mists of
> time.

It might have just been paranoia, but I doubt it.  Back then we
were still dealing with lots of systems that didn't have every
function described in SUS v2.

If you poked around in the mail archives you could likely find some
associated discussion, but I'm too lazy for that ...

regards, tom lane

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Andrew Dunstan



On 2022-07-19 Tu 07:15, Martin Kalcher wrote:
> Am 18.07.22 um 23:48 schrieb Martin Kalcher:
>>
>> If we go with (1) array_shuffle() and array_sample() should shuffle
>> each element individually and always return a one-dimensional array.
>>
>>    select array_shuffle('{{1,2},{3,4},{5,6}}');
>>    ---
>>     {1,4,3,5,6,2}
>>
>>    select array_sample('{{1,2},{3,4},{5,6}}', 3);
>>    --
>>     {1,4,3}
>>
>> If we go with (2) both functions should only operate on the first
>> dimension and shuffle whole subarrays and keep the dimensions intact.
>>
>>    select array_shuffle('{{1,2},{3,4},{5,6}}');
>>    -
>>     {{3,4},{1,2},{5,6}}
>>
>>    select array_sample('{{1,2},{3,4},{5,6}}', 2);
>>    ---
>>     {{3,4},{1,2}}
>>
>
> Having thought about it, i would go with (2). It gives the user the
> ability to decide wether or not array-of-arrays behavior is desired.
> If he wants the behavior of (1) he can flatten the array before
> applying array_shuffle(). Unfortunately there is no array_flatten()
> function (at the moment) and the user would have to work around it
> with unnest() and array_agg().
>
>


Why not have an optional second parameter for array_shuffle that
indicates whether or not to flatten? e.g. array_shuffle(my_array,
flatten => true)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Robert Haas

On Tue, Jul 19, 2022 at 9:53 AM Andrew Dunstan  wrote:
> > Having thought about it, i would go with (2). It gives the user the
> > ability to decide wether or not array-of-arrays behavior is desired.
> > If he wants the behavior of (1) he can flatten the array before
> > applying array_shuffle(). Unfortunately there is no array_flatten()
> > function (at the moment) and the user would have to work around it
> > with unnest() and array_agg().
>
> Why not have an optional second parameter for array_shuffle that
> indicates whether or not to flatten? e.g. array_shuffle(my_array,
> flatten => true)

IMHO, if we think that's something many people are going to want, it
would be better to add an array_flatten() function, because that could
be used for a variety of purposes, whereas an option to this function
can only be used for this function.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila  wrote:
>
> On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada  wrote:
> >
> > On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila  wrote:
> > >
> > > On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Fri, Jul 15, 2022 at 3:32 PM shiy.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > >
> > > > I've attached patches for all supported branches including the master.
> > > >
> > >
> > > For back branch patches,
> > > * Wouldn't it be better to move purge logic into the function
> > > SnapBuildPurge* function for the sake of consistency?
> >
> > Agreed.
> >
> > > * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> > > Can't we instead have a function similar to
> > > SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> > > will avoid calling it when the snapshot
> > > state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT
> >
> > Seems a good idea. We would need to pass the information about
> > (parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
> > we can change ReorderBufferXidHasCatalogChanges() so that it checks
> > the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
> > array.
> >
>
> Let's try to keep this as much similar to the master branch patch as possible.
>
> > BTW on backbranches, I think that the reason why we add
> > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > SnapBuild that could be serialized. Can we add a (private) array for
> > the initial running xacts in snapbuild.c instead of adding new
> > variables to ReorderBuffer?
> >
>
> While thinking about this, I wonder if the current patch for back
> branches can lead to an ABI break as it changes the exposed structure?
> If so, it may be another reason to change it to some other way
> probably as you are suggesting.

Yeah, it changes the size of ReorderBuffer, which is not good.
Changing the function names and arguments would also break ABI. So
probably we cannot do the above idea of removing
ReorderBufferInitialXactsSetCatalogChanges() as well.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jul 19, 2022 at 9:53 AM Andrew Dunstan  wrote:
>> Why not have an optional second parameter for array_shuffle that
>> indicates whether or not to flatten? e.g. array_shuffle(my_array,
>> flatten => true)

> IMHO, if we think that's something many people are going to want, it
> would be better to add an array_flatten() function, because that could
> be used for a variety of purposes, whereas an option to this function
> can only be used for this function.

Agreed.  Whether it's really needed, I dunno --- I don't recall the
issue having come up before.

After taking a second look through
https://www.postgresql.org/docs/current/functions-array.html
it seems like the things that treat arrays as flat even when they
are multi-D are just

* unnest(), which is more or less forced into that position by our
type system: it has to be anyarray returning anyelement, not
anyarray returning anyelement-or-anyarray-depending.

* array_to_string(), which maybe could have done it differently but
can reasonably be considered a variant of unnest().

* The containment/overlap operators, which are kind of their own
thing anyway.  Arguably they got this wrong, though I suppose it's
too late to rethink that.

Everything else either explicitly rejects more-than-one-D arrays
or does something that is compatible with thinking of them as
arrays-of-arrays.

So I withdraw my original position.  These functions should just
shuffle or select in the array's first dimension, preserving
subarrays.  Or else be lazy and reject more-than-one-D arrays;
but it's probably not that hard to handle them.

regards, tom lane

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Alvaro Herrera

[ Redirecting thread to -hackers from -committers ]

On 2022-Jul-19, Tom Lane wrote:

> Alvaro Herrera  writes:

> > Do you just need to send a patch to add an exports.txt file to
> > src/pl/plpgsql/src/ for these functions?
> 
> The precedent of plpython says that PGDLLEXPORT markers are sufficient.
> But yeah, we need a list of exactly which functions need to be
> re-exposed.  I imagine pldebugger has its own needs.

A reasonable guess.  I went as far as downloading pldebugger and
compiling it, but it doesn't have a test suite of its own, so I couldn't
verify anything about it.  I did notice that plpgsql_check is calling
function load_external_function(), and that doesn't appear in
pldebugger.  I wonder if the find_rendezvous_variable business is at
play.

Anyway, the minimal patch that makes plpgsql_check tests pass is
attached.  This seems a bit random.  Maybe it'd be better to have a
plpgsql_internal.h with functions that are exported only for plpgsql
itself, and keep plpgsql.h with a set of functions, all marked
PGDLLEXPORT, that are for external use.


... oh, and:

$ postmaster -c shared_preload_libraries=plugin_debugger
2022-07-19 16:27:24.006 CEST [742142] FATAL:  cannot request additional shared 
memory outside shmem_request_hook

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
>From e0759245de3e0fcad7a6b2c3e9a637d3bdffe2a4 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 19 Jul 2022 16:19:03 +0200
Subject: [PATCH] add PGDLLEXPORTS to plpgsql

for plpgsql_check benefit
---
 src/pl/plpgsql/src/plpgsql.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 1914160272..20dd5ec060 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -1231,10 +1231,10 @@ extern PLpgSQL_plugin **plpgsql_plugin_ptr;
 /*
  * Functions in pl_comp.c
  */
-extern PLpgSQL_function *plpgsql_compile(FunctionCallInfo fcinfo,
+extern PGDLLEXPORT PLpgSQL_function *plpgsql_compile(FunctionCallInfo fcinfo,
 		 bool forValidator);
 extern PLpgSQL_function *plpgsql_compile_inline(char *proc_source);
-extern void plpgsql_parser_setup(struct ParseState *pstate,
+extern PGDLLEXPORT void plpgsql_parser_setup(struct ParseState *pstate,
  PLpgSQL_expr *expr);
 extern bool plpgsql_parse_word(char *word1, const char *yytxt, bool lookup,
 			   PLwdatum *wdatum, PLword *word);
@@ -1246,7 +1246,7 @@ extern PLpgSQL_type *plpgsql_parse_wordtype(char *ident);
 extern PLpgSQL_type *plpgsql_parse_cwordtype(List *idents);
 extern PLpgSQL_type *plpgsql_parse_wordrowtype(char *ident);
 extern PLpgSQL_type *plpgsql_parse_cwordrowtype(List *idents);
-extern PLpgSQL_type *plpgsql_build_datatype(Oid typeOid, int32 typmod,
+extern PGDLLEXPORT PLpgSQL_type *plpgsql_build_datatype(Oid typeOid, int32 typmod,
 			Oid collation,
 			TypeName *origtypname);
 extern PLpgSQL_variable *plpgsql_build_variable(const char *refname, int lineno,
@@ -1257,7 +1257,7 @@ extern PLpgSQL_rec *plpgsql_build_record(const char *refname, int lineno,
 		 bool add2namespace);
 extern PLpgSQL_recfield *plpgsql_build_recfield(PLpgSQL_rec *rec,
 const char *fldname);
-extern int	plpgsql_recognize_err_condition(const char *condname,
+extern PGDLLEXPORT int	plpgsql_recognize_err_condition(const char *condname,
 			bool allow_sqlstate);
 extern PLpgSQL_condition *plpgsql_parse_err_condition(char *condname);
 extern void plpgsql_adddatum(PLpgSQL_datum *newdatum);
@@ -1280,7 +1280,7 @@ extern void plpgsql_exec_event_trigger(PLpgSQL_function *func,
 extern void plpgsql_xact_cb(XactEvent event, void *arg);
 extern void plpgsql_subxact_cb(SubXactEvent event, SubTransactionId mySubid,
 			   SubTransactionId parentSubid, void *arg);
-extern Oid	plpgsql_exec_get_datum_type(PLpgSQL_execstate *estate,
+extern PGDLLEXPORT Oid	plpgsql_exec_get_datum_type(PLpgSQL_execstate *estate,
 		PLpgSQL_datum *datum);
 extern void plpgsql_exec_get_datum_type_info(PLpgSQL_execstate *estate,
 			 PLpgSQL_datum *datum,
@@ -1296,7 +1296,7 @@ extern void plpgsql_ns_push(const char *label,
 extern void plpgsql_ns_pop(void);
 extern PLpgSQL_nsitem *plpgsql_ns_top(void);
 extern void plpgsql_ns_additem(PLpgSQL_nsitem_type itemtype, int itemno, const char *name);
-extern PLpgSQL_nsitem *plpgsql_ns_lookup(PLpgSQL_nsitem *ns_cur, bool localmode,
+extern PGDLLEXPORT PLpgSQL_nsitem *plpgsql_ns_lookup(PLpgSQL_nsitem *ns_cur, bool localmode,
 		 const char *name1, const char *name2,
 		 const char *name3, int *names_used);
 extern PLpgSQL_nsitem *plpgsql_ns_lookup_label(PLpgSQL_nsitem *ns_cur,
@@ -1306,7 +1306,7 @@ extern PLpgSQL_nsitem *plpgsql_ns_find_nearest_loop(PLpgSQL_nsitem *ns_cur);
 /*
  * Other functions in pl_funcs.c
  */
-extern const char *plpgsql_stmt_typename(PLpgSQL_stmt *stmt);
+extern PGDLLEXPORT const char *plpgsql_stmt_typename(PLpgSQL_stmt *s

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Julien Rouhaud

On Tue, Jul 19, 2022 at 04:28:07PM +0200, Alvaro Herrera wrote:
> ... oh, and:
>
> $ postmaster -c shared_preload_libraries=plugin_debugger
> 2022-07-19 16:27:24.006 CEST [742142] FATAL:  cannot request additional 
> shared memory outside shmem_request_hook

This has been reported multiple times (including on one of my own projects),
see
https://www.postgresql.org/message-id/flat/81f82c00-8818-91f3-96fa-47976f94662b%40pm.me
for the last report.

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Pavel Stehule

Hi

út 19. 7. 2022 v 16:28 odesílatel Alvaro Herrera 
napsal:

> [ Redirecting thread to -hackers from -committers ]
>
> On 2022-Jul-19, Tom Lane wrote:
>
> > Alvaro Herrera  writes:
>
> > > Do you just need to send a patch to add an exports.txt file to
> > > src/pl/plpgsql/src/ for these functions?
> >
> > The precedent of plpython says that PGDLLEXPORT markers are sufficient.
> > But yeah, we need a list of exactly which functions need to be
> > re-exposed.  I imagine pldebugger has its own needs.
>
> A reasonable guess.  I went as far as downloading pldebugger and
> compiling it, but it doesn't have a test suite of its own, so I couldn't
> verify anything about it.  I did notice that plpgsql_check is calling
> function load_external_function(), and that doesn't appear in
> pldebugger.  I wonder if the find_rendezvous_variable business is at
> play.
>
> Anyway, the minimal patch that makes plpgsql_check tests pass is
> attached.  This seems a bit random.  Maybe it'd be better to have a
> plpgsql_internal.h with functions that are exported only for plpgsql
> itself, and keep plpgsql.h with a set of functions, all marked
> PGDLLEXPORT, that are for external use.
>
>
I can confirm that the attached patch fixes plpgsql_check.

Thank you

Pavel




>
> ... oh, and:
>
> $ postmaster -c shared_preload_libraries=plugin_debugger
> 2022-07-19 16:27:24.006 CEST [742142] FATAL:  cannot request additional
> shared memory outside shmem_request_hook
>
> --
> Álvaro HerreraBreisgau, Deutschland  —
> https://www.EnterpriseDB.com/
>

Re: Costing elided SubqueryScans more nearly correctly

2022-07-19 Thread Tom Lane

Alvaro Herrera  writes:
> Thanks, pushed.

Pushed the original patch now too.

regards, tom lane

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 16:28:07 +0200, Alvaro Herrera wrote:
> Anyway, the minimal patch that makes plpgsql_check tests pass is
> attached.  This seems a bit random.  Maybe it'd be better to have a
> plpgsql_internal.h with functions that are exported only for plpgsql
> itself, and keep plpgsql.h with a set of functions, all marked
> PGDLLEXPORT, that are for external use.

It does seem a bit random. But I think we probably should err on the side of
adding more declarations, rather than the oposite.

I like the plpgsql_internal.h idea, but probably done separately?

Greetings,

Andres Freund

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 20:40:11 +0900, Amit Langote wrote:
> About that, I was wondering if the blocks in llvm_compile_expr() need
> to be hand-coded to match what's added in ExecInterpExpr() or if I've
> missed some tool that can be used instead?

The easiest way is to just call an external function for the implementation of
the step. But yes, otherwise you need to handcraft it.

Greetings,

Andres Freund

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Pavel Stehule

Hi

út 19. 7. 2022 v 17:31 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2022-07-19 16:28:07 +0200, Alvaro Herrera wrote:
> > Anyway, the minimal patch that makes plpgsql_check tests pass is
> > attached.  This seems a bit random.  Maybe it'd be better to have a
> > plpgsql_internal.h with functions that are exported only for plpgsql
> > itself, and keep plpgsql.h with a set of functions, all marked
> > PGDLLEXPORT, that are for external use.
>
> It does seem a bit random. But I think we probably should err on the side
> of
> adding more declarations, rather than the oposite.
>

This list can be extended. I think plpgsql_check is maybe one extension
that uses code from another extension directly. This is really not common
usage.


>
> I like the plpgsql_internal.h idea, but probably done separately?
>

can be

I have not any problem with it or with exports.txt file.



> Greetings,
>
> Andres Freund
>

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 17:37:11 +0200, Pavel Stehule wrote:
> út 19. 7. 2022 v 17:31 odesílatel Andres Freund  napsal:
> 
> > Hi,
> >
> > On 2022-07-19 16:28:07 +0200, Alvaro Herrera wrote:
> > > Anyway, the minimal patch that makes plpgsql_check tests pass is
> > > attached.  This seems a bit random.  Maybe it'd be better to have a
> > > plpgsql_internal.h with functions that are exported only for plpgsql
> > > itself, and keep plpgsql.h with a set of functions, all marked
> > > PGDLLEXPORT, that are for external use.
> >
> > It does seem a bit random. But I think we probably should err on the side
> > of
> > adding more declarations, rather than the oposite.
> >
> 
> This list can be extended. I think plpgsql_check is maybe one extension
> that uses code from another extension directly. This is really not common
> usage.

There's a few more use cases, e.g. transform modules. Hence exposing e.g. many
plpython helpers.


> I have not any problem with it or with exports.txt file.

Just to be clear, there shouldn't be any use exports.txt here, just a few
PGDLLEXPORTs.

Greetings,

Andres Freund

Autovacuum worker spawning strategy

2022-07-19 Thread Rafael Thofehrn Castro

Hello all,

While investigating a problem in a PG14 instance I noticed that autovacuum
workers
stop processing other databases when a database has a temporary table with
age
older than `autovacuum_freeze_max_age`. To test that I added a custom
logline showing
which database the about to spawned autovacuum worker will target. Here are
the details:

```
test=# select oid,datname from pg_database;
  oid  |  datname
---+---
 13757 | postgres
 32850 | test
 1 | template1
 13756 | template0
(4 rows)
```

Here are the loglines under normal circumstances:

```
2022-07-19 11:24:29.406 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
2022-07-19 11:24:44.406 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:24:59.406 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:25:14.406 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:25:29.417 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
2022-07-19 11:25:44.417 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:25:59.418 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:26:14.417 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:26:29.429 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
2022-07-19 11:26:44.430 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:26:59.432 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:27:14.429 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:27:29.442 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
2022-07-19 11:27:44.441 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:27:59.446 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:28:14.442 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:28:29.454 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
2022-07-19 11:28:44.454 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:28:59.458 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:29:14.443 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:29:29.465 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=32850)
2022-07-19 11:29:44.485 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=1)
2022-07-19 11:29:59.499 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13756)
```

But when I create a temp table and make it older than
`autovacuum_freeze_max_age`
I get this:

```
2022-07-19 11:30:14.496 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:30:29.495 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:30:44.507 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:30:59.522 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:31:14.536 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:31:29.551 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:31:44.565 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:31:59.579 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:32:14.591 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:32:29.606 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:32:44.619 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:32:59.631 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:33:14.643 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:33:29.655 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:33:44.667 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:33:59.679 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:34:14.694 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:34:29.707 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:34:44.719 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:34:59.732 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
2022-07-19 11:35:14.743 -03 [18627] WARNING:  AUTOVACUUM WORKER SPAWNED
(db=13757)
```

This is actually expected behavior judging by the code logic:
https://github.com/postgres/postgres/blob/master/src/backend/postmaster/autovacuum.c#L1201

PG prioritizes databases that need to be frozen and since a temporary table
can't
be frozen by a process other than the session that created it, that DB will
remain
a priority until the table is dropped.

I acknowledge that having a temp table existing for long enough to reach
`autovacuum_freeze_max_age`
is a problem itself as the table will never be frozen and if age reaches 2
billion
the instance will shut down. That being said, perhaps there is room for
improvement
in the AV worker spawning strategy to avoid leaving other DB

Re: Convert planner's AggInfo and AggTransInfo to Nodes

2022-07-19 Thread Peter Eisentraut


On 18.07.22 18:08, Tom Lane wrote:

I'm kind of tempted to mount an effort to get rid of as many of
pathnodes.h's "read_write_ignore" annotations as possible.  Some are
necessary to prevent infinite recursion, and others represent considered
judgments that they'd bloat node dumps more than they're worth --- but
I think quite a lot of them arose from plain laziness about updating
outfuncs.c.  With the infrastructure we have now, that's no longer
a good reason.


That was my impression as well, and I agree it would be good to sort 
that out.

Re: NAMEDATALEN increase because of non-latin languages

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 14:30:34 +0700, John Naylor wrote:
> I wrote:
> 
> > On Mon, Jul 18, 2022 at 9:58 AM Andres Freund  wrote:
> > > Hm. Wouldn't it make sense to just use the normal tuple deforming
> routines and
> > > then map the results to the structs?
> >
> > I wasn't sure if they'd be suitable for this, but if they are, that'd
> make this easier and more maintainable. I'll look into it.
> 
> This would seem to have its own problems: heap_deform_tuple writes to
> passed arrays of datums and bools. The lower level parts like fetchatt and
> nocachegetattr return datums, so still need some generated boilerplate.
> Some of these also assume they can write cached offsets on a passed tuple
> descriptor.

Sure. But that'll just be a form of conversion we do all over, rather than
encoding low-level data layout details. Basically
struct->member1 = DatumGetInt32(values[0]);
struct->member2 = DatumGetChar(values[1]);

etc.

> I'm thinking where the first few attributes are fixed length, not null, and
> (because of AIX) not double-aligned, we can do a single memcpy on multiple
> columns at once. That will still be a common pattern after namedata is
> varlen. Otherwise, use helper functions/macros similar to the above but
> instead of passing a tuple descriptor, use info we have at compile time.

I think that might be over-optimizing things. I don't think we do these
conversions at a rate that's high enough to warrant it - the common stuff
should be in relcache etc.  It's possible that we might want to optimize the
catcache case specifically - but that'd be more optimizing memory usage than
"conversion" imo.

Greetings,

Andres Freund

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Jacob Champion

On Fri, Jul 15, 2022 at 4:45 PM Andres Freund  wrote:
> On 2022-07-15 14:51:38 -0700, Jacob Champion wrote:
> > That seems much worse than escaping for this particular patch; if your
> > cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
> > "CN=?" in the log lines that were supposed to be helping you
> > root-cause. Escaping would be much more helpful in this case.
>
> I'm doubtful that's all that common.

Probably not, but the more systems that support it without weird
usability bugs, the more common it will hopefully become.

> But either way, I suggest a separate patch to deal with that...

Proposed fix attached, which uses \x-escaping for bytes outside of
printable ASCII.

Thanks,
--Jacob
From 3c8e910a75c74f85b640f7d728205c276b1d1c51 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 18 Jul 2022 15:01:19 -0700
Subject: [PATCH] Don't reflect unescaped cert data to the logs

Commit 3a0e385048 introduced a new path for unauthenticated bytes from
the client certificate to be printed unescaped to the logs. There are a
handful of these already, but it doesn't make sense to keep making the
problem worse. \x-escape any unprintable bytes.

The test case introduces a revoked UTF-8 certificate. This requires the
addition of the `-utf8` flag to `openssl req`. Since the existing
certificates all use an ASCII subset, this won't modify the existing
certificates' subjects if/when they get regenerated; this was verified
experimentally with

$ make sslfiles-clean
$ make sslfiles

Unfortunately the test can't be run as-is yet due to a test timing
issue; see 55828a6b60.
---
 src/backend/libpq/be-secure-openssl.c | 74 ---
 src/test/ssl/conf/client-revoked-utf8.config  | 13 
 src/test/ssl/ssl/client-crldir/9bb9e3c3.r0| 19 ++---
 src/test/ssl/ssl/client-revoked-utf8.crt  | 18 +
 src/test/ssl/ssl/client-revoked-utf8.key  | 27 +++
 src/test/ssl/ssl/client.crl   | 19 ++---
 .../ssl/ssl/root+client-crldir/9bb9e3c3.r0| 19 ++---
 src/test/ssl/ssl/root+client.crl  | 19 ++---
 src/test/ssl/sslfiles.mk  | 10 ++-
 src/test/ssl/t/001_ssltests.pl| 13 
 src/test/ssl/t/SSL/Backend/OpenSSL.pm |  3 +-
 11 files changed, 167 insertions(+), 67 deletions(-)
 create mode 100644 src/test/ssl/conf/client-revoked-utf8.config
 create mode 100644 src/test/ssl/ssl/client-revoked-utf8.crt
 create mode 100644 src/test/ssl/ssl/client-revoked-utf8.key

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 9cec6866a3..ad0d79a511 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -35,6 +35,7 @@
 #include "storage/fd.h"
 #include "storage/latch.h"
 #include "tcop/tcopprot.h"
+#include "utils/builtins.h"
 #include "utils/memutils.h"
 
 /*
@@ -1082,16 +1083,17 @@ dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata)
 }
 
 /*
- * Examines the provided certificate name, and if it's too long to log, modifies
- * and truncates it. The return value is NULL if no truncation was needed; it
- * otherwise points into the middle of the input string, and should not be
- * freed.
+ * Examines the provided certificate name, and if it's too long to log or
+ * contains unprintable ASCII, escapes and truncates it. The return value is
+ * always a new palloc'd string.
  */
 static char *
-truncate_cert_name(char *name)
+prepare_cert_name(char *name)
 {
 	size_t		namelen = strlen(name);
-	char	   *truncated;
+	char	   *truncated = name;
+	char	   *escaped;
+	int			i;
 
 	/*
 	 * Common Names are 64 chars max, so for a common case where the CN is the
@@ -1101,19 +1103,37 @@ truncate_cert_name(char *name)
 	 */
 #define MAXLEN 71
 
-	if (namelen <= MAXLEN)
-		return NULL;
-
-	/*
-	 * Keep the end of the name, not the beginning, since the most specific
-	 * field is likely to give users the most information.
-	 */
-	truncated = name + namelen - MAXLEN;
-	truncated[0] = truncated[1] = truncated[2] = '.';
+	if (namelen > MAXLEN)
+	{
+		/*
+		 * Keep the end of the name, not the beginning, since the most specific
+		 * field is likely to give users the most information.
+		 */
+		truncated = name + namelen - MAXLEN;
+		truncated[0] = truncated[1] = truncated[2] = '.';
+		namelen = MAXLEN;
+	}
 
 #undef MAXLEN
 
-	return truncated;
+	escaped = palloc(namelen * 4 + 1); /* one byte becomes four: "\xCC" */
+	i = 0;
+
+	for (char *c = truncated; *c; c++)
+	{
+		/* Keep printable characters including space. Escape everything else. */
+		if (*c >= 0x20 && *c <= 0x7E)
+			escaped[i++] = *c;
+		else
+		{
+			escaped[i++] = '\\';
+			escaped[i++] = 'x';
+			i += hex_encode(c, 1, &escaped[i]);
+		}
+	}
+
+	escaped[i] = '\0';
+	return escaped;
 }
 
 /*
@@ -1156,21 +1176,24 @@ verify_cb(int ok, X509_STORE_CTX *ctx)
 	{
 		char	   *subject,
    *issuer;
-		char	   *sub_truncated,
-   *iss_truncated;
+		char	   *sub_prepar

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 09:07:31 -0700, Jacob Champion wrote:
> On Fri, Jul 15, 2022 at 4:45 PM Andres Freund  wrote:
> > On 2022-07-15 14:51:38 -0700, Jacob Champion wrote:
> > > That seems much worse than escaping for this particular patch; if your
> > > cert's Common Name is in (non-ASCII) UTF-8 then all you'll see is
> > > "CN=?" in the log lines that were supposed to be helping you
> > > root-cause. Escaping would be much more helpful in this case.
> >
> > I'm doubtful that's all that common.
> 
> Probably not, but the more systems that support it without weird
> usability bugs, the more common it will hopefully become.
> 
> > But either way, I suggest a separate patch to deal with that...
> 
> Proposed fix attached, which uses \x-escaping for bytes outside of
> printable ASCII.

I don't think this should be open coded in the ssl part of the code. IMO this
should replace the existing ascii escape function instead. I strongly oppose
open coding this functionality in prepare_cert_name().

Greetings,

Andres Freund

Re: Memory leak fix in psql

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 21:08:53 +0800, Japin Li wrote:
> From b2bcc3a1bac67b8b414f2025607f8dd35e096289 Mon Sep 17 00:00:00 2001
> From: Japin Li 
> Date: Tue, 19 Jul 2022 18:27:25 +0800
> Subject: [PATCH v2 1/1] Fix the memory leak in psql describe
> 
> ---
>  src/bin/psql/describe.c | 168 
>  1 file changed, 168 insertions(+)
> 
> diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
> index 0ce38e4b4c..7a070a6cd0 100644
> --- a/src/bin/psql/describe.c
> +++ b/src/bin/psql/describe.c
> @@ -112,7 +112,10 @@ describeAggregates(const char *pattern, bool verbose, 
> bool showSystem)
>   "n.nspname", 
> "p.proname", NULL,
>   
> "pg_catalog.pg_function_is_visible(p.oid)",
>   NULL, 3))
> + {
> + termPQExpBuffer(&buf);
>   return false;
> + }
>  
>   appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 4;");

Adding copy over copy of this same block doesn't seem great. Can we instead
add a helper for it or such?

Greetings,

Andres Freund

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Jacob Champion

[resending to list]

On 7/19/22 09:14, Andres Freund wrote:
> IMO this should replace the existing ascii escape function instead.
That will affect the existing behavior of application_name and
cluster_name; is that acceptable?

--Jacob

Re: Convert planner's AggInfo and AggTransInfo to Nodes

2022-07-19 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> It seems like a reasonable idea, but I don't know enough to judge the
> wider ramifications of it.  But one thing that the patch should also do,
> is switch to using the l*_node() functions instead of manual casting.

Hm, I didn't bother with that on the grounds that there's no question
what should be in those two lists.  But I guess it's not much extra
code, so pushed that way.

regards, tom lane

Re: Memory leak fix in psql

2022-07-19 Thread Tom Lane

Andres Freund  writes:
> On 2022-07-19 21:08:53 +0800, Japin Li wrote:
>> +{
>> +termPQExpBuffer(&buf);
>>  return false;
>> +}

> Adding copy over copy of this same block doesn't seem great. Can we instead
> add a helper for it or such?

The usual style in these files is something like

if (bad things happened)
goto fail;

...

fail:
termPQExpBuffer(&buf);
return false;

Yeah, it's old school, but please let's not have a few functions that
do it randomly differently from all their neighbors.

regards, tom lane

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Tom Lane

Jacob Champion  writes:
> On 7/19/22 09:14, Andres Freund wrote:
>> IMO this should replace the existing ascii escape function instead.

> That will affect the existing behavior of application_name and
> cluster_name; is that acceptable?

I think Andres' point is exactly that these should all act alike.

Having said that, I struggle to see why we are panicking about badly
encoded log data from this source while blithely ignoring the problems
posed by non-ASCII role names, database names, and tablespace names.

regards, tom lane

Re: Memory leak fix in psql

2022-07-19 Thread Mark Dilger

> On Jul 19, 2022, at 2:02 AM, tanghy.f...@fujitsu.com wrote:
> 
> I think there is a newly introduced memory leak in your patch d2d3547.

I agree.  Thanks for noticing, and for the patch!

> Try to fix it in the attached patch. 
> Kindly to have a check.

This looks ok, but comments down-thread seem reasonable, so I suspect a new 
patch will be needed.  Would you like to author it, or would you prefer that I, 
as the guilty party, do so?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autovacuum worker spawning strategy

2022-07-19 Thread Nathan Bossart

On Tue, Jul 19, 2022 at 12:40:06PM -0300, Rafael Thofehrn Castro wrote:
> PG prioritizes databases that need to be frozen and since a temporary table
> can't
> be frozen by a process other than the session that created it, that DB will
> remain
> a priority until the table is dropped.
> 
> I acknowledge that having a temp table existing for long enough to reach
> `autovacuum_freeze_max_age`
> is a problem itself as the table will never be frozen and if age reaches 2
> billion
> the instance will shut down. That being said, perhaps there is room for
> improvement
> in the AV worker spawning strategy to avoid leaving other DBs in the dark.
> 
> This database where I spotted the problem is from a customer that consumes
> 100m xacts/hour
> and makes extensive uses of temp tables to load data, so that scenario can
> actually
> happen.

I wonder if it's worth tracking a ѕeparate datfrozenxid that does not
include stuff that is beyond autovacuum's control, like temporary tables.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Allow placeholders in ALTER ROLE w/o superuser

2022-07-19 Thread Nathan Bossart

On Tue, Jul 19, 2022 at 12:55:14AM -0500, Steve Chavez wrote:
> Taking your options into consideration, for me the correct behaviour should
> be:
> 
> - The ALTER ROLE placeholder should always be stored with a PGC_USERSET
> GucContext. It's a placeholder anyway, so it should be the less restrictive
> one. If the user wants to define it as PGC_SUSET or other this should be
> done through a custom extension.
> - When an extension claims the placeholder, we should check the
> DefineCustomXXXVariable GucContext with PGC_USERSET. If there's a match,
> then the value gets applied, otherwise WARN or ERR.
>   The role GUCs get applied at login time right? So at this point we can
> WARN or ERR about the defined role GUCs.
> 
> What do you think?

Hm.  I would expect ALTER ROLE to store the PGC_SUSET context when executed
by a superuser or a role with privileges via pg_parameter_acl.  Storing all
placeholder GUC settings as PGC_USERSET would make things more restrictive
than they are today.  For example, it would no longer be possible to apply
any ALTER ROLE settings from superusers for placeholders that later become
custom GUCS.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

StrategyAM for IndexAMs

2022-07-19 Thread Simon Riggs

I'm preparing the way for a later patch that would allow unique hash
indexes to be primary keys. There are various parts to the problem. I
was surprised at how many times we hardcode BTREE_AM_OID and
associated BT Strategy Numbers in many parts of the code, so have been
looking for ways to reconcile how Hash and Btree strategies work and
potentially remove hardcoding. There are various comments that say we
need a way to be able to define which Strategy Numbers are used by
indexams.

I came up with a rather simple way: the indexam just says "I'm like a
btree", which allows you to avoid adding hundreds of operator classes
for the new index, since that is cumbersome and insecure.

Specifically, we add a "strategyam" field to the IndexAmRoutine that
allows an indexam to declare whether it is like a btree, like a hash
index or another am. This then allows us to KEEP the hardcoded
BTREE_AM_OID tests, but point them at the strategyam rather than the
relam, which can be cached in various places as needed. No catalog
changes needed.

I've coded this up and it works fine.

The attached patch is still incomplete because we use this in a few
places and they all need to be checked. So before I do that, it seems
sensible to agree the approach.

(Obviously, there are hundreds of places where BTEqualStrategyNumber
is hardcoded, and this doesn't change that at all, in case that wasn't
clear).

Comments welcome on this still WIP patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


strategyam.v1.patch
Description: Binary data

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-07-19 Thread Nathan Bossart

On Tue, Jul 19, 2022 at 02:43:59PM +0530, Bharath Rupireddy wrote:
> Done. PSA v5 patch set.

LGTM.  I've marked this as ready-for-committer.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 12:39:43 -0400, Tom Lane wrote:
> Having said that, I struggle to see why we are panicking about badly
> encoded log data from this source while blithely ignoring the problems
> posed by non-ASCII role names, database names, and tablespace names.

I think we should fix these as well. I'm not as concerned about post-auth
encoding issues (i.e. tablespace name) as about pre-auth data (role name,
database name) - obviously being allowed to log in already is a pretty good
filter...

Greetings,

Andres Freund

Re: First draft of the PG 15 release notes

2022-07-19 Thread Bruce Momjian

On Mon, Jul 18, 2022 at 08:23:23PM -0500, Justin Pryzby wrote:
> > Increase hash_mem_multiplier default to 2.0 (Peter Geoghegan)
> > This allows query hash operations to use double the amount of work_mem 
> > memory as other operations.
> 
> I wonder if it's worth pointing out that a query may end up using not just 2x
> more memory (since work_mem*hash_mem_multiplier is per node), but 2**N more,
> for N nodes.

Uh, I said "per operation" so people might realize there can be multiple
work_mem memory operations per query.  I don't think I can do more in
this text.  I can't think of a way to improve it without making it more
confusing.

> > Remove pg_dump's --no-synchronized-snapshots option since all supported 
> > server versions support synchronized snapshots (Tom Lane)
> 
> It'd be better to put that after the note about dropping support for upgrading
> clusters older than v9.2 in psql/pg_dump/pg_upgrade.

Well, I put the --no-synchronized-snapshots item in incompatibilities
since it is a user-visible change that might require script adjustments.
However, I put the limit of pg_dump to 9.2 and greater into the pg_dump
section.  Are you suggesting I move the--no-synchronized-snapshots item
down there?  That doesn't match with the way I have listed other
incompatibilities so I am resistant to do that.

> > Enable system and TOAST btree indexes to efficiently store duplicates 
> > (Peter Geoghegan)
> 
> Say "btree indexes on system [and TOAST] tables"

Okay, updated to:

Allow btree indexes on system and TOAST tables to efficiently
store duplicates (Peter Geoghegan)

> > Prevent changes to columns only indexed by BRIN indexes from disabling HOT 
> > updates (Josef Simanek)
> 
> This was reverted

Ah, yes, removed.

> > Generate periodic log message during slow server starts (Nitin Jadhav, 
> > Robert Haas)
> messages plural
> 
> > Messages report the cause of the delay. The time interval for notification 
> > is controlled by the new server variable log_startup_progress_interval.
> *The messages

Ah, yes, fixed.

> > Add server variable shared_memory_size to report the size of allocated 
> > shared memory (Nathan Bossart)
> > Add server variable shared_memory_size_in_huge_pages to report the number 
> > of huge memory pages required (Nathan Bossart)
> 
> Maybe these should say server *parameter* since they're not really "variable".

Uh, I think of parameters as something passed.  We do call them server
variables, or read-only server variables.  I can add "read-only" but it
seems odd.

> > 0. Add support for LZ4 and Zstandard compression of server-side base 
> > backups (Jeevan Ladhe, Robert Haas)
> > 1. Allow pg_basebackup to use LZ4 and Zstandard compression on server-side 
> > base backup files (Dipesh Pandit, Jeevan Ladhe)
> > 2. Allow pg_basebackup's --compress option to control the compression 
> > method and options (Michael Paquier, Robert Haas)
> >New options include server-gzip (gzip on the server), client-gzip (same 
> > as gzip).
> > 3. Allow pg_basebackup to compress on the server side and decompress on the 
> > client side before storage (Dipesh Pandit)
> >This is accomplished by specifying compression on the server side and 
> > plain output format.
> 
> I still think these expose the incremental development rather than the
> user-facing change.

Well, they are in different parts of the system, though they are clearly
all related.  I am afraid merging them would be even more confusing.

> 1. It seems wrong to say "server-side" since client-side compression with
> LZ4/zstd is also supported.

Agreed.  I changed it to:

   Allow pg_basebackup to do LZ4 and Zstandard server-side compression
   on base backup files (Dipesh Pandit, Jeevan Ladhe)

> 2. It's confusing to say that the new options are server-gzip and client-gzip,
> since it just mentioned new algorithms;

I see your point since there will be new options for LZ4 and Zstandard
too, so I just removed that paragraph.

> 3. I'm not sure this needs to be mentioned at all; maybe it should be a
> "detail" following the item about server-side compression.

See my concerns above --- it seems too complex to merge into something
else.  However, I am open to an entire rewrite of these items.

> > Tables added to the listed schemas in the future will also be replicated.
> 
> "Tables later added" is clearer.  Otherwise "in the future" sounds like maybe
> in v16 or v17 we'll start replicating those tables.

Agreed, new wording:

Tables added later to the listed schemas will also be replicated.

> > Allow subscribers to stop logical replication application on error (Osumi 
> > Takamichi, Mark Dilger)
> "application" sounds off.

Agreed.  New text is:

Allow subscribers to stop the application of logical replication
changes on error

> > Add new default WAL-logged method for database creation (Dilip Kumar)
> "New default" sounds off.  Say "Add new WAL-logged method for database 
> creation, used by d

Re: System catalog documentation chapter

2022-07-19 Thread Bruce Momjian

On Mon, Jul 18, 2022 at 09:22:24PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Sat, Jul 16, 2022 at 10:53:17AM +0200, Peter Eisentraut wrote:
> >> Maybe this whole notion that "system views" is one thing is not suitable.
> 
> > Are you thinking we should just call the chapter "System Catalogs and
> > Views" and just place them alphabetically in a single chapter?
> 
> I didn't think that was Peter's argument at all.  He's complaining
> that "system views" isn't a monolithic category, which is a reasonable
> point, especially since we have a bunch of built-in views that appear
> in other chapters.  But to then also confuse them with catalogs isn't
> improving the situation.

I think I see now --- system _tables_ are really not for user consumption
but system views often are.  I am thinking the best approach is to move
most of the system views out of the system views section and into the
sections where they make sense.

> The views that are actually reinterpretations of catalog contents should
> probably be documented near the catalogs.  But a lot of stuff in that
> chapter is no such thing.  For example, it's really unclear why

Right.

> pg_backend_memory_contexts is documented here and not somewhere near
> the stats views.  We also have stuff like pg_available_extensions,

Right.

> pg_file_settings, and pg_timezone_names, which are reporting ground truth
> of some sort that didn't come from the catalogs.  I'm not sure if those
> belong near the catalogs or not.

I am thinking some of those need to be in the Server Configuration
chapter.

> The larger point, perhaps, is that this whole area is underneath
> "Part VII: Internals", and that being the case what you would expect
> to find here is stuff that we don't intend people to interact with
> in day-to-day usage.  Most of the "system views" are specifically
> intended for day-to-day use, maybe only by DBAs, but nonetheless they
> are user-facing in a way that the catalogs aren't.
> 
> Maybe we should move them all to Part IV, in a chapter or chapters
> adjacent to the Information Schema chapter.  Or maybe try to separate
> "user" views from "DBA" views, and put user views in Part IV while
> DBA views go into a new chapter in Part III, near the stats views.

I am going to look at moving system views that make sense into the
chapters where their contents are mentioned.  I don't think having a
central list of views is really helping us because we expect the views
to be used in ways the system catalogs would not be.

I will develop a proposed patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: First draft of the PG 15 release notes

2022-07-19 Thread Justin Pryzby

On Tue, Jul 19, 2022 at 01:24:30PM -0400, Bruce Momjian wrote:
> > > Remove pg_dump's --no-synchronized-snapshots option since all supported 
> > > server versions support synchronized snapshots (Tom Lane)
> > 
> > It'd be better to put that after the note about dropping support for 
> > upgrading
> > clusters older than v9.2 in psql/pg_dump/pg_upgrade.
> 
> Well, I put the --no-synchronized-snapshots item in incompatibilities
> since it is a user-visible change that might require script adjustments.
> However, I put the limit of pg_dump to 9.2 and greater into the pg_dump
> section.  Are you suggesting I move the--no-synchronized-snapshots item
> down there?  That doesn't match with the way I have listed other
> incompatibilities so I am resistant to do that.

I'd rather see the "limit support to v9.2" be moved or added to the
"incompatibilities" section, maybe with "remove --no-synchronized-snapshots"
as a secondary sentence.

> > > 0. Add support for LZ4 and Zstandard compression of server-side base 
> > > backups (Jeevan Ladhe, Robert Haas)
> > > 1. Allow pg_basebackup to use LZ4 and Zstandard compression on 
> > > server-side base backup files (Dipesh Pandit, Jeevan Ladhe)
> > > 2. Allow pg_basebackup's --compress option to control the compression 
> > > method and options (Michael Paquier, Robert Haas)
> > >New options include server-gzip (gzip on the server), client-gzip 
> > > (same as gzip).
> > > 3. Allow pg_basebackup to compress on the server side and decompress on 
> > > the client side before storage (Dipesh Pandit)
> > >This is accomplished by specifying compression on the server side and 
> > > plain output format.
> > 
> > I still think these expose the incremental development rather than the
> > user-facing change.
> 
> > 1. It seems wrong to say "server-side" since client-side compression with
> > LZ4/zstd is also supported.
> 
> Agreed.  I changed it to:
> 
>Allow pg_basebackup to do LZ4 and Zstandard server-side compression
>on base backup files (Dipesh Pandit, Jeevan Ladhe)

This still misses the point that those compression algs are also supported on
the client side, so it seems misleading to mention "server-side" support.

> > > Allow custom scan provders to indicate if they support projections (Sven 
> > > Klemm)
> > > The default is now that custom scan providers can't support projections, 
> > > so they need to be updated for this release.
> > 
> > Per the commit message, they don't "need" to be updated.
> > I think this should say "The default now assumes that a custom scan provider
> > does not support projections; to retain optimal performance, they should be
> > updated to indicate whether that's supported.
> 
> Okay, I went with this text:
> 
>   The default is now that custom scan providers are assumed to not
>   support projections;  those that do need to be updated for this
>   release.

I'd say "those that do *will need to be updated" otherwise the sentence can
sound like it means "those that need to be updated [will] ..."

Thanks,
-- 
Justin

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-19 Thread Tomas Vondra

On 7/18/22 20:45, Tom Lane wrote:
> Tomas Vondra  writes:
>> Thanks. I'll switch this to "needs review" now.
> 
> OK, I looked through this, and attach some review suggestions in the
> form of a delta patch.  (0001 below is your two patches merged, 0002
> is my delta.)  A lot of the delta is comment-smithing, but not all.
> 

Thanks!

> After reflection I think that what you've got, ie use reltuples but
> don't try to sample if reltuples <= 0, is just fine.  The remote
> would only have reltuples <= 0 in a never-analyzed table, which
> shouldn't be a situation that persists for long unless the table
> is tiny.  Also, if reltuples is in error, the way to bet is that
> it's too small thanks to the table having been enlarged.  But
> an error in that direction doesn't hurt us: we'll overestimate
> the required sample_frac and pull back more data than we need,
> but we'll still end up with a valid sample of the right size.
> So I doubt it's worth the complication to try to correct based
> on relpages etc.  (Note that any such correction would almost
> certainly end in increasing our estimate of reltuples.  But
> it's safer to have an underestimate than an overestimate.)
> 

I mostly agree, particularly for the non-partitioned case.

I we want to improve sampling for partitioned cases (where the foreign
table is just one of many partitions), I think we'd have to rework how
we determine sample size for each partition. Now we simply calculate
that from relpages, which seems quite fragile (different amounts of
bloat, different tuple densities) and somewhat strange for FDW serves
that don't use the same "page" concept.

So it may easily happen we determine bogus sample sizes for each
partition. The difficulties when calculating the sample_frac is just a
secondary issue.

OTOH the concept of a "row" seems way more general, so perhaps
acquire_inherited_sample_rows should use reltuples, and if we want to do
correction it should happen at this stage already.

> I messed around with the sample_frac choosing logic slightly,
> to make it skip pointless calculations if we decide right off
> the bat to disable sampling.  That way we don't need to worry
> about avoiding zero divide, nor do we have to wonder if any
> of the later calculations could misbehave.
> 

Thanks.

> I left your logic about "disable if saving fewer than 100 rows"
> alone, but I have to wonder if using an absolute threshold rather
> than a relative one is well-conceived.  Sampling at a rate of
> 99.9 percent seems pretty pointless, but this code is perfectly
> capable of accepting that if reltuples is big enough.  So
> personally I'd do that more like
> 
>   if (sample_frac > 0.95)
>   method = ANALYZE_SAMPLE_OFF;
> 
> which is simpler and would also eliminate the need for the previous
> range-clamp step.  I'm not sure what the right cutoff is, but
> your "100 tuples" constant is just as arbitrary.
> 

I agree there probably is not much difference between a threshold on
sample_frac directly and number of rows, at least in general. My
reasoning for switching to "100 rows" is that in most cases the network
transfer is probably more costly than "local costs", and 5% may be quite
a few rows (particularly with higher statistics target). I guess the
proper approach would be to make some simple costing, but that seems
like an overkill.

> I rearranged the docs patch too.  Where you had it, analyze_sampling
> was between fdw_startup_cost/fdw_tuple_cost and the following para
> discussing them, which didn't seem to me to flow well at all.  I ended
> up putting analyze_sampling in its own separate list.  You could almost
> make a case for giving it its own , but I concluded that was
> probably overkill.
> 

Thanks.

> One thing I'm not happy about, but did not touch here, is the expense
> of the test cases you added.  On my machine, that adds a full 10% to
> the already excessively long runtime of postgres_fdw.sql --- and I
> do not think it's buying us anything.  It is not this module's job
> to test whether bernoulli sampling works on partitioned tables.
> I think you should just add enough to make sure we exercise the
> relevant code paths in postgres_fdw itself.
> 

Right, I should have commented on that. The purpose of those tests was
verifying that if we change the sampling method on server/table, the
generated query changes accordingly, etc. But that's a bit futile
because we don't have a good way of verifying what query was used - it
worked during development, as I added elog(WARNING).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Convert planner's AggInfo and AggTransInfo to Nodes

2022-07-19 Thread Tom Lane

Peter Eisentraut  writes:
> On 18.07.22 18:08, Tom Lane wrote:
>> I'm kind of tempted to mount an effort to get rid of as many of
>> pathnodes.h's "read_write_ignore" annotations as possible.  Some are
>> necessary to prevent infinite recursion, and others represent considered
>> judgments that they'd bloat node dumps more than they're worth --- but
>> I think quite a lot of them arose from plain laziness about updating
>> outfuncs.c.  With the infrastructure we have now, that's no longer
>> a good reason.

> That was my impression as well, and I agree it would be good to sort 
> that out.

I had a go at doing this, and ended up with something that seems
reasonable for now (attached).  The thing that'd have to be done to
make additional progress is to convert a lot of partitioning-related
structs into full Nodes.  That seems like it might possibly be
worth doing, but I don't feel like doing it.  I doubt that making
planner node dumps smarter is a sufficient excuse for that anyway.
(But possibly if we then larded related code with castNode() and
sibling macros, there'd be enough of a gain in type-safety to
justify it?)

I learned a couple of interesting things along the way:

* I'd thought we already had outfuncs support for writing an array
of node pointers.  We don't, but it's easily added.  I chose to
write the array with parenthesis decoration, mainly because that
eases moving around it in emacs.

* WRITE_OID_ARRAY and WRITE_BOOL_ARRAY needed extension to handle a null
array pointer.  I think we should make all the WRITE_FOO_ARRAY macros
work alike, so I added that to all of them.  I first tried to make the
rest work like WRITE_INDEX_ARRAY, but that failed because readfuncs.c
isn't expecting "<>" for an empty array; it's expecting nothing at
all.  (Note there is no readfuncs equivalent to WRITE_INDEX_ARRAY.)
What I've done here is to change WRITE_INDEX_ARRAY to work like the
others and print nothing for an empty array, but I wonder if now
wouldn't be a good time to redefine the serialized representation
to be more robust.  I'm imagining "<>" for a NULL array pointer and
"(item item item)" otherwise, allowing a cross-check that we're
getting the right number of items.

* gen_node_support.pl was being insufficiently careful about parsing
type names, so I tightened its regexes a bit.

regards, tom lane

diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index f3309c3000..bee48696c7 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -441,6 +441,8 @@ foreach my $infile (@ARGV)
 	$type =~ s/\s*$//;
 	# strip space between type and "*" (pointer) */
 	$type =~ s/\s+\*$/*/;
+	# strip space between type and "**" (array of pointers) */
+	$type =~ s/\s+\*\*$/**/;

 	die
 	  "$infile:$lineno: cannot parse data type in \"$line\"\n"
@@ -745,8 +747,8 @@ _equal${n}(const $n *a, const $n *b)
   unless $equal_ignore || $t eq 'CoercionForm';
 			}
 		}
-		# scalar type pointer
-		elsif ($t =~ /(\w+)\*/ and elem $1, @scalar_types)
+		# arrays of scalar types
+		elsif ($t =~ /^(\w+)\*$/ and elem $1, @scalar_types)
 		{
 			my $tt = $1;
 			if (!defined $array_size_field)
@@ -780,13 +782,14 @@ _equal${n}(const $n *a, const $n *b)
 			print $eff "\tCOMPARE_SCALAR_FIELD($f);\n" unless $equal_ignore;
 		}
 		# node type
-		elsif ($t =~ /(\w+)\*/ and elem $1, @node_types)
+		elsif (($t =~ /^(\w+)\*$/ or $t =~ /^struct\s+(\w+)\*$/)
+			and elem $1, @node_types)
 		{
 			print $cff "\tCOPY_NODE_FIELD($f);\n"unless $copy_ignore;
 			print $eff "\tCOMPARE_NODE_FIELD($f);\n" unless $equal_ignore;
 		}
 		# array (inline)
-		elsif ($t =~ /\w+\[/)
+		elsif ($t =~ /^\w+\[\w+\]$/)
 		{
 			print $cff "\tCOPY_ARRAY_FIELD($f);\n"unless $copy_ignore;
 			print $eff "\tCOMPARE_ARRAY_FIELD($f);\n" unless $equal_ignore;
@@ -894,11 +897,16 @@ _read${n}(void)
 		my @a = @{ $node_type_info{$n}->{field_attrs}{$f} };

 		# extract per-field attributes
-		my $read_write_ignore = 0;
+		my $array_size_field;
 		my $read_as_field;
+		my $read_write_ignore = 0;
 		foreach my $a (@a)
 		{
-			if ($a =~ /^read_as\(([\w.]+)\)$/)
+			if ($a =~ /^array_size\(([\w.]+)\)$/)
+			{
+$array_size_field = $1;
+			}
+			elsif ($a =~ /^read_as\(([\w.]+)\)$/)
 			{
 $read_as_field = $1;
 			}
@@ -1015,19 +1023,10 @@ _read${n}(void)
 			print $off "\tWRITE_ENUM_FIELD($f, $t);\n";
 			print $rff "\tREAD_ENUM_FIELD($f, $t);\n" unless $no_read;
 		}
-		# arrays
-		elsif ($t =~ /(\w+)(\*|\[)/ and elem $1, @scalar_types)
+		# arrays of scalar types
+		elsif ($t =~ /^(\w+)(\*|\[\w+\])$/ and elem $1, @scalar_types)
 		{
 			my $tt = uc $1;
-			my $array_size_field;
-			foreach my $a (@a)
-			{
-if ($a =~ /^array_size\(([\w.]+)\)$/)
-{
-	$array_size_field = $1;
-	last;
-}
-			}
 			if (!defined $array_size_field)
 			{
 die "no array size defined for $n.$f of type $t\n";
@@ -1080,11 +1079,38 @@ _

Re: postgres_fdw: using TABLESAMPLE to collect remote sample

2022-07-19 Thread Tom Lane

Tomas Vondra  writes:
> I we want to improve sampling for partitioned cases (where the foreign
> table is just one of many partitions), I think we'd have to rework how
> we determine sample size for each partition. Now we simply calculate
> that from relpages, which seems quite fragile (different amounts of
> bloat, different tuple densities) and somewhat strange for FDW serves
> that don't use the same "page" concept.

> So it may easily happen we determine bogus sample sizes for each
> partition. The difficulties when calculating the sample_frac is just a
> secondary issue.

> OTOH the concept of a "row" seems way more general, so perhaps
> acquire_inherited_sample_rows should use reltuples, and if we want to do
> correction it should happen at this stage already.

Yeah, there's definitely something to be said for changing that to be
based on rowcount estimates instead of physical size.  I think it's
a matter for a different patch though, and not a reason to hold up
this one.

regards, tom lane

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-07-19 Thread Martin Kalcher


Am 19.07.22 um 16:20 schrieb Tom Lane:


So I withdraw my original position.  These functions should just
shuffle or select in the array's first dimension, preserving
subarrays.  Or else be lazy and reject more-than-one-D arrays;
but it's probably not that hard to handle them.



Here is a patch with dimension aware array_shuffle() and array_sample().

If you think array_flatten() is desirable, i can take a look at it. 
Maybe a second parameter would be nice to specify the target dimension:


  select array_flatten(array[[[1,2],[3,4]],[[5,6],[7,8]]], 1);
  ---
   {1,2,3,4,5,6,7,8}

  select array_flatten(array[[[1,2],[3,4]],[[5,6],[7,8]]], 2);
  ---
   {{1,2,3,4},{5,6,7,8}}

MartinFrom 2aa6d72ff0f4d8835ee2f09f8cdf16b7e8005e56 Mon Sep 17 00:00:00 2001
From: Martin Kalcher 
Date: Sun, 17 Jul 2022 18:06:04 +0200
Subject: [PATCH] Introduce array_shuffle() and array_sample()

* array_shuffle() shuffles the elements of an array.
* array_sample() chooses n elements from an array by random.

Shuffling of arrays can already be accomplished with SQL
using unnest() and array_agg(order by random()). But that is
very slow compared to the new functions. In addition the new functions
are dimension aware.
---
 doc/src/sgml/func.sgml   |  35 +
 src/backend/utils/adt/arrayfuncs.c   | 189 +++
 src/include/catalog/pg_proc.dat  |   6 +
 src/test/regress/expected/arrays.out |  60 +
 src/test/regress/sql/arrays.sql  |  17 +++
 5 files changed, 307 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b6783b7ad0..6b96897244 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19395,6 +19395,41 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ array_sample
+
+array_sample ( array anyarray, n integer )
+anyarray
+   
+   
+Returns n randomly chosen elements from array.
+The order of the elements in resulting array is unspecified.
+   
+   
+array_sample(ARRAY[[1,2],[3,4],[5,6]], 2)
+{{5,6},{3,4}}
+   
+  
+
+  
+   
+
+ array_shuffle
+
+array_shuffle ( anyarray )
+anyarray
+   
+   
+Shuffles the first dimension of the array.
+   
+   
+array_shuffle(ARRAY[[1,2],[3,4],[5,6]])
+{{5,6},{3,4},{1,2}}
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index fb167f226a..3de1b0db30 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -6872,3 +6872,192 @@ trim_array(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(result);
 }
+
+/*
+ * Produce array with max n random items from given array in random order.
+ *
+ * array: array object to examine (must not be NULL)
+ * n: max number of items
+ * elmtyp, elmlen, elmbyval, elmalign: info for the datatype of the items
+ *
+ * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
+ * from the system catalogs, given the elmtype.  However, the caller is
+ * in a better position to cache this info across multiple uses, or even
+ * to hard-wire values if the element type is hard-wired.
+ */
+static ArrayType *
+array_shuffle_n(ArrayType *array, int n,
+Oid elmtyp, int elmlen, bool elmbyval, char elmalign)
+{
+	ArrayType  *result;
+	int			ndim,
+			   *dims,
+			   *lbs,
+rdims[MAXDIM],
+nelm,
+nitem,
+i,
+j,
+k;
+	Datum		elm,
+			   *elms,
+			   *relms;
+	bool		nul,
+			   *nuls,
+			   *rnuls;
+
+	ndim = ARR_NDIM(array);
+	dims = ARR_DIMS(array);
+	lbs = ARR_LBOUND(array);
+
+	if (ndim < 1 || dims[0] < 1 || n < 1)
+		return construct_empty_array(elmtyp);
+
+	deconstruct_array(array, elmtyp, elmlen, elmbyval, elmalign,
+	  &relms, &rnuls, &nelm);
+
+	/* Calculate number of elements per item. */
+	nelm = (ndim > 1) ? ArrayGetNItems(ndim - 1, dims + 1) : 1;
+	elms = relms;
+	nuls = rnuls;
+	nitem = dims[0];
+	n = Min(n, nitem);
+
+	/*
+	 * Shuffle array using Fisher-Yates algorithm. Swap head with an randomly
+	 * chosen item and increment head.
+	 */
+	for (i = 0; i < n; i++)
+	{
+		k = (rand() % (nitem - i)) * nelm;
+
+		/* Swap all elements in head with elements in item k. */
+		for (j = 0; j < nelm; j++)
+		{
+			elm = *elms;
+			nul = *nuls;
+			*elms = elms[k];
+			*nuls = nuls[k];
+			elms[k] = elm;
+			nuls[k] = nul;
+			elms++;
+			nuls++;
+		}
+	}
+
+	memcpy(rdims, dims, ndim * sizeof(int));
+	rdims[0] = n;
+
+	result = construct_md_array(relms, rnuls, ndim, rdims, lbs,
+elmtyp, elmlen, elmbyval, elmalign);
+
+	pfree(relms);
+	pfree(rnuls);
+
+	return result;
+}
+
+/*
+ * Shuffle the elements of an array.
+ */
+Datum
+array_shuffle(PG_FUNCTION_ARGS)
+{
+	ArrayType  *array,
+			   *result;
+	int16		elmlen;
+	bool		elmbyval;
+	char		elmalign;
+	Oid			elmtyp;
+	TypeCacheEntry *typentry;
+	i

Re: pg_parameter_aclcheck() and trusted extensions

2022-07-19 Thread Tom Lane

Nathan Bossart  writes:
>> I think this is because GUCArrayReset() is the only caller of
>> validate_option_array_item() that sets skipIfNoPermissions to true.  The
>> others fall through to set_config_option(), which does a
>> pg_parameter_aclcheck().  So, you are right.

> Here's a small patch that seems to fix this case.

Yeah, this is more or less what I was thinking of.

> However, I wonder if a
> better way to fix this is to provide a way to stop set_config_option() from
> throwing errors (e.g., setting elevel to -1).  That way, we could remove
> the manual permissions checks in favor of always using the real ones, which
> might help prevent similar bugs in the future.

I thought about that for a bit.  You could almost do it today if you
passed elevel == DEBUG5; the ensuing log chatter for failures would be
down in the noise compared to everything else you would see with
min_messages cranked down that far.  However,

(1) As things stand, set_config_option()'s result does not distinguish
no-permissions failures from other problems, so we'd need some rejiggering
of its API anyway.

(2) As you mused upthread, it's possible that ACL_SET isn't what we should
be checking here, but some more-specific privilege.  So I'd just as soon
keep this privilege check separate from set_config_option's.

I'll push ahead with fixing it like this.

regards, tom lane

Re: First draft of the PG 15 release notes

2022-07-19 Thread Bruce Momjian

On Tue, Jul 19, 2022 at 01:13:07PM -0500, Justin Pryzby wrote:
> On Tue, Jul 19, 2022 at 01:24:30PM -0400, Bruce Momjian wrote:
> > Well, I put the --no-synchronized-snapshots item in incompatibilities
> > since it is a user-visible change that might require script adjustments.
> > However, I put the limit of pg_dump to 9.2 and greater into the pg_dump
> > section.  Are you suggesting I move the--no-synchronized-snapshots item
> > down there?  That doesn't match with the way I have listed other
> > incompatibilities so I am resistant to do that.
> 
> I'd rather see the "limit support to v9.2" be moved or added to the
> "incompatibilities" section, maybe with "remove --no-synchronized-snapshots"
> as a secondary sentence.

Is removing support for an older version an incompatibility --- I didn't
think so.

> > > > 0. Add support for LZ4 and Zstandard compression of server-side base 
> > > > backups (Jeevan Ladhe, Robert Haas)
> > > > 1. Allow pg_basebackup to use LZ4 and Zstandard compression on 
> > > > server-side base backup files (Dipesh Pandit, Jeevan Ladhe)
> > > > 2. Allow pg_basebackup's --compress option to control the compression 
> > > > method and options (Michael Paquier, Robert Haas)
> > > >New options include server-gzip (gzip on the server), client-gzip 
> > > > (same as gzip).
> > > > 3. Allow pg_basebackup to compress on the server side and decompress on 
> > > > the client side before storage (Dipesh Pandit)
> > > >This is accomplished by specifying compression on the server side 
> > > > and plain output format.
> > > 
> > > I still think these expose the incremental development rather than the
> > > user-facing change.
> > 
> > > 1. It seems wrong to say "server-side" since client-side compression with
> > > LZ4/zstd is also supported.
> > 
> > Agreed.  I changed it to:
> > 
> >Allow pg_basebackup to do LZ4 and Zstandard server-side compression
> >on base backup files (Dipesh Pandit, Jeevan Ladhe)
> 
> This still misses the point that those compression algs are also supported on
> the client side, so it seems misleading to mention "server-side" support.

I reworked that paragraph in the attached patch.  What we did was to add
server-side gzip/LZ/ZSTD, and client-side LZ/ZSTD.  (We already had
client-side gzip.)  Hopefully the new text is clearer.  You can see the
new output here:

https://momjian.us/pgsql_docs/release-15.html

> > > > Allow custom scan provders to indicate if they support projections 
> > > > (Sven Klemm)
> > > > The default is now that custom scan providers can't support 
> > > > projections, so they need to be updated for this release.
> > > 
> > > Per the commit message, they don't "need" to be updated.
> > > I think this should say "The default now assumes that a custom scan 
> > > provider
> > > does not support projections; to retain optimal performance, they should 
> > > be
> > > updated to indicate whether that's supported.
> > 
> > Okay, I went with this text:
> > 
> >   The default is now that custom scan providers are assumed to not
> >   support projections;  those that do need to be updated for this
> >   release.
> 
> I'd say "those that do *will need to be updated" otherwise the sentence can
> sound like it means "those that need to be updated [will] ..."

Oh, good point, done.

Cumulative patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

diff --git a/doc/src/sgml/release-15.sgml b/doc/src/sgml/release-15.sgml
new file mode 100644
index dfa3c74..1cf6375
*** a/doc/src/sgml/release-15.sgml
--- b/doc/src/sgml/release-15.sgml
*** Author: Tom Lane 
*** 544,551 
  
   
The default is now that custom scan providers are assumed to not
!   support projections;  those that do need to be updated for this
!   release.
   
  
  
--- 544,551 
  
   
The default is now that custom scan providers are assumed to not
!   support projections;  those that do will need to be updated for
!   this release.
   
  
  
*** Author: Robert Haas 
  2022-03-08 [7cf085f07] Add support for zstd base backup compression.
  -->
  
   

!Allow pg_basebackup to do LZ4 and
!Zstandard server-side compression on base backup files (Dipesh
!Pandit, Jeevan Ladhe)

-  
- 
- 
  
-  

!Allow pg_basebackup's
!--compress option to control the compression
!method and options (Michael Paquier, Robert Haas)

   
  
--- 2495,2515 
  2022-02-11 [751b8d23b] pg_basebackup: Allow client-side LZ4 (de)compression.
  Author: Robert Haas 
  2022-03-08 [7cf085f07] Add support for zstd base backup compression.
+ Author: Robert Haas 
+ 2022-01-24 [0ad803291] Server-side gzip compression.
  -->
  
   

!Allow pg_baseback

Re: Slow standby snapshot

2022-07-19 Thread Michail Nikolaev

Hello, Andrey.

> I've looked into v5.
Thanks!

Patch is updated accordingly your remarks.

Best regards,
Michail.
From 1301a262dea7f541c11092851e82f10932150ee3 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev 
Date: Tue, 19 Jul 2022 23:50:53 +0300
Subject: [PATCH v6] Currently KnownAssignedXidsGetAndSetXmin requires an
 iterative loop through KnownAssignedXids array, including xids marked as
 invalid. Performance impact is especially noticeable in the presence of long
 (few seconds) transactions on primary, big number (few thousands) of
 max_connections and high read workload on standby. Most of the cpu spent on
 looping throw KnownAssignedXids while almost all xid are invalid anyway.
 KnownAssignedXidsCompress removes invalid xids from time to time, but
 performance is still affected.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To increase performance we lazily maintain an offset in the KnownAssignedXidsNextOffset array to skip known
to be invalid xids. KnownAssignedXidsNextOffset does not always point to “next” valid xid, it is just some
offset safe to skip (known to be filled by only invalid xids).
---
 src/backend/storage/ipc/procarray.c | 57 -
 1 file changed, 48 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index dadaa958a8..f613ae2f09 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -271,6 +271,7 @@ static TransactionId cachedXidIsNotInProgress = InvalidTransactionId;
  */
 static TransactionId *KnownAssignedXids;
 static bool *KnownAssignedXidsValid;
+static int32 *KnownAssignedXidsNextOffset;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
 /*
@@ -450,6 +451,12 @@ CreateSharedProcArray(void)
 			ShmemInitStruct("KnownAssignedXidsValid",
 			mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
 			&found);
+		KnownAssignedXidsNextOffset = (int32 *)
+ShmemInitStruct("KnownAssignedXidsNextOffset",
+mul_size(sizeof(int32), TOTAL_MAX_CACHED_SUBXIDS),
+&found);
+		for (int i = 0; i < TOTAL_MAX_CACHED_SUBXIDS; i++)
+			KnownAssignedXidsNextOffset[i] = 1;
 	}
 }
 
@@ -4539,7 +4546,15 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
  * XID entry itself.  This preserves the property that the XID entries are
  * sorted, so we can do binary searches easily.  Periodically we compress
  * out the unused entries; that's much cheaper than having to compress the
- * array immediately on every deletion.
+ * array immediately on every deletion. Also, we lazily maintain an offset
+ * in KnownAssignedXidsNextOffset[] array to skip known to be invalid xids.
+ * It helps to skip the gaps; it could significantly increase performance in
+ * the case of long transactions on the primary. KnownAssignedXidsNextOffset
+ * is s updated during taking the snapshot. The KnownAssignedXidsNextOffset
+ * contains not an offset to the next valid xid but a number which tends to
+ * the offset to next valid xid. KnownAssignedXidsNextOffset[] values read
+ * and updated without additional locking because four-bytes read-writes are
+ * assumed to be atomic.
  *
  * The actually valid items in KnownAssignedXids[] and KnownAssignedXidsValid[]
  * are those with indexes tail <= i < head; items outside this subscript range
@@ -4577,7 +4592,7 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
  *		must happen)
  *	* Compressing the array is O(S) and requires exclusive lock
  *	* Removing an XID is O(logS) and requires exclusive lock
- *	* Taking a snapshot is O(S) and requires shared lock
+ *	* Taking a snapshot is O(S), amortized O(N) next call; requires shared lock
  *	* Checking for an XID is O(logS) and requires shared lock
  *
  * In comparison, using a hash table for KnownAssignedXids would mean that
@@ -4637,12 +4652,13 @@ KnownAssignedXidsCompress(bool force)
 	 * re-aligning data to 0th element.
 	 */
 	compress_index = 0;
-	for (i = tail; i < head; i++)
+	for (i = tail; i < head; i += KnownAssignedXidsNextOffset[i])
 	{
 		if (KnownAssignedXidsValid[i])
 		{
 			KnownAssignedXids[compress_index] = KnownAssignedXids[i];
 			KnownAssignedXidsValid[compress_index] = true;
+			KnownAssignedXidsNextOffset[compress_index] = 1;
 			compress_index++;
 		}
 	}
@@ -4745,6 +4761,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	{
 		KnownAssignedXids[head] = next_xid;
 		KnownAssignedXidsValid[head] = true;
+		KnownAssignedXidsNextOffset[head] = 1;
 		TransactionIdAdvance(next_xid);
 		head++;
 	}
@@ -4960,7 +4977,7 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)
 	tail = pArray->tailKnownAssignedXids;
 	head = pArray->headKnownAssignedXids;
 
-	for (i = tail; i < head; i++)
+	for (i = tail; i < head; i += KnownAssignedXidsNextOffset[i])
 	{
 		if (KnownAssignedXidsValid[i])
 		{
@@ -4983,7 +5000,7 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)

Re: pg_parameter_aclcheck() and trusted extensions

2022-07-19 Thread Nathan Bossart

On Tue, Jul 19, 2022 at 04:27:08PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> However, I wonder if a
>> better way to fix this is to provide a way to stop set_config_option() from
>> throwing errors (e.g., setting elevel to -1).  That way, we could remove
>> the manual permissions checks in favor of always using the real ones, which
>> might help prevent similar bugs in the future.
> 
> I thought about that for a bit.  You could almost do it today if you
> passed elevel == DEBUG5; the ensuing log chatter for failures would be
> down in the noise compared to everything else you would see with
> min_messages cranked down that far.  However,
> 
> (1) As things stand, set_config_option()'s result does not distinguish
> no-permissions failures from other problems, so we'd need some rejiggering
> of its API anyway.
> 
> (2) As you mused upthread, it's possible that ACL_SET isn't what we should
> be checking here, but some more-specific privilege.  So I'd just as soon
> keep this privilege check separate from set_config_option's.

I think we'd also need to keep the manual permissions checks for
placeholders, so it wouldn't save much, anyway.

> I'll push ahead with fixing it like this.

Sounds good.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Jacob Champion

On Tue, Jul 19, 2022 at 10:09 AM Andres Freund  wrote:
> On 2022-07-19 12:39:43 -0400, Tom Lane wrote:
> > Having said that, I struggle to see why we are panicking about badly
> > encoded log data from this source while blithely ignoring the problems
> > posed by non-ASCII role names, database names, and tablespace names.
>
> I think we should fix these as well. I'm not as concerned about post-auth
> encoding issues (i.e. tablespace name) as about pre-auth data (role name,
> database name) - obviously being allowed to log in already is a pretty good
> filter...

v2 adds escaping to pg_clean_ascii(). My original attempt used
StringInfo allocation, but that didn't play well with guc_malloc(), so
I switched to a two-pass API where the caller allocates. Let me know
if I'm missing something obvious; this way is more verbose than I'd
like...

Thanks,
--Jacob
From d99b59652f0b8479a5df0bc50357b5c4617f9fc2 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Tue, 19 Jul 2022 10:48:58 -0700
Subject: [PATCH v2 1/2] pg_clean_ascii(): escape bytes rather than lose them

Rather than replace each unprintable byte with a '?' character, replace
it with a hex escape instead. The API is now two-pass (one pass to get
the escaped length of the string, the second pass to perform the
escaping), in order to allow the use of guc_malloc'd buffers.
---
 src/backend/postmaster/postmaster.c |  4 +--
 src/backend/utils/misc/guc.c| 10 ++--
 src/common/string.c | 38 ++---
 src/include/common/string.h |  2 +-
 4 files changed, 40 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 1c25457526..8f5cdf4380 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2284,9 +2284,9 @@ retry1:
  */
 if (strcmp(nameptr, "application_name") == 0)
 {
-	char	   *tmp_app_name = pstrdup(valptr);
+	char	   *tmp_app_name = palloc(pg_clean_ascii(valptr, NULL));
 
-	pg_clean_ascii(tmp_app_name);
+	pg_clean_ascii(valptr, tmp_app_name);
 
 	port->application_name = tmp_app_name;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 0328029d43..2e1a7af315 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -12480,7 +12480,10 @@ static bool
 check_application_name(char **newval, void **extra, GucSource source)
 {
 	/* Only allow clean ASCII chars in the application name */
-	pg_clean_ascii(*newval);
+	char *buf = guc_malloc(ERROR, pg_clean_ascii(*newval, NULL));
+
+	pg_clean_ascii(*newval, buf);
+	*newval = buf;
 
 	return true;
 }
@@ -12496,7 +12499,10 @@ static bool
 check_cluster_name(char **newval, void **extra, GucSource source)
 {
 	/* Only allow clean ASCII chars in the cluster name */
-	pg_clean_ascii(*newval);
+	char *buf = guc_malloc(ERROR, pg_clean_ascii(*newval, NULL));
+
+	pg_clean_ascii(*newval, buf);
+	*newval = buf;
 
 	return true;
 }
diff --git a/src/common/string.c b/src/common/string.c
index 16940d1fa7..82a8afa4a9 100644
--- a/src/common/string.c
+++ b/src/common/string.c
@@ -22,6 +22,7 @@
 #endif
 
 #include "common/string.h"
+#include "lib/stringinfo.h"
 
 
 /*
@@ -59,9 +60,11 @@ strtoint(const char *pg_restrict str, char **pg_restrict endptr, int base)
 
 
 /*
- * pg_clean_ascii -- Replace any non-ASCII chars with a '?' char
+ * pg_clean_ascii -- Replace any non-ASCII chars with a "\xXX" string
  *
- * Modifies the string passed in which must be '\0'-terminated.
+ * Puts an escaped copy into the dst buffer, which must be at least as big as
+ * the return value of pg_clean_ascii(src, NULL). The input string must be
+ * '\0'-terminated.
  *
  * This function exists specifically to deal with filtering out
  * non-ASCII characters in a few places where the client can provide an almost
@@ -73,22 +76,39 @@ strtoint(const char *pg_restrict str, char **pg_restrict endptr, int base)
  * In general, this function should NOT be used- instead, consider how to handle
  * the string without needing to filter out the non-ASCII characters.
  *
- * Ultimately, we'd like to improve the situation to not require stripping out
- * all non-ASCII but perform more intelligent filtering which would allow UTF or
+ * Ultimately, we'd like to improve the situation to not require replacing all
+ * non-ASCII but perform more intelligent filtering which would allow UTF or
  * similar, but it's unclear exactly what we should allow, so stick to ASCII only
  * for now.
  */
-void
-pg_clean_ascii(char *str)
+size_t
+pg_clean_ascii(const char *str, char *dst)
 {
-	/* Only allow clean ASCII chars in the string */
-	char	   *p;
+	const char *p;
+	size_t		i = 0;
 
 	for (p = str; *p != '\0'; p++)
 	{
+		/* Only allow clean ASCII chars in the string */
 		if (*p < 32 || *p > 126)
-			*p = '?';
+		{
+			if (dst)
+snprintf(&dst[i], 5, "\\x%02x", (unsigned char) *p);
+			i += 4;
+		}
+		else
+		{
+			if (dst)
+dst[

Re: [PATCH] Log details for client certificate failures

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 15:08:38 -0700, Jacob Champion wrote:
> v2 adds escaping to pg_clean_ascii(). My original attempt used
> StringInfo allocation, but that didn't play well with guc_malloc(), so
> I switched to a two-pass API where the caller allocates. Let me know
> if I'm missing something obvious; this way is more verbose than I'd
> like...

Hm, that's pretty awkward. Perhaps we can have a better API for
everything but guc.c?

Or alternatively, perhaps we can just make pg_clean_ascii() return NULL
if allocation failed and then guc_strdup() the result in guc.c?

If we end up needing a two phase approach, why use the same function for
both phases? That seems quite awkward.

Greetings,

Andres Freund

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 16:28:07 +0200, Alvaro Herrera wrote:
> Anyway, the minimal patch that makes plpgsql_check tests pass is
> attached.

Do you want to commit that or should I?

Greetings,

Andres Freund

Re: pgsql: Default to hidden visibility for extension libraries where possi

2022-07-19 Thread Andres Freund

Hi,

On 2022-07-19 08:31:49 -0700, Andres Freund wrote:
> But I think we probably should err on the side of adding more
> declarations, rather than the oposite.

To see if there's other declarations that should be added, I used
https://codesearch.debian.net/search?q=load_external_function&literal=1&perpkg=1

which shows plpgsql_check and hstore_pllua. All the hstore symbols for
the latter are exported already.

Greetings,

Andres Freund

Re: Convert planner's AggInfo and AggTransInfo to Nodes

2022-07-19 Thread Tom Lane

I wrote:
> * WRITE_OID_ARRAY and WRITE_BOOL_ARRAY needed extension to handle a null
> array pointer.  I think we should make all the WRITE_FOO_ARRAY macros
> work alike, so I added that to all of them.  I first tried to make the
> rest work like WRITE_INDEX_ARRAY, but that failed because readfuncs.c
> isn't expecting "<>" for an empty array; it's expecting nothing at
> all.  (Note there is no readfuncs equivalent to WRITE_INDEX_ARRAY.)
> What I've done here is to change WRITE_INDEX_ARRAY to work like the
> others and print nothing for an empty array, but I wonder if now
> wouldn't be a good time to redefine the serialized representation
> to be more robust.  I'm imagining "<>" for a NULL array pointer and
> "(item item item)" otherwise, allowing a cross-check that we're
> getting the right number of items.

Concretely, about like this.

regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 2b85f97f39..fe97d559b6 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -16,6 +16,7 @@
 
 #include 
 
+#include "access/attnum.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "nodes/bitmapset.h"
@@ -96,48 +97,30 @@ static void outChar(StringInfo str, char c);
 	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
 	 outBitmapset(str, node->fldname))
 
+/* Write a variable-length array of AttrNumber */
 #define WRITE_ATTRNUMBER_ARRAY(fldname, len) \
-	do { \
-		appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-		for (int i = 0; i < len; i++) \
-			appendStringInfo(str, " %d", node->fldname[i]); \
-	} while(0)
+	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
+	 writeAttrNumberCols(str, node->fldname, len))
 
+/* Write a variable-length array of Oid */
 #define WRITE_OID_ARRAY(fldname, len) \
-	do { \
-		appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-		for (int i = 0; i < len; i++) \
-			appendStringInfo(str, " %u", node->fldname[i]); \
-	} while(0)
+	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
+	 writeOidCols(str, node->fldname, len))
 
-/*
- * This macro supports the case that the field is NULL.  For the other array
- * macros, that is currently not needed.
- */
+/* Write a variable-length array of Index */
 #define WRITE_INDEX_ARRAY(fldname, len) \
-	do { \
-		appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-		if (node->fldname) \
-			for (int i = 0; i < len; i++) \
-appendStringInfo(str, " %u", node->fldname[i]); \
-		else \
-			appendStringInfoString(str, "<>"); \
-	} while(0)
+	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
+	 writeIndexCols(str, node->fldname, len))
 
+/* Write a variable-length array of int */
 #define WRITE_INT_ARRAY(fldname, len) \
-	do { \
-		appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-		for (int i = 0; i < len; i++) \
-			appendStringInfo(str, " %d", node->fldname[i]); \
-	} while(0)
+	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
+	 writeIntCols(str, node->fldname, len))
 
+/* Write a variable-length array of bool */
 #define WRITE_BOOL_ARRAY(fldname, len) \
-	do { \
-		appendStringInfoString(str, " :" CppAsString(fldname) " "); \
-		for (int i = 0; i < len; i++) \
-			appendStringInfo(str, " %s", booltostr(node->fldname[i])); \
-	} while(0)
-
+	(appendStringInfoString(str, " :" CppAsString(fldname) " "), \
+	 writeBoolCols(str, node->fldname, len))
 
 #define booltostr(x)  ((x) ? "true" : "false")
 
@@ -196,6 +179,37 @@ outChar(StringInfo str, char c)
 	outToken(str, in);
 }
 
+/*
+ * common implementation for scalar-array-writing functions
+ *
+ * The data format is either "<>" for a NULL pointer or "(item item item)".
+ * fmtstr must include a leading space.
+ * convfunc can be empty, or the name of a conversion macro or function.
+ */
+#define WRITE_SCALAR_ARRAY(fnname, datatype, fmtstr, convfunc) \
+static void \
+fnname(StringInfo str, const datatype *arr, int len) \
+{ \
+	if (arr != NULL) \
+	{ \
+		appendStringInfoChar(str, '('); \
+		for (int i = 0; i < len; i++) \
+			appendStringInfo(str, fmtstr, convfunc(arr[i])); \
+		appendStringInfoChar(str, ')'); \
+	} \
+	else \
+		appendStringInfoString(str, "<>"); \
+}
+
+WRITE_SCALAR_ARRAY(writeAttrNumberCols, AttrNumber, " %d",)
+WRITE_SCALAR_ARRAY(writeOidCols, Oid, " %u",)
+WRITE_SCALAR_ARRAY(writeIndexCols, Index, " %u",)
+WRITE_SCALAR_ARRAY(writeIntCols, int, " %d",)
+WRITE_SCALAR_ARRAY(writeBoolCols, bool, " %s", booltostr)
+
+/*
+ * Print a List.
+ */
 static void
 _outList(StringInfo str, const List *node)
 {
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index a2391280be..c77c3984ca 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -502,97 +502,45 @@ readDatum(bool typbyval)
 }
 
 /*
- * readAttrNumberCols
- */
-AttrNumber *
-readAttrNumberCols(int numCols)
-{
-	int			tokenLength,
-i;
-	const char *token;
-	AttrN

Re: [Commitfest 2022-07] Begins Now

2022-07-19 Thread Joe Conway


On 7/18/22 02:53, Alvaro Herrera wrote:

On 2022-Jul-18, Aleksander Alekseev wrote:


Hi hackers,


If someone put a lot of review into a patchset a few months ago, they
absolutely deserve credit. But if that entry has been sitting with no
feedback this month, why is it useful to keep that Reviewer around?


As I recall, several committers reported before that they use
Reviewers field in the CF application when writing the commit message.
I would argue that this is the reason.


Maybe we need two separate reviewer columns -- one for credits
(historical tracking) and one for people currently reviewing a patch.
So we definitely expect an email "soon" from someone in the second
column, but not from somebody who is only in the first column.


+1


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Memory leak fix in psql

2022-07-19 Thread Michael Paquier

On Tue, Jul 19, 2022 at 12:36:31PM -0400, Tom Lane wrote:
> Yeah, it's old school, but please let's not have a few functions that
> do it randomly differently from all their neighbors.

True enough.  And it is not like we should free the PQExpBuffer given
by the caller in validateSQLNamePattern().
--
Michael


signature.asc
Description: PGP signature

Re: Memory leak fix in psql

2022-07-19 Thread Michael Paquier

On Tue, Jul 19, 2022 at 09:43:21AM -0700, Mark Dilger wrote:
> This looks ok, but comments down-thread seem reasonable, so I
> suspect a new patch will be needed.  Would you like to author it, or
> would you prefer that I, as the guilty party, do so? 

If any of you could update the patch, that would be great.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Add a test for "cannot truncate foreign table"

2022-07-19 Thread Fujii Masao





On 2022/07/15 16:52, Fujii Masao wrote:



On 2022/07/08 17:03, Etsuro Fujita wrote:

Rather than doing so, I'd vote for adding a test case to file_fdw, as
proposed in the patch, because that would be much simpler and much
less expensive.


So ISTM that most agreed to push Nagata-san's patch adding the test for 
TRUNCATE on foreign table with file_fdw. So barring any objection, I will 
commit the patch.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Add a test for "cannot truncate foreign table"

2022-07-19 Thread Yugo NAGATA

On Wed, 20 Jul 2022 09:38:17 +0900
Fujii Masao  wrote:

> 
> 
> On 2022/07/15 16:52, Fujii Masao wrote:
> > 
> > 
> > On 2022/07/08 17:03, Etsuro Fujita wrote:
> >> Rather than doing so, I'd vote for adding a test case to file_fdw, as
> >> proposed in the patch, because that would be much simpler and much
> >> less expensive.
> > 
> > So ISTM that most agreed to push Nagata-san's patch adding the test for 
> > TRUNCATE on foreign table with file_fdw. So barring any objection, I will 
> > commit the patch.
> 
> Pushed. Thanks!

Thanks!

> 
> Regards,
> 
> -- 
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION


-- 
Yugo NAGATA

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 17:31:07 +0900, Masahiko Sawada  
wrote in 
> On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
>  wrote:
> > At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila  
> > wrote in
> > > Good work. I wonder without comments this may create a problem in the
> > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > freeing the memory any less robust. Also, for consistency, we can use
> > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > memory in the below code:
> > > + /* set catalog modifying transactions */
> > > + if (builder->catchange.xip)
> > > + pfree(builder->catchange.xip);
> >
> > But xip must be positive there.  We can add a comment explains that.
> >
> 
> Yes, if we add the comment for it, probably we need to explain a gcc's
> optimization but it seems to be too much to me.

Ah, sorry. I confused with other place in SnapBuildPurgeCommitedTxn.
I agree to you, that we don't need additional comment *there*.

> > +   catchange_xip = 
> > ReorderBufferGetCatalogChangesXacts(builder->reorder);
> >
> > catchange_xip is allocated in the current context, but ondisk is
> > allocated in builder->context.  I see it kind of inconsistent (even if
> > the current context is same with build->context).
> 
> Right. I thought that since the lifetime of catchange_xip is short,
> until the end of SnapBuildSerialize() function we didn't need to
> allocate it in builder->context. But given ondisk, we need to do that
> for catchange_xip as well. Will fix it.

Thanks.

> > +   if (builder->committed.xcnt > 0)
> > +   {
> >
> > It seems to me comitted.xip is always non-null, so we don't need this.
> > I don't strongly object to do that, though.
> 
> But committed.xcnt could be 0, right? We don't need to copy anything
> by calling memcpy with size = 0 in this case. Also, it looks more
> consistent with what we do for catchange_xcnt.

Mmm. the patch changed that behavior. AllocateSnapshotBuilder always
allocate the array with a fixed size. SnapBuildAddCommittedTxn still
assumes builder->committed.xip is non-NULL.  SnapBuildRestore *kept*
ondisk.builder.commited.xip populated with a valid array pointer. But
the patch allows committed.xip be NULL, thus in that case,
SnapBuildAddCommitedTxn calls repalloc(NULL) which triggers assertion
failure.

> > +   Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
> >
> > (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> > (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> > assert just requires that catchange_txns and catchange_ntxns are
> > consistent so it should be checked just after dlist_empty.. I think.
> >
> 
> If we want to check if catchange_txns and catchange_ntxns are
> consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
> This function requires the caller to use rb->catchange_ntxns as the
> length of the returned array. I think this assertion ensures that the
> actual length of the array is consistent with the length we
> pre-calculated.

Sorry again. Please forget the comment about xcnt == rb->catchange_ntxns..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 09:58:28 -0700, Nathan Bossart  
wrote in 
> On Tue, Jul 19, 2022 at 02:43:59PM +0530, Bharath Rupireddy wrote:
> > Done. PSA v5 patch set.
> 
> LGTM.  I've marked this as ready-for-committer.

I find the following sentense in config.sgml. "Specifies the minimum
size of past log file segments kept in the pg_wal directory"

postgresql.conf.sample contains "logfile segment" in a few lines.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: errdetail/errhint style

2022-07-19 Thread Michael Paquier

On Tue, Jul 19, 2022 at 09:51:13PM +0900, Michael Paquier wrote:
> +1.

I have looked at that and added the extra periods, and applied it.
Thanks, Justin.
--
Michael


signature.asc
Description: PGP signature

Re: Expose last replayed timeline ID along with last replayed LSN

2022-07-19 Thread Kyotaro Horiguchi

At Tue, 19 Jul 2022 14:28:40 +0530, Bharath Rupireddy 
 wrote in 
> Hi,
> 
> At times it's useful to know the last replayed WAL record's timeline
> ID (especially on the standbys that are lagging in applying WAL while
> failing over - for reporting, logging and debugging purposes). AFICS,
> there's no function that exposes the last replayed TLI. We can either
> change the existing pg_last_wal_replay_lsn() to report TLI along with
> the LSN which might break the compatibility or introduce a new
> function pg_last_wal_replay_info() that emits both LSN and TLI. I'm
> fine with either of the approaches, but for now, I'm attaching a WIP
> patch that adds a new function pg_last_wal_replay_info().
> 
> Thoughts?

There was a more comprehensive discussion [1], which went nowhere..

[1] https://www.postgresql.org/message-id/20191211052002.GK72921%40paquier.xyz

regadrs.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Wed, Jul 20, 2022 at 9:58 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 19 Jul 2022 17:31:07 +0900, Masahiko Sawada  
> wrote in
> > On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
> >  wrote:
> > > At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila  
> > > wrote in
> > > > Good work. I wonder without comments this may create a problem in the
> > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > > freeing the memory any less robust. Also, for consistency, we can use
> > > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > > memory in the below code:
> > > > + /* set catalog modifying transactions */
> > > > + if (builder->catchange.xip)
> > > > + pfree(builder->catchange.xip);
> > >
> > > But xip must be positive there.  We can add a comment explains that.
> > >
> >
> > Yes, if we add the comment for it, probably we need to explain a gcc's
> > optimization but it seems to be too much to me.
>
> Ah, sorry. I confused with other place in SnapBuildPurgeCommitedTxn.
> I agree to you, that we don't need additional comment *there*.
>
> > > +   catchange_xip = 
> > > ReorderBufferGetCatalogChangesXacts(builder->reorder);
> > >
> > > catchange_xip is allocated in the current context, but ondisk is
> > > allocated in builder->context.  I see it kind of inconsistent (even if
> > > the current context is same with build->context).
> >
> > Right. I thought that since the lifetime of catchange_xip is short,
> > until the end of SnapBuildSerialize() function we didn't need to
> > allocate it in builder->context. But given ondisk, we need to do that
> > for catchange_xip as well. Will fix it.
>
> Thanks.
>
> > > +   if (builder->committed.xcnt > 0)
> > > +   {
> > >
> > > It seems to me comitted.xip is always non-null, so we don't need this.
> > > I don't strongly object to do that, though.
> >
> > But committed.xcnt could be 0, right? We don't need to copy anything
> > by calling memcpy with size = 0 in this case. Also, it looks more
> > consistent with what we do for catchange_xcnt.
>
> Mmm. the patch changed that behavior. AllocateSnapshotBuilder always
> allocate the array with a fixed size. SnapBuildAddCommittedTxn still
 > assumes builder->committed.xip is non-NULL.  SnapBuildRestore *kept*
> ondisk.builder.commited.xip populated with a valid array pointer. But
> the patch allows committed.xip be NULL, thus in that case,
> SnapBuildAddCommitedTxn calls repalloc(NULL) which triggers assertion
> failure.

IIUC the patch doesn't allow committed.xip to be NULL since we don't
overwrite it if builder->committed.xcnt is 0 (i.e.,
ondisk.builder.committed.xip is NULL):

builder->committed.xcnt = ondisk.builder.committed.xcnt;
/* We only allocated/stored xcnt, not xcnt_space xids ! */
/* don't overwrite preallocated xip, if we don't have anything here */
if (builder->committed.xcnt > 0)
{
pfree(builder->committed.xip);
builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
builder->committed.xip = ondisk.builder.committed.xip;
}

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?

2022-07-19 Thread David Rowley

Hackers,

Currently, if we have a query such as:

SELECT a,b, COUNT(*)
FROM a
INNER JOIN b on a.a = b.x
GROUP BY a,b
ORDER BY a DESC, b;

With enable_hashagg = off, we get the following plan:

  QUERY PLAN
---
 GroupAggregate
   Group Key: a.a, a.b
   ->  Sort
 Sort Key: a.a DESC, a.b
 ->  Merge Join
   Merge Cond: (a.a = b.x)
   ->  Sort
 Sort Key: a.a
 ->  Seq Scan on a
   ->  Sort
 Sort Key: b.x
 ->  Seq Scan on b

We can see that the merge join picked to sort the input on a.a rather
than a.a DESC.  This is due to the way
select_outer_pathkeys_for_merge() only picks the query_pathkeys as a
prefix of the join pathkeys if we can find all of the join pathkeys in
the query_pathkeys.

I think we can relax this now that we have incremental sort.  I think
a better way to limit this is to allow a prefix of the query_pathkeys
providing that covers *all* of the join pathkeys.  That way, for the
above query, it leaves it open for the planner to do the Merge Join by
sorting by a.a DESC then just do an Incremental Sort to get the
GroupAggregate input sorted by a.b.

I've attached a patch for this and it changes the plan for the above query to:

   QUERY PLAN

 GroupAggregate
   Group Key: a.a, a.b
   ->  Incremental Sort
 Sort Key: a.a DESC, a.b
 Presorted Key: a.a
 ->  Merge Join
   Merge Cond: (a.a = b.x)
   ->  Sort
 Sort Key: a.a DESC
 ->  Seq Scan on a
   ->  Sort
 Sort Key: b.x DESC
 ->  Seq Scan on b

The current behaviour is causing me a bit of trouble in plan
regression for the ORDER BY / DISTINCT aggregate improvement patch
that I have pending.

Is there any reason that we shouldn't do this?

David
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 11de286e60..cf08fcac51 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1924,11 +1924,13 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
  * Since we assume here that a sort is required, there is no particular use
  * in matching any available ordering of the outerrel.  (joinpath.c has an
  * entirely separate code path for considering sort-free mergejoins.)  Rather,
- * it's interesting to try to match the requested query_pathkeys so that a
- * second output sort may be avoided; and failing that, we try to list "more
- * popular" keys (those with the most unmatched EquivalenceClass peers)
- * earlier, in hopes of making the resulting ordering useful for as many
- * higher-level mergejoins as possible.
+ * it's interesting to try to match, or match a prefix of the requested
+ * query_pathkeys so that a second output sort may be avoided.  We can get
+ * away with just a prefix of the query_pathkeys when that prefix covers the
+ * entire join condition.  Failing that, we try to list "more popular" keys
+ *  (those with the most unmatched EquivalenceClass peers) earlier, in hopes
+ * of making the resulting ordering useful for as many higher-level mergejoins
+ * as possible.
  */
 List *
 select_outer_pathkeys_for_merge(PlannerInfo *root,
@@ -1998,11 +2000,16 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 
/*
 * Find out if we have all the ECs mentioned in query_pathkeys; if so we
-* can generate a sort order that's also useful for final output. There 
is
-* no percentage in a partial match, though, so we have to have 'em all.
+* can generate a sort order that's also useful for final output. If we
+* only have a prefix of the query_pathkeys, and that prefix is the 
entire
+* join condition, then it's useful to use the prefix as the pathkeys as
+* this increases the chances that an incremental sort will be able to 
be
+* used by the upper planner.
 */
if (root->query_pathkeys)
{
+   int matches = 0;
+
foreach(lc, root->query_pathkeys)
{
PathKey*query_pathkey = (PathKey *) lfirst(lc);
@@ -2015,6 +2022,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
}
if (j >= necs)
break;  /* didn't find match */
+
+   matches++;
}
/* if we got to the end of the list, we have them all */
if (lc == NULL)
@@ -2037,6 +2046,22 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
}
}
}
+
+   /*
+* If we didn't find a full match, but matched all of the join 
clauses

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Amit Kapila

On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada  wrote:
>
> On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila  wrote:
> >
> > On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada  
> > wrote:
> > >
> >
> > > BTW on backbranches, I think that the reason why we add
> > > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > > SnapBuild that could be serialized. Can we add a (private) array for
> > > the initial running xacts in snapbuild.c instead of adding new
> > > variables to ReorderBuffer?
> > >
> >
> > While thinking about this, I wonder if the current patch for back
> > branches can lead to an ABI break as it changes the exposed structure?
> > If so, it may be another reason to change it to some other way
> > probably as you are suggesting.
>
> Yeah, it changes the size of ReorderBuffer, which is not good.
>

So, are you planning to give a try with your idea of making a private
array for the initial running xacts? I am not sure but I guess you are
proposing to add it in SnapBuild structure, if so, that seems safe as
that structure is not exposed.

> Changing the function names and arguments would also break ABI. So
> probably we cannot do the above idea of removing
> ReorderBufferInitialXactsSetCatalogChanges() as well.
>

Why do you think we can't remove
ReorderBufferInitialXactsSetCatalogChanges() from the back branch
patch? I think we don't need to change the existing function
ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
like SnapBuildXidHasCatalogChanges() similar to master branch patch.

-- 
With Regards,
Amit Kapila.

RE: Memory leak fix in psql

2022-07-19 Thread tanghy.f...@fujitsu.com

On Tuesday, July 19, 2022 7:41 PM, Japin Li  wrote:
> After looking around, I found psql/describe.c also has some memory
> leaks,
> attached a patch to fix these leaks.

Thanks for your check and improvement.
Your fix LGTM so please allow me to merge it in the attached patch. 
Based on your rebased version, now this new patch version is V3.

Regards,
Tang

v3-0001-fix-the-memory-leak-in-psql-describe.patch
Description: v3-0001-fix-the-memory-leak-in-psql-describe.patch

Re: System column support for partitioned tables using heap

2022-07-19 Thread Morris de Oryx

On Tue, Jul 19, 2022 at 10:38 PM Robert Haas  wrote:


> For MERGE itself, I wonder if some information about this should be
> included in the command tag. It looks like MERGE already includes some
> sort of row count in the command tag, but I guess perhaps it doesn't
> distinguish between inserts and updates. I don't know why we couldn't
> expose multiple values this way, though.

It would be great to get some sort of feedback from MERGE accessible
through SQL results, even if that doesn't come in the form of a RETURNING
list.

> I wonder whether you could just have the CTEs bubble up 1 or 0 and
> then sum them at some stage, instead of relying on xmax. Presumably
> your UPSERT simulation knows which thing it did in each case.

It might help if I show a sample insert handling function. The issue is
with the line at the end of the top CTE, insert_rows:

returning xmax as inserted_transaction_id),

That's what fails on partitions. Is there an alternative way to test what
happened to the row(s)? here's the full function. . I wrote a code
generator, so I don't have to hand-code all of these bits for each
table+version:

-- Create a function to accept an array of rows formatted as item_type_v1
for UPSERT into item_type.
DROP FUNCTION IF EXISTS types_plus.insert_item_type_v1
(types_plus.item_type_v1[]);

CREATE OR REPLACE FUNCTION types_plus.insert_item_type_v1 (data_in
types_plus.item_type_v1[])

RETURNS TABLE (
   insert_countinteger,
   estimated_update_count  integer,
   transaction_id  text)

LANGUAGE SQL

BEGIN ATOMIC

-- The CTE below is a roundabout way of returning an insertion count from a
pure SQL function in Postgres.
WITH
inserted_rows as (
INSERT INTO item_type (
id,
marked_for_deletion,
name_)

SELECT
rows_in.id,
rows_in.marked_for_deletion,
rows_in.name_

FROM unnest(data_in) as rows_in

ON CONFLICT(id) DO UPDATE SET
marked_for_deletion = EXCLUDED.marked_for_deletion,
name_   = EXCLUDED.name_

returning xmax as inserted_transaction_id),

status_data AS (
 select count(*) FILTER (where inserted_transaction_id  = 0) AS
insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS
estimated_update_count,
pg_current_xact_id_if_assigned()::text   AS
transaction_id

   from inserted_rows),

insert_log_entry AS (
   INSERT INTO insert_log (
   data_file_id,
   ib_version,
   job_run_id,

  schema_name,
  table_name,
  records_submitted,
  insert_count,
  estimated_update_count)

SELECT
   coalesce_session_variable(
 'data_file_id',
 '')::uuid,

   coalesce_session_variable('ib_version'),  -- Default result is ''

   coalesce_session_variable(
 'job_run_id',
 '')::uuid,

   'ascendco',
   'item_type',
   (select cardinality(data_in)),
   insert_count,
   estimated_update_count

  FROM status_data
)

-- Final output/result.
   select insert_count,
  estimated_update_count,
  transaction_id

  from status_data;

END;

Re: Windows now has fdatasync()

2022-07-19 Thread Thomas Munro

Ok, I've pushed the Windows patch.  I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.

Mikael kindly upgraded conchuela, so that leaves just prairiedog
without fdatasync.  I've attached a patch to drop the configure probe
for that once prairiedog's host is reassigned to new duties, if we're
agreed on that.

While in this part of the code I noticed another anachronism that
could be cleaned up: our handling of the old pre-standard BSD O_FSYNC
flag.  Pulling on that I noticed I could remove a bunch of associated
macrology.
From b78c07a573c9f7e634eb9d33f57b07e9c37bfb47 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 20 Jul 2022 12:32:26 +1200
Subject: [PATCH 1/2] Remove O_FSYNC and associated macros.

O_FSYNC was a pre-standard way of spelling O_SYNC.  It's not needed on
any modern system.  We can just use O_SYNC directly, if it exists, and
get rid of our OPEN_SYNC_FLAG macro.

Likewise for O_DSYNC, we can just use it directly, if it exists, and get
rid of our OPEN_DATASYNC_FLAG macro.  The only complication is that we
still want to avoid choosing open_datasync as a default if O_DSYNC has
the same value as O_SYNC, so there is no change in behavior.
---
 src/backend/access/transam/xlog.c | 20 ++--
 src/bin/pg_test_fsync/pg_test_fsync.c | 12 ++--
 src/include/access/xlogdefs.h | 19 +--
 3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9854b51c6a..1da3b8eb2e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -171,10 +171,10 @@ const struct config_enum_entry sync_method_options[] = {
 #ifdef HAVE_FDATASYNC
 	{"fdatasync", SYNC_METHOD_FDATASYNC, false},
 #endif
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
 	{"open_sync", SYNC_METHOD_OPEN, false},
 #endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
 	{"open_datasync", SYNC_METHOD_OPEN_DSYNC, false},
 #endif
 	{NULL, 0, false}
@@ -7894,10 +7894,10 @@ get_sync_bit(int method)
 
 	/*
 	 * Optimize writes by bypassing kernel cache with O_DIRECT when using
-	 * O_SYNC/O_FSYNC and O_DSYNC.  But only if archiving and streaming are
-	 * disabled, otherwise the archive command or walsender process will read
-	 * the WAL soon after writing it, which is guaranteed to cause a physical
-	 * read if we bypassed the kernel cache. We also skip the
+	 * O_SYNC and O_DSYNC.  But only if archiving and streaming are disabled,
+	 * otherwise the archive command or walsender process will read the WAL
+	 * soon after writing it, which is guaranteed to cause a physical read if
+	 * we bypassed the kernel cache. We also skip the
 	 * posix_fadvise(POSIX_FADV_DONTNEED) call in XLogFileClose() for the same
 	 * reason.
 	 *
@@ -7921,13 +7921,13 @@ get_sync_bit(int method)
 		case SYNC_METHOD_FSYNC_WRITETHROUGH:
 		case SYNC_METHOD_FDATASYNC:
 			return 0;
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
 		case SYNC_METHOD_OPEN:
-			return OPEN_SYNC_FLAG | o_direct_flag;
+			return O_SYNC | o_direct_flag;
 #endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
 		case SYNC_METHOD_OPEN_DSYNC:
-			return OPEN_DATASYNC_FLAG | o_direct_flag;
+			return O_DSYNC | o_direct_flag;
 #endif
 		default:
 			/* can't happen (unless we are out of sync with option array) */
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index f7bc199a30..6739214eb8 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -300,7 +300,7 @@ test_sync(int writes_per_op)
 	printf(LABEL_FORMAT, "open_datasync");
 	fflush(stdout);
 
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
 	if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
 	{
 		printf(NA_FORMAT, _("n/a*"));
@@ -407,8 +407,8 @@ test_sync(int writes_per_op)
 	printf(LABEL_FORMAT, "open_sync");
 	fflush(stdout);
 
-#ifdef OPEN_SYNC_FLAG
-	if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+	if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
 	{
 		printf(NA_FORMAT, _("n/a*"));
 		fs_warning = true;
@@ -466,7 +466,7 @@ test_open_syncs(void)
 static void
 test_open_sync(const char *msg, int writes_size)
 {
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
 	int			tmpfile,
 ops,
 writes;
@@ -475,8 +475,8 @@ test_open_sync(const char *msg, int writes_size)
 	printf(LABEL_FORMAT, msg);
 	fflush(stdout);
 
-#ifdef OPEN_SYNC_FLAG
-	if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+	if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
 		printf(NA_FORMAT, _("n/a*"));
 	else
 	{
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index a47e3eeb1f..a720169b17 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -70,27 +70,10 @@ typedef uint16 RepOriginId;
  * default method.

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-19 Thread Masahiko Sawada

On Wed, Jul 20, 2022 at 12:11 PM Amit Kapila  wrote:
>
> On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada  wrote:
> >
> > On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada  
> > > wrote:
> > > >
> > >
> > > > BTW on backbranches, I think that the reason why we add
> > > > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > > > SnapBuild that could be serialized. Can we add a (private) array for
> > > > the initial running xacts in snapbuild.c instead of adding new
> > > > variables to ReorderBuffer?
> > > >
> > >
> > > While thinking about this, I wonder if the current patch for back
> > > branches can lead to an ABI break as it changes the exposed structure?
> > > If so, it may be another reason to change it to some other way
> > > probably as you are suggesting.
> >
> > Yeah, it changes the size of ReorderBuffer, which is not good.
> >
>
> So, are you planning to give a try with your idea of making a private
> array for the initial running xacts?

Yes.

>  I am not sure but I guess you are
> proposing to add it in SnapBuild structure, if so, that seems safe as
> that structure is not exposed.

We cannot add it in SnapBuild as it gets serialized to the disk.

>
> > Changing the function names and arguments would also break ABI. So
> > probably we cannot do the above idea of removing
> > ReorderBufferInitialXactsSetCatalogChanges() as well.
> >
>
> Why do you think we can't remove
> ReorderBufferInitialXactsSetCatalogChanges() from the back branch
> patch? I think we don't need to change the existing function
> ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
> like SnapBuildXidHasCatalogChanges() similar to master branch patch.

IIUC we need to change SnapBuildCommitTxn() but it's exposed.

Currently, we call like DecodeCommit() -> SnapBuildCommitTxn() ->
ReorderBufferXidHasCatalogChanges(). If we have a wrapper function, we
call like DecodeCommit() -> SnapBuildCommitTxn() ->
SnapBuildXidHasCatalogChanges() ->
ReorderBufferXidHasCatalogChanges(). In
SnapBuildXidHasCatalogChanges(), we need to check if the transaction
has XACT_XINFO_HAS_INVALS, which means DecodeCommit() needs to pass
either parsed->xinfo or (parsed->xinfo & XACT_XINFO_HAS_INVALS != 0)
down to SnapBuildXidHasCatalogChanges(). However, since
SnapBuildCommitTxn(), between DecodeCommit() and
SnapBuildXidHasCatalogChanges(), is exposed we cannot change it.

Another idea would be to have functions, say
SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
does actual work of handling transaction commits and both
SnapBuildCommitTxn() and SnapBuildCommit() call
SnapBuildCommitTxnWithXInfo() with different arguments.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Memory leak fix in psql

2022-07-19 Thread Michael Paquier

On Wed, Jul 20, 2022 at 03:14:35AM +, tanghy.f...@fujitsu.com wrote:
> Your fix LGTM so please allow me to merge it in the attached patch. 
> Based on your rebased version, now this new patch version is V3.

What about the argument of upthread where we could use a goto in
functions where there are multiple pattern validation checks?  Per se
v4 attached.
--
Michael
From ccc8bb83baf618ea1a481193403a9009cd4a24a5 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 20 Jul 2022 12:47:52 +0900
Subject: [PATCH v4] fix the memory leak in psql describe

---
 src/bin/psql/describe.c | 239 +++-
 1 file changed, 212 insertions(+), 27 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 88d92a08ae..1c40cc6d59 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -112,7 +112,10 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem)
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 4;");
 
@@ -182,7 +185,10 @@ describeAccessMethods(const char *pattern, bool verbose)
 NULL, "amname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -244,7 +250,10 @@ describeTablespaces(const char *pattern, bool verbose)
 NULL, "spcname", NULL,
 NULL,
 NULL, 1))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 
@@ -534,7 +543,9 @@ describeFunctions(const char *functypes, const char *func_pattern,
 "n.nspname", "p.proname", NULL,
 "pg_catalog.pg_function_is_visible(p.oid)",
 NULL, 3))
-		return false;
+	{
+		goto error_return;
+	}
 
 	for (int i = 0; i < num_arg_patterns; i++)
 	{
@@ -561,7 +572,9 @@ describeFunctions(const char *functypes, const char *func_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
-return false;
+			{
+goto error_return;
+			}
 		}
 		else
 		{
@@ -599,6 +612,10 @@ describeFunctions(const char *functypes, const char *func_pattern,
 
 	PQclear(res);
 	return true;
+
+error_return:
+	termPQExpBuffer(&buf);
+	return false;
 }
 
 
@@ -682,7 +699,10 @@ describeTypes(const char *pattern, bool verbose, bool showSystem)
 "pg_catalog.format_type(t.oid, NULL)",
 "pg_catalog.pg_type_is_visible(t.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -836,7 +856,9 @@ describeOperators(const char *oper_pattern,
 "n.nspname", "o.oprname", NULL,
 "pg_catalog.pg_operator_is_visible(o.oid)",
 NULL, 3))
-		return false;
+	{
+		goto error_return;
+	}
 
 	if (num_arg_patterns == 1)
 		appendPQExpBufferStr(&buf, "  AND o.oprleft = 0\n");
@@ -866,7 +888,9 @@ describeOperators(const char *oper_pattern,
 		true, false,
 		nspname, typname, ft, tiv,
 		NULL, 3))
-return false;
+			{
+goto error_return;
+			}
 		}
 		else
 		{
@@ -890,6 +914,10 @@ describeOperators(const char *oper_pattern,
 
 	PQclear(res);
 	return true;
+
+error_return:
+	termPQExpBuffer(&buf);
+	return false;
 }
 
 
@@ -953,7 +981,10 @@ listAllDbs(const char *pattern, bool verbose)
 		if (!validateSQLNamePattern(&buf, pattern, false, false,
 	NULL, "d.datname", NULL, NULL,
 	NULL, 1))
+		{
+			termPQExpBuffer(&buf);
 			return false;
+		}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1;");
 	res = PSQLexec(buf.data);
@@ -1106,7 +1137,10 @@ permissionsList(const char *pattern)
 "n.nspname", "c.relname", NULL,
 "n.nspname !~ '^pg_' AND pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
+	{
+		termPQExpBuffer(&buf);
 		return false;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
 
@@ -1177,16 +1211,15 @@ listDefaultACLs(const char *pattern)
 "pg_catalog.pg_get_userbyid(d.defaclrole)",
 NULL,
 NULL, 3))
-		return false;
+	{
+		goto error_return;
+	}
 
 	appendPQExpBufferStr(&buf, "ORDER BY 1, 2, 3;");
 
 	res = PSQLexec(buf.data);
 	if (!res)
-	{
-		termPQExpBuffer(&buf);
-		return false;
-	}
+		goto error_return;
 
 	myopt.nullPrint = NULL;
 	printfPQExpBuffer(&buf, _("Default access privileges"));
@@ -1200,6 +1233,10 @@ listDefaultACLs(const char *pattern)
 	termPQExpBuffer(&buf);
 	PQclear(res);
 	return true;
+
+error_return:
+	termPQExpBuffer(&buf);
+	return false;
 }
 
 
@@ -1253,7 +1290,9 @@ objectDescription(const char *pattern, bool showSystem)
 false, "n.nspname", "pgc.conname", NULL,
 "pg_catalog.pg_table_is_visible(c.oid)",
 NULL, 3))
-		return false;
+	{
+		goto error_return;
+	}
 
 	/* Domain constraint descriptions */
 	appendPQExpBuffer(&buf,
@@ -1277,7 +1316,9 @@ objectDescription(const char *pattern, bool show

Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?

2022-07-19 Thread David Rowley

On Wed, 20 Jul 2022 at 15:02, David Rowley  wrote:
> I've attached a patch for this and it changes the plan for the above query to:

Looks like I based that patch on the wrong branch.

Here's another version of the patch that's based on master.

David
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index b5d6c977ee..5f0ead2db8 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -1890,11 +1890,13 @@ find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
  * Since we assume here that a sort is required, there is no particular use
  * in matching any available ordering of the outerrel.  (joinpath.c has an
  * entirely separate code path for considering sort-free mergejoins.)  Rather,
- * it's interesting to try to match the requested query_pathkeys so that a
- * second output sort may be avoided; and failing that, we try to list "more
- * popular" keys (those with the most unmatched EquivalenceClass peers)
- * earlier, in hopes of making the resulting ordering useful for as many
- * higher-level mergejoins as possible.
+ * it's interesting to try to match, or match a prefix of the requested
+ * query_pathkeys so that a second output sort may be avoided or an
+ * incremental sort may be done instead.  We can get away with just a prefix
+ * of the query_pathkeys when that prefix covers the entire join condition.
+ * Failing that, we try to list "more popular" keys  (those with the most
+ * unmatched EquivalenceClass peers) earlier, in hopes of making the resulting
+ * ordering useful for as many higher-level mergejoins as possible.
  */
 List *
 select_outer_pathkeys_for_merge(PlannerInfo *root,
@@ -1964,11 +1966,16 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
 
/*
 * Find out if we have all the ECs mentioned in query_pathkeys; if so we
-* can generate a sort order that's also useful for final output. There 
is
-* no percentage in a partial match, though, so we have to have 'em all.
+* can generate a sort order that's also useful for final output. If we
+* only have a prefix of the query_pathkeys, and that prefix is the 
entire
+* join condition, then it's useful to use the prefix as the pathkeys as
+* this increases the chances that an incremental sort will be able to 
be
+* used by the upper planner.
 */
if (root->query_pathkeys)
{
+   int matches = 0;
+
foreach(lc, root->query_pathkeys)
{
PathKey*query_pathkey = (PathKey *) lfirst(lc);
@@ -1981,6 +1988,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
}
if (j >= necs)
break;  /* didn't find match */
+
+   matches++;
}
/* if we got to the end of the list, we have them all */
if (lc == NULL)
@@ -2003,6 +2012,23 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
}
}
}
+
+   /*
+* If we didn't find a full match, but matched all of the join 
clauses
+* then we'll make use of these as partially sorted input is 
better
+* than nothing for the upper planner as it may lead to 
incremental
+* sorts instead of full sorts.
+*/
+   else if (matches == nClauses)
+   {
+   pathkeys = list_copy_head(root->query_pathkeys, 
matches);
+
+   /* we have all of the join pathkeys, so nothing more to 
do */
+   pfree(ecs);
+   pfree(scores);
+
+   return pathkeys;
+   }
}
 
/*
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index e1d9d971d6..e83c552159 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2437,6 +2437,37 @@ select count(*) from
  1
 (1 row)
 
+set enable_hashjoin = 0;
+set enable_nestloop = 0;
+set enable_hashagg = 0;
+--
+-- check that we use the pathkeys from a prefix of the group by / order by
+-- clause for the join pathkeys when that prefix covers all join quals.  We
+-- expect this to lead to an incremental sort for the group by / order by.
+--
+explain (costs off)
+select x.thousand, x.twothousand, count(*)
+from tenk1 x inner join tenk1 y on x.thousand = y.thousand
+group by x.thousand, x.twothousand
+order by x.thousand desc, x.twothousand;
+QUERY PLAN 
   
+--
+ GroupAggregate
+   Group Key: x.thousand, x.twothousand
+   ->  Incremental Sort
+ Sort Key: x.thousand DESC

Re: Memory leak fix in psql

2022-07-19 Thread Japin Li



On Wed, 20 Jul 2022 at 11:51, Michael Paquier  wrote:
> On Wed, Jul 20, 2022 at 03:14:35AM +, tanghy.f...@fujitsu.com wrote:
>> Your fix LGTM so please allow me to merge it in the attached patch. 
>> Based on your rebased version, now this new patch version is V3.
>
> What about the argument of upthread where we could use a goto in
> functions where there are multiple pattern validation checks?  Per se
> v4 attached.

+1. LGTM.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

1 2 >

1 - 100 of 123 matches

Mail list logo