date:20221128

Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-28 Thread Richard Guo

On Mon, Nov 28, 2022 at 3:40 PM Michael Paquier  wrote:

> > On Fri, Nov 25, 2022 at 04:05:13PM +0800, Richard Guo wrote:
> >> I wonder if we need to revise the comment atop qual_is_pushdown_safe()
> >> too which says
> >>
> >>  * rinfo is a restriction clause applying to the given subquery (whose
> RTE
> >>  * has index rti in the parent query).
> >>
> >> since there is no 'given subquery' after we remove it from the params.
>
> I was thinking about this point, and it seems to me that we could just
> do s/the given subquery/a subquery/.  But perhaps you have a different
> view on the matter?


I think the new wording is good.  Thanks for the change.

Thanks
Richard

RE: Avoid streaming the transaction which are skipped (in corner cases)

2022-11-28 Thread shiy.f...@fujitsu.com

On Sun, Nov 27, 2022 1:33 PM Dilip Kumar  wrote:
> 
> On Sat, Nov 26, 2022 at 12:15 PM Amit Kapila 
> wrote:
> >
> > On Fri, Nov 25, 2022 at 5:38 PM Amit Kapila 
> wrote:
> > >
> > > On Fri, Nov 25, 2022 at 1:35 PM Dilip Kumar 
> wrote:
> > > >
> > > > During DecodeCommit() for skipping a transaction we use ReadRecPtr to
> > > > check whether to skip this transaction or not.  Whereas in
> > > > ReorderBufferCanStartStreaming() we use EndRecPtr to check whether
> to
> > > > stream or not. Generally it will not create a problem but if the
> > > > commit record itself is adding some changes to the transaction(e.g.
> > > > snapshot) and if the "start_decoding_at" is in between ReadRecPtr and
> > > > EndRecPtr then streaming will decide to stream the transaction where
> > > > as DecodeCommit will decide to skip it.  And for handling this case in
> > > > ReorderBufferForget() we call stream_abort().
> > > >
> > >
> > > The other cases are probably where we don't have FilterByOrigin or
> > > dbid check, for example,
> XLOG_HEAP2_NEW_CID/XLOG_XACT_INVALIDATIONS.
> > > We anyway actually don't send anything for such cases except empty
> > > start/stop messages. Can we add some flag to txn which says that there
> > > is at least one change like DML that we want to stream?
> > >
> >
> > We can probably think of using txn_flags for this purpose.
> 
> In the attached patch I have used txn_flags to identify whether it has
> any streamable change or not and the transaction will not be selected
> for streaming unless it has at least one streamable change.
> 

Thanks for your patch.

I saw that the patch added a check when selecting largest transaction, but in
addition to ReorderBufferCheckMemoryLimit(), the transaction can also be
streamed in ReorderBufferProcessPartialChange(). Should we add the check in
this function, too?

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 9a58c4bfb9..108737b02f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -768,7 +768,8 @@ ReorderBufferProcessPartialChange(ReorderBuffer *rb, 
ReorderBufferTXN *txn,
 */
if (ReorderBufferCanStartStreaming(rb) &&
!(rbtxn_has_partial_change(toptxn)) &&
-   rbtxn_is_serialized(txn))
+   rbtxn_is_serialized(txn) &&
+   rbtxn_has_streamable_change(txn))
ReorderBufferStreamTXN(rb, toptxn);
 }

Regards,
Shi yu

Re: Logical Replication Custom Column Expression

2022-11-28 Thread Peter Smith

On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas
 wrote:
>
> Yes, if the property is on the subscription side then it should be applied 
> for all the tables that the connected publication is exposing.
> So if the property is enabled you should be sure that this origin column 
> exists to all of the tables that the publication is exposing...
>
> Sure this is the complete idea, that the subscriber should match the PK of 
> origin, 
> As the subscription table will contain same key values from different 
> origins, for example:
>
> For publisher1 database table
> id pk integer | value character varying
> 1   | testA1
> 2   | testA2
>
> For publisher2 database table
> id pk integer | value character varying
> 1   | testB1
> 2   | testB2
>
> For subscriber database table
> origin pk character varying | id pk integer | value character varying
> publisher1   | 1   | testA1
> publisher1   | 2   | testA2
> publisher2   | 1   | testB1
> publisher2   | 2   | testB2
>
> All statements INSERT, UPDATE, DELETE should always include the predicate of 
> the origin.
>

This sounds similar to what I had posted [1] although I was saying the
generated column value might be the *subscriber* name, not the origin
publisher name. (where are you getting that value from -- somehow from
the subscriptions' CONNECTION dbname?)

Anyway, regardless of the details, please note -- my idea was really
intended just as a discussion starting point to demonstrate that
required functionality might be achieved using a simpler syntax than
what had been previously suggested. But in practice there may be some
problems with this approach -- e.g. how will the initial tablesync
COPY efficiently assign these subscriber name column values?

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.

Failed Assert while pgstat_unlink_relation

2022-11-28 Thread vignesh C

Hi,

While reviewing/testing one of the patches I found the following Assert:
#0  0x55c6312ba639 in pgstat_unlink_relation (rel=0x7fb73bcbac58)
at pgstat_relation.c:161
#1  0x55c6312bbb5a in pgstat_relation_delete_pending_cb
(entry_ref=0x55c6335563a8) at pgstat_relation.c:839
#2  0x55c6312b72bc in pgstat_delete_pending_entry
(entry_ref=0x55c6335563a8) at pgstat.c:1124
#3  0x55c6312be3f1 in pgstat_release_entry_ref (key=...,
entry_ref=0x55c6335563a8, discard_pending=true) at pgstat_shmem.c:523
#4  0x55c6312bee9a in pgstat_drop_entry
(kind=PGSTAT_KIND_RELATION, dboid=5, objoid=40960) at
pgstat_shmem.c:867
#5  0x55c6312c034a in AtEOXact_PgStat_DroppedStats
(xact_state=0x55c6334baac8, isCommit=false) at pgstat_xact.c:97
#6  0x55c6312c0240 in AtEOXact_PgStat (isCommit=false,
parallel=false) at pgstat_xact.c:55
#7  0x55c630df8bee in AbortTransaction () at xact.c:2861
#8  0x55c630df94fd in AbortCurrentTransaction () at xact.c:3352

I could reproduce this issue with the following steps:
create table t1(c1 int);
BEGIN;
CREATE TABLE t (c1 int);
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;

Regards,
Vignesh

Re: Understanding, testing and improving our Windows filesystem code

2022-11-28 Thread Thomas Munro

On Mon, Nov 28, 2022 at 8:58 PM Ian Lawrence Barwick  wrote:
> For my understanding, does this entry supersede the proposal in
> https://commitfest.postgresql.org/40/3347/ ?

I think so (Victor hasn't commented).  Patch 0004 derives from
Victor's patch and has him as primary author still, but I made some
changes:

* remove obsolete version check code
* provide fallback code for systems where it doesn't work (after some
research to determine that there are such systems, and what they do)
* test that it's really more POSIX-like and demonstrate what that
means (building on 0003)

Patch 0003 is a set of file system semantics tests that work on Unix,
but also exercise those src/port/*.c wrappers on Windows and show
differences from Unix semantics.  Some of these tests also verify
various bugfixes already committed, so they've been pretty useful to
me already even though they aren't in the tree yet.

Patches 0001 and 0002 are generic, unrelated to this Windows stuff,
and  provide a simple way to write unit tests for small bits of C code
without a whole PostgreSQL server.  That's something that has been
proposed in the abstract many times before by many people.  Here I've
tried to be minimalist about it, just what I needed for the
higher-numbered patches, building on existing technologies (TAP).

Re: Report roles in pg_upgrade pg_ prefix check

2022-11-28 Thread Daniel Gustafsson

> On 28 Nov 2022, at 02:18, Michael Paquier  wrote:
> 
> On Thu, Nov 24, 2022 at 12:31:09PM +0100, Daniel Gustafsson wrote:
>> Looking at a recent pg_upgrade thread I happened to notice that the check for
>> roles with a pg_ prefix only reports the error, not the roles it found.  
>> Other
>> similar checks where the user is expected to alter the old cluster typically
>> reports the found objects in a textfile.  The attached adds reporting to make
>> that class of checks consistent (the check for prepared transactions which 
>> also
>> isn't reporting is different IMO as it doesn't expect ALTER commands).
>> 
>> As this check is only executed against the old cluster the patch removes the
>> check when printing the error.
> 
> +1.  A backpatch would be nice, though not strictly mandatory as
> that's not a bug fix.

Yeah, it doesn't really qualify since this not a bugfix.

> +   ntups = PQntuples(res);
> +   i_rolname = PQfnumber(res, "rolname");
> 
> Would it be worth adding the OID on top of the role name in the
> generated report?  That would be a free meal.

We are a bit inconsistent in how much details we include in the report
textfiles, so could do that without breaking any consistency in reporting.
Looking at other checks, the below format would match what we already do fairly
well:

 (oid=)

Done in the attached.

--
Daniel Gustafsson   https://vmware.com/



v2-0001-Report-incompatible-roles-in-pg_upgrade-checking.patch
Description: Binary data

Failed Assert in pgstat_assoc_relation

2022-11-28 Thread vignesh C

Hi,

While reviewing/testing one of the patches I found the following Assert:
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139624429171648) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=139624429171648) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=139624429171648,
signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x7efcda6e3476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x7efcda6c97f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x5590bf283139 in ExceptionalCondition
(conditionName=0x5590bf468170 "rel->pgstat_info->relation == NULL",
fileName=0x5590bf46812b "pgstat_relation.c", lineNumber=143) at
assert.c:66
#6  0x5590bf0ce5f8 in pgstat_assoc_relation (rel=0x7efcce996a48)
at pgstat_relation.c:143
#7  0x5590beb83046 in initscan (scan=0x5590bfbf4af8, key=0x0,
keep_startblock=false) at heapam.c:343
#8  0x5590beb8466f in heap_beginscan (relation=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0, parallel_scan=0x0,
flags=449) at heapam.c:1223
#9  0x5590bf02af39 in table_beginscan (rel=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0) at
../../../src/include/access/tableam.h:891
#10 0x5590bf02bf8a in DefineQueryRewrite (rulename=0x5590bfb281d0
"_RETURN", event_relid=16387, event_qual=0x0, event_type=CMD_SELECT,
is_instead=true, replace=false, action=0x5590bfbf4aa8)
at rewriteDefine.c:447
#11 0x5590bf02b5ab in DefineRule (stmt=0x5590bfb285c0,
queryString=0x5590bfb277a8 "CREATE RULE \"_RETURN\" AS ON SELECT TO t
DO INSTEAD  SELECT * FROM t1;") at rewriteDefine.c:213

I could reproduce this issue with the following steps:
create table t1(c int);
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
select * from t;
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
ROLLBACK;

Regards,
Vignesh

Re: Avoid streaming the transaction which are skipped (in corner cases)

2022-11-28 Thread Dilip Kumar

On Mon, Nov 28, 2022 at 1:46 PM shiy.f...@fujitsu.com
 wrote:
>
> Thanks for your patch.
>
> I saw that the patch added a check when selecting largest transaction, but in
> addition to ReorderBufferCheckMemoryLimit(), the transaction can also be
> streamed in ReorderBufferProcessPartialChange(). Should we add the check in
> this function, too?
>
> diff --git a/src/backend/replication/logical/reorderbuffer.c 
> b/src/backend/replication/logical/reorderbuffer.c
> index 9a58c4bfb9..108737b02f 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
> @@ -768,7 +768,8 @@ ReorderBufferProcessPartialChange(ReorderBuffer *rb, 
> ReorderBufferTXN *txn,
>  */
> if (ReorderBufferCanStartStreaming(rb) &&
> !(rbtxn_has_partial_change(toptxn)) &&
> -   rbtxn_is_serialized(txn))
> +   rbtxn_is_serialized(txn) &&
> +   rbtxn_has_streamable_change(txn))
> ReorderBufferStreamTXN(rb, toptxn);
>  }

You are right we need this in ReorderBufferProcessPartialChange() as
well.  I will fix this in the next version.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Bug in row_number() optimization

2022-11-28 Thread Sergey Shinderuk


On 28.11.2022 03:23, David Rowley wrote:

On Sat, 26 Nov 2022 at 05:19, Tom Lane  wrote:


Sergey Shinderuk  writes:

What about user-defined operators? I created my own <= operator for int8
which returns true on null input, and put it in a btree operator class.
Admittedly, it's weird that (null <= 1) evaluates to true. But does it
violate  the contract of the btree operator class or something? Didn't
find a clear answer in the docs.


It's pretty unlikely that this would work during an actual index scan.
I'm fairly sure that btree (and other index AMs) have hard-wired
assumptions that index operators are strict.


If we're worried about that then we could just restrict this
optimization to only work with strict quals.


Not sure this is necessary if btree operators must be strict anyway.



The proposal to copy the datums into the query context does not seem
to me to be a good idea. If there are a large number of partitions
then it sounds like we'll leak lots of memory.  We could invent some
partition context that we reset after each partition, but that's
probably more complexity than it would be worth.


Ah, good point.



I've attached a draft patch to move the code to nullify the aggregate
results so that's only done once per partition and adjusted the
planner to limit this to strict quals.


Not quite sure that we don't need to do anything for the 
WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more 
tuples for the current partition, we still call ExecProject with 
dangling pointers. Is it okay?



+   if (!func_strict(opexpr->opfuncid))
+   return false;

Should return true instead?

--
Sergey Shinderukhttps://postgrespro.com/

[PATCH] Check snapshot argument of index_beginscan and family

2022-11-28 Thread Aleksander Alekseev

Hi hackers,

A colleague of mine (cc'ed) reported that he was able to pass a NULL
snapshot to index_beginscan() and it even worked to a certain degree.

I took my toy extension [1] and replaced the argument with NULL as an
experiment:

```
eax=# CREATE EXTENSION experiment;
CREATE EXTENSION
eax=# SELECT phonebook_lookup_index('Alice');
 phonebook_lookup_index

 -1
(1 row)

eax=# SELECT phonebook_insert('Bob', 456);
 phonebook_insert
--
1
(1 row)

eax=# SELECT phonebook_lookup_index('Alice');
 phonebook_lookup_index

 -1
(1 row)

eax=# SELECT phonebook_insert('Alice', 123);
 phonebook_insert
--
2
(1 row)

eax=# SELECT phonebook_lookup_index('Alice');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
```

So evidently it really works as long as the index doesn't find any
matching rows.

This could be really confusing for the extension authors so here is a
patch that adds corresponding Asserts().

[1]: https://github.com/afiskon/postgresql-extensions/tree/main/005-table-access

-- 
Best regards,
Aleksander Alekseev


v1-0001-Check-snapshot-argument-of-index_beginscan-and-fa.patch
Description: Binary data

Re: [PATCH] Check snapshot argument of index_beginscan and family

2022-11-28 Thread Pavel Borisov

Hi, Alexander!
> A colleague of mine (cc'ed) reported that he was able to pass a NULL
> snapshot to index_beginscan() and it even worked to a certain degree.
>
> I took my toy extension [1] and replaced the argument with NULL as an
> experiment:
>
> ```
> eax=# CREATE EXTENSION experiment;
> CREATE EXTENSION
> eax=# SELECT phonebook_lookup_index('Alice');
>  phonebook_lookup_index
> 
>  -1
> (1 row)
>
> eax=# SELECT phonebook_insert('Bob', 456);
>  phonebook_insert
> --
> 1
> (1 row)
>
> eax=# SELECT phonebook_lookup_index('Alice');
>  phonebook_lookup_index
> 
>  -1
> (1 row)
>
> eax=# SELECT phonebook_insert('Alice', 123);
>  phonebook_insert
> --
> 2
> (1 row)
>
> eax=# SELECT phonebook_lookup_index('Alice');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> ```
>
> So evidently it really works as long as the index doesn't find any
> matching rows.
>
> This could be really confusing for the extension authors so here is a
> patch that adds corresponding Asserts().
>
> [1]: 
> https://github.com/afiskon/postgresql-extensions/tree/main/005-table-access
I think it's a nice catch and worth fixing. The one thing I don't
agree with is using asserts for handling the error that can appear
because most probably the server is built with assertions off and in
this case, there still will be a crash in this case. I'd do this with
report ERROR. Otherwise, the patch looks right and worth committing.

Kind regards,
Pavel Borisov.

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Ajin Cherian

On Mon, Nov 28, 2022 at 8:10 PM vignesh C  wrote:
>
> Hi,
>
> While reviewing/testing one of the patches I found the following Assert:
> #0  __pthread_kill_implementation (no_tid=0, signo=6,
> threadid=139624429171648) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=6, threadid=139624429171648) at
> ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=139624429171648,
> signo=signo@entry=6) at ./nptl/pthread_kill.c:89
> #3  0x7efcda6e3476 in __GI_raise (sig=sig@entry=6) at
> ../sysdeps/posix/raise.c:26
> #4  0x7efcda6c97f3 in __GI_abort () at ./stdlib/abort.c:79
> #5  0x5590bf283139 in ExceptionalCondition
> (conditionName=0x5590bf468170 "rel->pgstat_info->relation == NULL",
> fileName=0x5590bf46812b "pgstat_relation.c", lineNumber=143) at
> assert.c:66
> #6  0x5590bf0ce5f8 in pgstat_assoc_relation (rel=0x7efcce996a48)
> at pgstat_relation.c:143
> #7  0x5590beb83046 in initscan (scan=0x5590bfbf4af8, key=0x0,
> keep_startblock=false) at heapam.c:343
> #8  0x5590beb8466f in heap_beginscan (relation=0x7efcce996a48,
> snapshot=0x5590bfb5a520, nkeys=0, key=0x0, parallel_scan=0x0,
> flags=449) at heapam.c:1223
> #9  0x5590bf02af39 in table_beginscan (rel=0x7efcce996a48,
> snapshot=0x5590bfb5a520, nkeys=0, key=0x0) at
> ../../../src/include/access/tableam.h:891
> #10 0x5590bf02bf8a in DefineQueryRewrite (rulename=0x5590bfb281d0
> "_RETURN", event_relid=16387, event_qual=0x0, event_type=CMD_SELECT,
> is_instead=true, replace=false, action=0x5590bfbf4aa8)
> at rewriteDefine.c:447
> #11 0x5590bf02b5ab in DefineRule (stmt=0x5590bfb285c0,
> queryString=0x5590bfb277a8 "CREATE RULE \"_RETURN\" AS ON SELECT TO t
> DO INSTEAD  SELECT * FROM t1;") at rewriteDefine.c:213
>
> I could reproduce this issue with the following steps:
> create table t1(c int);
> BEGIN;
> CREATE TABLE t (c int);
> SAVEPOINT q;
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> select * from t;
> ROLLBACK TO q;
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> ROLLBACK;
>
> Regards,
> Vignesh


I think what is happening here is that the previous relation is not
unlinked when pgstat_init_relation() is called
because the relation is now a view and for relations without storage
the relation is not unlinked in pgstat_init_relation()

void
pgstat_init_relation(Relation rel)
{
charrelkind = rel->rd_rel->relkind;

/*
 * We only count stats for relations with storage and partitioned tables
 */
if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
{
rel->pgstat_enabled = false;
rel->pgstat_info = NULL;
return;
}

There is a logic in DefineQueryRewrite() which converts a relation to
a view when you create such a rule like the test case does.
So initially the relation had storage,  the pgstat_info is linked,
then table is converted to a view, but in init, the previous
relation is not unlinked but when it tries to link a new relation, the
assert fails saying a previous relation is already linked to
pgstat_info

I have made a small patch with a fix, but I am not sure if this is the
right way to fix this.

regards,
Ajin Cherian
Fujitsu Australia


pgstat_assoc_fix_for_views.patch
Description: Binary data

Re: Another multi-row VALUES bug

2022-11-28 Thread Dean Rasheed

On Wed, 23 Nov 2022 at 18:56, Tom Lane  wrote:
>
> I wonder if somehow we could just make one pass over
> all the VALUES RTEs, and process each one as needed?  The problem
> is to identify the relevant target relation, I guess.
>

I have been thinking about that some more, but I think it would be
pretty difficult to achieve.

Part of the problem is that the targetlist processing and VALUES RTE
processing are quite closely coupled (because of things like GENERATED
ALWAYS columns). Both rewriteTargetListIU() and rewriteValuesRTE()
rely on being passed the VALUES RTE that the targetlist is reading
from, and rewriteValuesRTE() then relies on extra information returned
by rewriteTargetListIU().

Also, there's the way that DEFAULTs from updatable views work, which
means that the DEFAULTs in a VALUES RTE won't necessarily all come
from the same target relation.

So I think it would be much harder to do the VALUES RTE processing
anywhere other than where it's being done right now, and even if it
could be done elsewhere, it would be a very invasive change, and
therefore hard to back-patch.

That, of course, leaves the problem of identifying the right VALUES
RTE to process.

A different way to do this, without relying on the contents of the
targetlist, is to note that, while processing a product query, what we
really want to do is ignore any VALUES RTEs from the original query,
since they will have already been processed. There should then never
be more than one VALUES RTE left to process -- the one from the rule
action.

This can be done by exploiting the fact that in product queries, the
rtable always consists of the rtable from the original query followed
by the rtable from the rule action, so we just need to ignore the
right number of RTEs at the start of the rtable. Of course that would
break if we ever changed the way rewriteRuleAction() worked, but at
least it only depends on that one other place in the code, which has
been stable for a long time, so the risk of future breakage seems
managable.

Regards,
Dean
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
new file mode 100644
index fb0c687..1e3efbb
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -418,6 +418,10 @@ rewriteRuleAction(Query *parsetree,
 	 * NOTE: because planner will destructively alter rtable, we must ensure
 	 * that rule action's rtable is separate and shares no substructure with
 	 * the main rtable.  Hence do a deep copy here.
+	 *
+	 * Note also that RewriteQuery() relies on the fact that RT entries from
+	 * the original query appear at the start of the expanded rtable, so
+	 * beware of changing this.
 	 */
 	sub_action->rtable = list_concat(copyObject(parsetree->rtable),
 	 sub_action->rtable);
@@ -3622,9 +3626,13 @@ rewriteTargetView(Query *parsetree, Rela
  *
  * rewrite_events is a list of open query-rewrite actions, so we can detect
  * infinite recursion.
+ *
+ * orig_rt_length is the length of the originating query's rtable, for product
+ * queries created by fireRules(), and 0 otherwise.  This is used to skip any
+ * already-processed VALUES RTEs from the original query.
  */
 static List *
-RewriteQuery(Query *parsetree, List *rewrite_events)
+RewriteQuery(Query *parsetree, List *rewrite_events, int orig_rt_length)
 {
 	CmdType		event = parsetree->commandType;
 	bool		instead = false;
@@ -3648,7 +3656,7 @@ RewriteQuery(Query *parsetree, List *rew
 		if (ctequery->commandType == CMD_SELECT)
 			continue;

-		newstuff = RewriteQuery(ctequery, rewrite_events);
+		newstuff = RewriteQuery(ctequery, rewrite_events, 0);

 		/*
 		 * Currently we can only handle unconditional, single-statement DO
@@ -3722,6 +3730,7 @@ RewriteQuery(Query *parsetree, List *rew
 		RangeTblEntry *rt_entry;
 		Relation	rt_entry_relation;
 		List	   *locks;
+		int			product_orig_rt_length;
 		List	   *product_queries;
 		bool		hasUpdate = false;
 		int			values_rte_index = 0;
@@ -3743,23 +3752,30 @@ RewriteQuery(Query *parsetree, List *rew
 		 */
 		if (event == CMD_INSERT)
 		{
+			ListCell   *lc2;
 			RangeTblEntry *values_rte = NULL;

 			/*
-			 * If it's an INSERT ... VALUES (...), (...), ... there will be a
-			 * single RTE for the VALUES targetlists.
+			 * Test if it's a multi-row INSERT ... VALUES (...), (...), ... by
+			 * looking for a VALUES RTE in the fromlist.  For product queries,
+			 * we must ignore any already-processed VALUES RTEs from the
+			 * original query.  These appear at the start of the rangetable.
 			 */
-			if (list_length(parsetree->jointree->fromlist) == 1)
+			foreach(lc2, parsetree->jointree->fromlist)
 			{
-RangeTblRef *rtr = (RangeTblRef *) linitial(parsetree->jointree->fromlist);
+RangeTblRef *rtr = (RangeTblRef *) lfirst(lc2);

-if (IsA(rtr, RangeTblRef))
+if (IsA(rtr, RangeTblRef) && rtr->rtindex > orig_rt_length)
 {
 	RangeTblEntry *rte = rt_fetch(rtr->rtindex,
   parsetree->rtable);

Re: [PATCH] Check snapshot argument of index_beginscan and family

2022-11-28 Thread Aleksander Alekseev

Hi Pavel,

Thanks for the feedback!

> I think it's a nice catch and worth fixing. The one thing I don't
> agree with is using asserts for handling the error that can appear
> because most probably the server is built with assertions off and in
> this case, there still will be a crash in this case. I'd do this with
> report ERROR. Otherwise, the patch looks right and worth committing.

Normally a developer is not supposed to pass NULLs there so I figured
having extra if's in the release builds doesn't worth it. Personally I
wouldn't mind using ereport() but my intuition tells me that the
committers are not going to approve this :)

Let's see what the rest of the community thinks.

-- 
Best regards,
Aleksander Alekseev

Re: Introduce a new view for checkpointer related stats

2022-11-28 Thread Bharath Rupireddy

On Sat, Nov 26, 2022 at 4:32 AM Andres Freund  wrote:
>

Thanks Andres for reviewing.

> > May I know what it means to deprecate pg_stat_bgwriter columns?
>
> Add a note to the docs saying that the columns will be removed.

Done.

> > Are
> > you suggesting to add deprecation warnings to corresponding functions
> > pg_stat_get_bgwriter_buf_written_clean(),
> > pg_stat_get_bgwriter_maxwritten_clean(), pg_stat_get_buf_alloc() and
> > pg_stat_get_bgwriter_stat_reset_time() and in the docs?
>
> I'm thinking of the checkpoint related columns in pg_stat_bgwriter.

Added note in the docs alongside each deprecated pg_stat_bgwriter's
checkpoint related column.

> If we move, rather than duplicate, the pg_stat_bgwriter columns to
> pg_stat_checkpointer, everyone will have to update their monitoring scripts
> when upgrading and will need to add version dependency if they monitor
> multiple versions. If we instead keep the duplicated columns in
> pg_stat_bgwriter for 5 years, users can reloy on pg_stat_checkpointer in all
> supported versions.

Agree. However, it's a bit difficult to keep track of deprecated
things and come back after 5 years to clean them up unless "some"
postgres hacker/developer/user notices it again. Perhaps, adding a
separate section, say 'Deprecated and To-Be-Removed, in
https://wiki.postgresql.org/wiki/Main_Page is a good idea.

> To be clear, it isn't a very heavy burden for users to make these
> adjustments. But if it only costs us a few lines to keep the old columns for a
> bit, that seems worth it.

Yes.

I'm attaching the v3 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From ec09bfcde836cb23f090fd6088b89f0fb9a68000 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Mon, 28 Nov 2022 10:07:55 +
Subject: [PATCH v3] Introduce a new view for checkpointer related stats

pg_stat_bgwriter view currently reports checkpointer stats as well.
It is that way because historically checkpointer was part of
bgwriter until the commits 806a2ae and bf405ba, that went into
PG 9.2, separated them out.

It is time for us to separate checkpointer stats to its own view
called pg_stat_checkpointer.

For now, we deprecate the use of these checkpointer related fields
under pg_stat_bgwriter view for smoother transitioning.
Eventually, we need to remove them from pg_stat_bgwriter view.

Bump catalog version.

Author: Bharath Rupireddy
Reviewed-by: Andres Freund, Bertrand Drouvot
Discussion: https://www.postgresql.org/message-id/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
---
 doc/src/sgml/monitoring.sgml  | 165 +-
 src/backend/catalog/system_views.sql  |  17 +-
 .../utils/activity/pgstat_checkpointer.c  |   1 +
 src/backend/utils/adt/pgstatfuncs.c   |  19 +-
 src/include/catalog/pg_proc.dat   |  22 ++-
 src/include/pgstat.h  |   1 +
 src/test/recovery/t/029_stats_restart.pl  |   6 +-
 src/test/regress/expected/rules.out   |  14 +-
 src/test/regress/expected/stats.out   |  21 ++-
 src/test/regress/sql/stats.sql|  12 +-
 10 files changed, 236 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5579b8b9e0..c69d3b21c0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -448,6 +448,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
  
 
+ 
+  pg_stat_checkpointerpg_stat_checkpointer
+  One row only, showing statistics about the
+   checkpointer process's activity. See
+   
+   pg_stat_checkpointer for details.
+ 
+ 
+
  
   pg_stat_walpg_stat_wal
   One row only, showing statistics about WAL activity. See
@@ -3654,7 +3663,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
checkpoints_timed bigint
   
   
-   Number of scheduled checkpoints that have been performed
+   Number of scheduled checkpoints that have been performed. Use of this
+   field under pg_stat_bgwriter view has been
+   deprecated, because the field actually shows checkpointer process
+   activity and is moved to a new view called
+   pg_stat_checkpointer.
   
  
 
@@ -3663,7 +3676,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
checkpoints_req bigint
   
   
-   Number of requested checkpoints that have been performed
+   Number of requested checkpoints that have been performed. Use of this
+   field under pg_stat_bgwriter view has been
+   deprecated, because the field actually shows checkpointer process
+   activity and is moved to a new view called
+   pg_stat_checkpointer.
   
  
 
@@ -3673,7 +3690,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_a

Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-28 Thread Amit Kapila

On Sun, Nov 27, 2022 at 9:43 AM houzj.f...@fujitsu.com
 wrote:
>
> Attach the new version patch which addressed all comments so far.
>

Few comments on v52-0001*

1.
pa_free_worker()
{
...
+ /* Free the worker information if the worker exited cleanly. */
+ if (!winfo->error_mq_handle)
+ {
+ pa_free_worker_info(winfo);
+
+ if (winfo->in_use &&
+ !hash_search(ParallelApplyWorkersHash, &xid, HASH_REMOVE, NULL))
+ elog(ERROR, "hash table corrupted");

pa_free_worker_info() pfrees the winfo, so how is it legal to
winfo->in_use in the above check?

Also, why is this check (!winfo->error_mq_handle) required in the
first place in the patch? The worker exits cleanly only when the
leader apply worker sends a SIGINT signal and in that case, we already
detach from the error queue and clean up other worker information.

2.
+HandleParallelApplyMessages(void)
+{
...
...
+ foreach(lc, ParallelApplyWorkersList)
+ {
+ shm_mq_result res;
+ Size nbytes;
+ void*data;
+ ParallelApplyWorkerInfo *winfo = (ParallelApplyWorkerInfo *) lfirst(lc);
+
+ if (!winfo->error_mq_handle)
+ continue;

Similar to the previous comment, it is not clear whether we need this
check. If required, can we add a comment to indicate the case where it
happens to be true?

Note, there is a similar check for winfo->error_mq_handle in
pa_wait_for_xact_state(). Please add some comments if that is
required.

3. Why is there apply_worker_clean_exit() at the end of
ParallelApplyWorkerMain()? Normally either the leader worker stops
parallel apply, or parallel apply gets stopped because of a parameter
change, or exits because of error, and in none of those cases it can
hit this code path unless I am missing something.

Additionally, I think in LogicalParallelApplyLoop, we will never
receive zero-length messages so that is also wrong and should be
converted to elog(ERROR,..).

4. I think in logicalrep_worker_detach(), we should detach from the
shm error queue so that the parallel apply worker won't try to send a
termination message back to the leader worker.

5.
pa_send_data()
{
...
+ if (startTime == 0)
+ startTime = GetCurrentTimestamp();
...

What is the use of getting the current timestamp before waitlatch
logic, if it is not used before that? It seems that is for the time
logic to look correct. We can probably reduce the 10s interval to 9s
for that.

In this function, we need to add some comments to indicate why the
current logic is used, and also probably we can refer to the comments
atop this file.

6. I think it will be better if we keep stream_apply_worker local to
applyparallelworker.c by exposing functions to cache/resetting the
required info.

7. Apart from the above, I have made a few changes in the comments and
some miscellaneous cosmetic changes in the attached. Kindly include
these in the next version unless you see a problem with any change.

-- 
With Regards,
Amit Kapila.

changes_amit_v52_1.patch
Description: Binary data

changes_amit_v52_1.patch
Description: Binary data

Re: Logical Replication Custom Column Expression

2022-11-28 Thread Stavros Koureas

Sure I understand and neither do I have good knowledge of what else could
be influenced by such a change.
If the value of the column is the subscriber name has no benefit to this
idea of merging multiple upstreams with same primary keys, later you
describe the "connection dbname", yes this could be a possibility.
I do not fully understand that part "how will the initial tablesync COPY
efficiently assign these subscriber name column values?"
Why is difficult that during the initial sync put everywhere the same value
for all rows of the same origin?

Στις Δευ 28 Νοε 2022 στις 10:16 π.μ., ο/η Peter Smith 
έγραψε:

> On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas
>  wrote:
> >
> > Yes, if the property is on the subscription side then it should be
> applied for all the tables that the connected publication is exposing.
> > So if the property is enabled you should be sure that this origin column
> exists to all of the tables that the publication is exposing...
> >
> > Sure this is the complete idea, that the subscriber should match the PK
> of origin, 
> > As the subscription table will contain same key values from different
> origins, for example:
> >
> > For publisher1 database table
> > id pk integer | value character varying
> > 1   | testA1
> > 2   | testA2
> >
> > For publisher2 database table
> > id pk integer | value character varying
> > 1   | testB1
> > 2   | testB2
> >
> > For subscriber database table
> > origin pk character varying | id pk integer | value character varying
> > publisher1   | 1   | testA1
> > publisher1   | 2   | testA2
> > publisher2   | 1   | testB1
> > publisher2   | 2   | testB2
> >
> > All statements INSERT, UPDATE, DELETE should always include the
> predicate of the origin.
> >
>
> This sounds similar to what I had posted [1] although I was saying the
> generated column value might be the *subscriber* name, not the origin
> publisher name. (where are you getting that value from -- somehow from
> the subscriptions' CONNECTION dbname?)
>
> Anyway, regardless of the details, please note -- my idea was really
> intended just as a discussion starting point to demonstrate that
> required functionality might be achieved using a simpler syntax than
> what had been previously suggested. But in practice there may be some
> problems with this approach -- e.g. how will the initial tablesync
> COPY efficiently assign these subscriber name column values?
>
> --
> [1]
> https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia.
>

Re: Reducing power consumption on idle servers

2022-11-28 Thread Robert Haas

On Sun, Nov 27, 2022 at 6:57 PM Thomas Munro  wrote:
> The main documentation of pg_promote() etc now has "The parameter
> promote_trigger_file has been removed" in the
> places where the GUC was previously mentioned.  Perhaps we should just
> remove the mentions completely (it's somehow either too much and not
> enough information), but I'm OK with leaving those in for now until
> some better place exists for historical notes.

I think we should remove those mentions. Otherwise the documentation
just collects mentions of an increasing number of things that are no
longer relevant.

And, as you say, we can't presume that future readers would know what
promote_trigger_file even is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Simon Riggs

On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  wrote:
>
> Thanks for taking a look!
>
> On Thu, Nov 24, 2022 at 05:31:02PM +, Simon Riggs wrote:
> > * not sure I believe that everything it does can always be aborted out
> > of and shutdown - to achieve that you will need a
> > CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least
>
> I did something like this earlier, but was advised to simply let the
> functions finish as usual during shutdown [0].  I think this is what the
> checkpointer process does today, anyway.

If we say "The custodian is not an essential process and can shutdown
quickly when requested.", and yet we know its not true in all cases,
then that will lead to misunderstandings and bugs.

If we perform a restart and the custodian is performing extra work
that delays shutdown, then it also delays restart. Given the title of
the thread, we should be looking to improve that, or at least know it
occurred.

> > * not sure why you want immediate execution of custodian tasks - I
> > feel supporting two modes will be a lot harder. For me, I would run
> > locally when !IsUnderPostmaster and also in an Assert build, so we can
> > test it works right - i.e. running in its own process is just a
> > production optimization for performance (which is the stated reason
> > for having this)
>
> I added this because 0004 involves requesting a task from the postmaster,
> so checking for IsUnderPostmaster doesn't work.  Those tasks would always
> run immediately.  However, we could use IsPostmasterEnvironment instead,
> which would allow us to remove the "immediate" argument.  I did it this way
> in v14.

Thanks

> > 0005 seems good from what I know
> > * There is no check to see if it worked in any sane time
>
> What did you have in mind?  Should the custodian begin emitting WARNINGs
> after a while?

I think it might be useful if it logged anything that took an
"extended period", TBD.

Maybe that is already covered by startup process logging. Please tell
me that still works?

> > Rather than explicitly use DEBUG1 everywhere I would have an
> > #define CUSTODIAN_LOG_LEVEL LOG
> > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > line change in a later phase of Beta
>
> I can create a separate patch for this, but I don't think I've ever seen
> this sort of thing before.

Much of recovery is coded that way, for the same reason.

> Is the idea just to help with debugging during
> the development phase?

"Just", yes. Tests would be desirable also, under src/test/modules.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Perform streaming logical transactions by background workers and parallel apply

2022-11-28 Thread Amit Kapila

On Mon, Nov 28, 2022 at 12:49 PM Peter Smith  wrote:
>
...
>
> 17.
> @@ -388,10 +401,9 @@ static inline void cleanup_subxact_info(void);
>  /*
>   * Serialize and deserialize changes for a toplevel transaction.
>   */
> -static void stream_cleanup_files(Oid subid, TransactionId xid);
>  static void stream_open_file(Oid subid, TransactionId xid,
>   bool first_segment);
> -static void stream_write_change(char action, StringInfo s);
> +static void stream_write_message(TransactionId xid, char action, StringInfo 
> s);
>  static void stream_close_file(void);
>
> 17a.
>
> I felt just saying "file/files" is too vague. All the references to
> the file should be consistent, so IMO everything would be better named
> like:
>
> "stream_cleanup_files" -> "stream_msg_spoolfile_cleanup()"
> "stream_open_file" ->  "stream_msg_spoolfile_open()"
> "stream_close_file" -> "stream_msg_spoolfile_close()"
> "stream_write_message" -> "stream_msg_spoolfile_write_msg()"
>
> ~
>
> 17b.
> IMO there is not enough distinction here between function names
> stream_write_message and stream_write_change. e.g. You cannot really
> tell from their names what might be the difference.
>
> ~~~
>

I think the only new function needed by this patch is
stream_write_message so don't see why to change all others for that. I
see two possibilities to make name better (a) name function as
stream_open_and_write_change, or (b) pass a new argument (boolean
open) to stream_write_change

...
>
> src/include/replication/worker_internal.h
>
> 33. LeaderFileSetState
>
> +/* State of fileset in leader apply worker. */
> +typedef enum LeaderFileSetState
> +{
> + LEADER_FILESET_UNKNOWN,
> + LEADER_FILESET_BUSY,
> + LEADER_FILESET_ACCESSIBLE
> +} LeaderFileSetState;
>
> 33a.
>
> Missing from typedefs.list?
>
> ~
>
> 33b.
>
> I thought some more explanatory comments for the meaning of
> BUSY/ACCESSIBLE should be here.
>
> ~
>
> 33c.
>
> READY might be a better value than ACCESSIBLE
>
> ~
>
> 33d.
> I'm not sure what usefulness does the "LEADER_" and "Leader" prefixes
> give here. Maybe a name like PartialFileSetStat is more meaningful?
>
> e.g. like this?
>
> typedef enum PartialFileSetState
> {
> FS_UNKNOWN,
> FS_BUSY,
> FS_READY
> } PartialFileSetState;
>
> ~
>

All your suggestions in this point look good to me.

>
> ~~~
>
>
> 35. globals
>
>   /*
> + * Indicates whether the leader apply worker needs to serialize the
> + * remaining changes to disk due to timeout when sending data to the
> + * parallel apply worker.
> + */
> + bool serialize_changes;
>
> 35a.
> I wonder if the comment would be better to also mention "via shared memory".
>
> SUGGESTION
>
> Indicates whether the leader apply worker needs to serialize the
> remaining changes to disk due to timeout when attempting to send data
> to the parallel apply worker via shared memory.
>
> ~
>

I think the comment should say " .. the leader apply worker serialized
remaining changes ..."

> 35b.
> I wonder if a more informative variable name might be
> serialize_remaining_changes?
>

I think this needlessly makes the variable name long.

-- 
With Regards,
Amit Kapila.

Re: TAP output format in pg_regress

2022-11-28 Thread Daniel Gustafsson

> On 27 Nov 2022, at 11:22, Nikolay Shaplov  wrote:
> В письме от суббота, 26 ноября 2022 г. 23:35:45 MSK пользователь Daniel 
> Gustafsson написал:

> I wold suggest to use word immediate instead of noatexit. This will do the 
> code much more sensible for me. 

I think noatexit is clearer since the codepath is specifically to avoid any
registered atexit functions.  The point of this function is to be able to call
bail while in a function registered with atexit() so I think the current name
is better.

> I've also rewritten the comment, the way I would understand it better, if I 
> read it for the first time. I am not sure about my English, but key features 
> there are: 
> 
> - "terminate process" instead of "exit". Too many exist in the sentence, 
> better to use synonyms wherever is possible.

Sure, I can do that before pushing if the current version of the patch is
acceptable.

>> That would if so make the output something like the below.  Personally I
>> think the "test" prefix adds little value since everything printed are test
>> suites, and we are already today using indentation for grouping parallel
>> tests.
> 
> So this extra offset indicates that test is being included into parallel 
> group? Guess it not really obvious...

Grouping parallel tests via an initial list of test and then indenting each
test with whitespace was committed 22 years ago.  While there might be better
ways to do this, the lack of complaints so far at least seems to indicate that
it isn't all too terrible.

> Theoretically TAP 14 has subtests and this parallel tests looks like 
> subtests... but TAP 14 is not supported by modern harnesses..

Parallel tests aren't subtests though, they are single top-level tests which
run in parallel to each other.

--
Daniel Gustafsson   https://vmware.com/

Re: Non-decimal integer literals

2022-11-28 Thread Peter Eisentraut


On 23.11.22 17:25, Dean Rasheed wrote:

Taking a quick look, I noticed that there are no tests for negative
values handled in the parser.

Giving that a spin shows that make_const() fails to correctly identify
the base of negative non-decimal integers in the T_Float case, causing
it to fall through to numeric_in() and fail:


Fixed in new patch.
From 2d7f41981187df904e3d985f2770d9b5c26e9d7c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 28 Nov 2022 09:24:20 +0100
Subject: [PATCH v11] Non-decimal integer literals

Add support for hexadecimal, octal, and binary integer literals:

0x42F
0o273
0b100101

per SQL:202x draft.

This adds support in the lexer as well as in the integer type input
functions.

Discussion: 
https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb...@enterprisedb.com
---
 doc/src/sgml/syntax.sgml   |  34 
 src/backend/catalog/information_schema.sql |   6 +-
 src/backend/catalog/sql_features.txt   |   1 +
 src/backend/parser/parse_node.c|  37 +++-
 src/backend/parser/scan.l  | 101 ---
 src/backend/utils/adt/numutils.c   | 170 --
 src/fe_utils/psqlscan.l|  78 +++--
 src/interfaces/ecpg/preproc/pgc.l  | 106 ++-
 src/test/regress/expected/int2.out |  80 +
 src/test/regress/expected/int4.out |  80 +
 src/test/regress/expected/int8.out |  80 +
 src/test/regress/expected/numerology.out   | 193 -
 src/test/regress/sql/int2.sql  |  22 +++
 src/test/regress/sql/int4.sql  |  22 +++
 src/test/regress/sql/int8.sql  |  22 +++
 src/test/regress/sql/numerology.sql|  51 +-
 16 files changed, 974 insertions(+), 109 deletions(-)

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 93ad71737f51..956182e7c6a8 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -694,6 +694,40 @@ Numeric Constants
 
 
 
+
+ Additionally, non-decimal integer constants can be used in these forms:
+
+0xhexdigits
+0ooctdigits
+0bbindigits
+
+ hexdigits is one or more hexadecimal digits
+ (0-9, A-F), octdigits is one or more octal
+ digits (0-7), bindigits is one or more binary
+ digits (0 or 1).  Hexadecimal digits and the radix prefixes can be in
+ upper or lower case.  Note that only integers can have non-decimal forms,
+ not numbers with fractional parts.
+
+
+
+ These are some examples of this:
+0b100101
+0B10011001
+0o273
+0O755
+0x42f
+0X
+
+
+
+
+ 
+  Nondecimal integer constants are currently only supported in the range
+  of the bigint type (see ).
+ 
+
+
 
  integer
  bigint
diff --git a/src/backend/catalog/information_schema.sql 
b/src/backend/catalog/information_schema.sql
index 18725a02d1fb..95c27a625e7e 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -119,7 +119,7 @@ CREATE FUNCTION _pg_numeric_precision(typid oid, typmod 
int4) RETURNS integer
  WHEN 1700 /*numeric*/ THEN
   CASE WHEN $2 = -1
THEN null
-   ELSE (($2 - 4) >> 16) & 65535
+   ELSE (($2 - 4) >> 16) & 0x
END
  WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/
  WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/
@@ -147,7 +147,7 @@ CREATE FUNCTION _pg_numeric_scale(typid oid, typmod int4) 
RETURNS integer
WHEN $1 IN (1700) THEN
 CASE WHEN $2 = -1
  THEN null
- ELSE ($2 - 4) & 65535
+ ELSE ($2 - 4) & 0x
  END
ELSE null
   END;
@@ -163,7 +163,7 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod 
int4) RETURNS integer
WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
THEN CASE WHEN $2 < 0 THEN 6 ELSE $2 END
WHEN $1 IN (1186) /* interval */
-   THEN CASE WHEN $2 < 0 OR $2 & 65535 = 65535 THEN 6 ELSE $2 & 65535 
END
+   THEN CASE WHEN $2 < 0 OR $2 & 0x = 0x THEN 6 ELSE $2 & 
0x END
ELSE null
   END;
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 8704a42b60a8..abad216b7ee4 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -527,6 +527,7 @@ T652SQL-dynamic statements in SQL routines  
NO
 T653   SQL-schema statements in external routines  YES 
 T654   SQL-dynamic statements in external routines NO  
 T655   Cyclically dependent routines   YES 
+T661   Non-decimal integer literalsYES SQL:202x draft
 T811   Basic SQL/JSON constructor functionsNO  
 T812   SQL/JSON: JSON_OBJECTAGG

Re: Reducing power consumption on idle servers

2022-11-28 Thread Tom Lane

Robert Haas  writes:
> I think we should remove those mentions. Otherwise the documentation
> just collects mentions of an increasing number of things that are no
> longer relevant.

Yeah, I think the same.  There will be a release-note entry, and
I don't object to having something about it in appendix-obsolete.sgml;
but we shouldn't clutter the main docs with it.

regards, tom lane

Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-28 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Nov 28, 2022 at 11:54:45AM +0900, Michael Paquier wrote:
>> On Fri, Nov 25, 2022 at 04:05:13PM +0800, Richard Guo wrote:
> I wonder if we need to revise the comment atop qual_is_pushdown_safe()
> too which says
> 
> * rinfo is a restriction clause applying to the given subquery (whose RTE
> * has index rti in the parent query).
> 
> since there is no 'given subquery' after we remove it from the params.

> I was thinking about this point, and it seems to me that we could just
> do s/the given subquery/a subquery/.  But perhaps you have a different
> view on the matter?

My viewpoint is that this change is misguided.  Even if the current
coding of qual_is_pushdown_safe doesn't happen to reference the
subquery, it might need to in future.

regards, tom lane

Re: Remove a unused argument from qual_is_pushdown_safe

2022-11-28 Thread Daniel Gustafsson

> On 28 Nov 2022, at 15:15, Tom Lane  wrote:

> My viewpoint is that this change is misguided.  Even if the current
> coding of qual_is_pushdown_safe doesn't happen to reference the
> subquery, it might need to in future.

If I understand the code correctly the variable has some value in terms of
"documenting the code" (for lack of better terminology), and I would assume
virtually every modern compiler to figure out it's not needed.

--
Daniel Gustafsson   https://vmware.com/

Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs

Issue in current XactLockTableWait, starting with 3c27944fb2141,
discovered while reviewing https://commitfest.postgresql.org/40/3806/

Test demonstrating the problem is 001-isotest-tuplelock-subxact.v1.patch

A narrative description of the issue follows:
session1 - requests multiple nested subtransactions like this:
BEGIN; ...
SAVEPOINT subxid1; ...
SAVEPOINT subxid2; ...

If another session2 sees an xid from subxid2, it waits. If subxid2
aborts, then session2 sees the abort and can continue processing
normally.
However, if subxid2 subcommits, then the lock wait moves from subxid2
to the topxid. If subxid1 subsequently aborts, it will also abort
subxid2, but session2 waiting for subxid2 to complete doesn't see this
and waits for topxid instead. Which means that it waits longer than it
should, and later arriving lock waiters may actually get served first.

So it's a bug, but not too awful, since in many cases people don't use
nested subtransactions, and if they do use SAVEPOINTs, they don't
often close them using RELEASE. And in most cases the extra wait time
is not very long, hence why nobody ever reported this issue.

Thanks to Robert Haas and Julien Tachoires for asking the question
"are you sure the existing coding is correct?". You were both right;
it is not.

How to fix? Correct lock wait can be achieved while a subxid is
running if we do either
* a lock table entry for the subxid OR
* a subtrans entry that points to its immediate parent

So we have these options

1. Removing the XactLockTableDelete() call in CommitSubTransaction().
That releases lock waiters earlier than expected, which requires
pushups in XactLockTableWait() to cope with that (which are currently
broken). Why do we do it? To save shmem space in the lock table should
anyone want to run a transaction that contains thousands of
subtransactions, or more. So option (1) alone would eventually cause
us to run out of space in the lock table and a transaction would
receive ERRORs rather than be allowed to run for a long time.

2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
we go up the levels one by one as we did before. However, (2) causes
huge subtrans contention and if we implemented that and backpatched it
the performance issues could be significant. So my feeling is that if
we do (2) then we should not backpatch it.

So both (1) and (2) have issues.

The main result from patch https://commitfest.postgresql.org/40/3806/
is that having subtrans point direct to topxid is very good for
performance in XidIsInMVCCSnapshot(), and presumably other places
also. This bug prevents the  simple application of a patch to improve
performance. So now we need a stronger mix of ideas to both resolve
the bug and fix the subtrans contention issue in HEAD.

My preferred solution would be a mix of the above, call it option (3)

3.
a) Include the lock table entry for the first 64 subtransactions only,
so that we limit shmem. For those first 64 entries, have the subtrans
point direct to top, since this makes a subtrans lookup into an O(1)
operation, which is important for performance of later actions.

b) For any subtransactions after first 64, delete the subxid lock on
subcommit, to save shmem, but make subtrans point to the immediate
parent (only), not the topxid. That fixes the bug, but causes
performance problems in XidInMVCCSnapshot() and others, so we also do
c) and d)

c) At top level commit, go back and alter subtrans again for subxids
so now it points to the topxid, so that we avoid O(N) behavior in
XidInMVCCSnapshot() and other callers. Additional cost for longer
transactions, but it saves considerable cost in later callers that
need to call  GetTopmostTransaction.

d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
page-at-a-time. This will reduce the contention of repeatedly
re-visiting the same page(s) and ensure that a page is less often
paged out when we are still using it.

Thoughts?

--
Simon Riggshttp://www.EnterpriseDB.com/


001-isotest-tuplelock-subxact.v1.patch
Description: Binary data

[PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Nikita Malakhov

Hi hackers!

While working on Pluggable TOAST we've detected a defective behavior
on tables with large amounts of TOASTed data - queries freeze and DB stalls.
Further investigation led us to the loop with GetNewOidWithIndex function
call - when all available Oid already exist in the related TOAST table this
loop continues infinitely. Data type used for value ID is the UINT32, which
is
unsigned int and has a maximum value of *4294967295* which allows
maximum 4294967295 records in the TOAST table. It is not a very big amount
for modern databases and is the major problem for productive systems.

Quick fix for this problem is limiting GetNewOidWithIndex loops to some
reasonable amount defined by related macro and returning error if there is
still no available Oid. Please check attached patch, any feedback is
appreciated.

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


0001_infinite_new_toast_oid_v1.patch
Description: Binary data

Re: fixing CREATEROLE

2022-11-28 Thread Robert Haas

On Thu, Nov 24, 2022 at 2:41 AM  wrote:
> INHERITCREATEDROLES and SETCREATEDROLES behave much like DEFAULT
> PRIVILEGES. What about something like:
>
> ALTER DEFAULT PRIVILEGES FOR alice
> GRANT TO alice WITH INHERIT FALSE, SET TRUE, ADMIN TRUE
>
> The "abbreviated grant" is very much abbreviated, because the original
> syntax GRANT a TO b is already quite short to begin with, i.e. there is
> no ON ROLE or something like that in it.
>
> The initial DEFAULT privilege would be INHERIT FALSE, SET FALSE, ADMIN
> TRUE, I guess?

I don't know if changing the syntax from A to B is really getting us
anywhere. I generally agree that the ALTER DEFAULT PRIVILEGES syntax
looks nicer than the CREATE/ALTER ROLE syntax, but I'm not sure that's
a sufficient reason to move the control over this behavior to ALTER
DEFAULT PRIVILEGES. One thing to consider is that, as I've designed
this, whether or not ADMIN is included in the grant is non-negotiable.
I am, at least at present, inclined to think that was the right call,
partly because Mark Dilger expressed a lot of concern about the
CREATEROLE user losing control over the role they'd just created, and
allowing ADMIN to be turned off would have exactly that effect. Plus a
grant with INHERIT FALSE, SET FALSE, ADMIN FALSE would end up being
almost identical to no great at all, which seems pointless. Basically,
without ADMIN, the implicit GRANT fails to accomplish its intended
purpose, so I don't like having that as a possibility.

The other thing that's a little weird about the syntax which you
propose is that it's not obviously related to CREATE ROLE. The intent
of the patch as implemented is to allow control over only the implicit
GRANT that is created when a new role is created, not all grants that
might be created by or to a particular user. Setting defaults for all
grants doesn't seem like a particularly good idea to me, but it's
definitely a different idea than what the patch proposes to do.

I did spend some time thinking about trying to tie this to the
CREATEROLE syntax itself. For example, instead of CREATE ROLE alice
CREATEROLE INHERITCREATEDROLES SETCREATEDROLES you could write CREATE
ROLE alice CREATEROLE WITH (INHERIT TRUE, SET TRUE) or something like
this. That would avoid introducing new, lengthy keywords that are just
concatenations of other English words, a kind of syntax that doesn't
look particularly nice to me and probably is less friendly to
non-English speakers as well. I didn't do it that way because the
parser support would be more complex, but I could. CREATEROLE would
have to become a keyword again, but that's not a catastrophe.

Another idea would be to break the CREATEROLE stuff off from CREATE
ROLE entirely and put it all into GRANT. You could imagine syntax like
GRANT CREATEROLE (or CREATE ROLE?) TO role_specification WITH (INHERIT
TRUE/FALSE, SET TRUE/FALSE). There are a few potential issues with
this direction. One, if we did this, then CREATEROLE probably ought to
become inheritable, because that's the way grants work in general, and
this likely shouldn't be an exception, but this would be a behavior
change. However, if it is the consensus that such a behavior change
would be an improvement, that might be OK. Two, I wonder what we'd do
about the GRANTED BY role_specification clause. We could leave it out,
but that would be asymmetric with other GRANT commands. We could also
support it and record that information and make this work more like
other cases, including, I suppose, the possibility of dependent
grants. We'd have to think about what that means exactly. If you
revoke CREATEROLE from someone who has granted CREATEROLE to others, I
suppose that's a clear dependent grant and needs to be recursively
revoked. But what about the implicit grants that were created because
the person had CREATEROLE? Are those also dependent grants? And what
about the roles themselves? Should revoking CREATEROLE drop the roles
that the user in question created? That gets complicated, because
those roles might own objects. That's scary, because you might not
expect revoking a role permission to result in tables getting dropped.
It's also problematic, because those tables might be in some other
database where they are inaccessible to the current session. All in
all I'm inclined to think that recursing to the roles themselves is a
bad plan, but it's debatable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing another gen_node_support.pl special case

2022-11-28 Thread Peter Eisentraut


On 27.11.22 02:39, Tom Lane wrote:

I got confused about how we were managing EquivalenceClass pointers
in the copy/equal infrastructure, and it took me awhile to remember
that the reason it works is that gen_node_support.pl has hard-wired
knowledge about that.  I think that's something we'd be best off
dropping in favor of explicit annotations on affected fields.
Hence, I propose the attached.  This results in zero change in
the generated copy/equal code.


I suppose the question is whether this behavior is something that is a 
property of the EquivalenceClass type as such or something that is 
specific to each individual field.

Re: Removing another gen_node_support.pl special case

2022-11-28 Thread Tom Lane

Peter Eisentraut  writes:
> On 27.11.22 02:39, Tom Lane wrote:
>> I got confused about how we were managing EquivalenceClass pointers
>> in the copy/equal infrastructure, and it took me awhile to remember
>> that the reason it works is that gen_node_support.pl has hard-wired
>> knowledge about that.  I think that's something we'd be best off
>> dropping in favor of explicit annotations on affected fields.
>> Hence, I propose the attached.  This results in zero change in
>> the generated copy/equal code.

> I suppose the question is whether this behavior is something that is a 
> property of the EquivalenceClass type as such or something that is 
> specific to each individual field.

That's an interesting point, but what I'm on about is that I don't want
the behavior buried in gen_node_support.pl.

I think there's a reasonable argument to be made that equal_as_scalar
*is* a field-level property not a node-level property.  I agree that
for the copy case you could argue it differently, and I also agree
that it seems error-prone to have to remember to label fields this way.

I notice that EquivalenceClass is already marked as no_copy_equal,
which means that gen_node_support.pl can know that emitting a
recursive node-copy or node-compare request is a bad idea.  What
do you think of using the patch as it stands, plus a cross-check
that we don't emit COPY_NODE_FIELD or COMPARE_NODE_FIELD if the
target node type is no_copy or no_equal?  This is different from
just silently applying scalar copy/equal, in that (a) it's visibly
under the programmer's control, and (b) it's not hard to imagine
wanting to use other solutions such as copy_as(NULL).

(More generally, I suspect that there are other useful cross-checks
gen_node_support.pl could be making.  I had a to-do item to think
about that, but it didn't get to the top of the list yet.)

regards, tom lane

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Robert Haas

On Sat, Nov 26, 2022 at 4:08 AM Chris Travers  wrote:
> I didn't see any changes to pg_upgrade to make this change possible on 
> upgrade.  Is that also outside of the scope of your patch set?  If so how is 
> that continuity supposed to be ensured?

The scheme is documented in their 0006 patch, in a README.XID file.
I'm not entirely confident that it's the best design and have argued
against it in the past, but it's not crazy.

More generally, while I think there's plenty of stuff to be concerned
about in this patch set and while I'm somewhat skeptical about the
likelihood of its getting or staying committed, I can't really
understand your concerns in particular. The thrust of your concern
seems to be that if we allow people to get further behind, recovery
will be more difficult. I'm not sure I see the problem. Suppose that
we adopt this proposal and that it is bug-free. Now, consider a user
who gets 8 billion XIDs behind. They probably have to vacuum pretty
much every page in the database to do that, or least every page in the
tables that haven't been vacuumed recently. But that would likely also
be true if they were 800 million XIDs behind, as is possible today.
The effort to catch up doesn't increase linearly with how far behind
you are, and is always bounded by the DB size.

It is true that if the table is progressively bloating, it is likely
to be more bloated by the time you are 8 billion XIDs behind than it
was when you were 800 million XIDs behind. I don't see that as a very
good reason not to adopt this patch, because you can bloat the table
by an arbitrarily large amount while consuming only a small number of
XiDs, even just 1 XID. Protecting against bloat is good, but shutting
down the database when the XID age reaches a certain value is not a
particularly effective way of doing that, so saying that we'll be
hurting people by not shutting down the database at the point where we
do so today doesn't ring true to me. I think that most people who get
to the point of wraparound shutdown have workloads where bloat isn't a
huge issue, because those who do start having problems with the bloat
way before they run out of XIDs.

It would be entirely possible to add a parameter to the system that
says "hey, you know we can keep running even if we're a shazillion
XiDs behind, but instead shut down when we are behind by this number
of XIDs." Then, if somebody wants to force an automatic shutdown at
that point, they could, and I think that then the scenario you're
worried about just can't happen any more . But isn't that a little bit
silly? You could also just monitor how far behind you are and page the
DBA when you get behind by more than a certain number of XIDs. Then,
you wouldn't be risking a shutdown, and you'd still be able to stay on
top of the XID ages of your tables.

Philosophically, I disagree with the idea of shutting down the
database completely in any situation in which a reasonable alternative
exists. Losing read and write availability is really bad, and I don't
think it's what users want. I think that most users want the database
to degrade gracefully when things do not go according to plan.
Ideally, they'd like everything to Just Work, but reasonable users
understand that sometimes there are going to be problems, and in my
experience, what makes them happy is when the database acts to contain
the scope of the problem so that it affects their workload as little
as possible, rather than acting to magnify the problem so that it
impacts their workload as much as possible. This patch, implementation
and design concerns to one side, does that.

I don't believe there's a single right answer to the question of what
to do about vacuum falling behind, and I think it's worth exploring
multiple avenues to improve the situation. You can have vacuum never
run on a table at all, say because all of the workers are busy
elsewhere, or because the table is locked until the heat death of the
universe. You can have vacuum run on a table but too slowly to do any
good, because of the vacuum cost delay mechanism. You can have vacuum
run and finish but do little good because of prepared transactions or
replication slots or long-running queries. It's reasonable to think
about what kinds of steps might help in those different scenarios, and
especially to think about what kind of steps might help in multiple
cases. We should do that. But, I don't think any of that means that we
can ignore the need for some kind of expansion of the XID space
forever. Computers are getting faster. It's already possible to burn
through the XID space in hours, and the number of hours is going to go
down over time and maybe eventually the right unit will be minutes, or
even seconds. Sometime before then, we need to do something to make
the runway bigger, or else just give up on PostgreSQL being a relevant
piece of software.

Perhaps the thing we need to do is not exactly this, but if not, it's
probably a sibling or cousin of this.

-- 
Rober

Re: predefined role(s) for VACUUM and ANALYZE

2022-11-28 Thread Andrew Dunstan



On 2022-11-23 We 18:54, Nathan Bossart wrote:
> On Wed, Nov 23, 2022 at 02:56:28PM -0500, Andrew Dunstan wrote:
>> I have committed the first couple of these to get them out of the way.
> Thanks!
>
>> But I think we need a bit of cleanup in the next patch.
>> vacuum_is_relation_owner() looks like it's now rather misnamed. Maybe
>> vacuum_is_permitted_for_relation()? Also I think we need a more thorough
>> reworking of the comments around line 566. And I think we need a more
>> detailed explanation of why the change in vacuum_rel is ok, and if it is
>> OK we should adjust the head comment on the function.
>>
>> In any case I think this comment would be better English with "might"
>> instead of "may":
>>
>> /* user may have the ANALYZE privilege */
> I've attempted to address all your feedback in v13.  Please let me know if
> anything needs further reworking.



Thanks,


pushed.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 10:28 AM Simon Riggs
 wrote:
> So we have these options
>
> 1. Removing the XactLockTableDelete() call in CommitSubTransaction().
> That releases lock waiters earlier than expected, which requires
> pushups in XactLockTableWait() to cope with that (which are currently
> broken). Why do we do it? To save shmem space in the lock table should
> anyone want to run a transaction that contains thousands of
> subtransactions, or more. So option (1) alone would eventually cause
> us to run out of space in the lock table and a transaction would
> receive ERRORs rather than be allowed to run for a long time.

This seems unprincipled to me. The point of having a lock on the
subtransaction in the lock table is so that people can wait for the
subtransaction to end. If we don't remove the lock table entry at
subtransaction end, then that lock table entry doesn't serve that
purpose any more.

> 2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
> we go up the levels one by one as we did before. However, (2) causes
> huge subtrans contention and if we implemented that and backpatched it
> the performance issues could be significant. So my feeling is that if
> we do (2) then we should not backpatch it.

What I find suspicious about the coding of this function is that it
doesn't distinguish between commits and aborts at all. Like, if a
subtransaction commits, the changes don't become globally visible
until the parent transaction also commits. If a subtransaction aborts,
though, what happens to the top-level XID doesn't seem to matter at
all. The comment seems to agree:

 * Note that this does the right thing for subtransactions: if we wait on a
 * subtransaction, we will exit as soon as it aborts or its top parent commits.

I feel like what I'd expect to see given this comment is code which
(1) waits until the supplied XID is no longer running, (2) checks
whether the XID aborted, and if so return at once, and (3) otherwise
recurse to the parent XID. But the code doesn't do that. Perhaps
that's not actually the right thing to do, since it seems like a big
behavior change, but then I don't understand the comment.

Incidentally, one possible optimization here to try to release locking
traffic would be to try waiting for the top-parent first using a
conditional lock acquisition. If that works, cool. If not, go back
around and work up the tree level by level. Since that path would only
be taken in the unhappy case where we're probably going to have to
wait anyway, the cost probably wouldn't be too bad.

> My preferred solution would be a mix of the above, call it option (3)
>
> 3.
> a) Include the lock table entry for the first 64 subtransactions only,
> so that we limit shmem. For those first 64 entries, have the subtrans
> point direct to top, since this makes a subtrans lookup into an O(1)
> operation, which is important for performance of later actions.
>
> b) For any subtransactions after first 64, delete the subxid lock on
> subcommit, to save shmem, but make subtrans point to the immediate
> parent (only), not the topxid. That fixes the bug, but causes
> performance problems in XidInMVCCSnapshot() and others, so we also do
> c) and d)
>
> c) At top level commit, go back and alter subtrans again for subxids
> so now it points to the topxid, so that we avoid O(N) behavior in
> XidInMVCCSnapshot() and other callers. Additional cost for longer
> transactions, but it saves considerable cost in later callers that
> need to call  GetTopmostTransaction.
>
> d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
> page-at-a-time. This will reduce the contention of repeatedly
> re-visiting the same page(s) and ensure that a page is less often
> paged out when we are still using it.

I'm honestly not very sanguine about changing pg_subtrans to point
straight to the topmost XID. It's only OK to do that if there's
absolutely nothing that needs to know the full tree structure, and the
present case looks like an instance where we would like to have the
full tree structure. I would not be surprised if there are others.
That said, it seems a lot less likely that this would be an issue once
the top-level transaction is no longer running. At that point, all the
subtransactions are no longer running either: they either committed or
they rolled back, and I can't see a reason why any code should care
about anything other than which of those two things happened. So I
think your idea in (c) might have some possibilities.

You could also flip that idea around and have readers replace
immediate parent pointers with top-parent pointers opportunistically,
but I'm not sure that's actually any better. As you present it in (c)
above, there's a risk of going back and updating CLOG state that no
one will ever look at. But if you flipped it around and did it on the
read side, then you'd have the risk of a bunch of backends trying to
do it at the same time. I'm not sure whether either of those things

Re: Introduce a new view for checkpointer related stats

2022-11-28 Thread Robert Haas

On Tue, Nov 22, 2022 at 3:53 PM Andres Freund  wrote:
> I think we should consider deprecating the pg_stat_bgwriter columns but
> leaving them in place for a few years. New stuff should only be added to
> pg_stat_checkpointer, but we don't need to break old monitoring queries.

I vote to just remove them. I think that most people won't update
their queries until they are forced to do so.  I don't think it
matters very much when we force them to do that.

Our track record in following through on deprecations is pretty bad
too, which is another consideration.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Decouple last important WAL record LSN from WAL insert locks

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 11:42:19 +0530, Bharath Rupireddy wrote:
> On Sun, Nov 27, 2022 at 2:43 AM Andres Freund  wrote:
> > On 2022-11-23 19:12:07 +0530, Bharath Rupireddy wrote:
> > > While working on something else, I noticed that each WAL insert lock
> > > tracks its own last important WAL record's LSN (lastImportantAt) and
> > > both the bgwriter and checkpointer later computes the max
> > > value/server-wide last important WAL record's LSN via
> > > GetLastImportantRecPtr(). While doing so, each WAL insertion lock is
> > > acquired in exclusive mode in a for loop. This seems like too much
> > > overhead to me.
> >
> > GetLastImportantRecPtr() should be a very rare operation, so it's fine for 
> > it
> > to be expensive. The important thing is for the maintenance of the 
> > underlying
> > data to be very cheap.
> >
> > > I quickly coded a patch (attached herewith) that
> > > tracks the server-wide last important WAL record's LSN in
> > > XLogCtlInsert (lastImportantPos) protected with a spinlock and gets
> > > rid of lastImportantAt from each WAL insert lock.
> >
> > That strikes me as a very bad idea. It adds another point of contention to a
> > very very hot code path, to make a very rare code path cheaper.
> 
> Thanks for the response. I agree that GetLastImportantRecPtr() gets
> called rarely, however, what concerns me is that it's taking all the
> WAL insertion locks when it gets called.

So what? It's far from the only operation doing so. And in contrast to most of
the other places (c.f. WALInsertLockAcquireExclusive()) it only takes one of
them at a time.


> Is tracking lastImportantPos as pg_atomic_uint64 in XLogCtlInsert any
> better than an explicit spinlock? I think it's better on platforms
> where atomics are supported, however, it boils down to using a spin
> lock on the platforms where atomics aren't supported.

A central atomic in XLogCtlInsert would be better than a spinlock protected
variable, but still bad. We do *not* want to have more central state that
needs to be manipulated, we want *less*.

If we wanted to optimize this - and I haven't seen any evidence it's worth
doing so - we should just optimize the lock acquisitions in
GetLastImportantRecPtr() away. *Without* centralizing the state.

Greetings,

Andres Freund

Re: TAP output format in pg_regress

2022-11-28 Thread Andres Freund

On 2022-11-28 14:13:16 +0100, Daniel Gustafsson wrote:
> > On 27 Nov 2022, at 11:22, Nikolay Shaplov  wrote:
> > В письме от суббота, 26 ноября 2022 г. 23:35:45 MSK пользователь Daniel
> > Gustafsson написал:
>
> > I wold suggest to use word immediate instead of noatexit. This will do the
> > code much more sensible for me.
>
> I think noatexit is clearer since the codepath is specifically to avoid any
> registered atexit functions.  The point of this function is to be able to call
> bail while in a function registered with atexit() so I think the current name
> is better.

+1

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Andres Freund

On 2022-11-28 13:08:57 +, Simon Riggs wrote:
> On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  wrote:
> > > Rather than explicitly use DEBUG1 everywhere I would have an
> > > #define CUSTODIAN_LOG_LEVEL LOG
> > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > > line change in a later phase of Beta
> >
> > I can create a separate patch for this, but I don't think I've ever seen
> > this sort of thing before.
> 
> Much of recovery is coded that way, for the same reason.

I think that's not a good thing to copy without a lot more justification than
"some old code also does it that way". It's sometimes justified, but also
makes code harder to read (one doesn't know what it does without looking up
the #define, line length).

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 1:31 PM Andres Freund  wrote:
> On 2022-11-28 13:08:57 +, Simon Riggs wrote:
> > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  
> > wrote:
> > > > Rather than explicitly use DEBUG1 everywhere I would have an
> > > > #define CUSTODIAN_LOG_LEVEL LOG
> > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
> > > > line change in a later phase of Beta
> > >
> > > I can create a separate patch for this, but I don't think I've ever seen
> > > this sort of thing before.
> >
> > Much of recovery is coded that way, for the same reason.
>
> I think that's not a good thing to copy without a lot more justification than
> "some old code also does it that way". It's sometimes justified, but also
> makes code harder to read (one doesn't know what it does without looking up
> the #define, line length).

Yeah. If people need some of the log messages at a higher level during
development, they can patch their own copies.

I think there might be some argument for having a facility that lets
you pick subsystems or even individual messages that you want to trace
and pump up the log level for just those call sites. But I don't know
exactly what that would look like, and I don't think inventing one-off
mechanisms for particular cases is a good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Tom Lane

vignesh C  writes:
> I could reproduce this issue with the following steps:
> create table t1(c int);
> BEGIN;
> CREATE TABLE t (c int);
> SAVEPOINT q;
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> select * from t;
> ROLLBACK TO q;
> CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> ROLLBACK;

Uh-huh.  I've not bothered to trace this in detail, but presumably
what is happening is that the first CREATE RULE converts the table
to a view, and then the ROLLBACK undoes that so far as the catalogs
are concerned, but probably doesn't undo related pg_stats state
changes fully.  Then we're in a bad state that will cause problems.
(It still crashes if you replace the second CREATE RULE with
"select * from t".)

As far as HEAD is concerned, maybe it's time to nuke the whole
convert-table-to-view kluge entirely?  Only pg_dump older than
9.4 will emit such code, so we're really about out of reasons
to keep on maintaining it.

However, I'm not sure that removing that code in v15 will fly,
so maybe we need to make the new pg_stats code a little more
robust against the possibility of a relkind change.

regards, tom lane

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 13:37:16 -0500, Tom Lane wrote:
> vignesh C  writes:
> > I could reproduce this issue with the following steps:
> > create table t1(c int);
> > BEGIN;
> > CREATE TABLE t (c int);
> > SAVEPOINT q;
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> > select * from t;
> > ROLLBACK TO q;
> > CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD  SELECT * FROM t1;
> > ROLLBACK;
> 
> Uh-huh.  I've not bothered to trace this in detail, but presumably
> what is happening is that the first CREATE RULE converts the table
> to a view, and then the ROLLBACK undoes that so far as the catalogs
> are concerned, but probably doesn't undo related pg_stats state
> changes fully.  Then we're in a bad state that will cause problems.
> (It still crashes if you replace the second CREATE RULE with
> "select * from t".)

Yea. I haven't yet fully traced through this, but presumably relcache inval
doesn't fix this because we don't want to loose pending stats after DDL.

Perhaps we need to add a rule about not swapping pgstat* in
RelationClearRelation() when relkind changes?


> As far as HEAD is concerned, maybe it's time to nuke the whole
> convert-table-to-view kluge entirely?  Only pg_dump older than
> 9.4 will emit such code, so we're really about out of reasons
> to keep on maintaining it.

Sounds good to me.


> However, I'm not sure that removing that code in v15 will fly,

Agreed, at the very least that'd increase memory usage.


> so maybe we need to make the new pg_stats code a little more
> robust against the possibility of a relkind change.

Possibly via the relcache code.

Greetings,

Andres Freund

Re: Another multi-row VALUES bug

2022-11-28 Thread Tom Lane

Dean Rasheed  writes:
> A different way to do this, without relying on the contents of the
> targetlist, is to note that, while processing a product query, what we
> really want to do is ignore any VALUES RTEs from the original query,
> since they will have already been processed. There should then never
> be more than one VALUES RTE left to process -- the one from the rule
> action.

> This can be done by exploiting the fact that in product queries, the
> rtable always consists of the rtable from the original query followed
> by the rtable from the rule action, so we just need to ignore the
> right number of RTEs at the start of the rtable. Of course that would
> break if we ever changed the way rewriteRuleAction() worked, but at
> least it only depends on that one other place in the code, which has
> been stable for a long time, so the risk of future breakage seems
> managable.

This looks like a good solution.  I didn't actually test the patch,
but it passes an eyeball check.  I like the fact that we can verify
that we find only one candidate VALUES RTE.

regards, tom lane

Re: predefined role(s) for VACUUM and ANALYZE

2022-11-28 Thread Nathan Bossart

On Mon, Nov 28, 2022 at 12:13:13PM -0500, Andrew Dunstan wrote:
> pushed.

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Alvaro Herrera

On 2022-Nov-28, Simon Riggs wrote:

> A narrative description of the issue follows:
> session1 - requests multiple nested subtransactions like this:
> BEGIN; ...
> SAVEPOINT subxid1; ...
> SAVEPOINT subxid2; ...

> However, if subxid2 subcommits, then the lock wait moves from subxid2
> to the topxid.

Hmm, do we really do that?  Seems very strange .. it sounds to me like
the lock should have been transferred to subxid1 (which is subxid2's
parent), not to the top-level Xid.  Maybe what the user wanted was to
release subxid1 before establishing subxid2?  Or do they want to
continue to be able to rollback to subxid1 after establishing subxid2?
(but why?)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: fixing CREATEROLE

2022-11-28 Thread walther


Robert Haas:

I don't know if changing the syntax from A to B is really getting us
anywhere. I generally agree that the ALTER DEFAULT PRIVILEGES syntax
looks nicer than the CREATE/ALTER ROLE syntax, but I'm not sure that's
a sufficient reason to move the control over this behavior to ALTER
DEFAULT PRIVILEGES.


The list of role attributes can currently be roughly divided into the 
following categories:
- Settings with role-specific values: CONNECTION LIMIT, PASSWORD, VALID 
UNTIL. It's hard to imagine storing them anywhere else, because they 
need to have a different value for each role. Those are not just "flags" 
like all the other attributes.
- Two special attributes in INHERIT and BYPASSRLS regarding 
security/privileges. Those were invented because there was no other 
syntax to do the same thing. Those could be interpreted as privileges to 
do something, too - but lacking the ability to do that explicit. There 
is no SET BYPASSRLS ON/OFF or SET INHERIT ON/OFF. Of course the INHERIT 
case is now a bit different, because there is the inherit grant option 
you introduced.
- Cluster-wide privileges: SUPERUSER, CREATEDB, CREATEROLE, LOGIN, 
REPLICATION. Those can't be granted on some kind of object, because 
there is no such global object. You could imagine inventing some kind of 
global CLUSTER object and do something like GRANT SUPERUSER ON CLUSTER 
TO alice; instead. Turning those into role attributes was the choice 
made instead. Most likely it would have been only a syntactic difference 
anyway: Even if there was something like GRANT .. ON CLUSTER, you'd most 
likely implement that as... storing those grants as role attributes.


Your patch is introducing a new category of role attributes - those that 
are affecting default behavior. But there is already a way to express 
this right now, and that's ALTER DEFAULT PRIVILEGES in this case. Imho, 
the question asked should not be "why change from syntax A to B?" but 
rather: Why introduce a new category of role attributes, when there is a 
way to express the same concept already? I can't see any compelling 
reason for that, yet.


Or not "yet", but rather "anymore". When I understood and remember 
correctly, you implemented it in a way that a user could not change 
those new attributes on their own role. This is in fact different to how 
ALTER DEFAULT PRIVILEGES works, so you could have made an argument that 
this was better expressed as role attributes. But I think this was asked 
and agreed on to act differently, so that the user can change this 
default behavior of what happens when they create a role for themselves. 
And now this reason is gone - there is no reason NOT to implement it as 
DEFAULT PRIVILEGES.



One thing to consider is that, as I've designed
this, whether or not ADMIN is included in the grant is non-negotiable.
I am, at least at present, inclined to think that was the right call,
partly because Mark Dilger expressed a lot of concern about the
CREATEROLE user losing control over the role they'd just created, and
allowing ADMIN to be turned off would have exactly that effect. Plus a
grant with INHERIT FALSE, SET FALSE, ADMIN FALSE would end up being
almost identical to no great at all, which seems pointless. Basically,
without ADMIN, the implicit GRANT fails to accomplish its intended
purpose, so I don't like having that as a possibility.


With how you implemented it right now, is it possible to do the following?

CREATE ROLE alice;
REVOKE ADMIN OPTION FOR alice FROM CURRENT_USER;

If the answer is yes, then there is no reason to allow a user to set a 
shortcut for SET and INHERIT, but not for ADMIN.


If the answer is no, then you could just not allow specifying the ADMIN 
option in the ALTER DEFAULT PRIVILEGES statement and always force it to 
be TRUE.




The other thing that's a little weird about the syntax which you
propose is that it's not obviously related to CREATE ROLE. The intent
of the patch as implemented is to allow control over only the implicit
GRANT that is created when a new role is created, not all grants that
might be created by or to a particular user. Setting defaults for all
grants doesn't seem like a particularly good idea to me, but it's
definitely a different idea than what the patch proposes to do.


Before I proposed that I was confused for a moment about this, too - but 
it turns out to be wrong. ALTER DEFAULT PRIVILEGES in general works as:


When object A is created, issue a GRANT ON A automatically.

In my proposal, the "object" is not the GRANT of that role. It's the 
role itself. So the default privileges express what should happen when 
the role is created. The default privileges would NOT affect a regular 
GRANT role TO role_spec command. They only run that command when a role 
is created.



I did spend some time thinking about trying to tie this to the
CREATEROLE syntax itself. For example, instead of CREATE ROLE alice
CREATEROLE INHERITCREATEDROLES SETCREATEDROLES you could write CREATE
ROLE alic

Re: Add tracking of backend memory allocated to pg_stat_activity

2022-11-28 Thread Andres Freund

On 2022-11-26 22:10:06 -0500, Reid Thompson wrote:
>- zero allocated bytes after fork to avoid double counting postmaster 
> allocations

I still don't understand this - postmaster shouldn't be counted at all. It
doesn't have a PgBackendStatus. There simply shouldn't be any tracked
allocations from it.

Greetings,

Andres Freund

Re: TAP output format in pg_regress

2022-11-28 Thread Nikolay Shaplov

В письме от понедельник, 28 ноября 2022 г. 21:28:48 MSK пользователь Andres 
Freund написал:
> On 2022-11-28 14:13:16 +0100, Daniel Gustafsson wrote:
> > > On 27 Nov 2022, at 11:22, Nikolay Shaplov  wrote:
> > > В письме от суббота, 26 ноября 2022 г. 23:35:45 MSK пользователь Daniel
> > > Gustafsson написал:
> > > 
> > > I wold suggest to use word immediate instead of noatexit. This will do
> > > the
> > > code much more sensible for me.
> > 
> > I think noatexit is clearer since the codepath is specifically to avoid
> > any
> > registered atexit functions.  The point of this function is to be able to
> > call bail while in a function registered with atexit() so I think the
> > current name is better.
> 
> +1

Ok, then I would not insist on it.

> > So this extra offset indicates that test is being included into parallel
> > group? Guess it not really obvious...
> 
> Grouping parallel tests via an initial list of test and then indenting each
> test with whitespace was committed 22 years ago.  While there might be
> better
> ways to do this, the lack of complaints so far at least seems to indicate
> that it isn't all too terrible.

Ok, it was there for 22 years, it will do no harm if it is left this way for 
some time :-)



From my reviewer's point of view patch is ready for commit.

Thank you for your patience :-)


PS Should I change commitfest status?

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.

Re: Collation version tracking for macOS

2022-11-28 Thread Robert Haas

On Wed, Nov 23, 2022 at 12:09 AM Thomas Munro  wrote:
> OK.  Time for a new list of the various models we've discussed so far:
>
> 1.  search-by-collversion:  We introduce no new "library version"
> concept to COLLATION and DATABASE object and little or no new syntax.
>
> 2.  lib-version-in-providers: We introduce a separate provider value
> for each ICU version, for example ICU63, plus an unversioned ICU like
> today.
>
> 3.  lib-version-in-attributes: We introduce daticuversion (alongside
> datcollversion) and collicuversion (alongside collversion).  Similar
> to the above, but it's a separate property and the provider is always
> ICU.  New syntax for CREATE/ALTER COLLATION/DATABASE to set and change
> ICU_VERSION.
>
> 4.  lib-version-in-locale:  "63:en" from earlier versions.  That was
> mostly a strawman proposal to avoid getting bogged down in
> syntax/catalogue/model change discussions while trying to prove that
> dlopen would even work.  It doesn't sound like anyone really likes
> this.
>
> 5.  lib-version-in-collversion:  We didn't explicitly discuss this
> before, but you hinted at it: we could just use u_getVersion() in
> [dat]collversion.

I'd like to vote against #3 at least in the form that's described
here. If we had three more libraries providing collations, it's likely
that they would need versioning, too. So if we add an explicit notion
of provider version, then it ought not to be specific to libicu.

I think it's OK to decide that different library versions are
different providers (your option #2), or that they are the same
provider but give rise to different collations (your option #4), or
that there can be multiple version of each collation which are
distinguished by some additional provider version field (your #3 made
more generic).

I don't really understand #1 or #5 well enough to have an educated
opinion, but I do think that #1 seems a bit magical. It hopes that the
combination of a collation name and a datcollversion will be
sufficient to find exactly one matcing collation in a list of provided
libraries. The advantage of that, as I understand it, is that if you
do something to your system that causes the number of matches to go
from one to zero, you can just throw another library on the pile and
get the number back up to one. Woohoo! But there's a part of me that
worries: what if the number goes up to two, and they're not all the
same? Probably that's something that shouldn't happen, but if it does
then I think there's kind of no way to fix it. With the other options,
if there's some way to jigger the catalog state to match what you want
to happen, you can always repair the situation somehow, because the
library to be used for each collation is explicitly specified in some
way, and you just have to get it to match what you want to have
happen.

I don't know too much about this, though, so I might have it all wrong.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: TAP output format in pg_regress

2022-11-28 Thread Daniel Gustafsson

> On 28 Nov 2022, at 20:02, Nikolay Shaplov  wrote:

> From my reviewer's point of view patch is ready for commit.
> 
> Thank you for your patience :-)

Thanks for review.

The attached tweaks a few comments and attempts to address the compiler warning
error in the CFBot CI.  Not sure I entirely agree with the compiler there but
here is an attempt to work around it at least (by always copying the va_list
for the logfile). It also adds a missing va_end for the logfile va_list.

I hope this is close to a final version of this patch (commitmessage needs a
bit of work though).

> PS Should I change commitfest status?

Sure, go ahead.

--
Daniel Gustafsson   https://vmware.com/

v14-0001-Change-pg_regress-output-format-to-be-TAP-compli.patch
Description: Binary data

Re: fixing CREATEROLE

2022-11-28 Thread David G. Johnston

On Mon, Nov 28, 2022 at 11:57 AM  wrote:

> Robert Haas:
> > I don't know if changing the syntax from A to B is really getting us
> > anywhere. I generally agree that the ALTER DEFAULT PRIVILEGES syntax
> > looks nicer than the CREATE/ALTER ROLE syntax, but I'm not sure that's
> > a sufficient reason to move the control over this behavior to ALTER
> > DEFAULT PRIVILEGES.
>
> Your patch is introducing a new category of role attributes - those that
> are affecting default behavior. But there is already a way to express
> this right now, and that's ALTER DEFAULT PRIVILEGES in this case.

I do not like ALTER DEFAULT PRIVILEGES (ADP) for this.  I don't really like
defaults, period, for this.

The role doing the creation and the role being created are both in scope
when the command is executed and if anything it is the role doing to the
creation that is receiving the privileges not the role being created.  For
ADP, the role being created gets the privileges and it is objects not in
the scope of the executed command that are being affected.

> > One thing to consider is that, as I've designed
> > this, whether or not ADMIN is included in the grant is non-negotiable.
> > I am, at least at present, inclined to think that was the right call,
> > partly because Mark Dilger expressed a lot of concern about the
> > CREATEROLE user losing control over the role they'd just created, and
> > allowing ADMIN to be turned off would have exactly that effect. Plus a
> > grant with INHERIT FALSE, SET FALSE, ADMIN FALSE would end up being
> > almost identical to no great at all, which seems pointless. Basically,
> > without ADMIN, the implicit GRANT fails to accomplish its intended
> > purpose, so I don't like having that as a possibility.
>
> With how you implemented it right now, is it possible to do the following?
>
> CREATE ROLE alice;
> REVOKE ADMIN OPTION FOR alice FROM CURRENT_USER;
>
> If the answer is yes, then there is no reason to allow a user to set a
> shortcut for SET and INHERIT, but not for ADMIN.
>
> If the answer is no, then you could just not allow specifying the ADMIN
> option in the ALTER DEFAULT PRIVILEGES statement and always force it to
> be TRUE.
>

A prior email described that the creation of a role by a CREATEROLE role
results in the necessary creation of an ADMIN grant from the creator to the
new role granted by the bootstrap superuser (or, possibly, whichever
superuser granted CREATEROLE).  That REVOKE will not work as there would be
no existing "grant by current_user over alice granted by current_user"
immediately after current_user creates alice.

Or we could just decide not to,
> because is it really that hard to just issue a GRANT statement
> immediately after CREATE ROLE, when you want to have SET or INHERIT
> options on that role?
>
> The answer to that question was "yes it is too hard" a while back and as
> such DEFAULT PRIVILEGES were introduced.
>
>
A quick tally of the thread so far:

No Defaults needed: David J., Mark?, Tom?
Defaults needed - attached to role directly: Robert
Defaults needed - defined within Default Privileges: Walther?
The capability itself seems orthogonal to the rest of the patch to track
these details better.  I think we can "Fix CREATEROLE" without any feature
regarding optional default behaviors and would suggest this patch be so
limited and that another thread be started for discussion of (assuming a
default specifying mechanism is wanted overall) how it should look.  Let's
not let a usability debate distract us from fixing a real problem.

David J.

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs

On Mon, 28 Nov 2022 at 18:53, Alvaro Herrera  wrote:
>
> On 2022-Nov-28, Simon Riggs wrote:
>
> > A narrative description of the issue follows:
> > session1 - requests multiple nested subtransactions like this:
> > BEGIN; ...
> > SAVEPOINT subxid1; ...
> > SAVEPOINT subxid2; ...
>
> > However, if subxid2 subcommits, then the lock wait moves from subxid2
> > to the topxid.
>
> Hmm, do we really do that?  Seems very strange .. it sounds to me like
> the lock should have been transferred to subxid1 (which is subxid2's
> parent), not to the top-level Xid.

Correct; that is exactly what I'm saying and why we have a bug since
3c27944fb2141.

> Maybe what the user wanted was to
> release subxid1 before establishing subxid2?  Or do they want to
> continue to be able to rollback to subxid1 after establishing subxid2?
> (but why?)

This isn't a description of a user's actions, it is a script that
illustrates the bug in XactLockTableWait().

Perhaps a better example would be nested code blocks with EXCEPTION
clauses where the outer block fails...
e.g.

DO $$
BEGIN
 SELECT 1;

 BEGIN
  SELECT 1;
  EXCEPTION WHEN OTHERS THEN
   RAISE NOTICE 's2';
  END;

  RAISE division_by_zero; -- now back in outer subxact, which now fails

 EXCEPTION WHEN OTHERS THEN
   RAISE NOTICE 's1';
END;$$;

Of course, debugging this is harder since there is no way to return
the current subxid in SQL.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: fixing CREATEROLE

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 1:56 PM  wrote:
> And now this reason is gone - there is no reason NOT to implement it as
> DEFAULT PRIVILEGES.

I think there is, and it's this, which you wrote further down:

> In my proposal, the "object" is not the GRANT of that role. It's the
> role itself. So the default privileges express what should happen when
> the role is created. The default privileges would NOT affect a regular
> GRANT role TO role_spec command. They only run that command when a role
> is created.

I agree that this is what you are proposing, but it is not what your
proposed syntax says. Your proposed syntax simply says ALTER DEFAULT
PRIVILEGES .. GRANT. Users who read that are going to think it
controls the default behavior for all grants, because that's what the
syntax says. If the proposed syntax mentioned CREATE ROLE someplace,
maybe that would have some potential. A proposal to make a command
that controls CREATE ROLE and only CREATE ROLE and mentions neither
CREATE nor ROLE anywhere in the syntax is never going to be
acceptable.

> With how you implemented it right now, is it possible to do the following?
>
> CREATE ROLE alice;
> REVOKE ADMIN OPTION FOR alice FROM CURRENT_USER;
>
> If the answer is yes, then there is no reason to allow a user to set a
> shortcut for SET and INHERIT, but not for ADMIN.
>
> If the answer is no, then you could just not allow specifying the ADMIN
> option in the ALTER DEFAULT PRIVILEGES statement and always force it to
> be TRUE.

It's no. Well, OK, you can do it, but it doesn't revoke anything,
because you can only revoke your own grant, not the bootstrap
superuser's grant.

> attributes vs. default privileges. Or we could just decide not to,
> because is it really that hard to just issue a GRANT statement
> immediately after CREATE ROLE, when you want to have SET or INHERIT
> options on that role?

It's not difficult in the sense that climbing Mount Everest is
difficult, but it makes the user experience as a CREATEROLE
non-superuser quite noticeably different from being a superuser.
Having a way to paper over such differences is, in my opinion, an
important usability feature.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-28 Thread walther


David G. Johnston:

A quick tally of the thread so far:

No Defaults needed: David J., Mark?, Tom?
Defaults needed - attached to role directly: Robert
Defaults needed - defined within Default Privileges: Walther?


s/Walther/Wolfgang

The capability itself seems orthogonal to the rest of the patch to track 
these details better.  I think we can "Fix CREATEROLE" without any 
feature regarding optional default behaviors and would suggest this 
patch be so limited and that another thread be started for discussion of 
(assuming a default specifying mechanism is wanted overall) how it 
should look.  Let's not let a usability debate distract us from fixing a 
real problem.


+1

I didn't argue for whether defaults are needed in this case or not. I 
just said that ADP is better for defaults than role attributes are. Or 
the other way around: I think role attributes are not a good way to 
express those.


Personally, I'm in the No Defaults needed camp, too.

Best,

Wolfgang

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Simon Riggs

On Mon, 28 Nov 2022 at 17:38, Robert Haas  wrote:
>
> On Mon, Nov 28, 2022 at 10:28 AM Simon Riggs
>  wrote:
> > So we have these options
> >
> > 1. Removing the XactLockTableDelete() call in CommitSubTransaction().
> > That releases lock waiters earlier than expected, which requires
> > pushups in XactLockTableWait() to cope with that (which are currently
> > broken). Why do we do it? To save shmem space in the lock table should
> > anyone want to run a transaction that contains thousands of
> > subtransactions, or more. So option (1) alone would eventually cause
> > us to run out of space in the lock table and a transaction would
> > receive ERRORs rather than be allowed to run for a long time.
>
> This seems unprincipled to me. The point of having a lock on the
> subtransaction in the lock table is so that people can wait for the
> subtransaction to end. If we don't remove the lock table entry at
> subtransaction end, then that lock table entry doesn't serve that
> purpose any more.

An easy point to confuse:
"subtransaction to end": The subtransaction is "still running" to
other backends even AFTER it has been subcommitted, but its state now
transfers to the parent.

So the subtransaction doesn't cease running until it aborts, one of
its parent aborts or top level commit. The subxid lock should, on
principle, exist until one of those events occurs. It doesn't, but
that is an optimization, for the stated reason.

(All of the above is current behavior).

> > 2. In XactLockTableWait(), replace the call to SubTransGetParent(), so
> > we go up the levels one by one as we did before. However, (2) causes
> > huge subtrans contention and if we implemented that and backpatched it
> > the performance issues could be significant. So my feeling is that if
> > we do (2) then we should not backpatch it.
>
> What I find suspicious about the coding of this function is that it
> doesn't distinguish between commits and aborts at all. Like, if a
> subtransaction commits, the changes don't become globally visible
> until the parent transaction also commits. If a subtransaction aborts,
> though, what happens to the top-level XID doesn't seem to matter at
> all. The comment seems to agree:
>
>  * Note that this does the right thing for subtransactions: if we wait on a
>  * subtransaction, we will exit as soon as it aborts or its top parent 
> commits.
>
> I feel like what I'd expect to see given this comment is code which
> (1) waits until the supplied XID is no longer running, (2) checks
> whether the XID aborted, and if so return at once, and (3) otherwise
> recurse to the parent XID. But the code doesn't do that. Perhaps
> that's not actually the right thing to do, since it seems like a big
> behavior change, but then I don't understand the comment.

As I mention above, the correct behavior is that the subxact doesn't
cease running until it aborts, one of its parent aborts or top level
commit.

Which is slightly different from the comment, which may explain why
the bug exists.

> Incidentally, one possible optimization here to try to release locking
> traffic would be to try waiting for the top-parent first using a
> conditional lock acquisition. If that works, cool. If not, go back
> around and work up the tree level by level. Since that path would only
> be taken in the unhappy case where we're probably going to have to
> wait anyway, the cost probably wouldn't be too bad.

That sounds like a potential bug fix (not just an optimization).

> > My preferred solution would be a mix of the above, call it option (3)
> >
> > 3.
> > a) Include the lock table entry for the first 64 subtransactions only,
> > so that we limit shmem. For those first 64 entries, have the subtrans
> > point direct to top, since this makes a subtrans lookup into an O(1)
> > operation, which is important for performance of later actions.
> >
> > b) For any subtransactions after first 64, delete the subxid lock on
> > subcommit, to save shmem, but make subtrans point to the immediate
> > parent (only), not the topxid. That fixes the bug, but causes
> > performance problems in XidInMVCCSnapshot() and others, so we also do
> > c) and d)
> >
> > c) At top level commit, go back and alter subtrans again for subxids
> > so now it points to the topxid, so that we avoid O(N) behavior in
> > XidInMVCCSnapshot() and other callers. Additional cost for longer
> > transactions, but it saves considerable cost in later callers that
> > need to call  GetTopmostTransaction.
> >
> > d) Optimize SubTransGetTopmostTransaction() so it retrieves entries
> > page-at-a-time. This will reduce the contention of repeatedly
> > re-visiting the same page(s) and ensure that a page is less often
> > paged out when we are still using it.
>
> I'm honestly not very sanguine about changing pg_subtrans to point
> straight to the topmost XID. It's only OK to do that if there's
> absolutely nothing that needs to know the full tree structure, and the
> present case looks like an instance w

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 2:45 PM Simon Riggs
 wrote:
> An easy point to confuse:
> "subtransaction to end": The subtransaction is "still running" to
> other backends even AFTER it has been subcommitted, but its state now
> transfers to the parent.
>
> So the subtransaction doesn't cease running until it aborts, one of
> its parent aborts or top level commit. The subxid lock should, on
> principle, exist until one of those events occurs. It doesn't, but
> that is an optimization, for the stated reason.

That's not what "running" means to me. Running means it's started and
hasn't yet committed or rolled back.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-28 Thread David G. Johnston

On Mon, Nov 28, 2022 at 12:42 PM  wrote:

> David G. Johnston:
> > A quick tally of the thread so far:
> >
> > No Defaults needed: David J., Mark?, Tom?
> > Defaults needed - attached to role directly: Robert
> > Defaults needed - defined within Default Privileges: Walther?
>
> s/Walther/Wolfgang
>

Sorry 'bout that, I was just reading the To: line in my email reply.

>
> Personally, I'm in the No Defaults needed camp, too.
>

I kinda thought so from your final comments, thanks for clarifying.

David J.

Re: fixing CREATEROLE

2022-11-28 Thread walther


Robert Haas:

In my proposal, the "object" is not the GRANT of that role. It's the
role itself. So the default privileges express what should happen when
the role is created. The default privileges would NOT affect a regular
GRANT role TO role_spec command. They only run that command when a role
is created.


I agree that this is what you are proposing, but it is not what your
proposed syntax says. Your proposed syntax simply says ALTER DEFAULT
PRIVILEGES .. GRANT. Users who read that are going to think it
controls the default behavior for all grants, because that's what the
syntax says. If the proposed syntax mentioned CREATE ROLE someplace,
maybe that would have some potential. A proposal to make a command
that controls CREATE ROLE and only CREATE ROLE and mentions neither
CREATE nor ROLE anywhere in the syntax is never going to be
acceptable.


Yes, I agree - the abbreviated GRANT syntax is confusing/misleading in 
that case. Consistent with the other syntaxes, but easily confused 
nonetheless.



It's no. Well, OK, you can do it, but it doesn't revoke anything,
because you can only revoke your own grant, not the bootstrap
superuser's grant.


Ah, I see. I didn't get that difference regarding the bootstrap 
superuser, so far.


So in that sense, the ADP GRANT would be an additional GRANT issued by 
the user that created the role in addition to the bootstrap superuser's 
grant. You can't revoke the bootstrap superuser's grant - but you can't 
modify it either. And there is no need to add SET or INHERIT to the 
boostrap superuser's grant, because you can grant the role yourself 
again, with those options.


I think it would be very strange to have a default for that bootstrap 
superuser's grant. Or rather: A different default than the minimum 
required - and that's just ADMIN, not SET, not INHERIT. When you have 
the minimum, you can always choose to grant SET and INHERIT later on 
yourself - and revoke it, too! But when the SET and INHERIT are on the 
boostrap superuser's grant - then there is no way for you to revoke SET 
or INHERIT on that grant anymore later.


Why should the superuser, who gave you CREATEROLE, insist on you having 
SET or INHERIT forever and disallow to revoke it from yourself?


Best,

Wolfgang

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> On 2022-11-28 13:37:16 -0500, Tom Lane wrote:
>> As far as HEAD is concerned, maybe it's time to nuke the whole
>> convert-table-to-view kluge entirely?  Only pg_dump older than
>> 9.4 will emit such code, so we're really about out of reasons
>> to keep on maintaining it.

> Sounds good to me.

Here's a draft patch for that.  If we apply this to HEAD then
we only need that klugery in relcache for v15.

regards, tom lane

diff --git a/doc/src/sgml/rules.sgml b/doc/src/sgml/rules.sgml
index be3d17c852..3c9459b648 100644
--- a/doc/src/sgml/rules.sgml
+++ b/doc/src/sgml/rules.sgml
@@ -280,14 +280,16 @@
 
 
 Views in PostgreSQL are implemented
-using the rule system. In fact, there is essentially no difference
-between:
+using the rule system.  A view is basically an empty table (having no
+actual storage) with an ON SELECT DO INSTEAD rule.
+Conventionally, that rule is named _RETURN.
+So a view like
 
 
 CREATE VIEW myview AS SELECT * FROM mytab;
 
 
-compared against the two commands:
+is very nearly the same thing as
 
 
 CREATE TABLE myview (same column list as mytab);
@@ -295,13 +297,17 @@ CREATE RULE "_RETURN" AS ON SELECT TO myview DO INSTEAD
 SELECT * FROM mytab;
 
 
-because this is exactly what the CREATE VIEW
-command does internally.  This has some side effects. One of them
-is that the information about a view in the
-PostgreSQL system catalogs is exactly
-the same as it is for a table. So for the parser, there is
-absolutely no difference between a table and a view. They are the
-same thing: relations.
+although you can't actually write that, because tables are not
+allowed to have ON SELECT rules.
+
+
+
+A view can also have other kinds of DO INSTEAD
+rules, allowing INSERT, UPDATE,
+or DELETE commands to be performed on the view
+despite its lack of underlying storage.
+This is discussed further below, in
+.
 
 
 
@@ -,7 +1117,7 @@ SELECT word FROM words ORDER BY word <-> 'caterpiler' LIMIT 10;
 
 Rules that are defined on INSERT, UPDATE,
 and DELETE are significantly different from the view rules
-described in the previous section. First, their CREATE
+described in the previous sections. First, their CREATE
 RULE command allows more:
 
 
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index db45d8a08b..8ac2c81b06 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -239,7 +239,6 @@ DefineQueryRewrite(const char *rulename,
 	Relation	event_relation;
 	ListCell   *l;
 	Query	   *query;
-	bool		RelisBecomingView = false;
 	Oid			ruleId = InvalidOid;
 	ObjectAddress address;
 
@@ -311,7 +310,18 @@ DefineQueryRewrite(const char *rulename,
 		/*
 		 * Rules ON SELECT are restricted to view definitions
 		 *
-		 * So there cannot be INSTEAD NOTHING, ...
+		 * So this had better be a view, ...
+		 */
+		if (event_relation->rd_rel->relkind != RELKIND_VIEW &&
+			event_relation->rd_rel->relkind != RELKIND_MATVIEW)
+			ereport(ERROR,
+	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+	 errmsg("relation \"%s\" cannot have ON SELECT rules",
+			RelationGetRelationName(event_relation)),
+	 errdetail_relkind_not_supported(event_relation->rd_rel->relkind)));
+
+		/*
+		 * ... there cannot be INSTEAD NOTHING, ...
 		 */
 		if (action == NIL)
 			ereport(ERROR,
@@ -407,93 +417,6 @@ DefineQueryRewrite(const char *rulename,
 ViewSelectRuleName)));
 			rulename = pstrdup(ViewSelectRuleName);
 		}
-
-		/*
-		 * Are we converting a relation to a view?
-		 *
-		 * If so, check that the relation is empty because the storage for the
-		 * relation is going to be deleted.  Also insist that the rel not be
-		 * involved in partitioning, nor have any triggers, indexes, child or
-		 * parent tables, RLS policies, or RLS enabled.  (Note: some of these
-		 * tests are too strict, because they will reject relations that once
-		 * had such but don't anymore.  But we don't really care, because this
-		 * whole business of converting relations to views is just an obsolete
-		 * kluge to allow dump/reload of views that participate in circular
-		 * dependencies.)
-		 */
-		if (event_relation->rd_rel->relkind != RELKIND_VIEW &&
-			event_relation->rd_rel->relkind != RELKIND_MATVIEW)
-		{
-			TableScanDesc scanDesc;
-			Snapshot	snapshot;
-			TupleTableSlot *slot;
-
-			if (event_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-ereport(ERROR,
-		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-		 errmsg("cannot convert partitioned table \"%s\" to a view",
-RelationGetRelationName(event_relation;
-
-			/* only case left: */
-			Assert(event_relation->rd_rel->relkind == RELKIND_RELATION);
-
-			if (event_relation->rd_rel->relispartition)
-ereport(ERROR,
-		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-		 errmsg("cannot convert partition \"%s\"

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Tom Lane

Robert Haas  writes:
> That's not what "running" means to me. Running means it's started and
> hasn't yet committed or rolled back.

A subxact definitely can't be considered committed until its topmost
parent commits.  However, it could be known to be rolled back before
its parent.  IIUC, the current situation is that we don't take
advantage of the latter case but just wait for the topmost parent.

One thing we need to be pretty careful of here is to not break the
promise of atomic commit.  At topmost commit, all subxacts must
appear committed simultaneously.  It's not quite clear to me whether
we need a similar guarantee in the rollback case.  It seems like
we shouldn't, but maybe I'm missing something, in which case maybe
the current behavior is correct?

regards, tom lane

Re: fixing CREATEROLE

2022-11-28 Thread Mark Dilger

> On Nov 28, 2022, at 11:34 AM, David G. Johnston  
> wrote:
> 
> No Defaults needed: David J., Mark?, Tom?

As Robert has the patch organized, I think defaults are needed, but I see that 
as a strike against the patch.

> Defaults needed - attached to role directly: Robert
> Defaults needed - defined within Default Privileges: Walther?
> The capability itself seems orthogonal to the rest of the patch to track 
> these details better.  I think we can "Fix CREATEROLE" without any feature 
> regarding optional default behaviors and would suggest this patch be so 
> limited and that another thread be started for discussion of (assuming a 
> default specifying mechanism is wanted overall) how it should look.  Let's 
> not let a usability debate distract us from fixing a real problem.

In Robert's initial email, he wrote, "It seems to me that the root of any fix 
in this area must be to change the rule that CREATEROLE can administer any role 
whatsoever."

The obvious way to fix that is to revoke that rule and instead automatically 
grant ADMIN OPTION to a creator over any role they create.  That's problematic, 
though, because as things stand, ADMIN OPTION is granted to somebody by 
granting them membership in the administered role WITH ADMIN OPTION, so 
membership in the role and administration of the role are conflated.

Robert's patch tries to deal with the (possibly unwanted) role membership by 
setting up defaults to mitigate the effects, but that is more confusing to me 
than just de-conflating role membership from role administration, and giving 
role creators administration over roles they create, without in so doing giving 
them role membership.  I don't recall enough details about how hard it is to 
de-conflate role membership from role administration, and maybe that's a 
non-starter for reasons I don't recall at the moment.  I expect Robert has 
already contemplated that idea and instead proposed this patch for good 
reasons.  Robert?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: fixing CREATEROLE

2022-11-28 Thread walther


Mark Dilger:

Robert's patch tries to deal with the (possibly unwanted) role membership by 
setting up defaults to mitigate the effects, but that is more confusing to me 
than just de-conflating role membership from role administration, and giving 
role creators administration over roles they create, without in so doing giving 
them role membership.  I don't recall enough details about how hard it is to 
de-conflate role membership from role administration, and maybe that's a 
non-starter for reasons I don't recall at the moment.


Isn't this just GRANT .. WITH SET FALSE, INHERIT FALSE, ADMIN TRUE? That 
should allow role administration, without actually granting membership 
in that role, yet, right?


Best,

Wolfgang

Re: fixing CREATEROLE

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 3:02 PM Mark Dilger
 wrote:
> Robert's patch tries to deal with the (possibly unwanted) role membership by 
> setting up defaults to mitigate the effects, but that is more confusing to me 
> than just de-conflating role membership from role administration, and giving 
> role creators administration over roles they create, without in so doing 
> giving them role membership.  I don't recall enough details about how hard it 
> is to de-conflate role membership from role administration, and maybe that's 
> a non-starter for reasons I don't recall at the moment.  I expect Robert has 
> already contemplated that idea and instead proposed this patch for good 
> reasons.  Robert?

"De-conflating role membership from role administration" isn't really
a specific proposal that someone can go out and implement. You have to
make some decision about *how* you are going to separate those
concepts. And that's what I did: I made INHERIT and SET into
grant-level options. That allows you to give someone access to the
privileges of a role without the ability to administer it (at least
one of INHERIT and SET true, and ADMIN false) or the ability to
administer a role without having any direct access to its privileges
(INHERIT FALSE, SET FALSE, ADMIN TRUE). I don't see that we can, or
need to, separate things any more than that.

You can argue that a grant with INHERIT FALSE, SET FALSE, ADMIN TRUE
still grants membership, and I think formally that's true, but I also
think it's just picking something to bicker about. The need isn't to
separate membership per se from administration. It's to separate
privilege inheritance and the ability to SET ROLE from role
administration. And I've done that.

I strongly disagree with the idea that the ability for users to
control defaults here isn't needed. You can set a default tablespace
for your database, and a default tablespace for your session, and a
default tablespace for new partitions of an existing partition table.
You can set default privileges for every type of object you can
create, and a default search path to find objects in the database. You
can set defaults for all of your connection parameters to the database
using environment variables, and the default data directory for
commands that need one. You can set defaults for all of your psql
settings in ~/.psqlrc. You can set defaults for the character sets,
locales and collations of new databases.  You can set the default
version of an extension in the control file, so that the user doesn't
have to specify a version. And so on and so on. There's absolutely
scads of things for which it is useful to be able to set defaults and
for which we give people the ability to set defaults, and I don't
think anyone is making a real argument for why that isn't also true
here. The argument that has been made is essentially that you could
get by without it, but that's true of *every* default. Yet we keep
adding the ability to set defaults for new things, and to set the
defaults for existing things in new ways, and there's a very good
reason for that: it's extremely convenient. And that's true here, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-28 Thread Mark Dilger




> On Nov 28, 2022, at 12:08 PM, walt...@technowledgy.de wrote:
> 
> Isn't this just GRANT .. WITH SET FALSE, INHERIT FALSE, ADMIN TRUE? That 
> should allow role administration, without actually granting membership in 
> that role, yet, right?

Can you clarify what you mean here?  Are you inventing a new syntax?

+GRANT bob TO alice WITH SET FALSE, INHERIT FALSE, ADMIN TRUE;
+ERROR:  syntax error at or near "SET"
+LINE 1: GRANT bob TO alice WITH SET FALSE, INHERIT FALSE, ADMIN TRUE...


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 18:34:20 +0300, Nikita Malakhov wrote:
> While working on Pluggable TOAST we've detected a defective behavior
> on tables with large amounts of TOASTed data - queries freeze and DB stalls.
> Further investigation led us to the loop with GetNewOidWithIndex function
> call - when all available Oid already exist in the related TOAST table this
> loop continues infinitely. Data type used for value ID is the UINT32, which
> is
> unsigned int and has a maximum value of *4294967295* which allows
> maximum 4294967295 records in the TOAST table. It is not a very big amount
> for modern databases and is the major problem for productive systems.

I don't think the absolute number is the main issue - by default external
toasting will happen only for bigger datums. 4 billion external datums
typically use a lot of space.

If you hit this easily with your patch, then you likely broke the conditions
under which external toasting happens.

IMO the big issue is the global oid counter making it much easier to hit oid
wraparound. Due to that we end up assigning oids that conflict with existing
toast oids much sooner than 4 billion toasted datums.

I think the first step to improve the situation is to not use a global oid
counter for toasted values. One way to do that would be to use the sequence
code to do oid assignment, but we likely can find a more efficient
representation.

Eventually we should do the obvious thing and make toast ids 64bit wide - to
combat the space usage we likely should switch to representing the ids as
variable width integers or such, otherwise the space increase would likely be
prohibitive.

> Quick fix for this problem is limiting GetNewOidWithIndex loops to some
> reasonable amount defined by related macro and returning error if there is
> still no available Oid. Please check attached patch, any feedback is
> appreciated.

This feels like the wrong spot to tackle the issue. For one, most of the
looping will be in GetNewOidWithIndex(), so limiting looping in
toast_save_datum() won't help much. For another, if the limiting were in the
right place, it'd break currently working cases. Due to oid wraparound it's
pretty easy to hit "ranges" of allocated oids, without even getting close to
2^32 toasted datums.

Greetings,

Andres Freund

Re: fixing CREATEROLE

2022-11-28 Thread Mark Dilger




> On Nov 28, 2022, at 12:33 PM, Mark Dilger  
> wrote:
> 
>> Isn't this just GRANT .. WITH SET FALSE, INHERIT FALSE, ADMIN TRUE? That 
>> should allow role administration, without actually granting membership in 
>> that role, yet, right?
> 
> Can you clarify what you mean here?  Are you inventing a new syntax?

Nevermind.  After reading Robert's email, it's clear enough what you mean here.
  
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Nikita Malakhov

Hi!

We've already encountered this issue on large production databases, and
4 billion rows is not so much for modern bases, so this issue already arises
from time to time and would arise more and more often. I agree that global
oid counter is the main issue, and better solution would be local counters
with larger datatype for value id. This is the right way to solve this
issue,
although it would take some time. As I understand, global counter was taken
because it looked the fastest way of getting unique ID.
Ok, I'll prepare a patch with it.

>Due to that we end up assigning oids that conflict with existing
>toast oids much sooner than 4 billion toasted datums.

Just a note: global oid is checked for related TOAST table only, so equal
oids
in different TOAST tables would not collide.

>Eventually we should do the obvious thing and make toast ids 64bit wide -
to
>combat the space usage we likely should switch to representing the ids as
>variable width integers or such, otherwise the space increase would likely
be
>prohibitive.

I'm already working on it, but I thought that 64-bit value ID won't be
easily
accepted by community. I'd be very thankful for any advice on this.

On Mon, Nov 28, 2022 at 11:36 PM Andres Freund  wrote:

> Hi,
>
> On 2022-11-28 18:34:20 +0300, Nikita Malakhov wrote:
> > While working on Pluggable TOAST we've detected a defective behavior
> > on tables with large amounts of TOASTed data - queries freeze and DB
> stalls.
> > Further investigation led us to the loop with GetNewOidWithIndex function
> > call - when all available Oid already exist in the related TOAST table
> this
> > loop continues infinitely. Data type used for value ID is the UINT32,
> which
> > is
> > unsigned int and has a maximum value of *4294967295* which allows
> > maximum 4294967295 records in the TOAST table. It is not a very big
> amount
> > for modern databases and is the major problem for productive systems.
>
> I don't think the absolute number is the main issue - by default external
> toasting will happen only for bigger datums. 4 billion external datums
> typically use a lot of space.
>
> If you hit this easily with your patch, then you likely broke the
> conditions
> under which external toasting happens.
>
> IMO the big issue is the global oid counter making it much easier to hit
> oid
> wraparound. Due to that we end up assigning oids that conflict with
> existing
> toast oids much sooner than 4 billion toasted datums.
>
> I think the first step to improve the situation is to not use a global oid
> counter for toasted values. One way to do that would be to use the sequence
> code to do oid assignment, but we likely can find a more efficient
> representation.
>
> Eventually we should do the obvious thing and make toast ids 64bit wide -
> to
> combat the space usage we likely should switch to representing the ids as
> variable width integers or such, otherwise the space increase would likely
> be
> prohibitive.
>
>
> > Quick fix for this problem is limiting GetNewOidWithIndex loops to some
> > reasonable amount defined by related macro and returning error if there
> is
> > still no available Oid. Please check attached patch, any feedback is
> > appreciated.
>
> This feels like the wrong spot to tackle the issue. For one, most of the
> looping will be in GetNewOidWithIndex(), so limiting looping in
> toast_save_datum() won't help much. For another, if the limiting were in
> the
> right place, it'd break currently working cases. Due to oid wraparound it's
> pretty easy to hit "ranges" of allocated oids, without even getting close
> to
> 2^32 toasted datums.
>
> Greetings,
>
> Andres Freund
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-28 Thread Bruce Momjian

On Sun, Nov 27, 2022 at 11:26:50AM -0800, Andrey Borodin wrote:
> Some funny stuff. If a user tries to cancel a non-replicated transaction
> Azure Postgres will answer: "user requested cancel while waiting for
> synchronous replication ack. The COMMIT record has already flushed to
> WAL locally and might not have been replicatead to the standby. We
> must wait here."
> AWS RDS will answer: "ignoring request to cancel wait for synchronous
> replication"
> Yandex Managed Postgres will answer: "canceling wait for synchronous
> replication due requested, but cancelation is not allowed. The
> transaction has already committed locally and might not have been
> replicated to the standby. We must wait here."
> 
> So, for many services providing Postgres as a service it's only a
> matter of wording.

Wow, you are telling me all three cloud vendors changed how query cancel
behaves on an unresponsive synchronous replica?  That is certainly a
testament that the community needs to change or at least review our
behavior.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-28 Thread Bruce Momjian

On Mon, Nov 28, 2022 at 12:03:06PM +0530, Bharath Rupireddy wrote:
> Thanks for verifying the behaviour. And many thanks for an off-list chat.
> 
> FWIW, I'm planning to prepare a patch as per the below idea which is
> something similar to the initial proposal in this thread. Meanwhile,
> thoughts are welcome.
> 
> 1. Disable query cancel/CTRL+C/SIGINT when a backend is waiting for
> sync replication acknowledgement.
> 2. Process proc die immediately when a backend is waiting for sync
> replication acknowledgement, as it does today, however, upon restart,
> don't open up for business (don't accept ready-only connections)
> unless the sync standbys have caught up.

You can prepare a patch, but it unlikely to get much interest until you
get agreement on what the behavior should be.  The optimal order of
developer actions is:

Desirability -> Design -> Implement -> Test -> Review -> Commit
https://wiki.postgresql.org/wiki/Todo#Development_Process

Telling us what other cloud vendors do is not sufficient.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 23:54:53 +0300, Nikita Malakhov wrote:
> We've already encountered this issue on large production databases, and
> 4 billion rows is not so much for modern bases, so this issue already arises
> from time to time and would arise more and more often.

Was the issue that you exceeded 4 billion toasted datums, or that assignment
took a long time? How many toast datums did you actually have? Was this due to
very wide rows leading to even small datums getting toasted?

Greetings,

Andres Freund

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> I think the first step to improve the situation is to not use a global oid
> counter for toasted values. One way to do that would be to use the sequence
> code to do oid assignment, but we likely can find a more efficient
> representation.

I don't particularly buy that, because the only real fix is this:

> Eventually we should do the obvious thing and make toast ids 64bit wide

and if we do that there'll be no need to worry about multiple counters.

> - to
> combat the space usage we likely should switch to representing the ids as
> variable width integers or such, otherwise the space increase would likely be
> prohibitive.

And I don't buy that either.  An extra 4 bytes with a 2K payload is not
"prohibitive", it's more like "down in the noise".

I think if we switch to int8 keys and widen the global OID counter to 8
bytes (using just the low 4 bytes for other purposes), we'll have a
perfectly fine solution.  There is still plenty of work to be done under
that plan, because of the need to maintain backward compatibility for
existing TOAST tables --- and maybe people would want an option to keep on
using them, for non-enormous tables?  If we add additional work on top of
that, it'll just mean that it will take longer to have any solution.

>> Quick fix for this problem is limiting GetNewOidWithIndex loops to some
>> reasonable amount defined by related macro and returning error if there is
>> still no available Oid. Please check attached patch, any feedback is
>> appreciated.

> This feels like the wrong spot to tackle the issue.

Yeah, that is completely horrid.  It does not remove the existing failure
mode, just changes it to have worse consequences.

regards, tom lane

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 10:50:13 -0800, Andres Freund wrote:
> On 2022-11-28 13:37:16 -0500, Tom Lane wrote:
> > Uh-huh.  I've not bothered to trace this in detail, but presumably
> > what is happening is that the first CREATE RULE converts the table
> > to a view, and then the ROLLBACK undoes that so far as the catalogs
> > are concerned, but probably doesn't undo related pg_stats state
> > changes fully.  Then we're in a bad state that will cause problems.
> > (It still crashes if you replace the second CREATE RULE with
> > "select * from t".)
> 
> Yea. I haven't yet fully traced through this, but presumably relcache inval
> doesn't fix this because we don't want to loose pending stats after DDL.
> 
> Perhaps we need to add a rule about not swapping pgstat* in
> RelationClearRelation() when relkind changes?

Something like the attached. Still needs a bit of polish, e.g. adding the test
case from above.

I'm a bit uncomfortable adding a function call below
 * Perform swapping of the relcache entry contents.  Within this
 * process the old entry is momentarily invalid, so there 
*must* be no
 * possibility of CHECK_FOR_INTERRUPTS within this sequence. Do 
it in
 * all-in-line code for safety.
but it's not the first, see MemoryContextSetParent().


Greetings,

Andres Freund
diff --git i/src/backend/utils/activity/pgstat_relation.c w/src/backend/utils/activity/pgstat_relation.c
index f92e16e7af8..1158e09c24f 100644
--- i/src/backend/utils/activity/pgstat_relation.c
+++ w/src/backend/utils/activity/pgstat_relation.c
@@ -158,7 +158,7 @@ pgstat_unlink_relation(Relation rel)
 		return;
 
 	/* link sanity check */
-	Assert(rel->pgstat_info->relation == rel);
+	Assert(rel->pgstat_info->relation->rd_rel->oid == rel->rd_rel->oid);
 	rel->pgstat_info->relation = NULL;
 	rel->pgstat_info = NULL;
 }
diff --git i/src/backend/utils/cache/relcache.c w/src/backend/utils/cache/relcache.c
index bd6cd4e47b5..69237025c20 100644
--- i/src/backend/utils/cache/relcache.c
+++ w/src/backend/utils/cache/relcache.c
@@ -2626,6 +2626,7 @@ RelationClearRelation(Relation relation, bool rebuild)
 		bool		keep_rules;
 		bool		keep_policies;
 		bool		keep_partkey;
+		bool		keep_pgstats;
 
 		/* Build temporary entry, but don't link it into hashtable */
 		newrel = RelationBuildDesc(save_relid, false);
@@ -2666,6 +2667,8 @@ RelationClearRelation(Relation relation, bool rebuild)
 		keep_policies = equalRSDesc(relation->rd_rsdesc, newrel->rd_rsdesc);
 		/* partkey is immutable once set up, so we can always keep it */
 		keep_partkey = (relation->rd_partkey != NULL);
+		/* keep stats, except when converting tables into views etc */
+		keep_pgstats = relation->rd_rel->relkind == newrel->rd_rel->relkind;
 
 		/*
 		 * Perform swapping of the relcache entry contents.  Within this
@@ -2720,9 +2723,16 @@ RelationClearRelation(Relation relation, bool rebuild)
 			SWAPFIELD(RowSecurityDesc *, rd_rsdesc);
 		/* toast OID override must be preserved */
 		SWAPFIELD(Oid, rd_toastoid);
+
 		/* pgstat_info / enabled must be preserved */
-		SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
-		SWAPFIELD(bool, pgstat_enabled);
+		if (keep_pgstats)
+		{
+			SWAPFIELD(struct PgStat_TableStatus *, pgstat_info);
+			SWAPFIELD(bool, pgstat_enabled);
+		}
+		else
+			pgstat_unlink_relation(relation);
+
 		/* preserve old partition key if we have one */
 		if (keep_partkey)
 		{

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Peter Geoghegan

On Mon, Nov 28, 2022 at 8:53 AM Robert Haas  wrote:
> It is true that if the table is progressively bloating, it is likely
> to be more bloated by the time you are 8 billion XIDs behind than it
> was when you were 800 million XIDs behind. I don't see that as a very
> good reason not to adopt this patch, because you can bloat the table
> by an arbitrarily large amount while consuming only a small number of
> XiDs, even just 1 XID. Protecting against bloat is good, but shutting
> down the database when the XID age reaches a certain value is not a
> particularly effective way of doing that, so saying that we'll be
> hurting people by not shutting down the database at the point where we
> do so today doesn't ring true to me.

I can't speak for Chris, but I think that almost everybody will agree
on this much, without really having to think about it. It's easy to
see that having more XID space is, in general, strictly a good thing.
If there was a low risk way of getting that benefit, then I'd be in
favor of it.

Here's the problem that I see with this patch: I don't think that the
risks are commensurate with the benefits. I can imagine being in favor
of an even more invasive patch that (say) totally removes the concept
of freezing, but that would have to be a very different sort of
design.

> Philosophically, I disagree with the idea of shutting down the
> database completely in any situation in which a reasonable alternative
> exists. Losing read and write availability is really bad, and I don't
> think it's what users want.

At a certain point it may make more sense to activate XidStopLimit
protections (which will only prevent new XID allocations) instead of
getting further behind on freezing, even in a world where we're never
strictly obligated to activate XidStopLimit. It may in fact be the
lesser evil, even with 64-bit XIDs -- because we still have to freeze,
and the factors that drive when and how we freeze mostly aren't
changed.

Fundamentally, when we're falling behind on freezing, at a certain
point we can expect to keep falling behind -- unless some kind of
major shift happens. That's just how freezing works, with or without
64-bit XIDs/MXIDs. If VACUUM isn't keeping up with the allocation of
transactions, then the system is probably misconfigured in some way.
We should do our best to signal this as early and as frequently as
possible, and we should mitigate specific hazards (e.g. old temp
tables) if at all possible. We should activate the failsafe when
things really start to look dicey (which, incidentally, the patch just
removes). These mitigations may be very effective, but in the final
analysis they don't address the fundamental weakness in freezing.

Granted, the specifics of the current XidStopLimit mechanism are
unlikely to directly carry over to 64-bit XIDs. XidStopLimit is
structured in a way that doesn't actually consider freeze debt in
units like unfrozen pages. Like Chris, I just don't see why the patch
obviates the need for something like XidStopLimit, since the patch
doesn't remove freezing. An improved XidStopLimit mechanism might even
end up kicking in *before* the oldest relfrozenxid reached 2 billion
XIDs, depending on the specifics.

Removing the failsafe mechanism seems misguided to me for similar
reasons. I recently learned that Amazon RDS has set a *lower*
vacuum_failsafe_age default than the standard default (its default of
1.6 billion to only 1.2 billion on RDS). This decision predates my
joining AWS. It seems as if practical experience has shown that
allowing any table's age(relfrozenxid) to get too far past a billion
is not a good idea. At least it's not a good idea on modern Postgres
versions, that have the freeze map.

We really shouldn't have to rely on having billions of XIDs available
in the first place -- XID space isn't really a fungible commodity.
It's much more important to recognize that something (some specific
thing) just isn't working as designed, which in general could be
pretty far removed from freezing. For example, index corruption could
do it (at least without the failsafe). Some kind of autovacuum
starvation could do it. It's almost always more complicated than "not
enough available XID space".

-- 
Peter Geoghegan

Re: Bug in wait time when waiting on nested subtransaction

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 3:01 PM Tom Lane  wrote:
> One thing we need to be pretty careful of here is to not break the
> promise of atomic commit.  At topmost commit, all subxacts must
> appear committed simultaneously.  It's not quite clear to me whether
> we need a similar guarantee in the rollback case.  It seems like
> we shouldn't, but maybe I'm missing something, in which case maybe
> the current behavior is correct?

AFAICS, the behavior that Simon is complaining about here is an
exception to the way things work in general, and therefore his
complaint is well-founded. AbortSubTransaction() arranges to release
resources, whereas CommitSubTransaction() arranges to have them
transferred to the parent transaction. This isn't necessarily obvious
from looking at those functions, but if you drill down into the
AtEO(Sub)Xact functions that they call, that's what happens in nearly
all cases. Given that subtransaction abort releases resources
immediately, it seems pretty fair to wonder what the value is in
waiting for its parent or the topmost transaction. I don't see how
that can be necessary for correctness.

The commit message to which Simon (3c27944fb2141) points seems to have
inadvertently changed the behavior while trying to fix a bug and
improve performance. I remember being a bit skeptical about that fix
at the time. Back in the day, you couldn't XactLockTableWait() unless
you knew that the transaction had already started. That commit tried
to make it so that you could XactLockTableWait() earlier, because
that's apparently something that logical replication needs to do. But
that is a major redefinition of the charter of that function, and I am
wondering whether it was a mistake to fold together the thing that we
need in normal cases (which is to wait for a transaction we know has
started and may not have finished) from the thing we need in the
logical decoding case (which apparently has different requirements).
Maybe we should have solved that problem by finding a way to wait for
the transaction to start, and then afterwards wait for it to end. Or
maybe we should have altogether different entrypoints for the two
requirements. Or maybe using one function is fine but we just need it
to be more correct. I'm not really sure.

In short, I think Simon is right that there's a problem and right
about which commit caused it, but I'm not sure what I think we ought
to do about it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fixing CREATEROLE

2022-11-28 Thread David G. Johnston

On Mon, Nov 28, 2022 at 1:28 PM Robert Haas  wrote:

> On Mon, Nov 28, 2022 at 3:02 PM Mark Dilger
>  wrote:
>

> You can argue that a grant with INHERIT FALSE, SET FALSE, ADMIN TRUE
> still grants membership, and I think formally that's true, but I also
> think it's just picking something to bicker about. The need isn't to
> separate membership per se from administration. It's to separate
> privilege inheritance and the ability to SET ROLE from role
> administration. And I've done that.
>

We seem to now be in agreement on this design choice, and the related bit
about bootstrap superuser granting admin on newly created roles by the
createrole user.

This seems like a patch in its own right.

It still leaves open the default membership behavior as well as whether we
want to rework the attributes into predefined roles.

> I strongly disagree with the idea that the ability for users to
> control defaults here isn't needed.

That's fine, but are you saying this patch is incapable (or simply
undesirable) of having the parts about handling defaults separated out from
the parts that define how the system works with a given set of permissions;
and the one implementation detail of having the bootstrap superuser
automatically grant admin to any roles a createuser role creates? If you
and others feel strongly about defaults I'm sure that the suggested other
thread focused on that will get attention and be committed in a timely
manner.  But the system will work, and not be broken, if that got stalled,
and it could be added in later.

David J.

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 16:04:12 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > - to
> > combat the space usage we likely should switch to representing the ids as
> > variable width integers or such, otherwise the space increase would likely 
> > be
> > prohibitive.
> 
> And I don't buy that either.  An extra 4 bytes with a 2K payload is not
> "prohibitive", it's more like "down in the noise".

The space usage for the the the toast relation + index itself is indeed
irrelevant. Where it's not "down in the noise" is in struct varatt_external,
i.e. references to external toast datums. The size of that already is an
issue.

Greetings,

Andres Freund

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Nikita Malakhov

Hi,

Andres Freund  writes:
>Was the issue that you exceeded 4 billion toasted datums, or that
assignment
>took a long time? How many toast datums did you actually have? Was this
due to
>very wide rows leading to even small datums getting toasted?

Yep, we had 4 billion toasted datums. I remind that currently relation has
single
TOAST table for all toastable attributes, so it is not so difficult to get
to 4 billion
of toasted values.

>I think if we switch to int8 keys and widen the global OID counter to 8
>bytes (using just the low 4 bytes for other purposes), we'll have a
>perfectly fine solution.  There is still plenty of work to be done under
>that plan, because of the need to maintain backward compatibility for
>existing TOAST tables --- and maybe people would want an option to keep on
>using them, for non-enormous tables?  If we add additional work on top of
>that, it'll just mean that it will take longer to have any solution.

I agree, but:
1) Global OID counter is used not only for TOAST, so there would be a lot of
places where the short counter (low part of 64 OID, if we go with that) is
used;
2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
there
is some way to distinguish old tables from new ones?

But I don't see any reason to keep an old short ID as an option.

...
>Yeah, that is completely horrid.  It does not remove the existing failure
>mode, just changes it to have worse consequences.

On Tue, Nov 29, 2022 at 12:04 AM Tom Lane  wrote:

> Andres Freund  writes:
> > I think the first step to improve the situation is to not use a global
> oid
> > counter for toasted values. One way to do that would be to use the
> sequence
> > code to do oid assignment, but we likely can find a more efficient
> > representation.
>
> I don't particularly buy that, because the only real fix is this:
>
> > Eventually we should do the obvious thing and make toast ids 64bit wide
>
> and if we do that there'll be no need to worry about multiple counters.
>
> > - to
> > combat the space usage we likely should switch to representing the ids as
> > variable width integers or such, otherwise the space increase would
> likely be
> > prohibitive.
>
> And I don't buy that either.  An extra 4 bytes with a 2K payload is not
> "prohibitive", it's more like "down in the noise".
>
> I think if we switch to int8 keys and widen the global OID counter to 8
> bytes (using just the low 4 bytes for other purposes), we'll have a
> perfectly fine solution.  There is still plenty of work to be done under
> that plan, because of the need to maintain backward compatibility for
> existing TOAST tables --- and maybe people would want an option to keep on
> using them, for non-enormous tables?  If we add additional work on top of
> that, it'll just mean that it will take longer to have any solution.
>
> >> Quick fix for this problem is limiting GetNewOidWithIndex loops to some
> >> reasonable amount defined by related macro and returning error if there
> is
> >> still no available Oid. Please check attached patch, any feedback is
> >> appreciated.
>
> > This feels like the wrong spot to tackle the issue.
>
> Yeah, that is completely horrid.  It does not remove the existing failure
> mode, just changes it to have worse consequences.
>
> regards, tom lane
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Patch: Global Unique Index

2022-11-28 Thread Bruce Momjian

On Fri, Nov 25, 2022 at 05:03:06PM -0800, David Zhang wrote:
> Hi Bruce,
> 
> Thank you for helping review the patches in such detail.
> 
> On 2022-11-25 9:48 a.m., Bruce Momjian wrote:
> 
> Looking at the patch, I am unclear how the the patch prevents concurrent
> duplicate value insertion during the partitioned index checking.  I am
> actually not sure how that can be done without locking all indexes or
> inserting placeholder entries in all indexes.  (Yeah, that sounds bad,
> unless I am missing something.)
> 
> For the uniqueness check cross all partitions, we tried to follow the
> implementation of uniqueness check on a single partition, and added a loop to
> check uniqueness on other partitions after the index tuple has been inserted 
> to
> current index partition but before this index tuple has been made visible. The
> uniqueness check will wait `XactLockTableWait` if there is a valid transaction
> in process, and performs the uniqueness check again after the in-process
> transaction finished.

I can't see why this wouldn't work, but I also can't think of any cases
where we do this in our code already, so it will need careful
consideration.

We kind of do this for UPDATE and unique key conflicts, but only for a
single index entry. where we peek and sleep on pending changes, but not
across indexes.

> I am not quite sure if this is a proper way to deal with a deadlock in this
> case. It would be so grateful if someone could help provide some cases/methods
> to verify this cross all partitions uniqueness.

I assume you are using our existing deadlock detection code, and just
sleeping in various indexes and expecting deadlock detection to happen.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Slow standby snapshot

2022-11-28 Thread Tom Lane

Michail Nikolaev  writes:
>> * when idle - since we have time to do it when that happens, which
>> happens often since most workloads are bursty

> I have added getting of ProcArrayLock for this case.

That seems like a fairly bad idea: it will add extra contention
on ProcArrayLock, and I see no real strong argument that the path
can't get traversed often enough for that to matter.  It would
likely be better for KnownAssignedXidsCompress to obtain the lock
for itself, only after it knows there is something worth doing.
(This ties back to previous discussion: the comment claiming it's
safe to read head/tail because we have the lock is misguided.
It's safe because we're the only process that changes them.
So we can make the heuristic decision before acquiring lock.)

While you could use the "reason" code to decide whether you need
to take the lock, it might be better to add a separate boolean
argument specifying whether the caller already has the lock.

Beyond that, I don't see any issues except cosmetic ones.

> Also, I think while we still in the context, it is good to add:
> * Simon's optimization (1) for KnownAssignedXidsRemoveTree (it is
> simple and effective for some situations without any seen drawbacks)
> * Maybe my patch (2) for replacing known_assigned_xids_lck with memory 
> barrier?

Doesn't seem like we have any hard evidence in favor of either of
those being worth doing.  We especially haven't any evidence that
they'd still be worth doing after this patch.  I'd be willing to
make the memory barrier change anyway, because that seems like
a simple change that can't hurt.  I'm less enthused about the
patch at (1) because of the complexity it adds.

regards, tom lane

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 4:09 PM Peter Geoghegan  wrote:
> Granted, the specifics of the current XidStopLimit mechanism are
> unlikely to directly carry over to 64-bit XIDs. XidStopLimit is
> structured in a way that doesn't actually consider freeze debt in
> units like unfrozen pages. Like Chris, I just don't see why the patch
> obviates the need for something like XidStopLimit, since the patch
> doesn't remove freezing. An improved XidStopLimit mechanism might even
> end up kicking in *before* the oldest relfrozenxid reached 2 billion
> XIDs, depending on the specifics.

What is the purpose of using 64-bit XIDs, if not to avoid having to
stop the world when we run short of XIDs?

I'd say that if this patch, or any patch with broadly similar goals,
fails to remove xidStopLimit, it might as well not exist.

xidStopLimit is not a general defense against falling too far behind
on freezing, and in general, there is no reason to think that we need
such a defense. xidStopLimit is a defense against reusing the same XID
and thus causing irreversible database corruption. When that
possibility no longer exists, it has outlived its usefulness and we
should be absolutely delighted to bid it adieu.

It seems like you and Chris are proposing the moral equivalent of
paying off your home loan but still sending a check to the mortgage
company every month just to be sure they don't get mad.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-28 Thread Andrey Borodin

On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian  wrote:
>
> You can prepare a patch, but it unlikely to get much interest until you
> get agreement on what the behavior should be.

We discussed the approach on 2020's Unconference [0]. And there kind
of was an agreement.
Then I made a presentation on FOSDEM with all the details [1].
The patch had been on commitfest since 2019 [2]. There were reviewers
in the CF entry, and we kind of had an agreement.
Jeff Davis proposed a similar patch [3]. And we certainly agreed about cancels.
And now Bharath is proposing the same.

We have the interest and agreement.

Best regards, Andrey Borodin.

[0] 
https://wiki.postgresql.org/wiki/PgCon_2020_Developer_Unconference/Edge_cases_of_synchronous_replication_in_HA_solutions
[1] 
https://archive.fosdem.org/2021/schedule/event/postgresql_caveats_of_replication/attachments/slides/4365/export/events/attachments/postgresql_caveats_of_replication/slides/4365/sides.pdf
[2] https://commitfest.postgresql.org/26/2402/
[3] 
https://www.postgresql.org/message-id/flat/6a052e81060824a8286148b1165bafedbd7c86cd.camel%40j-davis.com#415dc2f7d41b8a251b419256407bb64d

Re: Failed Assert in pgstat_assoc_relation

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> Something like the attached. Still needs a bit of polish, e.g. adding the test
> case from above.

> I'm a bit uncomfortable adding a function call below
>* Perform swapping of the relcache entry contents.  Within this
>* process the old entry is momentarily invalid, so there 
> *must* be no
>* possibility of CHECK_FOR_INTERRUPTS within this sequence. Do 
> it in
>* all-in-line code for safety.

Ugh.  I don't know what pgstat_unlink_relation does, but assuming
that it can never throw an error seems like a pretty bad idea,
especially when you aren't adding that to its API spec (contrast
the comments for MemoryContextSetParent).

Can't that part be done outside the critical section?

regards, tom lane

Re: CF 2022-11: entries "Ready for Committer" with recent activity

2022-11-28 Thread Bruce Momjian

On Sun, Nov 27, 2022 at 01:29:18PM +0900, Ian Lawrence Barwick wrote:
> Transaction Management docs (2022-11-23)
> 
> 
> - https://commitfest.postgresql.org/40/3899/
> - 
> https://www.postgresql.org/message-id/flat/canbhv-e_iy9fmrerxrch8tztyenpfo72hf_xd2hldppva4d...@mail.gmail.com
> 
> Seems committable.

I plan to commit this in a few hours.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> On 2022-11-28 16:04:12 -0500, Tom Lane wrote:
>> And I don't buy that either.  An extra 4 bytes with a 2K payload is not
>> "prohibitive", it's more like "down in the noise".

> The space usage for the the the toast relation + index itself is indeed
> irrelevant. Where it's not "down in the noise" is in struct varatt_external,
> i.e. references to external toast datums.

Ah, gotcha.  Yes, the size of varatt_external is a problem.

regards, tom lane

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-29 00:24:49 +0300, Nikita Malakhov wrote:
> 2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
> there is some way to distinguish old tables from new ones?

The catalog / relcache entry should suffice to differentiate between the two.

Greetings,

Andres Freund

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Bruce Momjian

On Mon, Nov 28, 2022 at 04:30:22PM -0500, Robert Haas wrote:
> What is the purpose of using 64-bit XIDs, if not to avoid having to
> stop the world when we run short of XIDs?
> 
> I'd say that if this patch, or any patch with broadly similar goals,
> fails to remove xidStopLimit, it might as well not exist.
> 
> xidStopLimit is not a general defense against falling too far behind
> on freezing, and in general, there is no reason to think that we need
> such a defense. xidStopLimit is a defense against reusing the same XID
> and thus causing irreversible database corruption. When that
> possibility no longer exists, it has outlived its usefulness and we
> should be absolutely delighted to bid it adieu.
> 
> It seems like you and Chris are proposing the moral equivalent of
> paying off your home loan but still sending a check to the mortgage
> company every month just to be sure they don't get mad.

I think the problem is that we still have bloat with 64-bit XIDs,
specifically pg_xact and pg_multixact files.  Yes, that bloat is less
serious, but it is still an issue worth reporting in the server logs,
though not serious enough to stop the server from write queries.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Peter Geoghegan

On Mon, Nov 28, 2022 at 1:30 PM Robert Haas  wrote:
> What is the purpose of using 64-bit XIDs, if not to avoid having to
> stop the world when we run short of XIDs?

I agree that the xidStopLimit mechanism was designed with the specific
goal of preventing "true" physical XID wraparound that results in
wrong answers to queries. It was not written with the intention of
limiting the accumulation of freeze debt, which would have to use
units like unfrozen heap pages to make any sense. That is a historic
fact -- no question.

I think that it is nevertheless quite possible that just refusing to
allocate any more XIDs could make sense with 64-bit XIDs, where we
don't strictly have to to keep the system fully functional. That might
still be the lesser evil, in that other world. The cutoff itself would
depend on many workload details, I suppose.

Imagine if we actually had 64-bit XIDs -- let's assume for a moment
that it's a done deal. This raises a somewhat awkward question: do you
just let the system get further and further behind on freezing,
forever? We can all agree that 2 billion XIDs is very likely the wrong
time to start refusing new XIDs -- because it just isn't designed with
debt in mind. But what's the right time, if any? How much debt is too
much?

At the very least these seem like questions that deserve serious consideration.

> I'd say that if this patch, or any patch with broadly similar goals,
> fails to remove xidStopLimit, it might as well not exist.

Well, it could in principle be driven by lots of different kinds of
information, and make better decisions by actually targeting freeze
debt in some way.  An "enhanced version of xidStopLimit with 64-bit
XIDs" could kick in far far later than it currently would. Obviously
that has some value.

I'm not claiming to know how to build this "better xidStopLimit
mechanism", by the way. I'm not seriously proposing it. Mostly I'm
just saying that the question "where do you draw the line if not at 2
billion XIDs?" is a very pertinent question. It is not a question that
is made any less valid by the fact that we already know that 2 billion
XIDs is pretty far from optimal in almost all cases. Having some limit
based on something is likely to be more effective than having no limit
based on nothing.

Admittedly this argument works a lot better with the failsafe than it
does with xidStopLimit. Both are removed by the patch.

--
Peter Geoghegan

Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

2022-11-28 Thread Bruce Momjian

On Mon, Nov 28, 2022 at 01:31:39PM -0800, Andrey Borodin wrote:
> On Mon, Nov 28, 2022 at 12:59 PM Bruce Momjian  wrote:
> >
> > You can prepare a patch, but it unlikely to get much interest until you
> > get agreement on what the behavior should be.
> 
> We discussed the approach on 2020's Unconference [0]. And there kind
> of was an agreement.
> Then I made a presentation on FOSDEM with all the details [1].
> The patch had been on commitfest since 2019 [2]. There were reviewers
> in the CF entry, and we kind of had an agreement.
> Jeff Davis proposed a similar patch [3]. And we certainly agreed about 
> cancels.
> And now Bharath is proposing the same.
> 
> We have the interest and agreement.

Okay, I was not aware we had such broad agreement.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.

Re: fixing CREATEROLE

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 4:19 PM David G. Johnston
 wrote:
> That's fine, but are you saying this patch is incapable (or simply 
> undesirable) of having the parts about handling defaults separated out from 
> the parts that define how the system works with a given set of permissions; 
> and the one implementation detail of having the bootstrap superuser 
> automatically grant admin to any roles a createuser role creates? If you and 
> others feel strongly about defaults I'm sure that the suggested other thread 
> focused on that will get attention and be committed in a timely manner.  But 
> the system will work, and not be broken, if that got stalled, and it could be 
> added in later.

The topics are so closely intertwined that I don't believe that trying
to have separate discussions will be useful or productive. There's no
hope of anybody understanding 0004 or having an educated opinion about
it without first understanding the earlier patches, and there's no
requirement that someone has to review 0004, or like it, just because
they review or like 0001-0003.

But so far nobody has actually reviewed anything, and all that's
happened is people have complained about 0004 for reasons which in my
opinion are pretty nebulous and largely ignore the factors that caused
it to exist in the first place. We had about 400 emails during the
last release cycle arguing about a whole bunch of topics related to
user management, and it became absolutely crystal clear in that
discussion that Stephen Frost and David Steele wanted to have roles
that could create other roles but not immediately be able to access
their privileges. Mark and I, on the other hand, wanted to have roles
that could create other roles WITH immediate access to their
privileges. That argument was probably the main thing that derailed
that entire patch set, which represented months of work by Mark. Now,
I have come up with a competing patch set that for the price of 100
lines of code and a couple of slightly ugly option names can do either
thing. So Stephen and David and any like-minded users can have what
they want, and Mark and I and any like-minded users can have what we
want. And the result is that I've got like five people, some of whom
particulated in those discussions, showing up to say "hey, we don't
need the ability to set defaults." Well, if that's the case, then why
did we have hundreds and hundreds of emails within the last 12 months
arguing about which way it should work?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> On 2022-11-29 00:24:49 +0300, Nikita Malakhov wrote:
>> 2) Upgrading to 64-bit id would require re-toasting old TOAST tables. Or
>> there is some way to distinguish old tables from new ones?

> The catalog / relcache entry should suffice to differentiate between the two.

Yeah, you could easily look at the datatype of the first attribute
(in either the TOAST table or its index) to determine what to do.

As I said before, I think there's a decent argument that some people
will want the option to stay with 4-byte TOAST OIDs indefinitely,
at least for smaller tables.  So even without the fact that forced
conversions would be horridly expensive, we'll need to continue
support for both forms of TOAST table.

regards, tom lane

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Peter Geoghegan

On Mon, Nov 28, 2022 at 1:52 PM Bruce Momjian  wrote:
> I think the problem is that we still have bloat with 64-bit XIDs,
> specifically pg_xact and pg_multixact files.  Yes, that bloat is less
> serious, but it is still an issue worth reporting in the server logs,
> though not serious enough to stop the server from write queries.

That's definitely a big part of it.

Again, I don't believe that the idea is fundamentally without merit.
Just that it's not worth it, given that having more XID space is very
much not something that I think fixes most of the problems. And given
the real risk of serious bugs with something this invasive.

I believe that it would be more useful to focus on just not getting
into trouble in the first place, as well as on mitigating specific
problems that lead to the system reaching xidStopLimit in practice. I
don't think that there is any good reason to allow datfrozenxid to go
past about a billion. When it does the interesting questions are
questions about what went wrong, and how that specific failure can be
mitigated in a fairly direct way.

We've already used way to much "XID space runway", so why should using
even more help? It might, I suppose, but it almost seems like a
distraction to me, as somebody that wants to make things better for
users in general. As long as the system continues to misbehave (in
whatever way it happens to be misbehaving), why should any amount of
XID space ever be enough?

I think that we'll be able to get rid of freezing in a few years time.
But as long as we have freezing, we have these problems.

-- 
Peter Geoghegan

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Andres Freund

Hi,

On 2022-11-28 16:57:53 -0500, Tom Lane wrote:
> As I said before, I think there's a decent argument that some people
> will want the option to stay with 4-byte TOAST OIDs indefinitely,
> at least for smaller tables.

I think we'll need to do something about the width of varatt_external to make
the conversion to 64bit toast oids viable - and if we do, I don't think
there's a decent argument for staying with 4 byte toast OIDs. I think the
varatt_external equivalent would end up being smaller in just about all cases.
And as you said earlier, the increased overhead inside the toast table / index
is not relevant compared to the size of toasted datums.

Greetings,

Andres Freund

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Nikita Malakhov

Hi,

I'll check that tomorrow. If it is so then there won't be a problem keeping
old tables without re-toasting.

On Tue, Nov 29, 2022 at 1:10 AM Andres Freund  wrote:

> Hi,
>
> On 2022-11-28 16:57:53 -0500, Tom Lane wrote:
> > As I said before, I think there's a decent argument that some people
> > will want the option to stay with 4-byte TOAST OIDs indefinitely,
> > at least for smaller tables.
>
> I think we'll need to do something about the width of varatt_external to
> make
> the conversion to 64bit toast oids viable - and if we do, I don't think
> there's a decent argument for staying with 4 byte toast OIDs. I think the
> varatt_external equivalent would end up being smaller in just about all
> cases.
> And as you said earlier, the increased overhead inside the toast table /
> index
> is not relevant compared to the size of toasted datums.
>
> Greetings,
>
> Andres Freund
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Tom Lane

Andres Freund  writes:
> On 2022-11-28 16:57:53 -0500, Tom Lane wrote:
>> As I said before, I think there's a decent argument that some people
>> will want the option to stay with 4-byte TOAST OIDs indefinitely,
>> at least for smaller tables.

> And as you said earlier, the increased overhead inside the toast table / index
> is not relevant compared to the size of toasted datums.

Perhaps not.

> I think we'll need to do something about the width of varatt_external to make
> the conversion to 64bit toast oids viable - and if we do, I don't think
> there's a decent argument for staying with 4 byte toast OIDs. I think the
> varatt_external equivalent would end up being smaller in just about all cases.

I agree that we can't simply widen varatt_external to use 8 bytes for
the toast ID in all cases.  Also, I now get the point about avoiding
use of globally assigned OIDs here: if the counter starts from zero
for each table, then a variable-width varatt_external could actually
be smaller than currently for many cases.  However, that bit is somewhat
orthogonal, and it's certainly not required for fixing the basic problem.

So it seems like the plan of attack ought to be:

1. Invent a new form or forms of varatt_external to allow different
widths of the toast ID.  Use the narrowest width possible for any
given ID value.

2. Allow TOAST tables/indexes to store either 4-byte or 8-byte IDs.
(Conversion could be done as a side effect of table-rewrite
operations, perhaps.)

3. Localize ID selection so that tables can have small toast IDs
even when other tables have many IDs.  (Optional, could do later.)

regards, tom lane

Re: [PATCH] Infinite loop while acquiring new TOAST Oid

2022-11-28 Thread Nikita Malakhov

I've missed that -

>4 billion external datums
>typically use a lot of space.

Not quite so. It's 8 Tb for the minimal size of toasted data (about 2 Kb).
In my practice tables with more than 5 billions of rows are not of
something out
of the ordinary (highly loaded databases with large amounts of data in use).

On Tue, Nov 29, 2022 at 1:12 AM Nikita Malakhov  wrote:

> Hi,
>
> I'll check that tomorrow. If it is so then there won't be a problem keeping
> old tables without re-toasting.
>
> On Tue, Nov 29, 2022 at 1:10 AM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2022-11-28 16:57:53 -0500, Tom Lane wrote:
>> > As I said before, I think there's a decent argument that some people
>> > will want the option to stay with 4-byte TOAST OIDs indefinitely,
>> > at least for smaller tables.
>>
>> I think we'll need to do something about the width of varatt_external to
>> make
>> the conversion to 64bit toast oids viable - and if we do, I don't think
>> there's a decent argument for staying with 4 byte toast OIDs. I think the
>> varatt_external equivalent would end up being smaller in just about all
>> cases.
>> And as you said earlier, the increased overhead inside the toast table /
>> index
>> is not relevant compared to the size of toasted datums.
>>
>> Greetings,
>>
>> Andres Freund
>>
>
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Preventing indirection for IndexPageGetOpaque for known-size page special areas

2022-11-28 Thread Tom Lane

Bruce Momjian  writes:
> Uh, XTS doesn't require a nonce, so why are talking about nonces in this
> thread?

Because some other proposals do, or could, require a per-page nonce.

After looking through this thread, I side with Robert: we should reject
the remainder of this patch.  It gives up page layout flexibility that
we might want back someday.  Moreover, I didn't see any hard evidence
offered that there's any actual performance gain, let alone such a
compelling gain that we should give up that flexibility for it.

regards, tom lane

Re: Add 64-bit XIDs into PostgreSQL 15

2022-11-28 Thread Peter Geoghegan

On Mon, Nov 28, 2022 at 1:52 PM Peter Geoghegan  wrote:
> I'm not claiming to know how to build this "better xidStopLimit
> mechanism", by the way. I'm not seriously proposing it. Mostly I'm
> just saying that the question "where do you draw the line if not at 2
> billion XIDs?" is a very pertinent question. It is not a question that
> is made any less valid by the fact that we already know that 2 billion
> XIDs is pretty far from optimal in almost all cases. Having some limit
> based on something is likely to be more effective than having no limit
> based on nothing.
>
> Admittedly this argument works a lot better with the failsafe than it
> does with xidStopLimit. Both are removed by the patch.

Come to think of it, if you were going to do something like this it
would probably work by throttling XID allocations, with a gradual
ramp-up. It would rarely get to the point that the system refused to
allocate XIDs completely.

It's not fundamentally unreasonable to force the application to live
within its means by throttling. Feedback that slows down the rate of
writes is much more common in the LSM tree world, within systems like
MyRocks [1], where the risk of the system being destabilized by debt
is more pressing.

As I said, I don't think that this is a particularly promising way of
addressing problems with Postgres XID space exhaustion, since I
believe that the underlying issue isn't usually a simple lack of "XID
space slack capacity". But if you assume that I'm wrong here (if you
assume that we very often don't have the ability to freeze lazily
enough), then ISTM that throttling or feedback to stall new writes is
a very reasonable option. In fact, it's practically mandatory.

[1] 
https://docs.google.com/presentation/d/1WgP-SlKay5AnSoVDSvOIzmu7edMmtYhdywoa0oAR4JQ/edit#slide=id.g8839c9d71b_0_79
-- 
Peter Geoghegan

Re: Reducing power consumption on idle servers

2022-11-28 Thread Thomas Munro

I found some more comments and some documentation to word-smith very
lightly, and pushed.  The comments were stray references to the
trigger file.  It's
a little confusing because the remaining mechanism also uses a file,
but it uses a signal first so seems better to refer to promotion
requests rather than files.

/me wonders how many idle servers there are in the world

Re: Report roles in pg_upgrade pg_ prefix check

2022-11-28 Thread Michael Paquier

On Mon, Nov 28, 2022 at 09:58:46AM +0100, Daniel Gustafsson wrote:
> We are a bit inconsistent in how much details we include in the report
> textfiles, so could do that without breaking any consistency in reporting.
> Looking at other checks, the below format would match what we already do 
> fairly
> well:
> 
>  (oid=)
> 
> Done in the attached.

WFM.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: fixing CREATEROLE

2022-11-28 Thread David G. Johnston

On Mon, Nov 28, 2022 at 2:55 PM Robert Haas  wrote:

> On Mon, Nov 28, 2022 at 4:19 PM David G. Johnston
>  wrote:
> > That's fine, but are you saying this patch is incapable (or simply
> undesirable) of having the parts about handling defaults separated out from
> the parts that define how the system works with a given set of permissions;
> and the one implementation detail of having the bootstrap superuser
> automatically grant admin to any roles a createuser role creates? If you
> and others feel strongly about defaults I'm sure that the suggested other
> thread focused on that will get attention and be committed in a timely
> manner.  But the system will work, and not be broken, if that got stalled,
> and it could be added in later.
>
> The topics are so closely intertwined that I don't believe that trying
> to have separate discussions will be useful or productive. There's no
> hope of anybody understanding 0004 or having an educated opinion about
> it without first understanding the earlier patches, and there's no
> requirement that someone has to review 0004, or like it, just because
> they review or like 0001-0003.
>
> But so far nobody has actually reviewed anything

> Well, if that's the case, then why
> did we have hundreds and hundreds of emails within the last 12 months
> arguing about which way it should work?
>
>
When ya'll come to some final conclusion on how you want the defaults to
look, come tell the rest of us.  You already have 4 people debating the
matter, I don't really see the point of adding more voices to that
cachopany.  As you noted - voicing an opinion about 0004 is optional.

I'll reiterate my review from before, with a bit more confidence this time.

0001-0003 implements a desirable behavior change.  In order for someone to
make some other role a member in some third role that someone must have
admin privileges on both other roles.  CREATEROLE is not exempt from this
rule.  A user with CREATEROLE will, upon creating a new role, be granted
admin privilege on that role by the bootstrap superuser.

The consequence of 0001-0003 in the current environment is that since the
newly created CREATEROLE user will not have admin rights on any existing
roles in the cluster, while they can create new roles in the system they
are unable to grant those new roles membership in any other roles not also
created by them.  The ability to assign attributes to newly created roles
is unaffected.

As a unit of work, those are "ready-to-commit" for me.  I'll leave it to
you and others to judge the technical quality of the patch and finishing up
the FIXMEs that have been noted.

Desirable follow-on patches include:

1) Automatically install an additional membership grant, with the
CREATEROLE user as the grantor, specifying INHERIT OR SET as TRUE (I
personally favor attaching these to ALTER ROLE, modifiable only by oneself)

2) Convert Attributes into default roles

David J.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-11-28 Thread Nathan Bossart

Okay, here is a new patch set.  0004 adds logic to prevent custodian tasks
from delaying shutdown.

I haven't added any logging for long-running tasks yet.  Tasks might
ordinarily take a while, so such logs wouldn't necessarily indicate
something is wrong.  Perhaps we could add a GUC for the amount of time to
wait before logging.  This feature would be off by default.  Another option
could be to create a log_custodian GUC that causes tasks to be logged when
completed, similar to log_checkpoints.  Thoughts?

On Mon, Nov 28, 2022 at 01:37:01PM -0500, Robert Haas wrote:
> On Mon, Nov 28, 2022 at 1:31 PM Andres Freund  wrote:
>> On 2022-11-28 13:08:57 +, Simon Riggs wrote:
>> > On Sun, 27 Nov 2022 at 23:34, Nathan Bossart  
>> > wrote:
>> > > > Rather than explicitly use DEBUG1 everywhere I would have an
>> > > > #define CUSTODIAN_LOG_LEVEL LOG
>> > > > so we can run with it in LOG mode and then set it to DEBUG1 with a one
>> > > > line change in a later phase of Beta
>> > >
>> > > I can create a separate patch for this, but I don't think I've ever seen
>> > > this sort of thing before.
>> >
>> > Much of recovery is coded that way, for the same reason.
>>
>> I think that's not a good thing to copy without a lot more justification than
>> "some old code also does it that way". It's sometimes justified, but also
>> makes code harder to read (one doesn't know what it does without looking up
>> the #define, line length).
> 
> Yeah. If people need some of the log messages at a higher level during
> development, they can patch their own copies.
> 
> I think there might be some argument for having a facility that lets
> you pick subsystems or even individual messages that you want to trace
> and pump up the log level for just those call sites. But I don't know
> exactly what that would look like, and I don't think inventing one-off
> mechanisms for particular cases is a good idea.

Given this discussion, I haven't made any changes to the logging in the new
patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 7fa5c047781dddedb1f9c5a4e96622a23c0c0835 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 5 Jan 2022 19:24:22 +
Subject: [PATCH v15 1/4] Introduce custodian.

The custodian process is a new auxiliary process that is intended
to help offload tasks could otherwise delay startup and
checkpointing.  This commit simply adds the new process; it does
not yet do anything useful.
---
 src/backend/postmaster/Makefile |   1 +
 src/backend/postmaster/auxprocess.c |   8 +
 src/backend/postmaster/custodian.c  | 382 
 src/backend/postmaster/meson.build  |   1 +
 src/backend/postmaster/postmaster.c |  38 ++-
 src/backend/storage/ipc/ipci.c  |   3 +
 src/backend/storage/lmgr/proc.c |   1 +
 src/backend/utils/activity/wait_event.c |   3 +
 src/backend/utils/init/miscinit.c   |   3 +
 src/include/miscadmin.h |   3 +
 src/include/postmaster/custodian.h  |  32 ++
 src/include/storage/proc.h  |  11 +-
 src/include/utils/wait_event.h  |   1 +
 13 files changed, 482 insertions(+), 5 deletions(-)
 create mode 100644 src/backend/postmaster/custodian.c
 create mode 100644 src/include/postmaster/custodian.h

diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile
index 3a794e54d6..e1e1d1123f 100644
--- a/src/backend/postmaster/Makefile
+++ b/src/backend/postmaster/Makefile
@@ -18,6 +18,7 @@ OBJS = \
 	bgworker.o \
 	bgwriter.o \
 	checkpointer.o \
+	custodian.o \
 	fork_process.o \
 	interrupt.o \
 	pgarch.o \
diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index 7765d1c83d..c275271c95 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -20,6 +20,7 @@
 #include "pgstat.h"
 #include "postmaster/auxprocess.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/custodian.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/walreceiver.h"
@@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 		case CheckpointerProcess:
 			MyBackendType = B_CHECKPOINTER;
 			break;
+		case CustodianProcess:
+			MyBackendType = B_CUSTODIAN;
+			break;
 		case WalWriterProcess:
 			MyBackendType = B_WAL_WRITER;
 			break;
@@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype)
 			CheckpointerMain();
 			proc_exit(1);
 
+		case CustodianProcess:
+			CustodianMain();
+			proc_exit(1);
+
 		case WalWriterProcess:
 			WalWriterMain();
 			proc_exit(1);
diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c
new file mode 100644
index 00..a94381bc21
--- /dev/null
+++ b/src/backend/postmaster/custodian.c
@@ -0,0 +1,382 @@
+/*-
+ *
+ * custodian.c
+ *
+ * The custodian process handles a variety of non-critical tasks that might
+ * otherwise delay s

Re: fixing CREATEROLE

2022-11-28 Thread Robert Haas

On Mon, Nov 28, 2022 at 4:55 PM Robert Haas  wrote:
> But so far nobody has actually reviewed anything, ...

Actually this isn't true. Mark did review. Thanks, Mark.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

1 2 >

1 - 100 of 137 matches

Mail list logo