Re: proposal: possibility to read dumped table's name from file

2022-11-13 Thread Julien Rouhaud
On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote:

Thanks for the updated patch.  Apart from the function comment it looks good to
me.

Justin, did you have any other comment on the patch?

> > I don't fully understand the part about subpatterns, but is that necessary
> > to
> > describe it?  Simply saying that any valid and possibly-quoted identifier
> > can
> > be parsed should make it clear that identifiers containing \n characters
> > should
> > work too.  Maybe also just mention that whitespaces are removed and special
> > care is taken to output routines in exactly the same way calling code will
> > expect it (that is comma-and-single-space type delimiter).
> >
>
> In this case I hit the limits of my English language skills.
>
> I rewrote this comment, but it needs more care. Please, can you look at it?

I'm also not a native English speaker so I'm far for writing perfect comments
myself :)

Maybe something like

/*
 * read_pattern - reads on object pattern from input
 *
 * This function will parse any valid identifier (quoted or not, qualified or
 * not), which can also includes the full signature for routines.
 * Note that this function takes special care to sanitize the detected
 * identifier (removing extraneous whitespaces or other unnecessary
 * characters).  This is necessary as most backup/restore filtering functions
 * only recognize identifiers if they are written exactly way as they are
 * regenerated.
 * Returns a pointer to next character after the found identifier, or NULL on
 * error.
 */




Re: libpq support for NegotiateProtocolVersion

2022-11-13 Thread Peter Eisentraut

On 11.11.22 23:28, Jacob Champion wrote:

Consider the case where the server sends a
NegotiateProtocolVersion with a reasonable length, but then runs over
its own message (either by sending an unterminated string as one of the
extension names, or by sending a huge extension number). When I test
that against a client on my machine, it churns CPU and memory waiting
for the end of a message that will never come, even though it had
already decided that the maximum length of the message should have been
less than 2K.

Put another way, why do we loop around and poll for more data when we
hit the end of the connection buffer, if we've already checked at this
point that we should have the entire message buffered locally?


Isn't that the same behavior for other message types?  I don't see 
anything in the handling of the early 'E' and 'R' messages that would 
handle this.  If we want to address this, maybe this should be handled 
in the polling loop before we pass off the input buffer to the 
per-message-type handlers.






Re: refactor ownercheck and aclcheck functions

2022-11-13 Thread Peter Eisentraut

On 09.11.22 19:12, Corey Huinker wrote:

After considering this again, I decided to brute-force this and get rid
of all the trivial wrapper functions and also several of the special
cases.  That way, there is less confusion at the call sites about why
this or that style is used in a particular case.  Also, it now makes
sure you can't accidentally use the generic functions when a particular
one should be used.


+1


committed





Re: libpq error message refactoring

2022-11-13 Thread Peter Eisentraut

On 09.11.22 13:29, Alvaro Herrera wrote:

+/*
+ * Append a formatted string to the given buffer, after translation.  A
+ * newline is automatically appended; the format should not end with a
+ * newline.
+ */


I find the "after translation" bit unclear -- does it mean that the
caller should have already translated, or is it the other way around?  I
would say "Append a formatted string to the given buffer, after
translating it", which (to me) conveys more clearly that translation
occurs here.


ok


+   /* Loop in case we have to retry after enlarging the buffer. */
+   do
+   {
+   errno = save_errno;
+   va_start(args, fmt);
+   done = appendPQExpBufferVA(errorMessage, libpq_gettext(fmt), 
args);


I wonder if it makes sense to do libpq_gettext() just once, instead of
redoing it on each iteration.


I wonder whether that would expose us to potential compiler warnings 
about the format string not being constant.  As long as the compiler can 
trace that the string comes from gettext, it knows what's going on.


Also, most error strings in practice don't need the loop, so maybe it's 
not a big issue.



+/*
+ * appendPQExpBufferVA
+ * Shared guts of printfPQExpBuffer/appendPQExpBuffer.
+ * Attempt to format data and append it to str.  Returns true if done
+ * (either successful or hard failure), false if need to retry.
+ *
+ * Caution: callers must be sure to preserve their entry-time errno
+ * when looping, in case the fmt contains "%m".
+ */
+extern bool appendPQExpBufferVA(PQExpBuffer str, const char *fmt, va_list 
args) pg_attribute_printf(2, 0);


As an API user, I don't care that this is shared guts for something
else, I just care about what it does.  I think deleting that line is a
sufficient fix.


ok


@@ -420,7 +418,8 @@ pqsecure_raw_write(PGconn *conn, const void *ptr, size_t 
len)
snprintf(msgbuf, sizeof(msgbuf),
 libpq_gettext("server closed the 
connection unexpectedly\n"
   "\tThis 
probably means the server terminated abnormally\n"
-  "\tbefore 
or while processing the request.\n"));
+  "\tbefore 
or while processing the request."));
+   strlcat(msgbuf, "\n", sizeof(msgbuf));
conn->write_err_msg = strdup(msgbuf);
/* Now claim the write succeeded */
n = len;


In these two places, we're writing the error message manually to a
separate variable, so the extra \n is necessary.  It looks a bit odd to
do it with strlcat() after the fact, but AFAICT it's necessary as it
keeps the \n out of the translation catalog, which is good.  This is
nonobvious, so perhaps add a comment about it.


ok





Re: New docs chapter on Transaction Management and related changes

2022-11-13 Thread Laurenz Albe
On Thu, 2022-11-10 at 12:17 +0100, Alvaro Herrera wrote:
> On 2022-Nov-10, Laurenz Albe wrote:
> > On Wed, 2022-11-09 at 09:16 -0500, Robert Treat wrote:
> > > > > -  If AND CHAIN is specified, a new 
> > > > > transaction is
> > > > > +  If AND CHAIN is specified, a new unaborted 
> > > > > transaction is
> > > > >    immediately started with the same transaction characteristics 
> > > > > (see  > > > >    linkend="sql-set-transaction"/>) as the just finished one.  
> > > > > Otherwise,
> > > > >    no new transaction is started.
> > 
> > A new transaction is never aborted in my understanding.  Being aborted
> > is not a characteristic of a transaction, but a state.
> 
> I agree, but maybe it's good to make the point explicit, because it
> doesn't seem obvious.  Perhaps something like
> 
> "If X is specified, a new transaction (never in aborted state) is
> immediately started with the same transaction characteristics (see X) as
> the just finished one.  Otherwise ..."
> 
> Getting the wording of that parenthical comment right is tricky, though.
> What I propose above is not great, but I don't know how to make it
> better.  Other ideas that seem slightly worse but may inspire someone:
> 
>   ... a new transaction (which is never in aborted state) is ...
>   ... a new transaction (not in aborted state) is ...
>   ... a new transaction (never aborted, even if the previous is) is ...
>   ... a new (not-aborted) transaction is ...
>   ... a new (never aborted) transaction is ...
>   ... a new (never aborted, even if the previous is) transaction is ...
>   ... a new (never aborted, regardless of the status of the previous one) 
> transaction is ...
> 
> 
> Maybe there's a way to reword the entire phrase that leads to a better
> formulation of the idea.

Any of your auggestions is better than "unaborted".

How about:

  If AND CHAIN is specified, a new transaction is
  immediately started with the same transaction characteristics (see ) as the just finished one.
  This new transaction won't be in the aborted state, even
  if the old transaction was aborted.

Yours,
Laurenz Albe




Re: Making Bitmapsets be valid Nodes

2022-11-13 Thread Peter Eisentraut

On 11.11.22 21:05, Tom Lane wrote:

Per the discussion at [1], it seems like it'd be a good idea to make
Bitmapsets into full-fledged, tagged Nodes, so that we could do things
like print or copy lists of them without special-case logic.  The
extra space for the NodeTag is basically free due to alignment
considerations, at least on 64-bit hardware.

Attached is a cleaned-up version of Amit's patch v24-0003 at [2].
I fixed the problems with not always tagging Bitmapsets, and changed
the outfuncs/readfuncs logic so that Bitmapsets still print exactly
as they did before (thus, this doesn't require a catversion bump).


This looks good to me.





Re: Schema variables - new implementation for Postgres 15

2022-11-13 Thread Pavel Stehule
pá 4. 11. 2022 v 15:28 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> > On Fri, Nov 04, 2022 at 03:17:18PM +0100, Pavel Stehule wrote:
> > > I've stumbled upon something that looks weird to me (inspired by the
> > > example from tests):
> > >
> > > =# create variable v2 as int;
> > > =# let v2 = 3;
> > > =# create view vv2 as select coalesce(v2, 0) + 1000 as result
> > >
> > > =# select * from vv2;
> > >  result
> > >  
> > > 1003
> > >
> > > =# set force_parallel_mode to on;
> > > =# select * from vv2;
> > >  result
> > >  
> > > 1000
> > >
> > > In the second select the actual work is done from a worker backend.
> > > Since values of session variables are stored in the backend local
> > > memory, it's not being shared with the worker and the value is not
> found
> > > in the hash map. Does this suppose to be like that?
> > >
> >
> > It looks like a bug, but parallel queries should be supported.
> >
> > The value of the variable is passed as parameter to the worker backend.
> But
> > probably somewhere the original reference was not replaced by parameter
> >
> > On Fri, Nov 04, 2022 at 10:17:13PM +0800, Julien Rouhaud wrote:
> > Hi,
> >
> > There's code to serialize and restore all used variables for parallel
> workers
> > (see code about PARAM_VARIABLE and queryDesc->num_session_variables /
> > queryDesc->plannedstmt->sessionVariables).  I haven't reviewed that part
> yet,
> > but it's supposed to be working.  Blind guess would be that it's missing
> > something in expression walker.
>
> I see, thanks. I'll take a deeper look, my initial assumption was due to
> the fact that in the worker case create_sessionvars_hashtables is
> getting called for every query.
>

It should be fixed in today's patch

The problem was in missing pushing the hasSessionVariables flag to the
upper subquery in pull_up_simple_subquery.

--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1275,6 +1275,9 @@ pull_up_simple_subquery(PlannerInfo *root, Node
*jtnode, RangeTblEntry *rte,
/* If subquery had any RLS conditions, now main query does too */
parse->hasRowSecurity |= subquery->hasRowSecurity;

+   /* If subquery had session variables, now main query does too */
+   parse->hasSessionVariables |= subquery->hasSessionVariables;
+

Thank you for the check and bug report. Your example was added to regress
tests

Regards

Pavel


Re: Typo about subxip in comments

2022-11-13 Thread Japin Li


On Sat, 12 Nov 2022 at 12:12, Amit Kapila  wrote:
> On Fri, Nov 11, 2022 at 2:45 PM Japin Li  wrote:
>> Maybe a wrong plural in XidInMvccSnapshot().
>>
>>  * Make a quick range check to eliminate most XIDs without looking at the
>>  * xip arrays.
>>
>> I think we should use "xip array" instead of "xip arrays".
>>
>
> I think here the comment is referring to both xip and subxip array, so
> it looks okay to me.
>

Yeah, it means xip in normal case, and subxip in recovery case.

>> Furthermore, if the snapshot is taken during recovery, the xip array is
>> empty, and we should check subxip array. How about changing "xip arrays"
>> to "xip or subxip array"?
>>
>
> I don't know if that is an improvement. I think we should stick to
> your initial proposed change.

Agreed.  Let's focus on the initial proposed change.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Making Bitmapsets be valid Nodes

2022-11-13 Thread Tom Lane
Peter Eisentraut  writes:
> On 11.11.22 21:05, Tom Lane wrote:
>> Attached is a cleaned-up version of Amit's patch v24-0003 at [2].
>> I fixed the problems with not always tagging Bitmapsets, and changed
>> the outfuncs/readfuncs logic so that Bitmapsets still print exactly
>> as they did before (thus, this doesn't require a catversion bump).

> This looks good to me.

Pushed, thanks for looking.

regards, tom lane




Re: Reducing power consumption on idle servers

2022-11-13 Thread Simon Riggs
On Thu, 24 Mar 2022 at 16:02, Simon Riggs  wrote:
>
> On Thu, 24 Mar 2022 at 15:39, Robert Haas  wrote:
> >
> > On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
> >  wrote:
> > > The proposals of this patch are the following, each of which can be
> > > independently accepted/rejected:
> > > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > > (directly affects powersave)
> > > 2. deprecate promote_trigger_file, which then allows us to fix the
> > > sleep for startup process (directly affects powersave)
> > > 3. treat hibernation in all procs the same, for simplicity, and to
> > > make sure we don't forget one later
> > > 4. provide a design pattern for background worker extensions to
> > > follow, so as to encourage powersaving
> >
> > Unfortunately, the patch isn't split in a way that corresponds to this
> > division. I think I'm +1 on change #2 -- deprecating
> > promote_trigger_file seems like a good idea to me independently of any
> > power-saving considerations. But I'm not sure that I am on board with
> > any of the other changes.
>
> OK, so change (2) is good. Separate patch attached.

Thanks to Ian for prompting me to pick up this thread again; apologies
for getting distracted.

The attached patch is a reduced version of the original. It covers only:
* deprecation of the promote_trigger_file - there are no tests that
use that, hence why there is no test coverage for the patch
* changing the sleep time of the startup process to 60s
* docs and comments

Other points will be discussed in another branch of this thread.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v6.patch
Description: Binary data


Re: Data is copied twice when specifying both child and parent table in publication

2022-11-13 Thread vignesh C
  On Fri, 11 Nov 2022 at 11:13, wangw.f...@fujitsu.com
 wrote:
>
> On Fri, Oct 21, 2022 at 17:02 PM Peter Smith  wrote:
> >
>
> Thanks for your comments. Sorry for not replying in time.
>
> > On Mon, Oct 17, 2022 at 4:49 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > On Wed, Oct 5, 2022 at 11:08 AM Peter Smith 
> > wrote:
> > > > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch.
> > >
> > ...
> > > >
> > > > 3. QUESTION - pg_get_publication_tables / fetch_table_list
> > > >
> > > > When the same table is published by different publications (but there
> > > > are other differences like row-filters/column-lists in each
> > > > publication) the result tuple of this function does not include the
> > > > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK
> > > > as-is but how does it manage to associate each table with the correct
> > > > tuple?
> > > >
> > > > I know it apparently all seems to work but I’m not how does that
> > > > happen? Can you explain why a puboid is not needed for the result
> > > > tuple of this function?
> > >
> > > Sorry, I am not sure I understand your question.
> > > I try to answer your question by explaining the two functions you 
> > > mentioned:
> > >
> > > First, the function pg_get_publication_tables gets the list (see 
> > > table_infos)
> > > that included published table and the corresponding publication. Then 
> > > based
> > > on this list, the function pg_get_publication_tables returns information
> > > (scheme, relname, row filter and column list) about the published tables 
> > > in the
> > > publications list. It just doesn't return pubid.
> > >
> > > Then, the SQL in the function fetch_table_list will get the columns in the
> > > column list from pg_attribute. (This is to return all columns when the 
> > > column
> > > list is not specified)
> > >
> >
> > I meant, for example, if the different publications specified
> > different col-lists for the same table then IIUC the
> > fetch_table_lists() is going to return 2 list elements
> > (schema,rel_name,row_filter,col_list). But when the schema/rel_name
> > are the same for 2 elements then (without also a pubid) how are you
> > going to know where the list element came from, and how come that is
> > not important?
> >
> > > > ~~
> > > >
> > > > test_pub=# create table t1(a int, b int, c int);
> > > > CREATE TABLE
> > > > test_pub=# create publication pub1 for table t1(a) where (a > 99);
> > > > CREATE PUBLICATION
> > > > test_pub=# create publication pub2 for table t1(a,b) where (b < 33);
> > > > CREATE PUBLICATION
> > > >
> > > > Following seems OK when I swap orders of publication names...
> > > >
> > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > > > ARRAY['pub2','pub1']) gpt(relid, attrs, qual);
> > > >  relid | attrs | rowfilter
> > > > ---+---+---
> > > >  16385 | 1 2   | (b < 33)
> > > >  16385 | 1 | (a > 99)
> > > > (2 rows)
> > > >
> > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > > > ARRAY['pub1','pub2']) gpt(relid, attrs, qual);
> > > >  relid | attrs | rowfilter
> > > > ---+---+---
> > > >  16385 | 1 | (a > 99)
> > > >  16385 | 1 2   | (b < 33)
> > > > (2 rows)
> > > >
> > > > But what about this (this is similar to the SQL fragment from
> > > > fetch_table_list); I swapped the pub names but the results are the
> > > > same...
> > > >
> > > > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > > > array_agg(p.pubname)) from pg_publication p where pubname
> > > > IN('pub2','pub1');
> > > >
> > > >  pg_get_publication_tables
> > > >
> > > > -
> > 
> > > > -
> > > > -
> > 
> > > > -
> > > > ---
> > > >  (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > > > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > > > :vartype 23 :vartypmod -1 :var
> > > > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > > > :constbyval true :constisnull false :
> > > > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> > > >  (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > > > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > > > :varattno 2 :vartype 23 :vartypmod -1 :v
> > > > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > > > {CONST :consttype 23 :constty

Re: proposal: possibility to read dumped table's name from file

2022-11-13 Thread Pavel Stehule
ne 13. 11. 2022 v 9:58 odesílatel Julien Rouhaud 
napsal:

> On Sat, Nov 12, 2022 at 09:35:59PM +0100, Pavel Stehule wrote:
>
> Thanks for the updated patch.  Apart from the function comment it looks
> good to
> me.
>
> Justin, did you have any other comment on the patch?
>
> > > I don't fully understand the part about subpatterns, but is that
> necessary
> > > to
> > > describe it?  Simply saying that any valid and possibly-quoted
> identifier
> > > can
> > > be parsed should make it clear that identifiers containing \n
> characters
> > > should
> > > work too.  Maybe also just mention that whitespaces are removed and
> special
> > > care is taken to output routines in exactly the same way calling code
> will
> > > expect it (that is comma-and-single-space type delimiter).
> > >
> >
> > In this case I hit the limits of my English language skills.
> >
> > I rewrote this comment, but it needs more care. Please, can you look at
> it?
>
> I'm also not a native English speaker so I'm far for writing perfect
> comments
> myself :)
>

far better than mine :)

Thank you very much

updated patch attached

Regards

Pavel


>
> Maybe something like
>
> /*
>  * read_pattern - reads on object pattern from input
>  *
>  * This function will parse any valid identifier (quoted or not, qualified
> or
>  * not), which can also includes the full signature for routines.
>  * Note that this function takes special care to sanitize the detected
>  * identifier (removing extraneous whitespaces or other unnecessary
>  * characters).  This is necessary as most backup/restore filtering
> functions
>  * only recognize identifiers if they are written exactly way as they are
>  * regenerated.
>  * Returns a pointer to next character after the found identifier, or NULL
> on
>  * error.
>  */
>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 8b9d9f4cad..d5a6e2c7ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -779,6 +779,80 @@ PostgreSQL documentation
   
  
 
+ 
+  --filter=filename
+  
+   
+Specify a filename from which to read patterns for objects to include
+or exclude from the dump. The patterns are interpreted according to the
+same rules as the corresponding options:
+-t/--table for tables,
+-n/--schema for schemas,
+--include-foreign-data for data on foreign servers and
+--exclude-table-data for table data.
+To read from STDIN, use - as the
+filename.  The --filter option can be specified in
+conjunction with the above listed options for including or excluding
+objects, and can also be specified more than once for multiple filter
+files.
+   
+
+   
+The file lists one object pattern per row, with the following format:
+
+{ include | exclude } { table | schema | foreign_data | table_data } PATTERN
+
+   
+
+   
+The first keyword specifies whether the objects matched by the pattern
+are to be included or excluded. The second keyword specifies the type
+of object to be filtered using the pattern:
+
+ 
+  
+   table: tables, works like
+   -t/--table
+  
+ 
+ 
+  
+   schema: schemas, works like
+   -n/--schema
+  
+ 
+ 
+  
+   foreign_data: data on foreign servers, works like
+   --include-foreign-data. This keyword can only be
+   used with the include keyword.
+  
+ 
+ 
+  
+   table_data: table data, works like
+   --exclude-table-data. This keyword can only be
+   used with the exclude keyword.
+  
+ 
+
+   
+
+   
+Lines starting with # are considered comments and
+ignored. Comments can be placed after filter as well. Blank lines
+are also ignored. See  for how to
+perform quoting in patterns.
+   
+
+   
+Example files are listed below in the 
+section.
+   
+
+  
+ 
+
  
   --if-exists
   
@@ -1119,6 +1193,7 @@ PostgreSQL documentation
 schema (-n/--schema) and
 table (-t/--table) qualifier
 match at least one extension/schema/table in the database to be dumped.
+This also applies to filters used with --filter.
 Note that if none of the extension/schema/table qualifiers find
 matches, pg_dump will generate an error
 even without --strict-names.
@@ -1528,6 +1603,19 @@ CREATE DATABASE foo WITH TEMPLATE template0;
 
 
 $ pg_dump -t "\"MixedCaseName\"" mydb > mytab.sql
+
+
+  
+   To dump all tables with names starting with mytable, except for table
+   mytable2, specify a filter file
+   filter.txt like:
+
+include table mytable*
+exclude table mytable2
+
+
+
+$ pg_dump --filter=filter.txt mydb > db.sql
 
 
  
diff --git a/doc/src

Re: Reducing power consumption on idle servers

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
 wrote:
> The attached patch is a reduced version of the original. It covers only:
> * deprecation of the promote_trigger_file - there are no tests that
> use that, hence why there is no test coverage for the patch
> * changing the sleep time of the startup process to 60s
> * docs and comments

LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
completely idle recovery process looks like:

kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)

Presumably it would have no timeout at all in the next release.

[1] 
https://www.postgresql.org/message-id/calj2acuiyn+zmpguvmgeoy1u7ino2qsvqrnufk8swpvk3a8...@mail.gmail.com




Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Tom Lane
Thomas Munro  writes:
> And with that change and a pgindent, pushed.

There is something very seriously wrong with this patch.

On my machine, running "make -j10 check-world" (with compilation
already done) has been taking right about 2 minutes for some time.
Since this patch, it's taking around 2:45 --- I did a bisect run
to confirm that this patch is where it changed.

The buildfarm is showing a hit, too.  Comparing the step runtimes at

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2022-11-08%2005%3A29%3A28
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2022-11-08%2007%3A49%3A31

it'd seem that most tests involving walreceivers got much slower:
pg_basebackup-check from 00:29 to 00:39,
pg_rewind-check went from 00:56 to 01:26,
and recovery-check went from 03:56 to 04:45.
Curiously, subscription-check only went from 03:26 to 03:29.

I've not dug into it further than that, but my bet is that some
required wakeup condition was not accounted for.

regards, tom lane




Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 11:08 AM Tom Lane  wrote:
> There is something very seriously wrong with this patch.
>
> On my machine, running "make -j10 check-world" (with compilation
> already done) has been taking right about 2 minutes for some time.
> Since this patch, it's taking around 2:45 --- I did a bisect run
> to confirm that this patch is where it changed.
>
> The buildfarm is showing a hit, too.  Comparing the step runtimes at
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2022-11-08%2005%3A29%3A28
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2022-11-08%2007%3A49%3A31
>
> it'd seem that most tests involving walreceivers got much slower:
> pg_basebackup-check from 00:29 to 00:39,
> pg_rewind-check went from 00:56 to 01:26,
> and recovery-check went from 03:56 to 04:45.
> Curiously, subscription-check only went from 03:26 to 03:29.
>
> I've not dug into it further than that, but my bet is that some
> required wakeup condition was not accounted for.

Looking...




Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Nathan Bossart
On Sun, Nov 13, 2022 at 05:08:04PM -0500, Tom Lane wrote:
> There is something very seriously wrong with this patch.
> 
> On my machine, running "make -j10 check-world" (with compilation
> already done) has been taking right about 2 minutes for some time.
> Since this patch, it's taking around 2:45 --- I did a bisect run
> to confirm that this patch is where it changed.

I've been looking into this.  I wrote a similar patch for logical/worker.c
before noticing that check-world was taking much longer.  The problem in
that case seems to be that process_syncing_tables() isn't called as often.
It wouldn't surprise me if there's also something in walreceiver.c that
depends upon the frequent wakeups.  I suspect this will require a revert.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Reducing power consumption on idle servers

2022-11-13 Thread Simon Riggs
On Sun, 13 Nov 2022 at 21:28, Thomas Munro  wrote:
>
> On Mon, Nov 14, 2022 at 5:52 AM Simon Riggs
>  wrote:
> > The attached patch is a reduced version of the original. It covers only:
> > * deprecation of the promote_trigger_file - there are no tests that
> > use that, hence why there is no test coverage for the patch
> > * changing the sleep time of the startup process to 60s
> > * docs and comments
>
> LPGTM.  If we also fix the bogus SIGALRM wakeups[1], then finally a
> completely idle recovery process looks like:
>
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
> kevent(8,0x0,0,{ },1,{ 60.0 })   = 0 (0x0)
>
> Presumably it would have no timeout at all in the next release.
>
> [1] 
> https://www.postgresql.org/message-id/calj2acuiyn+zmpguvmgeoy1u7ino2qsvqrnufk8swpvk3a8...@mail.gmail.com

Clearly, I haven't been watching Hackers! Thanks for the nudge.

See if this does the trick?

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v7.patch
Description: Binary data


Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 11:26 AM Nathan Bossart
 wrote:
> On Sun, Nov 13, 2022 at 05:08:04PM -0500, Tom Lane wrote:
> > There is something very seriously wrong with this patch.
> >
> > On my machine, running "make -j10 check-world" (with compilation
> > already done) has been taking right about 2 minutes for some time.
> > Since this patch, it's taking around 2:45 --- I did a bisect run
> > to confirm that this patch is where it changed.
>
> I've been looking into this.  I wrote a similar patch for logical/worker.c
> before noticing that check-world was taking much longer.  The problem in
> that case seems to be that process_syncing_tables() isn't called as often.
> It wouldn't surprise me if there's also something in walreceiver.c that
> depends upon the frequent wakeups.  I suspect this will require a revert.

In the case of "meson test pg_basebackup/020_pg_receivewal" it looks
like wait_for_catchup hangs around for 10 seconds waiting for HS
feedback.  I'm wondering if we need to go back to high frequency
wakeups until it's caught up, or (probably better) arrange for a
proper event for progress.  Digging...




Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 12:11 PM Thomas Munro  wrote:
> On Mon, Nov 14, 2022 at 11:26 AM Nathan Bossart
>  wrote:
> > On Sun, Nov 13, 2022 at 05:08:04PM -0500, Tom Lane wrote:
> > > There is something very seriously wrong with this patch.
> > >
> > > On my machine, running "make -j10 check-world" (with compilation
> > > already done) has been taking right about 2 minutes for some time.
> > > Since this patch, it's taking around 2:45 --- I did a bisect run
> > > to confirm that this patch is where it changed.
> >
> > I've been looking into this.  I wrote a similar patch for logical/worker.c
> > before noticing that check-world was taking much longer.  The problem in
> > that case seems to be that process_syncing_tables() isn't called as often.
> > It wouldn't surprise me if there's also something in walreceiver.c that
> > depends upon the frequent wakeups.  I suspect this will require a revert.
>
> In the case of "meson test pg_basebackup/020_pg_receivewal" it looks
> like wait_for_catchup hangs around for 10 seconds waiting for HS
> feedback.  I'm wondering if we need to go back to high frequency
> wakeups until it's caught up, or (probably better) arrange for a
> proper event for progress.  Digging...

Maybe there is a better way to code this (I mean, who likes global
variables?) and I need to test some more, but I suspect the attached
is approximately what we missed.
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 8bd2ba37dd..fed2cc6e6f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1080,6 +1080,9 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli)
 	recvFile = -1;
 }
 
+static XLogRecPtr writePtr = 0;
+static XLogRecPtr flushPtr = 0;
+
 /*
  * Send reply message to primary, indicating our current WAL locations, oldest
  * xmin and the current time.
@@ -1096,8 +1099,6 @@ XLogWalRcvClose(XLogRecPtr recptr, TimeLineID tli)
 static void
 XLogWalRcvSendReply(bool force, bool requestReply)
 {
-	static XLogRecPtr writePtr = 0;
-	static XLogRecPtr flushPtr = 0;
 	XLogRecPtr	applyPtr;
 	TimestampTz now;
 
@@ -1334,6 +1335,9 @@ WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now)
 		case WALRCV_WAKEUP_REPLY:
 			if (wal_receiver_status_interval <= 0)
 wakeup[reason] = PG_INT64_MAX;
+			else if (writePtr != LogstreamResult.Write ||
+	 flushPtr != LogstreamResult.Flush)
+wakeup[reason] = now + 10;	/* frequent replies, not yet caught up */
 			else
 wakeup[reason] = now + wal_receiver_status_interval * INT64CONST(100);
 			break;


Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 12:35 PM Thomas Munro  wrote:
> Maybe there is a better way to code this (I mean, who likes global
> variables?) and I need to test some more, but I suspect the attached
> is approximately what we missed.

Erm, ENOCOFFEE, sorry that's not quite right, it needs the apply LSN,
but it demonstrates what the problem is...




Re: CI and test improvements

2022-11-13 Thread Justin Pryzby
On Fri, Nov 04, 2022 at 06:59:46PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2022-11-04 18:54:12 -0500, Justin Pryzby wrote:
> > Subject: [PATCH 1/8] meson: PROVE is not required
> > Subject: [PATCH 3/8] meson: rename 'main' tasks to 'regress' and 'isolation'
> 
> Pushed, thanks for the patches.

Thanks.

> > diff --git a/.cirrus.yml b/.cirrus.yml
> > index 9f2282471a9..99ac09dc679 100644
> > --- a/.cirrus.yml
> > +++ b/.cirrus.yml
> > @@ -451,12 +451,20 @@ task:
> >
> >build_script: |
> >  vcvarsall x64
> > -ninja -C build
> > +ninja -C build |tee build/meson-logs/build.txt
> > +REM Since pipes lose exit status of the preceding command, rerun 
> > compilation,
> > +REM without the pipe exiting now if it fails, rather than trying to 
> > run checks
> > +ninja -C build > nul
> 
> This seems mighty grotty :(. but I guess it's quick enough not worry about,
> and I can't come up with a better plan.
> 
> It doesn't seem quite right to redirect into meson-logs/ to me, my
> interpretation is that that's "meson's namespace".  Why not just store it in
> build/?

I put it there so it'd be included with the build artifacts.
Maybe it's worth adding a separate line to artifacts for stuff like
this, and ccache log ?

> > From e85fe83fd8a4b4c79a96d2bf66cd6a5e1bdfcd1e Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby 
> > Date: Sat, 26 Feb 2022 19:34:35 -0600
> > Subject: [PATCH 5/8] cirrus: build docs as a separate task..

> > +  # Exercise HTML and other docs:
> > +  ninja_docs_build_script: |
> > +mkdir build.ninja
> > +cd build.ninja
> 
> Perhaps build-ninja instead? build.ninja is the filename for ninja's build
> instructions, so it seems a bit confusing.

Sure.

Do you think building docs with both autoconf and meson is what's
desirable here ?

I'm not sure if this ought to be combined with/before/after your "move
compilerwarnings task to meson" patch?  (Regarding that patch: I
mentioned that it shouldn't use ccache -C, and it should use
meson_log_artifacts.)

> > From: Justin Pryzby 
> > Date: Tue, 26 Jul 2022 20:30:02 -0500
> > Subject: [PATCH 6/8] cirrus/ccache: add explicit cache keys..
> >
> > Since otherwise, building with ci-os-only will probably fail to use the
> > normal cache, since the cache key is computed using both the task name
> > and its *index* in the list of caches (internal/executor/cache.go:184).
> 
> Seems like this would potentially better addressed by reporting a bug to the
> cirrus folks?

You said that before, but I don't think so - since they wrote code to do
that, it's odd to file a bug that says that the behavior is wrong.  I am
curious why, but it seems delibrate.

https://www.postgresql.org/message-id/20220828171029.GO2342%40telsasoft.com

> There's enough copies of this to make it worth deduplicating. If we
> use something like fingerprint_script: echo
> ccache/$CIRRUS_BRANCH/$CIRRUS_OS we can use a yaml ref?   
>   
>   
>  
I'll look into it...

> I think you experimented with creating a 'base' ccache dir (e.g. on the master
> branch) and then using branch specific secondar caches?

I have to revisit that sometime.

That's a new feaure in ccache 4.4, which is currently only in macos.
This is another thing that it'd be easier to test by having cfbot
clobber the cirrus.yml rather than committing to postgres repo.
(Technically, it should probably only do use the in-testing cirrus.yml
if the patch it's testing doesn't itself modify .cirrus.yml)

> How did that turn out?  I think cfbot's caches constantly get removed
> due to overrunning the global space.

For cfbot, I don't know if there's much hope that any patch-specific
build artifacts will be cached from the previous run, typically ~24h
prior.

One idea I have, for the "Warnings" task (and maybe linux too), is to
defer pruning until after all the compilations.  To avoid LRU pruning
during early tasks causing bad hit ratios of later tasks.

Another thing that probably happens is that task1 starts compiling
patch1, and then another instance of task1 starts compiling patch2.  A
bit later, the first instance will upload its ccache result for patch1,
which will be summarily overwritten by the second instance's compilation
result, which doesn't include anything from the first instance.

Also, whenever ccache hits its MAXSIZE threshold, it prunes the cache
down to 80% of the configured size, which probably wipes away everything
from all but the most recent ~20 builds.

I also thought about having separate caches for each compilation in the
warnings task - but that requires too much repeated yaml just for that..

> > From: Justin Pryzby 
> > Date: Sun, 3 Apr 2022 00:10:20 -0500
> > Subject: [PATCH 7/8] cirrus/ccache: disable compression and show stats
> >
> > linux/debian/bullseye has 4.2; 99MB cirrus cache; cache_siz

Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Nathan Bossart
On Mon, Nov 14, 2022 at 12:51:14PM +1300, Thomas Munro wrote:
> On Mon, Nov 14, 2022 at 12:35 PM Thomas Munro  wrote:
>> Maybe there is a better way to code this (I mean, who likes global
>> variables?) and I need to test some more, but I suspect the attached
>> is approximately what we missed.
> 
> Erm, ENOCOFFEE, sorry that's not quite right, it needs the apply LSN,
> but it demonstrates what the problem is...

Yeah, I'm able to sort-of reproduce the problem by sending fewer replies,
but it's not clear to me why that's the problem.  AFAICT all of the code
paths that write/flush are careful to send a reply shortly afterward, which
should keep writePtr/flushPtr updated.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




RE: logical replication restrictions

2022-11-13 Thread Takamichi Osumi (Fujitsu)
On Tuesday, November 8, 2022 2:27 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> If you could not allocate a time to discuss this problem because of other
> important tasks or events, we would like to take over the thread and modify
> your patch.
> 
> We've planned that we will start to address comments and reported bugs if
> you would not respond by the end of this week.
Hi,


I've simply rebased the patch to make it applicable on top of HEAD
and make the tests pass. Note there are still open pending comments
and I'm going to start to address those.

I've written Euler as the original author in the commit message
to note his credit.



Best Regards,
Takamichi Osumi



v8-0001-Time-delayed-logical-replication-subscriber.patch
Description: v8-0001-Time-delayed-logical-replication-subscriber.patch


Re: Bitmapsets as Nodes

2022-11-13 Thread Amit Langote
On Mon, Oct 17, 2022 at 6:30 PM Amit Langote  wrote:
> For a couple of patches that I am working on ([1], [2]), I have needed
> to put Bitmapsets into a List that is in turn part of a Plan tree or a
> Node tree that may be written (using outNode) and read (using
> nodeRead).  Bitmapsets not being a Node themselves causes the
> write/read of such Plan/Node trees (containing Bitmapsets in a List)
> to not work properly.
>
> So, I included a patch in both of those threads to add minimal support
> for Bitmapsets to be added into a Plan/Node tree without facing the
> aforementioned problem, though Peter E suggested [3] that it would be
> a good idea to discuss it more generally in a separate thread, so this
> email.  Attached a patch to make the infrastructure changes necessary
> to allow adding Bitmapsets as Nodes, though I have not changed any of
> the existing Bitmapset that are added either to Query or to
> PlannedStmt to use that infrastructure.  That is, by setting their
> NodeTag and changing gen_node_support.pl to emit
> WRITE/READ_NODE_FIELD() instead of WRITE/READ_BITMAPSET_FIELD() for
> any Bitmapsets encountered in a Node tree.  One thing I am not quite
> sure about is who would be setting the NodeTag, the existing routines
> in bitmapset.c, or if we should add wrappers that do.
>
> Actually, Tom had posted about exactly the same thing last year [4],
> though trying to make Bitmapset Nodes became unnecessary after he
> resolved the problem that required making Bitmapsets Nodes by other
> means -- by getting rid of the field that was a List of Bitmapset
> altogether.  Maybe I should try to do the same in the case of both [1]
> and [2]?  In fact, I have tried getting rid of the need for List of
> Bitmapset for [1], and I like the alternative better in that case, but
> for [2], it still seems that a List of Bitmapset may be better than
> List of some-new-Node-containing-the-Bitmapset.

FTR, this has been taken care of in 5e1f3b9ebf6e5.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Suppressing useless wakeups in walreceiver

2022-11-13 Thread Thomas Munro
On Mon, Nov 14, 2022 at 1:05 PM Nathan Bossart  wrote:
> Yeah, I'm able to sort-of reproduce the problem by sending fewer replies,
> but it's not clear to me why that's the problem.  AFAICT all of the code
> paths that write/flush are careful to send a reply shortly afterward, which
> should keep writePtr/flushPtr updated.

Ahh, I think I might have it.  In the old coding, sendTime starts out
as 0, which I think caused it to send its first reply message after
the first 100ms sleep, and only after that fall into a cadence of
wal_receiver_status_interval (10s) while idle.  Our new coding won't
send its first reply until start time + wal_receiver_status_interval.
If I have that right, think we can get back to the previous behaviour
by explicitly setting the first message time, like:

@@ -433,6 +433,9 @@ WalReceiverMain(void)
for (int i = 0; i < NUM_WALRCV_WAKEUPS; ++i)
WalRcvComputeNextWakeup(i, now);

+   /* XXX start with a reply after 100ms */
+   wakeup[WALRCV_WAKEUP_REPLY] = now + 10;
+
/* Loop until end-of-streaming or error */

Obviously that's bogus and racy (it races with wait_for_catchup, and
it's slow, actually both sides are not great and really should be
event-driven, in later work)...




Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment

2022-11-13 Thread John Naylor
On Tue, Nov 8, 2022 at 8:57 AM David Rowley  wrote:

> On Tue, 8 Nov 2022 at 05:24, Andres Freund  wrote:
> > I doubtthere's ever a need to realloc such a pointer? Perhaps we could
just
> > elog(ERROR)?
>
> Are you maybe locked into just thinking about your current planned use
> case that we want to allocate BLCKSZ bytes in each case? It does not
> seem impossible to me that someone will want something more than an
> 8-byte alignment and also might want to enlarge the allocation at some
> point. I thought it might be more dangerous not to implement repalloc.
> It might not be clear to someone using palloc_aligned() that there's
> no code path that can call repalloc on the returned pointer.

I can imagine a use case for arrays of cacheline-sized objects.

> TYPEALIGN() will not work correctly unless the alignment is a power of
> 2. We could modify it to, but that would require doing some modular
> maths instead of bitmasking. That seems pretty horrible when the macro
> is given a value that's not constant at compile time as we'd end up
> with a (slow) divide in the code path.  I think the restriction is a
> good idea. I imagine there will never be any need to align to anything
> that's not a power of 2.

+1

> > Should we handle the case where we get a suitably aligned pointer from
> > MemoryContextAllocExtended() differently?
>
> Maybe it would be worth the extra check. I'm trying to imagine future
> use cases.  Maybe if someone wanted to ensure that we're aligned to
> CPU cache line boundaries then the chances of the pointer already
> being aligned to 64 bytes is decent enough.  The problem is it that
> it's too late to save any memory, it just saves a bit of boxing and
> unboxing of the redirect headers.

To my mind the main point of detecting this case is to save memory, so if
that's not possible/convenient, special-casing doesn't seem worth it.

- Assert((char *) chunk > (char *) block);
+ Assert((char *) chunk >= (char *) block);

Is this related or independent?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Add test module for Custom WAL Resource Manager feature

2022-11-13 Thread Bharath Rupireddy
On Sat, Nov 12, 2022 at 4:40 AM Jeff Davis  wrote:
>
> On Fri, 2022-11-11 at 17:01 +0530, Bharath Rupireddy wrote:
> > Hi,
> >
> > Inspired by recent commits 9fcdf2c, e813e0e and many small test
> > modules/extensions under src/test/modules, I would like to propose
> > one
> > such test module for Custom WAL Resource Manager feature introduced
> > by
> > commit 5c279a6. It not only covers the code a bit, but it also
> > demonstrates usage of the feature.
> >
> > I'm attaching a patch herewith. Thoughts?
>
> Good idea.

Thanks.

> Can we take it a little further to exercise the decoding
> path, as well? For instance, we can do something like a previous proposal[1], 
> except
> it can now be done as an extension. If it's useful, we could even put
> it in contrib with a real RMGR ID.
>
> [1]
> https://www.postgresql.org/message-id/20ee0b0ae6958804a88fe9580157587720faf664.ca...@j-davis.com

We have tests/modules defined for testing logical decoding, no? If the
intention is to define rm_redo in this test module, I think it's not
required.

> Though I'm also fine just adding a test module to start with.

Thanks. I would like to keep it simple.

I've added some more comments and attached v2 patch herewith. Please review.

I've also added a CF entry - https://commitfest.postgresql.org/41/4009/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v2-0001-Add-test-module-for-Custom-WAL-Resource-Manager-f.patch
Description: Binary data


RE: [PATCH] Support % wildcard in extension upgrade filenames

2022-11-13 Thread Regina Obe
> Re: Sandro Santilli
> > I'm attaching an updated version of the patch. This time the patch is
> > tested. Nothing changes unless the .control file for the subject
> > extension doesn't have a "wildcard_upgrades = true" statement.
> >
> > When wildcard upgrades are enabled, a file with a "%" symbol as the
> > "source" part of the upgrade path will match any version and
> 
> Fwiw I believe wildcard_upgrades isn't necessary in the .control file.
> If there are no % files, none would be used anyway, and if there are, it's
clear
> it's meant as wildcard since % won't appear in any remotely sane version
> number.
> 
> Christoph

I also like the idea of skipping the wildcard_upgrades syntax.
Then there is no need to have a conditional control file for PG 16 vs. older
versions.

Thanks,
Regina







Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-13 Thread Michael Paquier
On Sat, Nov 12, 2022 at 04:13:53PM +0800, Julien Rouhaud wrote:
> It's looks good to me.  I agree that file name and line number should be 
> enough
> to diagnose any unexpected error.

Thanks for checking.  I have looked at 0001 and 0002 again with a
fresh mind, and applied both of them this morning.

This makes the remaining bits of the patch much easier to follow in
hba.c.  Here are more comments after a closer review of the whole for
the C logic.

-MemoryContext
-tokenize_auth_file(const char *filename, FILE *file, List **tok_lines,
-  int elevel, int depth)
+static void
+tokenize_file_with_context(MemoryContext linecxt, const char *filename,

I really tend to prefer having one routine rather than two for the
tokenization entry point.  Switching to the line context after setting
up the callback is better, and tokenize_file_with_context() does so.
Anyway, what do you think about having one API that gains a
"MemoryContext *" argument, as of the following:
void tokenize_auth_file(const char *filename, FILE *file,
List **tok_lines,
int depth, int elevel, MemoryContext *linectx)

If the caller passes NULL for *linectx as the initial line context,
just create it as we do now.  If *linectx is not NULL, just reuse it.
That may be cleaner than returning the created MemoryContext as
returned result from tokenize_auth_file().

+/* Cumulate errors if any. */
+if (err_msg)
+{
+if (err_buf.len > 0)
+appendStringInfoChar(&err_buf, '\n');
+appendStringInfoString(&err_buf, err_msg);
+}

This aggregates all the error messages for all the files included in a
given repository.  As the patch stands, it seems to me that we would
get errors related to an include_dir clause for two cases:
- The specified path does not exist, in which case we have only one
err_msg to consume and report back.
- Multiple failures in opening and/or reading included files.
In the second case, aggregating the reports would provide a full set
of information, but that's not something a user would be able to act
on directly as this is system-related.  Or there is a case to know a
full list of files in the case of multiple files that cannot be read
because of permission issues?  We may be fine with just the first
system error here.  Note that in the case where files can be read and
opened, these would have their own TokenizedAuthLines for each line
parsed, meaning one line in the SQL views once translated to an
HbaLine or an IdentLine.

This line of thoughts brings an interesting point, actually: there is
an inconsistency between "include_if_exists" and "include" compared to
the GUC processing.  As of the patch, if we use "include" on a file
that does not exist, the tokenization logic would jump over it and
continue processing the follow-up entries anyway.  This is a different
policy than the GUCs, where we would immediately stop looking at
parameters after an "include" if it fails because its file does not
exist, working as a immediate stop in the processing.  The difference
that the patch brings between "include_if_exists" and "include" is
that we report an error in one case but not the other, still skip the
files in both cases and move on with the rest.  Hence my question,
shouldn't we do like the GUC processing for the hba and ident files,
aka stop immediately when we fail to find a file on an "include"
clause?  This would be equivalent to doing a "break" in
tokenize_file_with_context() after failing an include file.
--
Michael


signature.asc
Description: PGP signature


Time delayed LR (WAS Re: logical replication restrictions)

2022-11-13 Thread Amit Kapila
Hi,

The thread title doesn't really convey the topic under discussion, so
changed it. IIRC, this has been mentioned by others as well in the
thread.

On Sat, Nov 12, 2022 at 7:21 PM vignesh C  wrote:
>
> Few comments:
> 1) I feel if the user has specified a long delay there is a chance
> that replication may not continue if the replication slot falls behind
> the current LSN by more than max_slot_wal_keep_size. I feel we should
> add this reference in the documentation of min_apply_delay as the
> replication will not continue in this case.
>

This makes sense to me.

> 2) I also noticed that if we have to shut down the publisher server
> with a long min_apply_delay configuration, the publisher server cannot
> be stopped as the walsender waits for the data to be replicated. Is
> this behavior ok for the server to wait in this case? If this behavior
> is ok, we could add a log message as it is not very evident from the
> log files why the server could not be shut down.
>

I think for this case, the behavior should be the same as for physical
replication. Can you please check what is behavior for the case you
are worried about in physical replication? Note, we already have a
similar parameter for recovery_min_apply_delay for physical
replication.

-- 
With Regards,
Amit Kapila.




Re: Schema variables - new implementation for Postgres 15

2022-11-13 Thread Sergey Shinderuk

On 13.11.2022 20:59, Pavel Stehule wrote:

fresh rebase


Hello,

Sorry, I haven't been following this thread, but I'd like to report a 
memory management bug. I couldn't apply the latest patches, so I tested 
with v20221104-1-* patches applied atop of commit b0284bfb1db.



postgres=# create variable s text default 'abc';

create function f() returns text as $$
begin
return g(s);
end;
$$ language plpgsql;

create function g(t text) returns text as $$
begin
let s = 'BOOM!';
return t;
end;
$$ language plpgsql;

select f();
CREATE VARIABLE
CREATE FUNCTION
CREATE FUNCTION
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

LOG:  server process (PID 55307) was terminated by signal 11: 
Segmentation fault

DETAIL:  Failed process was running: select f();


I believe it's a use-after-free error, triggered by assigning a new 
value to s in g(), thus making t a dangling pointer.


After reconnecting I get a scary error:

postgres=# select f();
ERROR:  compressed pglz data is corrupt


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/





Re: Time delayed LR (WAS Re: logical replication restrictions)

2022-11-13 Thread Amit Kapila
On Mon, Nov 14, 2022 at 12:14 PM Amit Kapila  wrote:
>
> On Sat, Nov 12, 2022 at 7:21 PM vignesh C  wrote:
> >
> > Few comments:
> > 1) I feel if the user has specified a long delay there is a chance
> > that replication may not continue if the replication slot falls behind
> > the current LSN by more than max_slot_wal_keep_size. I feel we should
> > add this reference in the documentation of min_apply_delay as the
> > replication will not continue in this case.
> >
>
> This makes sense to me.
>
> > 2) I also noticed that if we have to shut down the publisher server
> > with a long min_apply_delay configuration, the publisher server cannot
> > be stopped as the walsender waits for the data to be replicated. Is
> > this behavior ok for the server to wait in this case? If this behavior
> > is ok, we could add a log message as it is not very evident from the
> > log files why the server could not be shut down.
> >
>
> I think for this case, the behavior should be the same as for physical
> replication. Can you please check what is behavior for the case you
> are worried about in physical replication? Note, we already have a
> similar parameter for recovery_min_apply_delay for physical
> replication.
>

I don't understand the reason for the below change in the patch:

+ /*
+ * If this subscription has been disabled and it has an apply
+ * delay set, wake up the logical replication worker to finish
+ * it as soon as possible.
+ */
+ if (!opts.enabled && sub->applydelay > 0)
+ logicalrep_worker_wakeup(sub->oid, InvalidOid);
+

It seems to me Kuroda-San has proposed this change [1] to fix the test
but it is not clear to me why such a change is required. Why can't
CHECK_FOR_INTERRUPTS() after waiting, followed by the existing below
code [2] in LogicalRepApplyLoop() sufficient to handle parameter
updates?

[2]
if (!in_remote_transaction && !in_streamed_transaction)
{
/*
* If we didn't get any transactions for a while there might be
* unconsumed invalidation messages in the queue, consume them
* now.
*/
AcceptInvalidationMessages();
maybe_reread_subscription();
...


[1] - 
https://www.postgresql.org/message-id/TYAPR01MB5866F9716A18DA0C68A2CDB3F5469%40TYAPR01MB5866.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.




Re: Non-decimal integer literals

2022-11-13 Thread John Naylor
On Mon, Oct 10, 2022 at 9:17 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> Taking another look around ecpg to see how this interacts with C-syntax
> integer literals.  I'm not aware of any particular issues, but it's
> understandably tricky.

I can find no discussion in the archives about the commit that added "xch":

commit 6fb3c3f78fbb2296894424f6e3183d339915eac7
Author: Michael Meskes 
Date:   Fri Oct 15 19:02:08 1999 +

*** empty log message ***

...and I can't think of why bounds checking a C literal was done like this.

Regarding the patch, it looks good overall. My only suggestion would be to
add a regression test for just below and just above overflow, at least for
int2.

Minor nits:

- * Process {integer}.  Note this will also do the right thing with
{decimal},
+ * Process {*integer}.  Note this will also do the right thing with
{numeric},

I scratched my head for a while, thinking this was Flex syntax, until I
realized my brain was supposed to do shell-globbing first, at which point I
could see it was referring to multiple Flex rules. I'd try to rephrase.

+T661 Non-decimal integer literals YES SQL:202x draft

Is there an ETA yet?

Also, it's not this patch's job to do it, but there are at least a half
dozen places that open-code turning a hex char into a number. It might be a
good easy "todo item" to unify that.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Making Bitmapsets be valid Nodes

2022-11-13 Thread Amit Langote
On Mon, Nov 14, 2022 at 12:23 AM Tom Lane  wrote:
> Peter Eisentraut  writes:
> > On 11.11.22 21:05, Tom Lane wrote:
> >> Attached is a cleaned-up version of Amit's patch v24-0003 at [2].
> >> I fixed the problems with not always tagging Bitmapsets, and changed
> >> the outfuncs/readfuncs logic so that Bitmapsets still print exactly
> >> as they did before (thus, this doesn't require a catversion bump).
>
> > This looks good to me.
>
> Pushed, thanks for looking.

Thanks a lot for this.

I agree that it may not be worthwhile to add an extra function call by
changing COPY_BITMAPSET_FIELD, etc. that is currently emitted by
gen_node_support.pl for any Bitmapset * / Relid struct members to
COPY_NODE_FIELD, etc.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: ExecRTCheckPerms() and many prunable partitions

2022-11-13 Thread Amit Langote
On Sat, Nov 12, 2022 at 1:46 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2022-Oct-06, Amit Langote wrote:
> >> Actually, List of Bitmapsets turned out to be something that doesn't
> >> just-work with our Node infrastructure, which I found out thanks to
> >> -DWRITE_READ_PARSE_PLAN_TREES.  So, I had to go ahead and add
> >> first-class support for copy/equal/write/read support for Bitmapsets,
>
> > Hmm, is this related to what Tom posted as part of his 0004 in
> > https://postgr.es/m/2901865.1667685...@sss.pgh.pa.us
>
> It could be.  For some reason I thought that Amit had withdrawn
> his proposal to make Bitmapsets be Nodes.

I think you are referring to [1] that I had forgotten to link to here.
I did reimplement a data structure in my patch on the "Re: generic
plans and initial pruning" thread to stop using a List of Bitmapsets,
so the Bitmapset as Nodes functionality became unnecessary there,
though I still need it for the proposal here to move
extraUpdatedColumns (patch 0004) into ModifyTable node.

> The code I was using that for would rather have fixed-size arrays
> of Bitmapsets than variable-size Lists, mainly because it always
> knows ab initio what the max length of the array will be.  But
> I don't think that the preference is so strong that it justifies
> a private data structure.
>
> The main thing I was wondering about in connection with that
> was whether to assume that there could be other future applications
> of the logic to perform multi-bitmapset union, intersection,
> etc.  If so, then I'd be inclined to choose different naming and
> put those functions in or near to bitmapset.c.  It doesn't look
> like Amit's code needs anything like that, but maybe somebody
> has an idea about other applications?

Yes, simple storage of multiple Bitmapsets in a List somewhere in a
parse/plan tree sounded like that would have wider enough use to add
proper node support for.   Assuming you mean trying to generalize
VarAttnoSet in your patch 0004 posted at [2], I wonder if you want to
somehow make its indexability by varno / RT index a part of the
interface of the generic code you're thinking for it?  For example,
varattnoset_*_members collection of routines in that patch seem to
assume that the Bitmapsets at a given index in the provided pair of
VarAttnoSets are somehow related -- covering to the same base relation
in this case.  That does not sound very generalizable but maybe that
is not what you are thinking at all.

> Anyway, I concur with Peter's upthread comment that making
> Bitmapsets be Nodes is probably justifiable all by itself.
> The lack of a Node tag in them now is just because in a 32-bit
> world it seemed like unnecessary bloat.  But on 64-bit machines
> it's free, and we aren't optimizing for 32-bit any more.
>
> I do not like the details of v24-0003 at all though, because
> it seems to envision that a "node Bitmapset" is a different
> thing from a raw Bitmapset.  That can only lead to bugs ---
> why would we not make it the case that every Bitmapset is
> properly labeled with the node tag?

Yeah, I just didn't think hard enough to realize that having
bitmapset.c itself set the node tag is essentially free, and it looks
like a better design anyway than the design where callers get to
choose to make the bitmapset they are manipulating a Node or not.

> Also, although I'm on board with making Bitmapsets be Nodes,
> I don't think I'm on board with changing their dump format.
> Planner node dumps would get enormously bulkier and less
> readable if we changed things like
>
>:relids (b 1 2)
>
> to
>
>:relids
>   {BITMAPSET
>(b 1 2)
>   }
>
> or whatever the output would look like as the patch stands.
> So that needs a bit more effort, but it's surely manageable.

Agreed with leaving the dump format unchanged or not bloating it.

Thanks a lot for 5e1f3b9ebf6e5.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

[1] 
https://www.postgresql.org/message-id/CA+HiwqG8L3DVoZauJi1-eorLnnoM6VcfJCCauQX8=ofi-qm...@mail.gmail.com
[2] https://www.postgresql.org/message-id/2901865.1667685211%40sss.pgh.pa.us




Re: Allow file inclusion in pg_hba and pg_ident files

2022-11-13 Thread Julien Rouhaud
Hi,

On Mon, Nov 14, 2022 at 02:40:37PM +0900, Michael Paquier wrote:
> On Sat, Nov 12, 2022 at 04:13:53PM +0800, Julien Rouhaud wrote:
> > It's looks good to me.  I agree that file name and line number should be 
> > enough
> > to diagnose any unexpected error.
>
> Thanks for checking.  I have looked at 0001 and 0002 again with a
> fresh mind, and applied both of them this morning.

Thanks a lot!

> This makes the remaining bits of the patch much easier to follow in
> hba.c.  Here are more comments after a closer review of the whole for
> the C logic.

Agreed.

> -MemoryContext
> -tokenize_auth_file(const char *filename, FILE *file, List **tok_lines,
> -  int elevel, int depth)
> +static void
> +tokenize_file_with_context(MemoryContext linecxt, const char *filename,
>
> I really tend to prefer having one routine rather than two for the
> tokenization entry point.  Switching to the line context after setting
> up the callback is better, and tokenize_file_with_context() does so.
> Anyway, what do you think about having one API that gains a
> "MemoryContext *" argument, as of the following:
> void tokenize_auth_file(const char *filename, FILE *file,
> List **tok_lines,
> int depth, int elevel, MemoryContext *linectx)
>
> If the caller passes NULL for *linectx as the initial line context,
> just create it as we do now.  If *linectx is not NULL, just reuse it.
> That may be cleaner than returning the created MemoryContext as
> returned result from tokenize_auth_file().

I originally used two functions as it's a common pattern (e.g. ReadBuffer /
ReadBufferExtended or all the *_internal versions of functions) and to avoid
unnecessary branch, but I agree that here having an extra branch is unlikely to
make any measurable difference.  It would only matter on -DEXEC_BACKEND /
win32, and in that case the extra overhead (over an already expensive new
backend mechanism) is way more than that, so +1 to keep things simple here.

> +/* Cumulate errors if any. */
> +if (err_msg)
> +{
> +if (err_buf.len > 0)
> +appendStringInfoChar(&err_buf, '\n');
> +appendStringInfoString(&err_buf, err_msg);
> +}
>
> This aggregates all the error messages for all the files included in a
> given repository.  As the patch stands, it seems to me that we would
> get errors related to an include_dir clause for two cases:
> - The specified path does not exist, in which case we have only one
> err_msg to consume and report back.
> - Multiple failures in opening and/or reading included files.
> In the second case, aggregating the reports would provide a full set
> of information, but that's not something a user would be able to act
> on directly as this is system-related.  Or there is a case to know a
> full list of files in the case of multiple files that cannot be read
> because of permission issues?  We may be fine with just the first
> system error here.  Note that in the case where files can be read and
> opened, these would have their own TokenizedAuthLines for each line
> parsed, meaning one line in the SQL views once translated to an
> HbaLine or an IdentLine.

If you have an include_dir directive and multiple files have wrong permission
(or maybe broken symlink or something like that), you will get multiple errors
when trying to process that single directive.  I think it's friendlier to
report as much detail as we can, so users can make sure they fix everything
rather than iterating over the first error.  That's especially helpful if the
fix is done in some external tooling (puppet or whatever) rather than directly
on the server.
>
> This line of thoughts brings an interesting point, actually: there is
> an inconsistency between "include_if_exists" and "include" compared to
> the GUC processing.  As of the patch, if we use "include" on a file
> that does not exist, the tokenization logic would jump over it and
> continue processing the follow-up entries anyway.  This is a different
> policy than the GUCs, where we would immediately stop looking at
> parameters after an "include" if it fails because its file does not
> exist, working as a immediate stop in the processing.  The difference
> that the patch brings between "include_if_exists" and "include" is
> that we report an error in one case but not the other, still skip the
> files in both cases and move on with the rest.  Hence my question,
> shouldn't we do like the GUC processing for the hba and ident files,
> aka stop immediately when we fail to find a file on an "include"
> clause?  This would be equivalent to doing a "break" in
> tokenize_file_with_context() after failing an include file.

I think that the problem is that we have the same interface for processing the
files on a startup/reload and for filling the views, so if we want the views to
be helpful and report all errors we have to also allow a bogus "include" to
continue in the reload case too.  The same problem doe