date:20220324

Re: Reducing power consumption on idle servers

2022-03-24 Thread Kyotaro Horiguchi

At Thu, 10 Mar 2022 11:45:10 -0800, Andres Freund  wrote in 
> Hi,
> 
> On 2022-03-10 17:50:47 +, Simon Riggs wrote:
> > On Wed, 9 Mar 2022 at 01:16, Zheng Li  wrote:
> > 
> > > > 1. Standardize the hibernation time at 60s, using a #define
> > > > HIBERNATE_DELAY_SEC 60
> > >
> > > I notice in patch 3 HIBERNATE_DELAY_SEC has been increased to 300
> > > seconds, what’s the reasoning behind it? Is longer hibernation delay
> > > better? If so can we set it to INT_MAX (the max timeout allowed by
> > > WaitLatch()) in which case a worker in hibernation only relies on
> > > wakeup? I think it would be nice to run experiments to verify that the
> > > patch reduces power consumption while varying the value of
> > > HIBERNATE_DELAY_SEC.
> > 
> > Setting it to INT_MAX would be the same as not allowing a timeout,
> > which changes a lot of current behavior and makes it less robust.
> 
> Most of these timeouts are a bad idea and should not exist. We repeatedly have
> had bugs where we were missing wakeups etc but those bugs were harder to

I basically agree to this.

> notice because of timeouts.  I'm against entrenching this stuff further.

For example, pgarch.c theoretically doesn't need hibernate, since it
has an apparent trigger event and a terminal condition. What we need
to do about archiver is to set timeout only when it didn't reach the
lastest finished segment at an iteration. (this might need additional
shared memory use, though..)

I'm not sure about bgwriter, walwriter and logical replication stuff...

About walreciver, 

-   if ((now - startTime) > WALRCV_STARTUP_TIMEOUT)
+   if ((now - startTime) > wal_receiver_timeout)

This is simply a misuse of the timeout. WALRCV_STARTP_TIMEOUT is not
used for a sleep and irrelevant to receiving data from the peer
walsender.  wal_receiver_timeout's default is 60s but we don't assume
that walreceiver takes that long time to start up. (I think Thomas is
wroking on the walreceiver timeout stuff.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Summary Sort workers Stats in EXPLAIN ANALYZE

2022-03-24 Thread Jian Guo

For a simple demo, with this explain statement：

-- Test sort stats summary
set force_parallel_mode=on;
select explain_filter('explain (analyze, summary off, timing off, costs off, 
format json) select * from tenk1 order by unique1');

Before this patch, we got plan like this:


   "Node Type": "Sort",+
   "Parent Relationship": "Outer", +
   "Parallel Aware": false,+
   "Async Capable": false, +
   "Actual Rows": 1,   +
   "Actual Loops": 1,  +
   "Sort Key": ["unique1"],+
   "Workers": [+
 { +
   "Worker Number": 0, +
   "Sort Method": "external merge",+
   "Sort Space Used": 2496,+
   "Sort Space Type": "Disk"   +
 } +
   ],  +



After this patch, the effected plan is this:

   "Node Type": "Sort",   +
   "Parent Relationship": "Outer",+
   "Parallel Aware": false,   +
   "Async Capable": false,+
   "Actual Rows": N,  +
   "Actual Loops": N, +
   "Sort Key": ["unique1"],   +
   "Workers planned": N,  +
   "Sort Method": "external merge",   +
   "Average Sort Space Used": N,  +
   "Peak Sort Space Used": N, +
   "Sort Space Type": "Disk", +

From: Jian Guo 
Sent: Monday, March 21, 2022 15:50
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Re: Summary Sort workers Stats in EXPLAIN ANALYZE

There is some problem with the last patch, I have removed the 
`ExplainOpenWorker` call to fix.

And also, I have added a test case in explain.sql according to the code change.

From: Jian Guo 
Sent: Monday, March 21, 2022 11:36
To: pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Summary Sort workers Stats in EXPLAIN ANALYZE


In current EXPLAIN ANALYZE implementation, the Sort Node stats from each 
workers are not summarized: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2762

When the worker number is large, it will print out huge amount of node details 
in the plan. I have created this patch to summarize the tuplesort stats by 
AverageSpaceUsed / PeakSpaceUsed, make it behave just like in 
`show_incremental_sort_group_info()`: 
https://github.com/postgres/postgres/blob/d4ba8b51c76300f06cc23f4d8a41d9f7210c4866/src/backend/commands/explain.c#L2890

Re: Allow file inclusion in pg_hba and pg_ident files

2022-03-24 Thread Michael Paquier

On Wed, Mar 23, 2022 at 01:14:02PM +0900, Michael Paquier wrote:
> On Wed, Mar 23, 2022 at 10:16:34AM +0800, Julien Rouhaud wrote:
>> Yeah, I thought about it but didn't rename it given your concerns about git
>> history.  I'm fine either way.
> 
> Oh, OK.  The amount of diffs created by 0001 is still fine to grab
> even with the struct rename, so let's stick with it.

And so, this first part has been applied as of d4781d8.  I'll try to
look at 0002 shortly.
--
Michael


signature.asc
Description: PGP signature

Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second

2022-03-24 Thread Michail Nikolaev

Hello, Peter.

>> * Add code to _bt_killitems() that detects if it has generated an FPI,
>> just to set some LP_DEAD bits.
>> * Instead of avoiding the FPI when this happens, proactively call
>> _bt_simpledel_pass() just before _bt_killitems() returns. Accept the
>> immediate cost of setting an LP_DEAD bit, just like today, but avoid
>> repeated FPIs.

> Yes, I am almost sure proactively calling of_bt_simpledel_pass() will
> positively impact the system on many workloads. But also I am almost
> sure it will not change the behavior of the incident I mention -
> because it is not related to multiple checkpoints.

I just realized what it seems to be dangerous approache because of
locking mechanism.
Currently _bt_killitems requires only read lock but _bt_simpledel_pass
required write lock (it ends with _bt_delitems_delete).
It will required to increase locking mode in order to call _bt_simpledel_pass.

Such a change may negatively affect many workloads because of write
lock during scanning - and it is really hard to to prove absence of
regression (have no idea how).

Thanks,
Michail.

Re: Allow file inclusion in pg_hba and pg_ident files

2022-03-24 Thread Julien Rouhaud

On Thu, Mar 24, 2022 at 04:50:31PM +0900, Michael Paquier wrote:
> On Wed, Mar 23, 2022 at 01:14:02PM +0900, Michael Paquier wrote:
> > On Wed, Mar 23, 2022 at 10:16:34AM +0800, Julien Rouhaud wrote:
> >> Yeah, I thought about it but didn't rename it given your concerns about git
> >> history.  I'm fine either way.
> > 
> > Oh, OK.  The amount of diffs created by 0001 is still fine to grab
> > even with the struct rename, so let's stick with it.
> 
> And so, this first part has been applied as of d4781d8.  I'll try to
> look at 0002 shortly.

Thanks!

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-24 Thread Kyotaro Horiguchi

At Wed, 23 Mar 2022 23:52:06 +, Jacob Champion  wrote 
in 
> On Wed, 2022-03-23 at 14:20 +0900, Kyotaro Horiguchi wrote:
> > I tried to write out the doc part.  What do you think about it?
> 
> I like it, thanks! I've applied that in v10, with a tweak to two
> iPAddress spellings and a short expansion of the condition in the Note,
> and I've added you as a co-author to 0002.

I'm fine with it. Thanks. I marked it as Ready-for-Commiter.

Note for the patch set:

0001 is preliminary patch to move inet_pton out of src/backend tree. 

0002 is the main patch of this patchset

0003 is optional, which introduces pg_inet_pton() only works for IPv6
 addresses. 0002 gets the same effect by the following use of
 pg_inet_net_pton().

  > if (!strchr(host, '/')
  > && pg_inet_net_pton(PGSQL_AF_INET6, host, addr, -1) == 128)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Assert in pageinspect with NULL pages

2022-03-24 Thread Julien Rouhaud

Hi,

On Wed, Mar 23, 2022 at 05:16:41PM +0900, Michael Paquier wrote:
> On Fri, Mar 18, 2022 at 06:43:39AM +0300, Alexander Lakhin wrote:
> > Hello Michael,
> > No, just x86_64, Ubuntu 20.04, gcc 11, valgrind 3.15. I just put that query
> > in page.sql and see the server abort.
> 
> Bah, of course, I have missed the valgrind part of the problem.
> 
> The problem is that we attempt to verify a heap page here but its
> pd_special is BLCKSZ.  This causes BrinPageType() to grab back a
> position of the area at the exact end of the page, via
> PageGetSpecialPointer(), hence the failure in reading two bytes
> outside the page.  The result here is that the set of defenses in
> verify_brin_page() is poor: we should at least make sure that the
> special area is available for a read.  As far as I can see, this is
> also possible in bt_page_items_bytea(), gist_page_opaque_info(),
> gin_metapage_info() and gin_page_opaque_info().  All those code paths
> should be protected with some checks on PageGetSpecialSize(), I
> guess, before attempting to read the special area of the page.  Hash
> indexes are protected by checking the expected size of the special
> area, and one code path of btree relies on the opened relation to be a
> btree index.

I worked on a patch to fix the problem.  The functions you mention were indeed
missing some check, but after digging a bit more I found that other problem
existed.  For instance, feeding a btree page to a gist_page_items_bytea() (both
pages have the same special size) can be quite problematic too, as you can end
up accessing bogus data when retrieving the items.  I also added additional
sanity checks with what is available (gist_page_id for gist, zero-level for
btree leaf pages and so on), and tried to cover everything with tests.

I'm a bit worried about the btree tests stability.  I avoid emitting the level
found to help with that, but it still depends on what other AM will put in
their special page.
From fa2841b83a57daf8de95e8b154afc2bf5dd60dda Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 24 Mar 2022 15:51:21 +0800
Subject: [PATCH v1] pageinspect: add more sanity checks when accessing pages.

As seen in bug #16527, feeding an incorrect page can lead to ouf-of-bound
reads when trying to access the page special area.  To fix it, check that the
special area has the expected size before trying to access in in the functions
that weren't already doing so.

While at it, add other checks when possible to validate that the given page is
from the expected format to avoid other possible invalid access.

Author: Julien Rouhaud
Reported-By: Alexander Lakhin
Discussion: https://postgr.es/m/16527-ef7606186f0610a1%40postgresql.org
Discussion: https://postgr.es/m/150535d6-65ac-16ee-b28a-851965fc485b%40gmail.com
---
 contrib/pageinspect/brinfuncs.c| 18 +
 contrib/pageinspect/btreefuncs.c   | 16 
 contrib/pageinspect/expected/brin.out  | 12 +
 contrib/pageinspect/expected/btree.out | 23 +++--
 contrib/pageinspect/expected/gin.out   |  9 +++
 contrib/pageinspect/expected/gist.out  | 11 
 contrib/pageinspect/expected/hash.out  | 14 +++
 contrib/pageinspect/ginfuncs.c | 16 
 contrib/pageinspect/gistfuncs.c| 35 ++
 contrib/pageinspect/sql/brin.sql   |  5 
 contrib/pageinspect/sql/btree.sql  | 13 --
 contrib/pageinspect/sql/gin.sql|  3 +++
 contrib/pageinspect/sql/gist.sql   |  4 +++
 contrib/pageinspect/sql/hash.sql   | 11 
 14 files changed, 186 insertions(+), 4 deletions(-)

diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index bf12901ac3..d4de80eee5 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -58,6 +58,15 @@ brin_page_type(PG_FUNCTION_ARGS)
 
page = get_page_from_raw(raw_page);
 
+   /* verify the special space has the expected size */
+   if (PageGetSpecialSize(page) != MAXALIGN(sizeof(BrinSpecialSpace)))
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("input page is not a valid brin 
page"),
+errdetail("Special size %d, expected 
%d",
+  (int) 
PageGetSpecialSize(page),
+  (int) 
MAXALIGN(sizeof(BrinSpecialSpace);
+
switch (BrinPageType(page))
{
case BRIN_PAGETYPE_META:
@@ -86,6 +95,15 @@ verify_brin_page(bytea *raw_page, uint16 type, const char 
*strtype)
 {
Pagepage = get_page_from_raw(raw_page);
 
+   /* verify the special space has the expected size */
+   if (PageGetSpecialSize(page) != sizeof(BrinSpecialSpace))
+   ereport(ERROR,
+

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread Thomas Munro

On Tue, Mar 22, 2022 at 12:01 PM David Christensen
 wrote:
> Enclosed is v6, incorporating these fixes and docs tweaks.

Thanks!

I made a couple of minor changes in the docs, to wit: fixed
copy/paste-o "-F block" -> "-F fork", fork names didn't have initial
caps elsewhere, tablespace is better represented by
tblspc than tbl,
some minor wording changes to avoid constructions with "filter" where
it seemed to me a little ambiguous whether that means something is
included or excluded, and some other wording changes for consistency
with nearby paragraphs.

And... pushed.

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Daniel Gustafsson

> On 24 Mar 2022, at 06:17, Kyotaro Horiguchi  wrote:

> The comment and errmsg_plural don't seem to be consistent.  When the
> code was added by c4f2a0458d, it had only singular form and already
> had the comment.  After that 8032d76b5 turned it to errmsg_plural
> ignoring the comment.  It seems like a thinko of 8032d76b5.

Following the bouncing ball, that seems like a reasonable conclusion, and
removing the plural form should be fine to reduce translator work.

--
Daniel Gustafsson   https://vmware.com/

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-24 Thread Bharath Rupireddy

On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
 wrote:
> +void
> +XLogRecGetBlockRefInfo(XLogReaderState *record, char *delimiter,
> +  uint32 *fpi_len, bool 
> detailed_format,
> +  StringInfo buf)
> ...
> +   if (detailed_format && delimiter)
> +   appendStringInfoChar(buf, '\n');
>
> It is odd that the variable "delimiter" is used as a bool in the
> function, though it is a "char *", which I meant that it is used as
> delimiter string (assuming that you might want to insert ", " between
> two blkref descriptions).

I'm passing NULL if the delimiter isn't required (for pg_walinspect)
and I wanted to check if it's passed, so I was using the delimiter in
the condition. However, I now changed it to delimiter != NULL.

> +get_forkname(ForkNumber num)
>
> forkNames[] is public and used in reinit.c.  I think we don't need
> this function.

Yes. I removed it.

> +#define MAX_XLINFO_TYPES 16
> ...
> +   XLogRecStatsrmgr_stats[RM_NEXT_ID];
> +   XLogRecStatsrecord_stats[RM_NEXT_ID][MAX_XLINFO_TYPES];
> +} XLogStats;
> +
>
> This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> a part of xlogreader, the xlogstats looks like a better place.

I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
structures, 1 macro, and 2 functions with no plan to add new stats
structures or functions. Since xlogreader is the one that reads the
WAL, and is being included by both backend and other modules (tools
and extensions) IMO it's the right place. However, I can specify in
xlogreader that if at all the new stats related structures or
functions are going to be added, it's good to move them into a new
header and .c file.

Thoughts?

> +#define XLOG_GET_STATS_PERCENTAGE(n_pct, rec_len_pct, fpi_len_pct, \
> + 
> tot_len_pct, total_count, \
>
> It doesn't need to be a macro. However in the first place I don't
> think it is useful to have.  Rather it may be harmful since it doesn't
> reduce complexity much but instead just hides details.  If we want to
> avoid tedious repetitions of the same statements, a macro like the
> following may work.
>
> #define CALC_PCT (num, denom) ((denom) == 0 ? 0.0 ? 100.0 * (num) / (denom))
> ...
> >   n_pct = CALC_PCT(n, total_count);
> >   rec_len_pct = CALC_PCT(rec_len, total_rec_len);
> >   fpi_len_pct = CALC_PCT(fpi_len, total_fpi_len);
> >   tot_len_pct = CALC_PCT(tot_len, total_len);
>
> But it is not seem that different if we directly write out the detail.
>
> >   n_pct = (total_count == 0 ? 0 : 100.0 * n / total_count);
> >   rec_len_pct = (total_rec_len == 0 ? 0 : 100.0 * rec_len / total_rec_len);
> >   fpi_len_pct = (total_fpi_len == 0 ? 0 : 100.0 * fpi_len / total_fpi_len);
> >   tot_len_pct = (total_len == 0 ? 0 : 100.0 * tot_len / total_len);

I removed the XLOG_GET_STATS_PERCENTAGE macro.

Attaching v14 patch-set here with. It has bunch of other changes along
with the above:

1) Used STRICT for all the functions and introduced _till_end_of_wal
versions for pg_get_wal_records_info and pg_get_wal_stats.
2) Most of the code is reused between pg_walinspect and pg_waldump and
also within pg_walinspect.
3) Added read_local_xlog_page_no_wait without duplicating the code so
that the pg_walinspect functions don't wait even while finding the
first valid WAL record.
4) No function waits for future WAL lsn even to find the first valid WAL record.
5) Addressed the review comments raised upthread by Andres.

I hope this version makes the patch cleaner, please review it further.

Regards,
Bharath Rupireddy.

v14-0001-Refactor-pg_waldump-code.patch
Description: Binary data

v14-0002-pg_walinspect.patch
Description: Binary data

v14-0003-pg_walinspect-tests.patch
Description: Binary data

v14-0004-pg_walinspect-docs.patch
Description: Binary data

Re: add checkpoint stats of snapshot and mapping files of pg_logical dir

2022-03-24 Thread Bharath Rupireddy

On Thu, Mar 24, 2022 at 8:58 AM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 23, 2022 at 10:16 AM Bharath Rupireddy
>  wrote:
> >
> > On Tue, Mar 22, 2022 at 8:12 PM Andres Freund  wrote:
> > > > Do you mean like this?
> > > > ereport(LOG,
> > > > /* translator: the placeholders show checkpoint options */
> > > > (errmsg("%s starting:%s%s%s%s%s%s%s%s",
> > > > restartpoint ? _("restartpoint") : _("checkpoint"),
> > > > (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
> > > > (flags & CHECKPOINT_END_OF_RECOVERY) ? "
> > > > end-of-recovery" : "",
> > > > (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
> > > > (flags & CHECKPOINT_FORCE) ? " force" : "",
> > > > (flags & CHECKPOINT_WAIT) ? " wait" : "",
> > > > (flags & CHECKPOINT_CAUSE_XLOG) ? " wal" : "",
> > > > (flags & CHECKPOINT_CAUSE_TIME) ? " time" : "",
> > > > (flags & CHECKPOINT_FLUSH_ALL) ? " flush-all" : 
> > > > "")));
> > >
> > > Yes.
> >
> > Done that way, see
> > v7-0001-Deduplicate-checkpoint-restartpoint-starting-comp.patch.
> >
> > > > I think the reason in this case might be that some flag names with 
> > > > hyphens
> > > > and spaces before words may not have the right/matching words in all
> > > > languages. What happens if we choose to translate/not translate the 
> > > > entire
> > > > message?
> > >
> > > If individual words aren't translated the "original" word would be used.
> >
> > Interestingly, the translated message for "checkpoint/restart
> > complete" is empty. Maybe because it has untranslatable strings?
> >
> > #: access/transam/xlog.c:8752
> > #, c-format
> > msgid "restartpoint complete: wrote %d buffers (%.1f%%); %d WAL
> > file(s) added, %d removed, %d recycled; write=%ld.%03d s,
> > sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, longest=%ld.%03d s,
> > average=%ld.%03d s; distance=%d kB, estimate=%d kB"
> > msgstr ""
> >
> > > > > Both seem still very long. I still am doubtful this level of detail is
> > > > > appropriate. Seems more like a thing for a tracepoint or such. How 
> > > > > about just
> > > > > printing the time for the logical decoding operations in aggregate, 
> > > > > without
> > > > > breaking down into files, adding LSNs etc?
> > > >
> > > > The distinction that the patch makes right now is for snapshot and
> > > > rewrite mapping files and it makes sense to have them separately.
> > >
> > > -1. The line also needs to be readable...
> >
> > IMHO, that's subjective. Even now, the existing
> > "checkpoint/restartpoint complete" message has a good amount of info
> > which makes it unreadable for some.
> >
> > The number of logical decoding files(snapshot and mapping) the
> > checkpoint processed is a good metric to have in server logs along
> > with the time it took for removing/syncing them. Thoughts?
>
> Realized that the CF bot is applying patches in the alphabetical order
> (I've sent out both v7 patches as v7-0001). Attaching v8 patch-set
> with v8-0001 and v8-0002 names. Apart from this, no change in v8.

CF bot is complaining
https://cirrus-ci.com/task/6458067389775872?logs=mingw_cross_warning#L399.

Attaching v9 patch-set to fix it.

Regards,
Bharath Rupireddy.


v9-0001-Deduplicate-checkpoint-restartpoint-starting-comp.patch
Description: Binary data


v9-0002-Add-checkpoint-stats-of-snapshot-and-mapping-file.patch
Description: Binary data

Re: Support logical replication of DDLs

2022-03-24 Thread Dilip Kumar

On Mon, Mar 21, 2022 at 1:43 PM Dilip Kumar  wrote:
>
> On Thu, Mar 17, 2022 at 2:47 AM Zheng Li  wrote:
> >
> > Hi,
> >
> > >If you don't mind, would you like to share the POC or the branch for this 
> > >work?
> >
> > The POC patch is attached. It currently supports the following 
> > functionalities:
>
> Thanks for sharing, I will look into it.
>
> > >In such cases why don't we just log the table creation WAL for DDL
> > >instead of a complete statement which creates the table and inserts
> > >the tuple? Because we are already WAL logging individual inserts and
> > >once you make sure of replicating the table creation I think the exact
> > >data insertion on the subscriber side will be taken care of by the
> > >insert WALs no?
> >
> > The table creation WAL and table insert WAL are available. The tricky
> > part is how do we break down this command into two parts (a normal
> > CREATE TABLE followed by insertions) either from the parsetree or the
> > WALs. I’ll have to dig more on this.
>
> I agree that this is a bit tricky, anyway I will also put more thoughts on 
> this.

I had put some more thought about this, basically, during CTAS we are
generating the CreateStmt inside "create_ctas_internal" and executing
it first before inserting the tuple, so can't we generate the
independent sql just for creating the tuple maybe using deparsing or
something?

Apart from that I have one more question, basically if you are
directly logging the sql query then how you are identifying under
which schema you need to create that table, are you changing the sql
and generating schema-qualified name?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: proposal - psql - use pager for \watch command

2022-03-24 Thread Thomas Munro

On Wed, Jul 14, 2021 at 6:06 AM Pavel Stehule  wrote:
> út 13. 7. 2021 v 19:50 odesílatel Tom Lane  napsal:
>> After playing with this along the way to fixing the sigwait issues,
>> I have a gripe/suggestion.  If I hit control-C while the thing
>> is waiting between queries, eg
>>
>> regression=# select now() \watch
>> Tue Jul 13 13:44:44 2021 (every 2s)
>>
>>   now
>> ---
>>  2021-07-13 13:44:44.396565-04
>> (1 row)
>>
>> Tue Jul 13 13:44:46 2021 (every 2s)
>>
>>   now
>> ---
>>  2021-07-13 13:44:46.396572-04
>> (1 row)
>>
>> ^Cregression=#
>>
>> then as you can see I get nothing but the "^C" echo before the next
>> psql prompt.  The problem with this is that now libreadline is
>> misinformed about the cursor position, messing up any editing I
>> might try to do on the next line of input.  So I think it would
>> be a good idea to have some explicit final output when the \watch
>> command terminates, along the line of
>>
>> ...
>> Tue Jul 13 13:44:46 2021 (every 2s)
>>
>>   now
>> ---
>>  2021-07-13 13:44:46.396572-04
>> (1 row)
>>
>> ^C\watch cancelled
>> regression=#
>>
>> This strikes me as a usability improvement even without the
>> readline-confusion angle.
>
> I'll look at this issue.

Hi Pavel,

Do you have a patch for this?

turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Alvaro Herrera

This patch should silence some recent Coverity (false positive)
complaints about assertions contained in these macros.

Portability testing at:
https://cirrus-ci.com/github/alvherre/postgres/macros-to-inlinefuncs

Intend to push later today, unless something ugly happens.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 215eb05b368c202fce71efe242b58f0fe7025b38 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 24 Mar 2022 10:40:21 +0100
Subject: [PATCH] Change fastgetattr and heap_getattr to inline functions

They were macros previously, but recent callsite additions made Coverity
complain about one of the assertions being always true.  This change
could have been made a long time ago, but the Coverity complain broke
the inertia.
---
 src/backend/access/heap/heapam.c  |  46 -
 src/include/access/htup_details.h | 151 ++
 2 files changed, 69 insertions(+), 128 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3746336a09..74ad445e59 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1131,52 +1131,6 @@ heapgettup_pagemode(HeapScanDesc scan,
 }
 
 
-#if defined(DISABLE_COMPLEX_MACRO)
-/*
- * This is formatted so oddly so that the correspondence to the macro
- * definition in access/htup_details.h is maintained.
- */
-Datum
-fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
-			bool *isnull)
-{
-	return (
-			(attnum) > 0 ?
-			(
-			 (*(isnull) = false),
-			 HeapTupleNoNulls(tup) ?
-			 (
-			  TupleDescAttr((tupleDesc), (attnum) - 1)->attcacheoff >= 0 ?
-			  (
-			   fetchatt(TupleDescAttr((tupleDesc), (attnum) - 1),
-		(char *) (tup)->t_data + (tup)->t_data->t_hoff +
-		TupleDescAttr((tupleDesc), (attnum) - 1)->attcacheoff)
-			   )
-			  :
-			  nocachegetattr((tup), (attnum), (tupleDesc))
-			  )
-			 :
-			 (
-			  att_isnull((attnum) - 1, (tup)->t_data->t_bits) ?
-			  (
-			   (*(isnull) = true),
-			   (Datum) NULL
-			   )
-			  :
-			  (
-			   nocachegetattr((tup), (attnum), (tupleDesc))
-			   )
-			  )
-			 )
-			:
-			(
-			 (Datum) NULL
-			 )
-		);
-}
-#endif			/* defined(DISABLE_COMPLEX_MACRO) */
-
-
 /* 
  *	 heap access method interface
  * 
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index b2d52ed16c..de0b91e1fa 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -690,88 +690,6 @@ struct MinimalTupleData
 #define HeapTupleClearHeapOnly(tuple) \
 		HeapTupleHeaderClearHeapOnly((tuple)->t_data)
 
-
-/* 
- *		fastgetattr
- *
- *		Fetch a user attribute's value as a Datum (might be either a
- *		value, or a pointer into the data area of the tuple).
- *
- *		This must not be used when a system attribute might be requested.
- *		Furthermore, the passed attnum MUST be valid.  Use heap_getattr()
- *		instead, if in doubt.
- *
- *		This gets called many times, so we macro the cacheable and NULL
- *		lookups, and call nocachegetattr() for the rest.
- * 
- */
-
-#if !defined(DISABLE_COMPLEX_MACRO)
-
-#define fastgetattr(tup, attnum, tupleDesc, isnull)	\
-(	\
-	AssertMacro((attnum) > 0),		\
-	(*(isnull) = false),			\
-	HeapTupleNoNulls(tup) ?			\
-	(\
-		TupleDescAttr((tupleDesc), (attnum)-1)->attcacheoff >= 0 ?	\
-		(			\
-			fetchatt(TupleDescAttr((tupleDesc), (attnum)-1),		\
-(char *) (tup)->t_data + (tup)->t_data->t_hoff +	\
-TupleDescAttr((tupleDesc), (attnum)-1)->attcacheoff)\
-		)			\
-		:			\
-			nocachegetattr((tup), (attnum), (tupleDesc))			\
-	)\
-	:\
-	(\
-		att_isnull((attnum)-1, (tup)->t_data->t_bits) ?\
-		(			\
-			(*(isnull) = true),		\
-			(Datum)NULL\
-		)			\
-		:			\
-		(			\
-			nocachegetattr((tup), (attnum), (tupleDesc))			\
-		)			\
-	)\
-)
-#else			/* defined(DISABLE_COMPLEX_MACRO) */
-
-extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
-		 bool *isnull);
-#endif			/* defined(DISABLE_COMPLEX_MACRO) */
-
-
-/* 
- *		heap_getattr
- *
- *		Extract an attribute of a heap tuple and return it as a Datum.
- *		This works for either system or user attributes.  The given attnum
- *		is properly range-checked.
- *
- *		If the field in question has a NULL value, we return a zero Datum
- *		and set *isnull == true.  Otherwise, we set *isnull == false.
- *
- *		 is the pointer to the heap tuple.   is the attribute
- *		number of the column (field) caller wants.   is a
- *		pointer to the structure describing the row and all its fields.
- * 
- */
-#d

Re: proposal - psql - use pager for \watch command

2022-03-24 Thread Pavel Stehule

čt 24. 3. 2022 v 11:05 odesílatel Thomas Munro 
napsal:

> On Wed, Jul 14, 2021 at 6:06 AM Pavel Stehule 
> wrote:
> > út 13. 7. 2021 v 19:50 odesílatel Tom Lane  napsal:
> >> After playing with this along the way to fixing the sigwait issues,
> >> I have a gripe/suggestion.  If I hit control-C while the thing
> >> is waiting between queries, eg
> >>
> >> regression=# select now() \watch
> >> Tue Jul 13 13:44:44 2021 (every 2s)
> >>
> >>   now
> >> ---
> >>  2021-07-13 13:44:44.396565-04
> >> (1 row)
> >>
> >> Tue Jul 13 13:44:46 2021 (every 2s)
> >>
> >>   now
> >> ---
> >>  2021-07-13 13:44:46.396572-04
> >> (1 row)
> >>
> >> ^Cregression=#
> >>
> >> then as you can see I get nothing but the "^C" echo before the next
> >> psql prompt.  The problem with this is that now libreadline is
> >> misinformed about the cursor position, messing up any editing I
> >> might try to do on the next line of input.  So I think it would
> >> be a good idea to have some explicit final output when the \watch
> >> command terminates, along the line of
> >>
> >> ...
> >> Tue Jul 13 13:44:46 2021 (every 2s)
> >>
> >>   now
> >> ---
> >>  2021-07-13 13:44:46.396572-04
> >> (1 row)
> >>
> >> ^C\watch cancelled
> >> regression=#
> >>
> >> This strikes me as a usability improvement even without the
> >> readline-confusion angle.
> >
> > I'll look at this issue.
>
> Hi Pavel,
>
> Do you have a patch for this?
>

Not yet. I forgot about this issue.

Regards

Pavel

Re: Parameter for planner estimate of recursive queries

2022-03-24 Thread Simon Riggs

On Wed, 23 Mar 2022 at 20:25, Tom Lane  wrote:
>
> Simon Riggs  writes:
> >> [New patch version pending]
>
> Do you have any objection if I rename the GUC to
> recursive_worktable_factor?  That seems a bit clearer as to what
> it does, and it leaves more room for other knobs in the same area
> if we decide we need any.

None, I think your proposal is a better name.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Logical replication timeout problem

2022-03-24 Thread Amit Kapila

On Tue, Mar 22, 2022 at 7:25 AM wangw.f...@fujitsu.com
 wrote:
>
> Attach the new patch.
>

It seems by mistake you have removed the changes from pgoutput_message
and pgoutput_truncate functions. I have added those back.
Additionally, I made a few other changes: (a) moved the function
UpdateProgress to pgoutput.c as it is not used outside it, (b) change
the new parameter in plugin API from 'send_keep_alive' to 'last_write'
to make it look similar to WalSndPrepareWrite and WalSndWriteData, (c)
made a number of changes in WalSndUpdateProgress API, it is better to
move keep-alive code after lag track code because we do process
replies at that time and there it will compute the lag; (d)
changed/added comments in the code.

Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.

v5-0001-Fix-the-logical-replication-timeout-during-large-.patch
Description: Binary data

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Simon Riggs

On Wed, 23 Mar 2022 at 19:50, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > I've tried to sum up the various points from everybody into this doc
> > patch. Thanks all for replies.
>
> This seemed rather badly in need of copy-editing.  How do you
> like the attached text?

Seems clear and does the job.

The unique violation thing is worryingly general. Do we know enough to
say that this is thought to occur only with a) multiple unique
constraints, b) exclusion constraints?

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread Peter Eisentraut


On 23.03.22 23:54, Thomas Munro wrote:

That's because ForkNum is a signed type.  You will probably succeed if
you use "%d" instead.


Erm, is that really OK?  C says "Each enumerated type shall be
compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,
but shall be capable of representing the values of all the members of
the enumeration."  It could even legally vary from enum to enum,
though in practice most compilers probably just use ints all the time
unless you use weird pragma pack incantation.  Therefore I think you
need an intermediate variable with the size and signedness matching the
format string, if you're going to scanf directly into it, which
David's V6 did.


An intermediate variable is probably the best way to avoid thinking 
about this much more. ;-)  But note that the committed patch uses a %u 
format whereas the ForkNum enum is signed.


Btw., why the sscanf() instead of just strtol/stroul?

Re: Reducing power consumption on idle servers

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 07:16, Kyotaro Horiguchi  wrote:

> > Most of these timeouts are a bad idea and should not exist. We repeatedly 
> > have
> > had bugs where we were missing wakeups etc but those bugs were harder to
>
> I basically agree to this.

As a general point, maybe. But we have a lot of procs doing a lot of
polling and we are unlikely to change that anytime soon.

> > notice because of timeouts.  I'm against entrenching this stuff further.
>
> For example, pgarch.c theoretically doesn't need hibernate, since it
> has an apparent trigger event and a terminal condition. What we need
> to do about archiver is to set timeout only when it didn't reach the
> lastest finished segment at an iteration. (this might need additional
> shared memory use, though..)

archiver is not the important thing here. If there is objection to
that section of code we can remove that.

> I'm not sure about bgwriter, walwriter and logical replication stuff...

I am sure. bgwriter, walwriter and logical worker need action from us
to allow them to hibernate in a sensible way that allows powersaving.

> I think Thomas is wroking on the walreceiver timeout stuff.

Yes, he is. I have no problem going with what he advocates, if others agree.

However, that fix does not solve the whole problem, since many other
procs also need fixing.

The proposals of this patch are the following, each of which can be
independently accepted/rejected:
1. fix the sleep pattern of bgwriter, walwriter and logical worker
(directly affects powersave)
2. deprecate promote_trigger_file, which then allows us to fix the
sleep for startup process (directly affects powersave)
3. treat hibernation in all procs the same, for simplicity, and to
make sure we don't forget one later
4. provide a design pattern for background worker extensions to
follow, so as to encourage powersaving

Please don't reject the whole patch because you disagree with part
(3), it is not that important.

Thanks

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Thomas Munro

On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
 wrote:
> The unique violation thing is worryingly general. Do we know enough to
> say that this is thought to occur only with a) multiple unique
> constraints, b) exclusion constraints?

I'm aware of 3 cases.  The two you mentioned, which I think we can fix
(as described in the threads I posted upthread), and then there is a
third case that I'm still confused about, in the last line of
read-write-unique-4.spec.

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread Peter Eisentraut


On 24.03.22 11:57, Peter Eisentraut wrote:

On 23.03.22 23:54, Thomas Munro wrote:

That's because ForkNum is a signed type.  You will probably succeed if
you use "%d" instead.


Erm, is that really OK?  C says "Each enumerated type shall be
compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,
but shall be capable of representing the values of all the members of
the enumeration."  It could even legally vary from enum to enum,
though in practice most compilers probably just use ints all the time
unless you use weird pragma pack incantation.  Therefore I think you
need an intermediate variable with the size and signedness matching the
format string, if you're going to scanf directly into it, which
David's V6 did.


An intermediate variable is probably the best way to avoid thinking 
about this much more. ;-)  But note that the committed patch uses a %u 
format whereas the ForkNum enum is signed.


Btw., why the sscanf() instead of just strtol/stroul?


Or even:  Why are we exposing fork *numbers* in the user interface? 
Even low-level tools such as pageinspect use fork *names* in their 
interface.

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread Thomas Munro

On Fri, Mar 25, 2022 at 12:01 AM Peter Eisentraut
 wrote:
> Or even:  Why are we exposing fork *numbers* in the user interface?
> Even low-level tools such as pageinspect use fork *names* in their
> interface.

I wondered about that but thought it seemed OK for such a low level
tool.  It's a fair point though, especially if other low level tools
are doing that.  Here's a patch to change it.
From 79060adc6db3e38c1935426145115c9eed39d85f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 25 Mar 2022 00:22:01 +1300
Subject: [PATCH] Use fork names, not numbers, in pg_waldump option.

Improvement for commit 127aea2a.

Suggested-by: Peter Eisentraut 
Discussion: https://postgr.es/m/3a4c2e93-7976-2320-fc0a-32097fe148a7%40enterprisedb.com
---
 doc/src/sgml/ref/pg_waldump.sgml |  8 
 src/bin/pg_waldump/pg_waldump.c  | 20 +---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 981d3c9038..9e1b91683d 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -118,10 +118,10 @@ PostgreSQL documentation
   

 If provided, only display records that modify blocks in the given fork.
-The valid values are 0 for the main fork,
-1 for the free space map,
-2 for the visibility map,
-and 3 for the init fork.
+The valid values are main for the main fork,
+fsm for the free space map,
+vm for the visibility map,
+and init for the init fork.

   
  
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 92238f30c9..e878c90803 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -828,8 +828,8 @@ usage(void)
 	printf(_("  -e, --end=RECPTR   stop reading at WAL location RECPTR\n"));
 	printf(_("  -f, --follow   keep retrying after reaching end of WAL\n"));
 	printf(_("  -k, --block=N  with --relation, only show records matching this block\n"));
-	printf(_("  -F, --fork=N   only show records matching a specific fork number\n"
-			 " (defaults to showing all)\n"));
+	printf(_("  -F, --fork=FORKonly show records matching a specific fork;\n"
+			 " valid fork names are main, fsm, vm, init\n"));
 	printf(_("  -l, --relation=N/N/N   only show records that affect a specific relation\n"));
 	printf(_("  -n, --limit=N  number of records to display\n"));
 	printf(_("  -p, --path=PATHdirectory in which to find log segment files or a\n"
@@ -968,13 +968,19 @@ main(int argc, char **argv)
 break;
 			case 'F':
 {
-	unsigned int forknum;
+	int			forknum = InvalidForkNumber;
 
-	if (sscanf(optarg, "%u", &forknum) != 1 ||
-		forknum > MAX_FORKNUM)
+	for (int i = 0; i <= MAX_FORKNUM; ++i)
 	{
-		pg_log_error("could not parse valid fork number (0..%d) \"%s\"",
-	 MAX_FORKNUM, optarg);
+		if (strcmp(optarg, forkNames[i]) == 0)
+		{
+			forknum = i;
+			break;
+		}
+	}
+	if (forknum == InvalidForkNumber)
+	{
+		pg_log_error("could not parse fork \"%s\"", optarg);
 		goto bad_argument;
 	}
 	config.filter_by_relation_forknum = (ForkNumber) forknum;
-- 
2.30.2

Re: MDAM techniques and Index Skip Scan patch

2022-03-24 Thread Jesper Pedersen


On 3/23/22 18:22, Dmitry Dolgov wrote:


The CF item could be set to RwF, what would you say, Jesper?



We want to thank the community for the feedback that we have received 
over the years for this feature. Hopefully a future implementation can 
use Tom's suggestions to get closer to a committable solution.


Here is the last CommitFest entry [1] for the archives.

RwF

[1] https://commitfest.postgresql.org/37/1741/

Best regards,
 Dmitry & Jesper

Re: Support logical replication of DDLs

2022-03-24 Thread Dilip Kumar

On Thu, Mar 24, 2022 at 3:32 PM Dilip Kumar  wrote:
>
> On Mon, Mar 21, 2022 at 1:43 PM Dilip Kumar  wrote:
> >
> > On Thu, Mar 17, 2022 at 2:47 AM Zheng Li  wrote:
> > >
> > > Hi,
> > >
> > > >If you don't mind, would you like to share the POC or the branch for 
> > > >this work?
> > >
> > > The POC patch is attached. It currently supports the following 
> > > functionalities:
> >
> > Thanks for sharing, I will look into it.
> >
> > > >In such cases why don't we just log the table creation WAL for DDL
> > > >instead of a complete statement which creates the table and inserts
> > > >the tuple? Because we are already WAL logging individual inserts and
> > > >once you make sure of replicating the table creation I think the exact
> > > >data insertion on the subscriber side will be taken care of by the
> > > >insert WALs no?
> > >
> > > The table creation WAL and table insert WAL are available. The tricky
> > > part is how do we break down this command into two parts (a normal
> > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > WALs. I’ll have to dig more on this.
> >
> > I agree that this is a bit tricky, anyway I will also put more thoughts on 
> > this.
>
> I had put some more thought about this, basically, during CTAS we are
> generating the CreateStmt inside "create_ctas_internal" and executing
> it first before inserting the tuple, so can't we generate the
> independent sql just for creating the tuple maybe using deparsing or
> something?
>
> Apart from that I have one more question, basically if you are
> directly logging the sql query then how you are identifying under
> which schema you need to create that table, are you changing the sql
> and generating schema-qualified name?

I was going through the patch and it seems you are logging the search
path as well along with the query so I think this will probably work.
I have got one more query while looking into the code.  In the below
code snippet you are logging DDL command only if it is a top level
query but there are no comments explaining what sort of queries we
don't want to log.  Suppose I am executing a DDL statement inside a PL
then that will not be a top level statement so is your intention to
block that as well or that is an unintentional side effect?

+/*
+ * Consider logging the DDL command if logical logging is
enabled and this is
+ * a top level query.
+ */
+if (XLogLogicalInfoActive() && isTopLevel)
+LogLogicalDDLCommand(parsetree, queryString);
+


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Index Skip Scan (new UniqueKeys)

2022-03-24 Thread Jesper Pedersen


On 6/9/20 06:22, Dmitry Dolgov wrote:

Here is a new version of index skip scan patch, based on v8 patch for
UniqueKeys implementation from [1]. I want to start a new thread to
simplify navigation, hopefully I didn't forget anyone who actively
participated in the discussion.



This CommitFest entry has been closed with RwF at [1].

Thanks for all the feedback given !

[1] 
https://www.postgresql.org/message-id/ab8636e7-182f-886a-3a39-f3fc279ca45d%40redhat.com


Best regards,
 Dmitry & Jesper

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread Thomas Munro

On Fri, Mar 25, 2022 at 12:26 AM Thomas Munro  wrote:
> On Fri, Mar 25, 2022 at 12:01 AM Peter Eisentraut
>  wrote:
> > Or even:  Why are we exposing fork *numbers* in the user interface?
> > Even low-level tools such as pageinspect use fork *names* in their
> > interface.
>
> I wondered about that but thought it seemed OK for such a low level
> tool.  It's a fair point though, especially if other low level tools
> are doing that.  Here's a patch to change it.

Oh, and there's already a name lookup function to use for this.
From e276c28414645022ead3f33e6a6bc11ae0479496 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 25 Mar 2022 00:22:01 +1300
Subject: [PATCH v2] Use fork names, not numbers, in pg_waldump option.

Improvement for commit 127aea2a.

Suggested-by: Peter Eisentraut 
Discussion: https://postgr.es/m/3a4c2e93-7976-2320-fc0a-32097fe148a7%40enterprisedb.com
---
 doc/src/sgml/ref/pg_waldump.sgml |  8 
 src/bin/pg_waldump/pg_waldump.c  | 15 +++
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 981d3c9038..9e1b91683d 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -118,10 +118,10 @@ PostgreSQL documentation
   

 If provided, only display records that modify blocks in the given fork.
-The valid values are 0 for the main fork,
-1 for the free space map,
-2 for the visibility map,
-and 3 for the init fork.
+The valid values are main for the main fork,
+fsm for the free space map,
+vm for the visibility map,
+and init for the init fork.

   
  
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 92238f30c9..bb6b7576fd 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -828,8 +828,8 @@ usage(void)
 	printf(_("  -e, --end=RECPTR   stop reading at WAL location RECPTR\n"));
 	printf(_("  -f, --follow   keep retrying after reaching end of WAL\n"));
 	printf(_("  -k, --block=N  with --relation, only show records matching this block\n"));
-	printf(_("  -F, --fork=N   only show records matching a specific fork number\n"
-			 " (defaults to showing all)\n"));
+	printf(_("  -F, --fork=FORKonly show records matching a specific fork;\n"
+			 " valid fork names are main, fsm, vm, init\n"));
 	printf(_("  -l, --relation=N/N/N   only show records that affect a specific relation\n"));
 	printf(_("  -n, --limit=N  number of records to display\n"));
 	printf(_("  -p, --path=PATHdirectory in which to find log segment files or a\n"
@@ -968,16 +968,15 @@ main(int argc, char **argv)
 break;
 			case 'F':
 {
-	unsigned int forknum;
+	ForkNumber	forknum;
 
-	if (sscanf(optarg, "%u", &forknum) != 1 ||
-		forknum > MAX_FORKNUM)
+	forknum = forkname_to_number(optarg);
+	if (forknum == InvalidForkNumber)
 	{
-		pg_log_error("could not parse valid fork number (0..%d) \"%s\"",
-	 MAX_FORKNUM, optarg);
+		pg_log_error("could not parse fork \"%s\"", optarg);
 		goto bad_argument;
 	}
-	config.filter_by_relation_forknum = (ForkNumber) forknum;
+	config.filter_by_relation_forknum = forknum;
 	config.filter_by_extended = true;
 }
 break;
-- 
2.30.2

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Michael Paquier

On Thu, Mar 24, 2022 at 11:21:07AM +0100, Alvaro Herrera wrote:
> This patch should silence some recent Coverity (false positive)
> complaints about assertions contained in these macros.

The logic looks fine.  Good idea to get rid of DISABLE_COMPLEX_MACRO.

> Portability testing at:
> https://cirrus-ci.com/github/alvherre/postgres/macros-to-inlinefuncs
> 
> Intend to push later today, unless something ugly happens.

Hmm.  I think that you'd better add a return at the end of each
function?  Some compilers are dumb in detecting that all the code 
paths return (aka recent d0083c1) and could generate warnings, even if
things are coded to return all the time, like in your patch.
--
Michael


signature.asc
Description: PGP signature

Re: Estimating HugePages Requirements?

2022-03-24 Thread Michael Paquier

On Wed, Mar 23, 2022 at 03:25:48PM +0900, Michael Paquier wrote:
> My solution for the docs is perhaps too confusing for the end-user,
> and we are talking about a Linux-only thing here anyway.  So, at the
> end, I am tempted to just add the "2> /dev/null" as suggested upthread
> by Nathan and call it a day.

This still sounds like the best way to go for now, so done this way as
of bbd4951.
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_reset_single_*_counters vs pg_stat_database.stats_reset

2022-03-24 Thread Tomas Vondra

On 3/24/22 01:59, Andres Freund wrote:
> Hi,
> 
> On 2022-03-23 17:55:16 -0700, Andres Freund wrote:
>> Maybe I just don't understand what these reset functions are intended for?
>> Their introduction [3] didn't explain much either. To me the behaviour of
>> resetting pg_stat_database.stats_reset but nothing else in pg_stat_database
>> makes them kind of dangerous.
> 
> Forgot to add: At the very least we should document that weird behaviour,
> because it's certainly not obvious.  But imo we should either remove the
> behaviour or drop the functions.
> 

I agree it should have been documented, but I still stand behind the
current behavior. I'm not willing to die on this hill, but I think the
reasoning was/is sound.

Firstly, calculating transactions/second, reads/second just from by
looking at pg_stat_database data (counter and stat_reset) is nonsense.
It might work for short time periods, but for anything longer it's bound
to give you bogus results - you don't even know if the system was
running at all, and so on.

Secondly, to do anything really meaningful you need to calculate deltas,
and be able to detect if some of the stats were reset for the particular
interval. And the stat_reset timestamp was designed to be a simple way
to detect that (instead of having to inspect all individual timestamps).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 11:01, Thomas Munro  wrote:
>
> On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
>  wrote:
> > The unique violation thing is worryingly general. Do we know enough to
> > say that this is thought to occur only with a) multiple unique
> > constraints, b) exclusion constraints?
>
> I'm aware of 3 cases.  The two you mentioned, which I think we can fix
> (as described in the threads I posted upthread), and then there is a
> third case that I'm still confused about, in the last line of
> read-write-unique-4.spec.

I don't see any confusion - it is clearly a serialization error. What
is more, I see this as a confusing bug that we should fix.

If we were updating the row rather than inserting it, we would get
"ERROR: could not serialize access due to concurrent update", as
documented. The type of command shouldn't affect whether it is a
serialization error or not. (Attached patch proves it does throw
serializable error for UPDATE).

Solving this requires us to alter the Index API to pass down a
snapshot to allow us to test whether the concurrent insert is visible
or not. The test is shown in the attached patch, but this doesn't
attempt the major task of tweaking the APIs to allow this check to be
made.

--
Simon Riggshttp://www.EnterpriseDB.com/

partial_patch_for_read-write-unique-4_serialization_error.v1.patch
Description: Binary data

isotest_read-write-unique-5.v1.patch
Description: Binary data

Re: [PATCH] add relation and block-level filtering to pg_waldump

2022-03-24 Thread David Christensen



> On Mar 24, 2022, at 6:43 AM, Thomas Munro  wrote:
> 
> On Fri, Mar 25, 2022 at 12:26 AM Thomas Munro  wrote:
>>> On Fri, Mar 25, 2022 at 12:01 AM Peter Eisentraut
>>>  wrote:
>>> Or even:  Why are we exposing fork *numbers* in the user interface?
>>> Even low-level tools such as pageinspect use fork *names* in their
>>> interface.
>> 
>> I wondered about that but thought it seemed OK for such a low level
>> tool.  It's a fair point though, especially if other low level tools
>> are doing that.  Here's a patch to change it.
> 
> Oh, and there's already a name lookup function to use for this.

+1 on the semantic names. 

David

Re: TAP output format in pg_regress

2022-03-24 Thread Daniel Gustafsson

> On 22 Mar 2022, at 00:49, Andres Freund  wrote:

> This is failing with segmentation faults on cfbot:
> https://cirrus-ci.com/task/4879227009892352?logs=test_world#L21
> 
> Looks like something around isolationtester is broken?

It could be triggered by plpgsql tests as well, and was (as usual) a silly
mistake easily fixed when found.  The attached survices repeated check-world
for me.

> Seems like we might not have energy to push this forward in the next two
> weeks, so maybe the CF entry should be marked returned or moved for now?

Since there is little use for this without the Meson branch, it should target
the same version as that patch. I'll move the patch to the next CF for now.

As we discussed off-list I extended this patchset with an attempt to minimize
noise as per 20220221164736.rq3ornzjdkmwk...@alap3.anarazel.de, but it's not
yet done.  The attached has a rough draft of adding a --verbose option which
gives the current output (and potentially more for debugging), but which by
default is off producing minimal noise.

--
Daniel Gustafsson   https://vmware.com/

v4-0002-Reduce-noise-in-pg_regress-default-output.patch
Description: Binary data

v4-0001-pg_regress-TAP-output-format.patch
Description: Binary data

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Bharath Rupireddy

On Thu, Mar 24, 2022 at 2:34 PM Daniel Gustafsson  wrote:
>
> > On 24 Mar 2022, at 06:17, Kyotaro Horiguchi  wrote:
>
> > The comment and errmsg_plural don't seem to be consistent.  When the
> > code was added by c4f2a0458d, it had only singular form and already
> > had the comment.  After that 8032d76b5 turned it to errmsg_plural
> > ignoring the comment.  It seems like a thinko of 8032d76b5.
>
> Following the bouncing ball, that seems like a reasonable conclusion, and
> removing the plural form should be fine to reduce translator work.

Yes, the singular version of the message isn't required at all as
numReportedClient > 1. Hence I proposed to remove errmsg_plural and
singular version.

Regards,
Bharath Rupireddy.

Re: multithreaded zstd backup compression for client and server

2022-03-24 Thread Robert Haas

On Wed, Mar 23, 2022 at 7:07 PM Justin Pryzby  wrote:
> Did you try this on windows at all ?  It's probably no surprise that zstd
> implements threading differently there.

I did not. I haven't had a properly functioning Windows development
environment in about a decade.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Peter Eisentraut




On 24.03.22 13:48, Bharath Rupireddy wrote:

Yes, the singular version of the message isn't required at all as
numReportedClient > 1. Hence I proposed to remove errmsg_plural and
singular version.


The issue is that n == 1 and n != 1 are not the only cases that 
errmsg_plural() handles.  Some languages have different forms for n == 
1, n == 2, and n >= 5, for example.  So while it is true that in


errmsg_plural("drop cascades to %d other object",
  "drop cascades to %d other objects",

the English singular string will never be used, you have to keep the 
errmsg_plural() call so that it can handle variants like the above for 
other languages.


You could write

errmsg_plural("DUMMY NOT USED %d",
  "drop cascades to %d other objects",

but I don't think that is better.

Re: Estimating HugePages Requirements?

2022-03-24 Thread Magnus Hagander

On Wed, Mar 23, 2022 at 7:25 AM Michael Paquier  wrote:
>
> On Tue, Mar 15, 2022 at 03:44:39PM -0700, Nathan Bossart wrote:
> > A simple approach could be to just set log_min_messages to PANIC before
> > exiting.  I've attached a patch for this.  With this patch, we'll still see
> > a FATAL if we try to use 'postgres -C' for a runtime-computed GUC on a
> > running server, and there will be no extra output as long as the user sets
> > log_min_messages to INFO or higher (i.e., not a DEBUG* value).  For
> > comparison, 'postgres -C' for a non-runtime-computed GUC does not emit
> > extra output as long as the user sets log_min_messages to DEBUG2 or higher.
>
> >   puts(config_val ? config_val : "");
> > +
> > + /* don't emit shutdown messages */
> > + SetConfigOption("log_min_messages", "PANIC", PGC_INTERNAL, 
> > PGC_S_OVERRIDE);
> > +
> >   ExitPostmaster(0);
>
> That's fancy, but I don't like that much.  And this would not protect
> either against any messages generated before this code path, either,

But neither would the suggestion of redirecting stderr to /dev/null.
In fact, doing the redirect it will *also* throw away any FATAL that
happens. In fact, using the 2>/dev/null method, we *also* remove the
message that says there's another postmaster running in this
directory, which is strictly worse than the override of
log_min_messages.

That said, the redirect can be removed without recompiling postgres,
so it is probably still hte better choice as a temporary workaround.
But we should really look into getting a better solution in place once
we start on 16.

> My solution for the docs is perhaps too confusing for the end-user,
> and we are talking about a Linux-only thing here anyway.  So, at the
> end, I am tempted to just add the "2> /dev/null" as suggested upthread
> by Nathan and call it a day.  Does that sound fine?

What would be a linux only thing?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Peter Eisentraut


On 24.03.22 06:17, Kyotaro Horiguchi wrote:

The comment and errmsg_plural don't seem to be consistent.  When the
code was added by c4f2a0458d, it had only singular form and already
had the comment.  After that 8032d76b5 turned it to errmsg_plural
ignoring the comment.  It seems like a thinko of 8032d76b5.


I have removed the comment.

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Daniel Gustafsson

> On 24 Mar 2022, at 14:07, Peter Eisentraut 
>  wrote:
> 
> On 24.03.22 06:17, Kyotaro Horiguchi wrote:
>> The comment and errmsg_plural don't seem to be consistent.  When the
>> code was added by c4f2a0458d, it had only singular form and already
>> had the comment.  After that 8032d76b5 turned it to errmsg_plural
>> ignoring the comment.  It seems like a thinko of 8032d76b5.
> 
> I have removed the comment.

I was just typing a reply to your upthread answer that we should just remove
the comment then, so a retroactive +1 on this =)

--
Daniel Gustafsson   https://vmware.com/

Re: unlogged sequences

2022-03-24 Thread Peter Eisentraut

Here is an updated patch that now also includes SET LOGGED/UNLOGGED 
support.  So this version addresses all known issues and open problems.



On 28.02.22 10:56, Peter Eisentraut wrote:

rebased patch, no functional changes

On 11.02.22 10:12, Peter Eisentraut wrote:

On 25.06.19 20:37, Andres Freund wrote:

I.e. I think it'd be better if we just added a fork argument to
fill_seq_with_data(), and then do something like

smgrcreate(srel, INIT_FORKNUM, false);
log_smgrcreate(&rel->rd_node, INIT_FORKNUM);
fill_seq_with_data(rel, tuple, INIT_FORKNUM);

and add a FlushBuffer() to the end of fill_seq_with_data() if writing
INIT_FORKNUM. The if (RelationNeedsWAL(rel)) would need an || forkNum ==
INIT_FORKNUM.


Now that logical replication of sequences is nearing completion, I 
figured it would be suitable to dust off this old discussion on 
unlogged sequences, mainly so that sequences attached to unlogged 
tables can be excluded from replication.


Attached is a new patch that incorporates the above suggestions, with 
some slight refactoring.  The only thing I didn't/couldn't do was to 
call FlushBuffers(), since that is not an exported function.  So this 
still calls FlushRelationBuffers(), which was previously not liked. 
Ideas welcome.


I have also re-tested the crash reported by Michael Paquier in the old 
discussion and added test cases that catch them.


The rest of the patch is just documentation, DDL support, client 
support, etc.


What is not done yet is support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED.  This is a bit of a problem because:


1. The new behavior is that a serial/identity sequence of a new 
unlogged table is now also unlogged.
2. There is also a new restriction that changing a table to logged is 
not allowed if it is linked to an unlogged sequence.  (This is IMO 
similar to the existing restriction on linking mixed logged/unlogged 
tables via foreign keys.)
3. Thus, currently, you can't create an unlogged table with a 
serial/identity column and then change it to logged.  This is 
reflected in some of the test changes I had to make in alter_table.sql 
to work around this.  These should eventually go away.


Interestingly, there is grammar support for ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED because there is this:


 |   ALTER SEQUENCE qualified_name alter_table_cmds
 {
 AlterTableStmt *n = makeNode(AlterTableStmt);
 n->relation = $3;
 n->cmds = $4;
 n->objtype = OBJECT_SEQUENCE;
 n->missing_ok = false;
 $$ = (Node *)n;
 }

But it is rejected later in tablecmds.c.  In fact, it appears that 
this piece of grammar is currently useless because there are no 
alter_table_cmds that actually work for sequences.  (This used to be 
different because things like OWNER TO also went through here.)


I tried to make tablecmds.c handle sequences as well, but that became 
messy.  So I'm thinking about making ALTER SEQUENCE ... SET 
LOGGED/UNLOGGED an entirely separate code path and rip out the above 
grammar, but that needs some further pondering.


But all that is a bit of a separate effort, so in the meantime some 
review of the changes in and around fill_seq_with_data() would be useful.
From 42584b9f0b2d34c6ac33fe9a978c22b37231e779 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 24 Mar 2022 13:54:15 +0100
Subject: [PATCH v4] Unlogged sequences

Add support for unlogged sequences.  Unlike for unlogged tables, this
is not a performance feature.  It allows sequences associated with
unlogged tables to be excluded from replication.

An identity/serial sequence is automatically made unlogged when its
owning table is.  (See also discussion in bug #15631.)

A new subcommand ALTER SEQUENCE ... SET LOGGED/UNLOGGED is added.
Also, identity/serial sequences automatically follow persistence
changes of their associated tables.

Discussion: 
https://www.postgresql.org/message-id/flat/04e12818-2f98-257c-b926-2845d74ed04f%402ndquadrant.com
---
 doc/src/sgml/ref/alter_sequence.sgml   |  12 ++
 doc/src/sgml/ref/create_sequence.sgml  |  23 +++-
 doc/src/sgml/ref/pg_dump.sgml  |   6 +-
 src/backend/commands/sequence.c|  58 --
 src/backend/commands/tablecmds.c   |  32 +-
 src/backend/parser/parse_utilcmd.c |   1 +
 src/bin/pg_dump/pg_dump.c  |   4 +-
 src/bin/psql/describe.c|   8 +-
 src/bin/psql/tab-complete.c|   5 +-
 src/include/commands/sequence.h|   1 +
 src/test/recovery/t/014_unlogged_reinit.pl |  49 +++-
 src/test/regress/expected/alter_table.out  | 125 +
 src/test/regress/expected/sequence.out |  20 +++-
 src/test/regress/sql/alter_table.sql   |  37 +++---
 src/test/regress/sql/sequence.sql  |  10 +-
 15 files changed, 297 insertions(+), 94 deletions(-)

diff --git a/doc/src/sgml

Re: multithreaded zstd backup compression for client and server

2022-03-24 Thread Dagfinn Ilmari Mannsåker

Hi Robert,

I haven't reviewed the meat of the patch in detail, but I noticed
something in the tests:

Robert Haas  writes:
> diff --git a/src/bin/pg_verifybackup/t/009_extract.pl 
> b/src/bin/pg_verifybackup/t/009_extract.pl
> index 9f9cc7540b..e17e7cad51 100644
> --- a/src/bin/pg_verifybackup/t/009_extract.pl
> +++ b/src/bin/pg_verifybackup/t/009_extract.pl
[…]
> + if ($backup_stdout ne '')
> + {
> + print "# standard output was:\n$backup_stdout";
> + }
> + if ($backup_stderr ne '')
> + {
> + print "# standard error was:\n$backup_stderr";
> + }
[…]
> diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl 
> b/src/bin/pg_verifybackup/t/010_client_untar.pl
> index 487e30e826..5f6a4b9963 100644
> --- a/src/bin/pg_verifybackup/t/010_client_untar.pl
> +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
[…]
> + if ($backup_stdout ne '')
> + {
> + print "# standard output was:\n$backup_stdout";
> + }
> + if ($backup_stderr ne '')
> + {
> + print "# standard error was:\n$backup_stderr";
> + }

Per the TAP protocol, every line of non-test-result output should be
prefixed by "# ". The note() function does this for you, see
https://metacpan.org/pod/Test::More#Diagnostics for details.

- ilmari

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Alvaro Herrera

On 2022-Mar-24, Michael Paquier wrote:

> Hmm.  I think that you'd better add a return at the end of each
> function?  Some compilers are dumb in detecting that all the code 
> paths return (aka recent d0083c1) and could generate warnings, even if
> things are coded to return all the time, like in your patch.

Hmm, OK to do something about that.  I added pg_unreachable(): looking
at LWLockAttemptLock(), it looks that that should be sufficient.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From bd535ae2c8044de0c1eb6f4e17dd0f73c50e1622 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 24 Mar 2022 10:40:21 +0100
Subject: [PATCH v2] Change fastgetattr and heap_getattr to inline functions

They were macros previously, but recent callsite additions made Coverity
complain about one of the assertions being always true.  This change
could have been made a long time ago, but the Coverity complain broke
the inertia.
---
 src/backend/access/heap/heapam.c  |  46 -
 src/include/access/htup_details.h | 155 ++
 2 files changed, 73 insertions(+), 128 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3746336a09..74ad445e59 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1131,52 +1131,6 @@ heapgettup_pagemode(HeapScanDesc scan,
 }
 
 
-#if defined(DISABLE_COMPLEX_MACRO)
-/*
- * This is formatted so oddly so that the correspondence to the macro
- * definition in access/htup_details.h is maintained.
- */
-Datum
-fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
-			bool *isnull)
-{
-	return (
-			(attnum) > 0 ?
-			(
-			 (*(isnull) = false),
-			 HeapTupleNoNulls(tup) ?
-			 (
-			  TupleDescAttr((tupleDesc), (attnum) - 1)->attcacheoff >= 0 ?
-			  (
-			   fetchatt(TupleDescAttr((tupleDesc), (attnum) - 1),
-		(char *) (tup)->t_data + (tup)->t_data->t_hoff +
-		TupleDescAttr((tupleDesc), (attnum) - 1)->attcacheoff)
-			   )
-			  :
-			  nocachegetattr((tup), (attnum), (tupleDesc))
-			  )
-			 :
-			 (
-			  att_isnull((attnum) - 1, (tup)->t_data->t_bits) ?
-			  (
-			   (*(isnull) = true),
-			   (Datum) NULL
-			   )
-			  :
-			  (
-			   nocachegetattr((tup), (attnum), (tupleDesc))
-			   )
-			  )
-			 )
-			:
-			(
-			 (Datum) NULL
-			 )
-		);
-}
-#endif			/* defined(DISABLE_COMPLEX_MACRO) */
-
-
 /* 
  *	 heap access method interface
  * 
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index b2d52ed16c..0c5c7b352f 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -690,88 +690,6 @@ struct MinimalTupleData
 #define HeapTupleClearHeapOnly(tuple) \
 		HeapTupleHeaderClearHeapOnly((tuple)->t_data)
 
-
-/* 
- *		fastgetattr
- *
- *		Fetch a user attribute's value as a Datum (might be either a
- *		value, or a pointer into the data area of the tuple).
- *
- *		This must not be used when a system attribute might be requested.
- *		Furthermore, the passed attnum MUST be valid.  Use heap_getattr()
- *		instead, if in doubt.
- *
- *		This gets called many times, so we macro the cacheable and NULL
- *		lookups, and call nocachegetattr() for the rest.
- * 
- */
-
-#if !defined(DISABLE_COMPLEX_MACRO)
-
-#define fastgetattr(tup, attnum, tupleDesc, isnull)	\
-(	\
-	AssertMacro((attnum) > 0),		\
-	(*(isnull) = false),			\
-	HeapTupleNoNulls(tup) ?			\
-	(\
-		TupleDescAttr((tupleDesc), (attnum)-1)->attcacheoff >= 0 ?	\
-		(			\
-			fetchatt(TupleDescAttr((tupleDesc), (attnum)-1),		\
-(char *) (tup)->t_data + (tup)->t_data->t_hoff +	\
-TupleDescAttr((tupleDesc), (attnum)-1)->attcacheoff)\
-		)			\
-		:			\
-			nocachegetattr((tup), (attnum), (tupleDesc))			\
-	)\
-	:\
-	(\
-		att_isnull((attnum)-1, (tup)->t_data->t_bits) ?\
-		(			\
-			(*(isnull) = true),		\
-			(Datum)NULL\
-		)			\
-		:			\
-		(			\
-			nocachegetattr((tup), (attnum), (tupleDesc))			\
-		)			\
-	)\
-)
-#else			/* defined(DISABLE_COMPLEX_MACRO) */
-
-extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
-		 bool *isnull);
-#endif			/* defined(DISABLE_COMPLEX_MACRO) */
-
-
-/* 
- *		heap_getattr
- *
- *		Extract an attribute of a heap tuple and return it as a Datum.
- *		This works for either system or user attributes.  The given attnum
- *		is properly range-checked.
- *
- *		If the field in question has a NULL value, we return a zero Datum
- *		and set *isnull == true.  Otherwise, we set *isnull == false.
- *
- *		 is the pointer to the heap t

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Bharath Rupireddy

On Thu, Mar 24, 2022 at 6:35 PM Peter Eisentraut
 wrote:
>
> On 24.03.22 13:48, Bharath Rupireddy wrote:
> > Yes, the singular version of the message isn't required at all as
> > numReportedClient > 1. Hence I proposed to remove errmsg_plural and
> > singular version.
>
> The issue is that n == 1 and n != 1 are not the only cases that
> errmsg_plural() handles.  Some languages have different forms for n ==
> 1, n == 2, and n >= 5, for example.  So while it is true that in
>
>  errmsg_plural("drop cascades to %d other object",
>"drop cascades to %d other objects",

Thanks. I think I get the point - is it dngettext doing things
differently for different languages?

#define EVALUATE_MESSAGE_PLURAL(domain, targetfield, appendval)  \
{ \
const char *fmt; \
StringInfoData  buf; \
/* Internationalize the error format string */ \
if (!in_error_recursion_trouble()) \
fmt = dngettext((domain), fmt_singular, fmt_plural, n); \
else \
fmt = (n == 1 ? fmt_singular : fmt_plural); \
initStringInfo(&buf); \

Regards,
Bharath Rupireddy.

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Japin Li



On Thu, 24 Mar 2022 at 21:26, Alvaro Herrera  wrote:
> On 2022-Mar-24, Michael Paquier wrote:
>
>> Hmm.  I think that you'd better add a return at the end of each
>> function?  Some compilers are dumb in detecting that all the code
>> paths return (aka recent d0083c1) and could generate warnings, even if
>> things are coded to return all the time, like in your patch.
>
> Hmm, OK to do something about that.  I added pg_unreachable(): looking
> at LWLockAttemptLock(), it looks that that should be sufficient.

Hi,

I want to know why we do not use the following style?

+static inline Datum
+heap_getattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
+{
+   if (attnum > 0)
+   {
+   if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
+   return getmissingattr(tupleDesc, attnum, isnull);
+   else
+   return fastgetattr(tup, attnum, tupleDesc, isnull);
+   }
+
+   return heap_getsysattr(tup, attnum, tupleDesc, isnull);
+}


--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: multithreaded zstd backup compression for client and server

2022-03-24 Thread Robert Haas

On Wed, Mar 23, 2022 at 7:31 PM Andres Freund  wrote:
> I found this the following section in the manual [1]:
>
> ZSTD_c_nbWorkers=400,/* Select how many threads will be spawned to 
> compress in parallel.
>   * When nbWorkers >= 1, triggers asynchronous 
> mode when invoking ZSTD_compressStream*() :
>   * ZSTD_compressStream*() consumes input and 
> flush output if possible, but immediately gives back control to caller,
>   * while compression is performed in parallel, 
> within worker thread(s).
>   * (note : a strong exception to this rule is 
> when first invocation of ZSTD_compressStream2() sets ZSTD_e_end :
>   *  in which case, ZSTD_compressStream2() 
> delegates to ZSTD_compress2(), which is always a blocking call).
>   * More workers improve speed, but also increase 
> memory usage.
>   * Default value is `0`, aka "single-threaded 
> mode" : no worker is spawned,
>   * compression is performed inside Caller's 
> thread, and all invocations are blocking */
>
> "ZSTD_compressStream*() consumes input ... immediately gives back control"
> pretty much confirms that.

I saw that too, but I didn't consider it conclusive. It would be nice
if their documentation had a bit more detail on what's really
happening.

> Do we care about zstd's memory usage here? I think it's OK to mostly ignore
> work_mem/maintenance_work_mem here, but I could also see limiting concurrency
> so that estimated memory usage would fit into work_mem/maintenance_work_mem.

I think it's possible that we want to do nothing and possible that we
want to do something, but I think it's very unlikely that the thing we
want to do is related to maintenance_work_mem. Say we soft-cap the
compression level to the one which we think will fit within
maintanence_work_mem. I think the most likely outcome is that people
will not get the compression level they request and be confused about
why that has happened. It also seems possible that we'll be wrong
about how much memory will be used - say, because somebody changes the
library behavior in a new release - and will limit it to the wrong
level. If we're going to do anything here, I think it should be to
limit based on the compression level itself and not based how much
memory we think that level will use.

But that leaves the question of whether we should even try to impose
some kind of limit, and there I'm not sure. It feels like it might be
overengineered, because we're only talking about users who have
replication privileges, and if those accounts are subverted there are
big problems anyway. I think if we imposed a governance system here it
would get very little use. On the other hand, I think that the higher
zstd compression levels of 20+ can actually use a ton of memory, so we
might want to limit access to those somehow. Apparently on the command
line you have to say --ultra -- not sure if there's a corresponding
API call or if that's a guard that's built specifically into the CLI.

> Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
> one invocation? I recall that there's some differences between basebackup
> client / serverside around buffer sizes - but that's before all the recent-ish
> changes...

That thought occurred to me too but I haven't investigated yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Tom Lane

Thomas Munro  writes:
> On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
>  wrote:
>> The unique violation thing is worryingly general. Do we know enough to
>> say that this is thought to occur only with a) multiple unique
>> constraints, b) exclusion constraints?

> I'm aware of 3 cases.  The two you mentioned, which I think we can fix
> (as described in the threads I posted upthread), and then there is a
> third case that I'm still confused about, in the last line of
> read-write-unique-4.spec.

That test is modeling the case where the application does an INSERT
with values based on some data it read earlier.  There is no way for
the server to know that there's any connection, so I think if you
try to throw a serialization error rather than a uniqueness error,
you're basically lying to the client by claiming something you do not
know to be true.  And the lie is not without consequences: if the
application believes it, it might iterate forever vainly trying to
commit a transaction that will never succeed.

regards, tom lane

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Tom Lane

Bharath Rupireddy  writes:
> Thanks. I think I get the point - is it dngettext doing things
> differently for different languages?

Yeah.  To be concrete, have a look in ru.po:

#: catalog/dependency.c:1208
#, c-format
msgid "drop cascades to %d other object"
msgid_plural "drop cascades to %d other objects"
msgstr[0] "удаление распространяется на ещё %d объект"
msgstr[1] "удаление распространяется на ещё %d объекта"
msgstr[2] "удаление распространяется на ещё %d объектов"

I don't know Russian, so I don't know exactly what's going
on there, but there's evidently three different forms in
that language.  Probably one of them is not reachable given
that n > 1, but I doubt we're saving the translator any time
with that info.  Besides, gettext might require all three
forms to be provided anyway in order to work correctly.

regards, tom lane

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Alvaro Herrera

On 2022-Mar-24, Bharath Rupireddy wrote:

> Thanks. I think I get the point - is it dngettext doing things
> differently for different languages?

Yes.  The dngettext() rules are embedded in each translation's catalog
file:

$ git grep 'Plural-Forms' src/backend/po/*.po
de.po:"Plural-Forms: nplurals=2; plural=(n != 1);\n"
es.po:"Plural-Forms: nplurals=2; plural=n != 1;\n"
fr.po:"Plural-Forms: nplurals=2; plural=(n > 1);\n"
id.po:"Plural-Forms: nplurals=2; plural=(n > 1);\n"
it.po:"Plural-Forms: nplurals=2; plural=n != 1;\n"
ja.po:"Plural-Forms: nplurals=2; plural=n != 1;\n"
ko.po:"Plural-Forms: nplurals=1; plural=0;\n"
pl.po:"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && 
(n%100<10 "
pt_BR.po:"Plural-Forms: nplurals=2; plural=(n>1);\n"
ru.po:"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && 
n"
sv.po:"Plural-Forms: nplurals=2; plural=(n != 1);\n"
tr.po:"Plural-Forms: nplurals=2; plural=(n != 1);\n"
uk.po:"Plural-Forms: nplurals=4; plural=((n%10==1 && n%100!=11) ? 0 : ((n%10 >= 
2 && n%10 <=4 && (n%100 < 12 || n%100 > 14)) ? 1 : ((n%10 == 0 || (n%10 >= 5 && 
n%10 <=9)) || (n%100 >= 11 && n%100 <= 14)) ? 2 : 3));\n"
zh_CN.po:"Plural-Forms: nplurals=1; plural=0;\n"

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 14:05, Tom Lane  wrote:
>
> Thomas Munro  writes:
> > On Thu, Mar 24, 2022 at 11:44 PM Simon Riggs
> >  wrote:
> >> The unique violation thing is worryingly general. Do we know enough to
> >> say that this is thought to occur only with a) multiple unique
> >> constraints, b) exclusion constraints?
>
> > I'm aware of 3 cases.  The two you mentioned, which I think we can fix
> > (as described in the threads I posted upthread), and then there is a
> > third case that I'm still confused about, in the last line of
> > read-write-unique-4.spec.
>
> That test is modeling the case where the application does an INSERT
> with values based on some data it read earlier.  There is no way for
> the server to know that there's any connection, so I think if you
> try to throw a serialization error rather than a uniqueness error,
> you're basically lying to the client by claiming something you do not
> know to be true.  And the lie is not without consequences: if the
> application believes it, it might iterate forever vainly trying to
> commit a transaction that will never succeed.

OK, I see what you mean. There are 2 types of transaction, one that
reads inside the transaction, one that decides what value to use some
other way.

So now we have 2 cases, both of which generate uniqueness violations,
but only one of which might succeed if retried. The patch does cover
this, I guess, by saying be careful, but I would be happier if we can
also add

"this is thought to occur only with multiple unique constraints and/or
an exclusion constraints"

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Alvaro Herrera

On 2022-Mar-24, Japin Li wrote:

> I want to know why we do not use the following style?
> 
> +static inline Datum
> +heap_getattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
> +{
> + if (attnum > 0)
> + {
> + if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
> + return getmissingattr(tupleDesc, attnum, isnull);
> + else
> + return fastgetattr(tup, attnum, tupleDesc, isnull);
> + }
> +
> + return heap_getsysattr(tup, attnum, tupleDesc, isnull);
> +}

That was the first thing I wrote, but I can't get myself to like it.
For this one function the code flow is obvious enough; but if you apply
the same idea to fastgetattr(), the result is not nice at all.

If there are enough votes for doing it this way, I can do that.

I guess we could do something like this instead, which seems somewhat
less bad:

if (attnum <= 0)
return heap_getsysattr(...)
if (likely(attnum <= HeapTupleHeaderGetNattrs(...)))
return fastgetattr(...)

return getmissingattr(...)

but I still prefer the one in the v2 patch I posted.

It's annoying that case 0 (InvalidAttrNumber) is not well handled
anywhere.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Tom Lane

Alvaro Herrera  writes:
> $ git grep 'Plural-Forms' src/backend/po/*.po
> ru.po:"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 
> && n"

Oh, interesting: if I'm reading that right, all three Russian
forms are reachable, even with the knowledge that n > 1.
(But isn't the last "&& n" test redundant?)

regards, tom lane

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Tom Lane

Simon Riggs  writes:
> OK, I see what you mean. There are 2 types of transaction, one that
> reads inside the transaction, one that decides what value to use some
> other way.

> So now we have 2 cases, both of which generate uniqueness violations,
> but only one of which might succeed if retried. The patch does cover
> this, I guess, by saying be careful, but I would be happier if we can
> also add

> "this is thought to occur only with multiple unique constraints and/or
> an exclusion constraints"

Um, what's that got to do with it?  The example in 
read-write-unique-4.spec involves only a single pkey constraint.

We could add something trying to explain that if the application inserts a
value into a constrained column based on data it read earlier, then any
resulting constraint violation might be effectively a serialization
failure.

regards, tom lane

Re: Remove an unnecessary errmsg_plural in dependency.c

2022-03-24 Thread Alvaro Herrera

On 2022-Mar-24, Tom Lane wrote:

> Alvaro Herrera  writes:
> > $ git grep 'Plural-Forms' src/backend/po/*.po
> > ru.po:"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 
> > && n"
> 
> Oh, interesting: if I'm reading that right, all three Russian
> forms are reachable, even with the knowledge that n > 1.
> (But isn't the last "&& n" test redundant?)

I wondered about that trailing 'n' and it turns out that the grep was
too simplistic, so it's incomplete.  The full rule is:

"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n"
"%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2);\n"

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Use -fvisibility=hidden for shared libraries

2022-03-24 Thread Peter Eisentraut


On 11.01.22 03:53, Andres Freund wrote:

Ugh. In the case of worker_spi it's because -fvisibility=hidden isn't applied
to worker_spi.c at all. Which in turn is because Makefile.shlib isn't used at
all for MODULES= style modules just for MODULE_big= ones :(.

Once that's "fixed" it fails as expected...

I'm not sure what the best way to deal with this is. Just now I copied the
logic from Makefile.shlib to pgxs.mk (by the existing CFLAGS_SL use), but that
doesn't scale - there's other direct uses of CFLAGS_SL.

Perhaps the best way would be to add the -fvisibility=hidden to CFLAGS_SL, and
work around the explicit exports issue in Makefile.shlib by adding an explicit
-fvisibility=default? Or perhaps CFLAGS_SL


The easiest solution would be to change worker_spi's Makefile to use 
MODULE_big.


There are already many cases where MODULE_big is used instead of MODULES 
for seemingly random reasons, so I wouldn't worry too much about it here 
if it helps you move forward.

Re: [RFC] building postgres with meson -v6

2022-03-24 Thread Peter Eisentraut


On 09.03.22 13:37, Peter Eisentraut wrote:

v6-0008-meson-prereq-Handle-DLSUFFIX-in-msvc-builds-simil.patch.gz

I think the right way here is actually to go the other way around:
Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
assignment from src/makefiles/ to src/templates/, have configure read
it, and then substitute it into Makefile.global and pg_config.h.

Then we also don't have to patch the Windows build code a bunch of
times to add the DLSUFFIX define everywhere.


This patch should do it.
From 763943176a1e0a0c954414ba9da07742ad791656 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 24 Mar 2022 16:00:54 +0100
Subject: [PATCH] Refactor DLSUFFIX handling

Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
assignment from src/makefiles/ to src/templates/, have configure read
it, and then substitute it into Makefile.global and pg_config.h.  This
avoids the need of all users to locally set CPPFLAGS.
---
 config/python.m4  |  4 +++-
 configure | 13 -
 configure.ac  |  5 +
 src/Makefile.global.in|  2 ++
 src/backend/jit/Makefile  |  2 --
 src/backend/utils/fmgr/Makefile   |  2 --
 src/backend/utils/fmgr/dfmgr.c|  5 -
 src/bin/pg_upgrade/Makefile   |  2 +-
 src/include/pg_config.h.in|  3 +++
 src/include/port/win32_port.h |  3 ---
 src/interfaces/ecpg/test/Makefile |  1 -
 src/makefiles/Makefile.aix|  1 -
 src/makefiles/Makefile.cygwin |  1 -
 src/makefiles/Makefile.darwin |  2 --
 src/makefiles/Makefile.freebsd|  2 --
 src/makefiles/Makefile.hpux   |  6 --
 src/makefiles/Makefile.linux  |  2 --
 src/makefiles/Makefile.netbsd |  2 --
 src/makefiles/Makefile.openbsd|  2 --
 src/makefiles/Makefile.solaris|  2 --
 src/makefiles/Makefile.win32  |  1 -
 src/template/cygwin   |  2 ++
 src/template/hpux |  7 +++
 src/template/win32|  2 ++
 src/test/regress/GNUmakefile  |  3 +--
 src/tools/msvc/Solution.pm|  1 +
 26 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/config/python.m4 b/config/python.m4
index 52f34070dd..e500873ff3 100644
--- a/config/python.m4
+++ b/config/python.m4
@@ -120,7 +120,9 @@ else
found_shlib=0
for d in "${python_libdir}" "${python_configdir}" /usr/lib64 /usr/lib
do
-   # We don't know the platform DLSUFFIX here, so check 'em all.
+   # Note: DLSUFFIX is for loadable modules, not shared
+   # libraries, so cannot be used here portably.  Just
+   # check all known possibilities.
for e in .so .dll .dylib .sl; do
if test -e "$d/lib${ldlibrary}$e"; then
python_libdir="$d"
diff --git a/configure b/configure
index e066cbe2c8..519cef76a9 100755
--- a/configure
+++ b/configure
@@ -743,6 +743,7 @@ BITCODE_CFLAGS
 CFLAGS_VECTORIZE
 CFLAGS_UNROLL_LOOPS
 PERMIT_DECLARATION_AFTER_STATEMENT
+DLSUFFIX
 LLVM_BINPATH
 LLVM_CXXFLAGS
 LLVM_CFLAGS
@@ -5219,11 +5220,19 @@ fi # fi
 unset CFLAGS
 unset CXXFLAGS
 
+DLSUFFIX=".so"
+
 #
 # Read the template
 #
 . "$srcdir/src/template/$template" || exit
 
+
+
+cat >>confdefs.h <<_ACEOF
+#define DLSUFFIX "$DLSUFFIX"
+_ACEOF
+
 # C[XX]FLAGS are selected so:
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
@@ -10560,7 +10569,9 @@ else
found_shlib=0
for d in "${python_libdir}" "${python_configdir}" /usr/lib64 /usr/lib
do
-   # We don't know the platform DLSUFFIX here, so check 'em all.
+   # Note: DLSUFFIX is for loadable modules, not shared
+   # libraries, so cannot be used here portably.  Just
+   # check all known possibilities.
for e in .so .dll .dylib .sl; do
if test -e "$d/lib${ldlibrary}$e"; then
python_libdir="$d"
diff --git a/configure.ac b/configure.ac
index 078381e568..79ae882133 100644
--- a/configure.ac
+++ b/configure.ac
@@ -397,11 +397,16 @@ AS_IF([test "$with_llvm" = yes], [
 unset CFLAGS
 unset CXXFLAGS
 
+DLSUFFIX=".so"
+
 #
 # Read the template
 #
 . "$srcdir/src/template/$template" || exit
 
+AC_SUBST(DLSUFFIX)
+AC_DEFINE_UNQUOTED([DLSUFFIX], ["$DLSUFFIX"], [Define to the file name 
extension of dynamically-loadable modules.])dnl
+
 # C[XX]FLAGS are selected so:
 # If the user specifies something in the environment, that is used.
 # else:  If the template file set something, that is used.
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index bbdc1c4bda..0726b2020f 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -545,6 +545,8 @@ WIN32_STACK_RLIMIT=4194304
 # Set if we have a working win32 crashdump header
 have_win32_dbghelp = @have_win32_dbghelp@
 
+DLSUFFIX = @DLSUFF

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-24 Thread Pavel Borisov

Hi, Peter!
Thanks for your review!

About v25-0001-Use-unsigned-64-bit-numbering-of-SLRU-pages.patch:
>
> -static bool CLOGPagePrecedes(int page1, int page2);
> +static bool CLOGPagePrecedes(uint64 page1, uint64 page2);
>
> You are changing the API from signed to unsigned.  Is this intentional?
> It is not explained anywhere.  Are we sure that nothing uses, for
> example, negative values as error markers?
>
Initially, we've made SLRU pages to be int64, and reworked them into uint64
as per Andres Freund's proposal. It is not necessary for a 64xid patchset
though as maximum page number is at least several (>2) times less than the
maximum 64bit xid value. So we've returned them to be signed int64. I don't
see the reason why make SRLU unsigned for introducing 64bit xids.


>   #define SlruFileName(ctl, path, seg) \
> -   snprintf(path, MAXPGPATH, "%s/%04X", (ctl)->Dir, seg)
> +   snprintf(path, MAXPGPATH, "%s/%04X%08X", (ctl)->Dir, \
> +(uint32) ((seg) >> 32), (uint32) ((seg) &
> (uint64)0x))

What's going on here?  Some explanation?  Why not use something like
> %llX or whatever you need?
>
Of course, this should be simplified as %012llX and we will do this in
later stage (in 0006 patch in 64xid thread) as this should be done together
with CLOG pg_upgrade. So we've returned this to the initial state in 0001.
Thanks for the notion!

+   uint64  segno = pageno / SLRU_PAGES_PER_SEGMENT;
> +   uint64  rpageno = pageno % SLRU_PAGES_PER_SEGMENT;
> [etc.]
>
> Not clear whether segno etc. need to be changed to 64 bits, or whether
> an increase of SLRU_PAGES_PER_SEGMENT should also be considered.
>
Yes, segno should be 64bits because even multiple of SLRU_PAGES_PER_SEGMENT
and CLOG_XACTS_PER_PAGE (and similar for commit_ts and mxact) is far less
than 2^32 and the overall length of clog/commit_ts/mxact is 64bit.


> -   if ((len == 4 || len == 5 || len == 6) &&
> +   if ((len == 12 || len == 13 || len == 14) &&
>
> This was horrible before, but maybe we can take this opportunity now to
> add a comment?
>
This should also be introduced later together with SlruFileName changes. So
we've removed this change from 0001. Later in 0006 we'll add this with
comments.

-   SlruFileName(ctl, path, ftag->segno);
> +   SlruFileName(ctl, path, (uint64) ftag->segno);
>
> Maybe ftag->segno should be changed to 64 bits as well?  Not clear.
>
Same as previous.


> There is a code comment at the definition of SLRU_PAGES_PER_SEGMENT that
> has some detailed explanations of how the SLRU numbering, SLRU file
> names, and transaction IDs tie together.  This doesn't seem to apply
> anymore after this change.
>
Same as previous.

The reference page of pg_resetwal contains various pieces of information
> of how to map SLRU files back to transaction IDs.  Please check if that
> is still all up to date.
>
Same as previous.

About v25-0002-Use-64-bit-format-to-output-XIDs.patch:
>
> I don't see the point of applying this now.  It doesn't make PG15 any
> better.  It's just a patch part of which we might need later.
> Especially the issue of how we are handwaving past the epoch-enabled
> output sites disturbs me.  At least those should be omitted from this
> patch, since this patch makes these more wrong, not more right for the
> future.

 psprintf("xmax %u equals or exceeds next valid transaction ID %u:%u",
> -  xmax,
> + psprintf("xmax %llu equals or exceeds next valid transaction ID %u:%llu",
> +  (unsigned long long) xmax,
> EpochFromFullTransactionId(ctx->next_fxid),
> -  XidFromFullTransactionId(ctx->next_fxid)));
> +  (unsigned long long) XidFromFullTransactionId(ctx->
> next_fxid)));


> This %u:%u business is basically an existing workaround for the lack of
> 64-bit transaction identifiers.  Presumably, when those are available,
> all of this will be replaced by a single number display, possibly a
> single %llu.  So these sites you change here will have to be touched
> again, and so changing this now doesn't make sense.  At least that's my
> guess.  Maybe there needs to be a fuller explanation of how this is
> meant to be transitioned.

Fixed, thanks.

An alternative we might want to consider is that we use PRId64 as
> explained here:
> <
> https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html>.
>
>   We'd have to check whether we still support any non-GNU gettext
> implementations and to what extent they support this.  But I think it's
> something to think about if we are going to embark on a journey of much
> more int64 printf output.


There were several other ways that have met opposition above in the thread.
I guess PRId64 will also be opposed as not portable enough. Personally, I
don't see much trouble when we cast the value to be printed to more wide
value and consider this the best choice of all that was discussed. We just
stick to a portable way of printing which was recommended by Tom and in
agreement with 1f8

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2022-03-24 Thread Pavel Borisov

Just forgot to mention that answers in a previous message are describing
the changes that are in v26.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: [PATCH] Accept IP addresses in server certificate SANs

2022-03-24 Thread Jacob Champion

On Thu, 2022-03-24 at 17:10 +0900, Kyotaro Horiguchi wrote:
> I'm fine with it. Thanks. I marked it as Ready-for-Commiter.

Thank you for the reviews and feedback!

--Jacob

Re: Reducing power consumption on idle servers

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
 wrote:
> The proposals of this patch are the following, each of which can be
> independently accepted/rejected:
> 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> (directly affects powersave)
> 2. deprecate promote_trigger_file, which then allows us to fix the
> sleep for startup process (directly affects powersave)
> 3. treat hibernation in all procs the same, for simplicity, and to
> make sure we don't forget one later
> 4. provide a design pattern for background worker extensions to
> follow, so as to encourage powersaving

Unfortunately, the patch isn't split in a way that corresponds to this
division. I think I'm +1 on change #2 -- deprecating
promote_trigger_file seems like a good idea to me independently of any
power-saving considerations. But I'm not sure that I am on board with
any of the other changes. I do agree with the basic goal of trying to
reduce unnecessary wakeups, but I think the rest of the patch is
taking a bit of a one-size-fits-all approach where I think that we
might want to be more nuanced. I think there are a couple of different
kinds of cases here.

In some places, like DetermineSleepTime(), we're already willing to
sleep for pretty long periods of time, like a minute. I think in those
cases we could consider doing nothing, on the theory that one wakeup
per minute is already not very much. If we do want to do something, I
think it should be in the direction of convincing ourselves that we
don't need a timeout at all. Having a timeout is a bit like insurance:
it guarantees that if for some reason the event by which we expect to
be awoken doesn't actually wake us up even though something meaningful
has happened, we won't hang forever. But if we think a wakeup per
minute is meaningful and we don't think there's any possibility of a
missed wakeup, let's just wait forever. I don't think we'll avoid many
user complaints by recovering from a missed wakeup in just under an
hour.

In other places, like WalWriterMain, we're basically polling. One
question in these kinds of cases is whether we can avoid polling in
favor of having some other process wake us up if the event that we
care about happens. This is unlikely to be practical in all cases -
e.g. we can't wait for a promotion trigger file to show up unless we
want to use inotify or something like that. However, we may be able to
avoid polling in some instances. When we can't, then I think it makes
sense to increase the wait time when the system appears to be idle. In
that subset of cases - we're polling and we can't avoid polling - this
kind of approach seems fairly reasonable.

I do have some concerns about the idea that the right strategy in
general, or even in particular cases, is to multiply by 50. This patch
hasn't invented that problem; it's already there. My concern is that
multiplying a small number by 50 seems very different than multiplying
a large number by 50. If we normally wait for 100ms before checking
for something to happen, and we wait for 5s, it's probably not going
to be a huge issue, because 5s is still a short amount of time for
human beings. If we normally wait for a minute before checking for
something to happen and we wait for 50 minutes, that's much more
likely to make a human being unhappy. Therefore, it's very unclear to
me that those cases should be treated the same way.

There's a more technical issue with this strategy, too: if we multiply
both short and long timeouts by 50, I think we are going to get pretty
much the same result as if we only multiply the short timeouts by 50.
Why even bother reducing the frequency of wakeups from minutes to
hours if we're elsewhere reducing the frequency from seconds to
minutes? If process A is still waking up every minute, putting process
B in the deep freeze isn't going to do a whole lot in terms of letting
the system go into any kind of deeper sleep.

All in all I feel that encouraging developers to adopt a
multiply-by-50 rule in all cases seems too simplistic to me. It may be
better than what we're doing right now, but it doesn't really seem
like the right policy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [RFC] building postgres with meson -v6

2022-03-24 Thread Andres Freund

Hi,

On 2022-03-24 16:16:15 +0100, Peter Eisentraut wrote:
> On 09.03.22 13:37, Peter Eisentraut wrote:
> > v6-0008-meson-prereq-Handle-DLSUFFIX-in-msvc-builds-simil.patch.gz
> > 
> > I think the right way here is actually to go the other way around:
> > Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
> > assignment from src/makefiles/ to src/templates/, have configure read
> > it, and then substitute it into Makefile.global and pg_config.h.
> > 
> > Then we also don't have to patch the Windows build code a bunch of
> > times to add the DLSUFFIX define everywhere.
> 
> This patch should do it.

> From 763943176a1e0a0c954414ba9da07742ad791656 Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Thu, 24 Mar 2022 16:00:54 +0100
> Subject: [PATCH] Refactor DLSUFFIX handling
> 
> Move DLSUFFIX into header files for all platforms.  Move the DLSUFFIX
> assignment from src/makefiles/ to src/templates/, have configure read
> it, and then substitute it into Makefile.global and pg_config.h.  This
> avoids the need of all users to locally set CPPFLAGS.

Reading through it, this looks good to me. Thanks!

Greetings,

Andres Freund

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Japin Li



On Thu, 24 Mar 2022 at 22:32, Alvaro Herrera  wrote:
> On 2022-Mar-24, Japin Li wrote:
>
>> I want to know why we do not use the following style?
>>
>> +static inline Datum
>> +heap_getattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
>> +{
>> +if (attnum > 0)
>> +{
>> +if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
>> +return getmissingattr(tupleDesc, attnum, isnull);
>> +else
>> +return fastgetattr(tup, attnum, tupleDesc, isnull);
>> +}
>> +
>> +return heap_getsysattr(tup, attnum, tupleDesc, isnull);
>> +}
>
> That was the first thing I wrote, but I can't get myself to like it.
> For this one function the code flow is obvious enough; but if you apply
> the same idea to fastgetattr(), the result is not nice at all.
>
> If there are enough votes for doing it this way, I can do that.
>
> I guess we could do something like this instead, which seems somewhat
> less bad:
>
> if (attnum <= 0)
>   return heap_getsysattr(...)
> if (likely(attnum <= HeapTupleHeaderGetNattrs(...)))
>   return fastgetattr(...)
>
> return getmissingattr(...)
>
> but I still prefer the one in the v2 patch I posted.
>
> It's annoying that case 0 (InvalidAttrNumber) is not well handled
> anywhere.

Thanks for your detail explaination.  I find bottomup_sort_and_shrink_cmp()
has smilar code

static int
bottomup_sort_and_shrink_cmp(const void *arg1, const void *arg2)
{
const IndexDeleteCounts *group1 = (const IndexDeleteCounts *) arg1;
const IndexDeleteCounts *group2 = (const IndexDeleteCounts *) arg2;

[...]

pg_unreachable();

return 0;
}

IIUC, the last statement is used to keep the compiler quiet.  However,
it doesn't exist in LWLockAttemptLock().  Why?

The difference between bottomup_sort_and_shrink_cmp() and LWLockAttemptlock()
is that LWLockAttemptlock() always returned before pg_unreachable(), however,
bottomup_sort_and_shrink_cmp() might be returned after pg_unreachable(), which
isn't expected.


--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 14:56, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > OK, I see what you mean. There are 2 types of transaction, one that
> > reads inside the transaction, one that decides what value to use some
> > other way.
>
> > So now we have 2 cases, both of which generate uniqueness violations,
> > but only one of which might succeed if retried. The patch does cover
> > this, I guess, by saying be careful, but I would be happier if we can
> > also add
>
> > "this is thought to occur only with multiple unique constraints and/or
> > an exclusion constraints"
>
> Um, what's that got to do with it?  The example in
> read-write-unique-4.spec involves only a single pkey constraint.

Yes, but as you explained, its not actually a serializable case, it
just looks a bit like one.

That means we are not currently aware of any case where the situation
is serializable but the error message is uniqueness violation, unless
we have 2 or more unique constraints and/or an exclusion constraint.

> We could add something trying to explain that if the application inserts a
> value into a constrained column based on data it read earlier, then any
> resulting constraint violation might be effectively a serialization
> failure.

We could do that as well.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Parameter for planner estimate of recursive queries

2022-03-24 Thread Tom Lane

Simon Riggs  writes:
> On Wed, 23 Mar 2022 at 20:25, Tom Lane  wrote:
>> Do you have any objection if I rename the GUC to
>> recursive_worktable_factor?  That seems a bit clearer as to what
>> it does, and it leaves more room for other knobs in the same area
>> if we decide we need any.

> None, I think your proposal is a better name.

Pushed that way.

regards, tom lane

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 1:29 AM Dilip Kumar  wrote:
> In the latest version I have fixed this issue by using a non
> conflicting name, because when it was compiled with-icu the foobar5
> was already used and we were seeing failure.  Apart from this I have
> fixed the duplicate cleanup problem by passing an extra parameter to
> RelationCreateStorage, which decides whether to register for on-abort
> delete or not and added the comments for the same.  IMHO this looks
> the most cleaner way to do it, please check the patch and let me know
> your thoughts.

I think that might be an OK way to do it. I think if we were starting
from scratch we'd probably want to come up with some better system,
but that's true of a lot of things.

I went over your version and changed some comments. I also added
documentation for the new wait event. Here's a new version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v6-0001-Add-new-block-by-block-strategy-for-CREATE-DATABA.patch
Description: Binary data

Re: Reducing power consumption on idle servers

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 15:39, Robert Haas  wrote:
>
> On Thu, Mar 24, 2022 at 6:59 AM Simon Riggs
>  wrote:
> > The proposals of this patch are the following, each of which can be
> > independently accepted/rejected:
> > 1. fix the sleep pattern of bgwriter, walwriter and logical worker
> > (directly affects powersave)
> > 2. deprecate promote_trigger_file, which then allows us to fix the
> > sleep for startup process (directly affects powersave)
> > 3. treat hibernation in all procs the same, for simplicity, and to
> > make sure we don't forget one later
> > 4. provide a design pattern for background worker extensions to
> > follow, so as to encourage powersaving
>
> Unfortunately, the patch isn't split in a way that corresponds to this
> division. I think I'm +1 on change #2 -- deprecating
> promote_trigger_file seems like a good idea to me independently of any
> power-saving considerations. But I'm not sure that I am on board with
> any of the other changes.

OK, so change (2) is good. Separate patch attached.

> I do agree with the basic goal of trying to
> reduce unnecessary wakeups, but I think the rest of the patch is
> taking a bit of a one-size-fits-all approach where I think that we
> might want to be more nuanced. I think there are a couple of different
> kinds of cases here.

OK, so you disagree with (3) and probably (4). No problem.

What about (1)? That directly affects the powersave capability. I
didn't read anything specific to that.

If we don't fix (1) as well, the changes for startup and walreceiver
will be ineffective for powersaving.

What changes will be acceptable for bgwriter, walwriter and logical worker?

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hibernate_startup.v5.patch
Description: Binary data

Re: Parameter for planner estimate of recursive queries

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 15:48, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Wed, 23 Mar 2022 at 20:25, Tom Lane  wrote:
> >> Do you have any objection if I rename the GUC to
> >> recursive_worktable_factor?  That seems a bit clearer as to what
> >> it does, and it leaves more room for other knobs in the same area
> >> if we decide we need any.
>
> > None, I think your proposal is a better name.
>
> Pushed that way.

Ok, thanks.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2022-03-24 Thread Dilip Kumar

On Thu, Mar 24, 2022 at 9:29 PM Robert Haas  wrote:
>
> On Thu, Mar 24, 2022 at 1:29 AM Dilip Kumar  wrote:
> > In the latest version I have fixed this issue by using a non
> > conflicting name, because when it was compiled with-icu the foobar5
> > was already used and we were seeing failure.  Apart from this I have
> > fixed the duplicate cleanup problem by passing an extra parameter to
> > RelationCreateStorage, which decides whether to register for on-abort
> > delete or not and added the comments for the same.  IMHO this looks
> > the most cleaner way to do it, please check the patch and let me know
> > your thoughts.
>
> I think that might be an OK way to do it. I think if we were starting
> from scratch we'd probably want to come up with some better system,
> but that's true of a lot of things.

Right.

> I went over your version and changed some comments. I also added
> documentation for the new wait event. Here's a new version.
>

Thanks, I have gone through your changes in comments and docs and those LGTM.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Reducing power consumption on idle servers

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 12:02 PM Simon Riggs
 wrote:
> What about (1)? That directly affects the powersave capability. I
> didn't read anything specific to that.
>
> If we don't fix (1) as well, the changes for startup and walreceiver
> will be ineffective for powersaving.
>
> What changes will be acceptable for bgwriter, walwriter and logical worker?

Hmm, I think it would be fine to introduce some kind of hibernation
mechanism for logical workers. bgwriter and wal writer already have a
hibernation mechanism, so I'm not sure what your concern is there
exactly. In your initial email you said you weren't proposing changes
there, but maybe that changed somewhere in the course of the
subsequent discussion. If you could catch me up on your present
thinking that would be helpful.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Tom Lane

Simon Riggs  writes:
> On Thu, 24 Mar 2022 at 14:56, Tom Lane  wrote:
>> Um, what's that got to do with it?  The example in
>> read-write-unique-4.spec involves only a single pkey constraint.

> Yes, but as you explained, its not actually a serializable case, it
> just looks a bit like one.

> That means we are not currently aware of any case where the situation
> is serializable but the error message is uniqueness violation, unless
> we have 2 or more unique constraints and/or an exclusion constraint.

Meh.  I'm disinclined to document it at that level of detail, both
because it's subject to change and because we're not sure that that
list is exhaustive.  I think a bit of handwaving is preferable.
How about the attached?  (Only the third new para is different.)

regards, tom lane

diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index da07f3f6c6..176325247d 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -588,7 +588,7 @@ ERROR:  could not serialize access due to concurrent update
 applications using this level must
 be prepared to retry transactions due to serialization failures.
 In fact, this isolation level works exactly the same as Repeatable
-Read except that it monitors for conditions which could make
+Read except that it also monitors for conditions which could make
 execution of a concurrent set of serializable transactions behave
 in a manner inconsistent with all possible serial (one at a time)
 executions of those transactions.  This monitoring does not
@@ -1720,6 +1720,71 @@ SELECT pg_advisory_lock(q.id) FROM

   
 
+  
+   Serialization Failure Handling
+
+   
+serialization failure
+   
+   
+retryable error
+   
+
+   
+Both Repeatable Read and Serializable isolation levels can produce
+errors that are designed to prevent serialization anomalies.  As
+previously stated, applications using these levels must be prepared to
+retry transactions that fail due to serialization errors.  Such an
+error's message text will vary according to the precise circumstances,
+but it will always have the SQLSTATE code 40001
+(serialization_failure).
+   
+
+   
+It may also be advisable to retry deadlock failures.
+These have the SQLSTATE code 40P01
+(deadlock_detected).
+   
+
+   
+In some cases it is also appropriate to retry unique-key failures,
+which have SQLSTATE code 23505
+(unique_violation), and exclusion constraint
+failures, which have SQLSTATE code 23P01
+(exclusion_violation).  For example, if the
+application selects a new value for a primary key column after
+inspecting the currently stored keys, it could get a unique-key
+failure because another application instance selected the same new key
+concurrently.  This is effectively a serialization failure, but the
+server will not detect it as such because it cannot see
+the connection between the inserted value and the previous reads.
+There are also some corner cases in which the server will issue a
+unique-key or exclusion constraint error even though in principle it
+has enough information to determine that a serialization problem
+exists.  While it's recommendable to just
+retry 40001 errors unconditionally, more care is
+needed when retrying these other error codes, since they might
+represent persistent error conditions rather than transient failures.
+   
+
+   
+It is important to retry the complete transaction, including all logic
+that decides which SQL to issue and/or which values to use.
+Therefore, PostgreSQL does not offer an
+automatic retry facility, since it cannot do so with any guarantee of
+correctness.
+   
+
+   
+Transaction retry does not guarantee that the retried transaction will
+complete; multiple retries may be needed.  In cases with very high
+contention, it is possible that completion of a transaction may take
+many attempts.  In cases involving a conflicting prepared transaction,
+it may not be possible to make progress until the prepared transaction
+commits or rolls back.
+   
+  
+
   
Caveats

Re: Column Filtering in Logical Replication

2022-03-24 Thread Peter Eisentraut



On 17.03.22 20:11, Tomas Vondra wrote:

But the comment describes the error for the whole block, which looks
like this:

-- error: replica identity "a" not included in the column list
ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (b, c);
UPDATE testpub_tbl5 SET a = 1;
ERROR:  cannot update table "testpub_tbl5"
DETAIL:  Column list used by the publication does not cover the replica
identity.

So IMHO the comment is correct.


Ok, that makes sense.  I read all the comments in the test file again. 
There were a couple that I think could use tweaking; see attached file. 
The ones with "???" didn't make sense to me:  The first one is before a 
command that doesn't seem to change anything, the second one I didn't 
understand the meaning.  Please take a look.


(The patch is actually based on your 20220318c patch, but I'm adding it 
here since we have the discussion here.)From 2e6352791e5418bb0726a051660d44311046fc28 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 24 Mar 2022 17:30:32 +0100
Subject: [PATCH] fixup! Allow specifying column lists for logical replication

---
 src/test/regress/sql/publication.sql | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/test/regress/sql/publication.sql 
b/src/test/regress/sql/publication.sql
index aeb1b572af..d50052ef9d 100644
--- a/src/test/regress/sql/publication.sql
+++ b/src/test/regress/sql/publication.sql
@@ -399,14 +399,14 @@ CREATE TABLE testpub_tbl5 (a int PRIMARY KEY, b text, c 
text,
 -- ok
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c);
 ALTER TABLE testpub_tbl5 DROP COLUMN c;-- no dice
--- ok: for insert-only publication, the column list is arbitrary
+-- ok: for insert-only publication, any column list is acceptable
 ALTER PUBLICATION testpub_fortable_insert ADD TABLE testpub_tbl5 (b, c);
 
 /* not all replica identities are good enough */
 CREATE UNIQUE INDEX testpub_tbl5_b_key ON testpub_tbl5 (b, c);
 ALTER TABLE testpub_tbl5 ALTER b SET NOT NULL, ALTER c SET NOT NULL;
 ALTER TABLE testpub_tbl5 REPLICA IDENTITY USING INDEX testpub_tbl5_b_key;
--- error: replica identity (b,c) is covered by column list (a, c)
+-- error: replica identity (b,c) is not covered by column list (a, c)
 UPDATE testpub_tbl5 SET a = 1;
 ALTER PUBLICATION testpub_fortable DROP TABLE testpub_tbl5;
 
@@ -423,7 +423,7 @@ CREATE PUBLICATION testpub_table_ins WITH (publish = 
'insert, truncate');
 ALTER PUBLICATION testpub_table_ins ADD TABLE testpub_tbl5 (a);
-- ok
 \dRp+ testpub_table_ins
 
--- with REPLICA IDENTITY FULL, column lists are not allowed
+-- tests with REPLICA IDENTITY FULL
 CREATE TABLE testpub_tbl6 (a int, b text, c text);
 ALTER TABLE testpub_tbl6 REPLICA IDENTITY FULL;
 
@@ -434,11 +434,11 @@ CREATE TABLE testpub_tbl6 (a int, b text, c text);
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl6; -- ok
 UPDATE testpub_tbl6 SET a = 1;
 
--- make sure changing the column list is updated in SET TABLE
+-- make sure changing the column list is updated in SET TABLE ???
 CREATE TABLE testpub_tbl7 (a int primary key, b text, c text);
 ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl7 (a, b);
 \d+ testpub_tbl7
--- ok: we'll skip this table
+-- ok: we'll skip this table ???
 ALTER PUBLICATION testpub_fortable SET TABLE testpub_tbl7 (a, b);
 \d+ testpub_tbl7
 -- ok: update the column list
-- 
2.35.1

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Simon Riggs

On Thu, 24 Mar 2022 at 16:29, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Thu, 24 Mar 2022 at 14:56, Tom Lane  wrote:
> >> Um, what's that got to do with it?  The example in
> >> read-write-unique-4.spec involves only a single pkey constraint.
>
> > Yes, but as you explained, its not actually a serializable case, it
> > just looks a bit like one.
>
> > That means we are not currently aware of any case where the situation
> > is serializable but the error message is uniqueness violation, unless
> > we have 2 or more unique constraints and/or an exclusion constraint.
>
> Meh.  I'm disinclined to document it at that level of detail, both
> because it's subject to change and because we're not sure that that
> list is exhaustive.  I think a bit of handwaving is preferable.
> How about the attached?  (Only the third new para is different.)

It's much better, thanks.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Peter Eisentraut


On 24.03.22 13:09, Michael Paquier wrote:

Hmm.  I think that you'd better add a return at the end of each
function?  Some compilers are dumb in detecting that all the code
paths return (aka recent d0083c1) and could generate warnings, even if
things are coded to return all the time, like in your patch.


That is a different case.  We know that not all compilers understand 
when elog/ereport return.  But no compiler is stupid enough not to 
understand that


foo()
{
if (something)
return this;
else
return that;
}

always reaches a return.

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Peter Eisentraut


On 24.03.22 15:32, Alvaro Herrera wrote:

+static inline Datum
+heap_getattr(HeapTuple tup, int attnum, TupleDesc tupleDesc, bool *isnull)
+{
+   if (attnum > 0)
+   {
+   if (attnum > (int) HeapTupleHeaderGetNatts(tup->t_data))
+   return getmissingattr(tupleDesc, attnum, isnull);
+   else
+   return fastgetattr(tup, attnum, tupleDesc, isnull);
+   }
+
+   return heap_getsysattr(tup, attnum, tupleDesc, isnull);
+}

That was the first thing I wrote, but I can't get myself to like it.
For this one function the code flow is obvious enough; but if you apply
the same idea to fastgetattr(), the result is not nice at all.


I like your first patch.  That is more of a functional style, whereas 
the above is more of a procedural style.

Re: role self-revocation

2022-03-24 Thread Robert Haas

On Fri, Mar 11, 2022 at 11:51 AM Robert Haas  wrote:
> On Fri, Mar 11, 2022 at 11:34 AM Tom Lane  wrote:
> > Note that either case would also involve making entries in pg_shdepend;
> > although for the case of roles owned by/granted to the bootstrap
> > superuser, we could omit those on the usual grounds that we don't need
> > to record dependencies on pinned objects.
>
> That makes sense to me, but it still doesn't solve the problem of
> agreeing on role ownership vs. WITH ADMIN OPTION vs. something else.

Notwithstanding the lack of agreement on that point, I believe that
what we should do for v15 is remove the session user
self-administration exception. We have pretty much established that it
was originally introduced in error. It later was found to be a
security vulnerability, and that resulted in the exception being
narrowed without removing it altogether. While there are differences
of opinion on what the larger plan here ought to be, nobody's proposal
plan involves retaining that exception. Neither has anyone offered a
plausible use case for the current behavior, so there's no reason to
think that removing it would break anything.

However, it might. And if it does, I think it would be best if
removing that exception were the *only* change in this area made by
that release. If for v16 or v17 or v23 we implement Plan Tom or Plan
Stephen or Plan Robert or something else, and along the way we remove
that self-administration exception, we're going to have a real fire
drill if it turns out that the self-administration exception was
important for some reason we're not seeing right now. If, on the other
hand, we remove that exception in v15, then if anything breaks, it'll
be a lot easier to deal with. Worst case scenario we just revert the
removal of that exception, which will be a very localized change if
nothing else has been done that depends heavily on its having been
removed.

So I propose to commit something like what I posted here:

http://postgr.es/m/ca+tgmobgek0jraowqvpqhsxcfbdfitxsomoebhmmmhmj4gl...@mail.gmail.com

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: turn fastgetattr and heap_getattr to inline functions

2022-03-24 Thread Alvaro Herrera

On 2022-Mar-24, Peter Eisentraut wrote:

> But no compiler is stupid enough not to understand that
> 
> foo()
> {
> if (something)
> return this;
> else
> return that;
> }
> 
> always reaches a return.

We have a number of examples of this pattern, so I guess it must be
true.  Pushed without the pg_unreachables, then.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes"   (Germán Poo)

Re: role self-revocation

2022-03-24 Thread Tom Lane

Robert Haas  writes:
> Notwithstanding the lack of agreement on that point, I believe that
> what we should do for v15 is remove the session user
> self-administration exception. We have pretty much established that it
> was originally introduced in error.

Agreed.

> However, it might. And if it does, I think it would be best if
> removing that exception were the *only* change in this area made by
> that release.

Good idea, especially since it's getting to be too late to consider
anything more invasive anyway.

> So I propose to commit something like what I posted here:
> http://postgr.es/m/ca+tgmobgek0jraowqvpqhsxcfbdfitxsomoebhmmmhmj4gl...@mail.gmail.com

+1, although the comments might need some more work.  In particular,
I'm not sure that this bit is well stated:

+* A role cannot have WITH ADMIN OPTION on itself, because that would
+* imply a membership loop.

We already do consider a role to be a member of itself:

regression=# create role r;
CREATE ROLE
regression=# grant r to r;
ERROR:  role "r" is a member of role "r"
regression=# grant r to r with admin option;
ERROR:  role "r" is a member of role "r"

It might be better to just say "By policy, a role cannot have WITH ADMIN
OPTION on itself".  But if you want to write a defense of that policy,
this isn't a very good one.

regards, tom lane

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-24 Thread Robert Haas

On Wed, Mar 23, 2022 at 6:28 PM Peter Geoghegan  wrote:
> It would be great if you could take a look v11-0002-*, Robert. Does it
> make sense to you?

You're probably not going to love hearing this, but I think you're
still explaining things here in ways that are too baroque and hard to
follow. I do think it's probably better. But, for example, in the
commit message for 0001, I think you could change the subject line to
"Allow non-aggressive vacuums to advance relfrozenxid" and it would be
clearer. And then I think you could eliminate about half of the first
paragraph, starting with "There is no fixed relationship", and all of
the third paragraph (which starts with "Later work..."), and I think
removing all that material would make it strictly more clear than it
is currently. I don't think it's the place of a commit message to
speculate too much on future directions or to wax eloquent on
theoretical points. If that belongs anywhere, it's in a mailing list
discussion.

It seems to me that 0002 mixes code movement with functional changes.
I'm completely on board with moving the code that decides how much to
skip into a function. That seems like a great idea, and probably
overdue. But it is not easy for me to see what has changed
functionally between the old and new code organization, and I bet it
would be possible to split this into two patches, one of which creates
a function, and the other of which fixes the problem, and I think that
would be a useful service to future readers of the code. I have a hard
time believing that if someone in the future bisects a problem back to
this commit, they're going to have an easy time finding the behavior
change in here. In fact I can't see it myself. I think the actual
functional change is to fix what is described in the second paragraph
of the commit message, but I haven't been able to figure out where the
logic is actually changing to address that. Note that I would be happy
with the behavior change happening either before or after the code
reorganization.

I also think that the commit message for 0002 is probably longer and
more complex than is really helpful, and that the subject line is too
vague, but since I don't yet understand exactly what's happening here,
I cannot comment on how I think it should be revised at this point,
except to say that the second paragraph of that commit message looks
like the most useful part.

I would also like to mention a few things that I do like about 0002.
One is that it seems to collapse two different pieces of logic for
page skipping into one. That seems good. As mentioned, it's especially
good because that logic is abstracted into a function. Also, it looks
like it is making a pretty localized change to one (1) aspect of what
VACUUM does -- and I definitely prefer patches that change only one
thing at a time.

Hope that's helpful.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: shared-memory based stats collector - v66

2022-03-24 Thread Melanie Plageman

On Thu, Mar 17, 2022 at 3:36 AM Andres Freund  wrote:
>
> The biggest todos are:
> - Address all the remaining AFIXMEs and XXXs

Attached is a patch that addresses three of the existing AFIXMEs.
From 2a975cdb5d10ec365ca2ced39b9f99a9385b6268 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 23 Mar 2022 19:43:12 -0400
Subject: [PATCH] Address 3 AFIXMEs

- reset timestamp callback
- schedule stat internl warn and reset if exists
- pending delete callback
---
 src/backend/utils/activity/pgstat.c   | 117 +-
 src/backend/utils/activity/pgstat_database.c  |   7 ++
 src/backend/utils/activity/pgstat_relation.c  |  16 +++
 .../utils/activity/pgstat_subscription.c  |   7 ++
 src/include/utils/pgstat_internal.h   |  16 +++
 5 files changed, 105 insertions(+), 58 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 7f86bc29ee..9841447a8b 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -154,10 +154,8 @@ static void pgstat_shared_refs_release_all(void);
 static void pgstat_perform_drop(xl_xact_stats_item *drop);
 static bool pgstat_drop_stats_entry(dshash_seq_status *hstat);
 
-static void pgstat_shared_reset_timestamp(PgStatKind kind, PgStatShm_StatEntryHeader *header);
 static void pgstat_reset_all_stats(TimestampTz ts);
 
-static void pgstat_pending_delete(PgStatSharedRef *shared_ref);
 static bool pgstat_pending_flush_stats(bool nowait);
 
 static PgStatKind pgstat_kind_for(char *kind_str);
@@ -281,6 +279,8 @@ static const PgStatKindInfo pgstat_kind_infos[PGSTAT_KIND_LAST + 1] = {
 		.pending_size = sizeof(PgStat_StatDBEntry),
 
 		.flush_pending_cb = pgstat_flush_database,
+		.pending_delete_cb = pgstat_pending_delete,
+		.reset_timestamp_cb = pgstat_database_reset_timestamp,
 	},
 
 	[PGSTAT_KIND_TABLE] = {
@@ -294,6 +294,8 @@ static const PgStatKindInfo pgstat_kind_infos[PGSTAT_KIND_LAST + 1] = {
 		.pending_size = sizeof(PgStat_TableStatus),
 
 		.flush_pending_cb = pgstat_flush_relation,
+		.pending_delete_cb = pgstat_relation_pending_delete,
+		.reset_timestamp_cb = pgstat_shared_reset_timestamp_noop,
 	},
 
 	[PGSTAT_KIND_FUNCTION] = {
@@ -307,6 +309,8 @@ static const PgStatKindInfo pgstat_kind_infos[PGSTAT_KIND_LAST + 1] = {
 		.pending_size = sizeof(PgStat_BackendFunctionEntry),
 
 		.flush_pending_cb = pgstat_flush_function,
+		.pending_delete_cb = pgstat_pending_delete,
+		.reset_timestamp_cb = pgstat_shared_reset_timestamp_noop,
 	},
 
 	[PGSTAT_KIND_SUBSCRIPTION] = {
@@ -322,6 +326,8 @@ static const PgStatKindInfo pgstat_kind_infos[PGSTAT_KIND_LAST + 1] = {
 		.pending_size = sizeof(PgStat_BackendSubEntry),
 
 		.flush_pending_cb = pgstat_flush_subscription,
+		.pending_delete_cb = pgstat_pending_delete,
+		.reset_timestamp_cb = pgstat_subscription_reset_timestamp,
 	},
 
 
@@ -1040,12 +1046,18 @@ pgstat_schedule_stat_internal(PgStatKind kind, Oid dboid, Oid objoid, bool is_cr
 void
 pgstat_schedule_stat_create(PgStatKind kind, Oid dboid, Oid objoid)
 {
-	pgstat_schedule_stat_internal(kind, dboid, objoid, /* create */ true);
+	if (pgstat_shared_ref_get(kind, dboid, objoid, false, NULL))
+	{
+		Oid msg_oid = (kind == PGSTAT_KIND_DB) ? dboid : objoid;
 
-	/*
-	 * AFIXME: It would be a good idea to check if an object with that key
-	 * already exists. WARN if so, and reset the stats to 0.
-	 */
+		ereport(WARNING,
+errmsg("Resetting existing stats for %s with OID %d.",
+	(pgstat_kind_info_for(kind))->name, msg_oid));
+
+		pgstat_reset_one(kind, dboid, objoid);
+	}
+
+	pgstat_schedule_stat_internal(kind, dboid, objoid, /* create */ true);
 }
 
 /*
@@ -2178,6 +2190,28 @@ pgstat_shared_refs_release_all(void)
  * 
  */
 
+/*
+ * Delete pending stats for variable number stats in the general case. Some
+ * types of stats require additional steps and have dedicated callbacks.
+ */
+void
+pgstat_pending_delete(PgStatSharedRef *shared_ref)
+{
+	void	   *pending_data = shared_ref->pending;
+	PgStatKind kind = shared_ref->shared_entry->key.kind;
+
+	Assert(pending_data != NULL);
+	Assert(!pgstat_kind_info_for(kind)->fixed_amount);
+
+	/* PGSTAT_KIND_TABLE has its own callback */
+	Assert(kind != PGSTAT_KIND_TABLE);
+
+	pfree(pending_data);
+	shared_ref->pending = NULL;
+
+	dlist_delete(&shared_ref->pending_node);
+}
+
 /*
  * Returns the appropriate PgStatSharedRef, preparing it to receive pending
  * stats if not already done.
@@ -,38 +2256,6 @@ pgstat_pending_fetch(PgStatKind kind, Oid dboid, Oid objoid)
 	return shared_ref;
 }
 
-static void
-pgstat_pending_delete(PgStatSharedRef *shared_ref)
-{
-	void	   *pending_data = shared_ref->pending;
-	PgStatKind	kind;
-
-	Assert(pending_data != NULL);
-
-	/* AFIXME: Move into a PgStatKindInfo callback */
-	kind = shared_ref->shared_entry->key.kind;
-	switch (kind)
-	{
-		case PGSTAT_KIND_TABLE:
-			pgstat_relation_unlink(((PgStat_TableStatus *) pending_data)

Re: role self-revocation

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 1:10 PM Tom Lane  wrote:
> > However, it might. And if it does, I think it would be best if
> > removing that exception were the *only* change in this area made by
> > that release.
>
> Good idea, especially since it's getting to be too late to consider
> anything more invasive anyway.

I'd say it's definitely too late at this point.

> > So I propose to commit something like what I posted here:
> > http://postgr.es/m/ca+tgmobgek0jraowqvpqhsxcfbdfitxsomoebhmmmhmj4gl...@mail.gmail.com
>
> +1, although the comments might need some more work.  In particular,
> I'm not sure that this bit is well stated:
>
> +* A role cannot have WITH ADMIN OPTION on itself, because that would
> +* imply a membership loop.
>
> We already do consider a role to be a member of itself:
>
> regression=# create role r;
> CREATE ROLE
> regression=# grant r to r;
> ERROR:  role "r" is a member of role "r"
> regression=# grant r to r with admin option;
> ERROR:  role "r" is a member of role "r"
>
> It might be better to just say "By policy, a role cannot have WITH ADMIN
> OPTION on itself".  But if you want to write a defense of that policy,
> this isn't a very good one.

That sentence is present in the current code, along with a bunch of
other sentences, which the patch renders irrelevant. So I just deleted
all of the other stuff and kept the sentence that is still relevant to
the revised code. I think your proposed replacement is an improvement,
but let's be careful not to get sucked into too much of a wordsmithing
exercise in a patch that's here to make a functional change.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Documenting when to retry on serialization failure

2022-03-24 Thread Tom Lane

Simon Riggs  writes:
> On Thu, 24 Mar 2022 at 16:29, Tom Lane  wrote:
>> How about the attached?  (Only the third new para is different.)

> It's much better, thanks.

Pushed then.

regards, tom lane

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-24 Thread Jacob Champion

On Wed, 2022-03-23 at 16:54 -0700, Andres Freund wrote:
> On 2022-03-23 23:06:14 +, Jacob Champion wrote:
> > On Wed, 2022-03-23 at 19:00 -0400, Tom Lane wrote:
> > > Hm.  I was more envisioning getting the "sharable" info out of Port
> > > entirely, although I'm not quite sure where it should go instead.
> > 
> > If it helps, I can move the substruct out and up to a new global struct
> > (MyProcShared?). At this point I think it's mostly search-and-replace.
> 
> Perhaps alongside CurrentUserId etc in miscinit.c?  It would be nicer if all
> those were together in a struct, but oh well.

Next draft in v7. My naming choices probably make even less sense now. Any 
ideas for names for "a bag of stuff that we want parallel workers to have too"?

> Another option would be to make it a GUC. With a bit of care it could be
> automatically synced by the existing parallelism infrastructure...

Like a write-once, PGC_INTERNAL setting? I guess I don't have any
intuition on how that would compare to the separate-global-and-accessor 
approach. Is the primary advantage that you don't have to maintain the
serialization logic, or is there more to it?

Thanks,
--Jacob
commit e6eca817f3cc359fff762600ad286d92046ba07d
Author: Jacob Champion 
Date:   Thu Mar 24 10:00:30 2022 -0700

squash! Allow parallel workers to use pg_session_authn_id()

Move SharedPort out of Port and over to miscinit.c.

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index dda2aab7b1..c88eab0933 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -478,79 +478,6 @@ InitializeParallelDSM(ParallelContext *pcxt)
MemoryContextSwitchTo(oldcontext);
 }
 
-/*
- * Calculate the space needed to serialize MyProcPort->shared.
- */
-Size
-EstimateSharedPortSpace(void)
-{
-   SharedPort *shared = &MyProcPort->shared;
-   Sizesize = 1;
-
-   if (shared->authn_id)
-   size = add_size(size, strlen(shared->authn_id) + 1);
-
-   return size;
-}
-
-/*
- * Serialize MyProcPort->shared for use by parallel workers.
- */
-void
-SerializeSharedPort(Size maxsize, char *start_address)
-{
-   SharedPort *shared = &MyProcPort->shared;
-
-   /*
-* First byte is an indication of whether or not authn_id has been set 
to
-* non-NULL, to differentiate that case from the empty string.
-*/
-   Assert(maxsize > 0);
-   start_address[0] = shared->authn_id ? 1 : 0;
-   start_address++;
-   maxsize--;
-
-   if (shared->authn_id)
-   {
-   Size len;
-
-   len = strlcpy(start_address, shared->authn_id, maxsize) + 1;
-   Assert(len <= maxsize);
-   maxsize -= len;
-   start_address += len;
-   }
-}
-
-/*
- * Restore MyProcPort->shared from its serialized representation, allocating
- * MyProcPort if necessary.
- */
-void
-RestoreSharedPort(char *sharedport)
-{
-   /* First make sure we have a place to put the information. */
-   if (!MyProcPort)
-   {
-   if (!(MyProcPort = calloc(1, sizeof(Port
-   ereport(FATAL,
-   (errcode(ERRCODE_OUT_OF_MEMORY),
-errmsg("out of memory")));
-   }
-
-   if (sharedport[0] == 0)
-   {
-   MyProcPort->shared.authn_id = NULL;
-   sharedport++;
-   }
-   else
-   {
-   sharedport++;
-   MyProcPort->shared.authn_id = 
MemoryContextStrdup(TopMemoryContext,
-   
  sharedport);
-   sharedport += strlen(sharedport) + 1;
-   }
-}
-
 /*
  * Reinitialize the dynamic shared memory segment for a parallel context such
  * that we could launch workers for it again.
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 40384a31b0..bceda9755a 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -342,15 +342,15 @@ auth_failed(Port *port, int status, const char *logdetail)
  * authorization will fail later.
  *
  * The provided string will be copied into TopMemoryContext, to match the
- * lifetime of the Port, so it is safe to pass a string that is managed by an
- * external library.
+ * lifetime of MyProcShared, so it is safe to pass a string that is managed by
+ * an external library.
  */
 static void
 set_authn_id(Port *port, const char *id)
 {
Assert(id);
 
-   if (port->shared.authn_id)
+   if (MyProcShared.authn_id)
{
/*
 * An existing authn_id should never be overwritten; that means 
two
@@ -361,17 +361,17 @@ set_authn_id(Port *port, const char *id)
ereport(FATAL,
(errmsg("authentication identifier set more 
than once"),
 e

Re: Support logical replication of DDLs

2022-03-24 Thread Zheng Li

Hi Dilip,

Thanks for the feedback.

> > > > The table creation WAL and table insert WAL are available. The tricky
> > > > part is how do we break down this command into two parts (a normal
> > > > CREATE TABLE followed by insertions) either from the parsetree or the
> > > > WALs. I’ll have to dig more on this.

> > I had put some more thought about this, basically, during CTAS we are
> > generating the CreateStmt inside "create_ctas_internal" and executing
> > it first before inserting the tuple, so can't we generate the
> > independent sql just for creating the tuple maybe using deparsing or
> > something?

Yes, deparsing might help for edge cases like this. However I found
a simple solution for this specific case:

The idea is to force skipping any direct data population (which can
potentially cause data inconsistency on the subscriber)
in CREATE AS and SELECT INTO command on the subscriber by forcing the
skipData flag in the intoClause of the parsetree after
the logical replication worker parses the command. The data sync will
be taken care of by the DML replication after the DDL replication
finishes.

This is implemented in the latest commit:
https://github.com/zli236/postgres/commit/116c33451da8d44577b8d6fdb05c4b6998cd0167

> > Apart from that I have one more question, basically if you are
> > directly logging the sql query then how you are identifying under
> > which schema you need to create that table, are you changing the sql
> > and generating schema-qualified name?
>
> I was going through the patch and it seems you are logging the search
> path as well along with the query so I think this will probably work.

Yes, currently we log the search path as well as the user name. And we
enforce the same search path and user name when applying the DDL command
on the subscriber.

> I have got one more query while looking into the code.  In the below
> code snippet you are logging DDL command only if it is a top level
> query but there are no comments explaining what sort of queries we
> don't want to log.  Suppose I am executing a DDL statement inside a PL
> then that will not be a top level statement so is your intention to
> block that as well or that is an unintentional side effect?
>
> +/*
> + * Consider logging the DDL command if logical logging is
> enabled and this is
> + * a top level query.
> + */
> +if (XLogLogicalInfoActive() && isTopLevel)
> +LogLogicalDDLCommand(parsetree, queryString);

Good catch. The reason for having isTopLevel in the condition is
because I haven't decided if a DDL statement inside a PL should
be replicated from the user point of view. For example, if I execute a
plpgsql function or a stored procedure which creates a table under the hood,
does it always make sense to replicate the DDL without running the same
function or stored procedure on the subscriber? It probably depends on
the specific
use case. Maybe we can consider making this behavior configurable by the user.

Thanks,
Zheng

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-24 Thread Robert Haas

On Wed, Mar 2, 2022 at 4:27 PM Andres Freund  wrote:
> I don't think we should commit this without synchronizing the authn between
> worker / leader (in a separate commit). Too likely that some function that's
> marked parallel ok queries the authn_id, opening up a security/monitoring hole
> or such because of a bogus return value.

It is not free to copy data from the leader to the worker. I don't
think we should just adopt a policy of copying everything anyone
thinks of, because then most of the time we'll be copying a bunch of
stuff that really isn't needed.

My gut reaction is to think that this is way too marginal to be worth
making parallel-safe, but it is also possible that I just don't know
enough to understand its true value.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

2022-03-24 Thread Tom Lane

Tatsuo Ishii  writes:
>> My hoary animal prairiedog doesn't like this [1]:

> My machine (Ubuntu 20) did not complain either. Maybe perl version
> difference?  Any way, the fix pushed. Let's see how prairiedog feels.

Still not happy.  After some digging in man pages, I believe the
problem is that its old version of Perl does not understand "\gN"
backreferences.  Is there a good reason to be using that rather
than the traditional "\N" backref notation?

regards, tom lane

Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-03-24 Thread Andres Freund

Hi,

On 2022-03-24 13:55:29 -0400, Robert Haas wrote:
> On Wed, Mar 2, 2022 at 4:27 PM Andres Freund  wrote:
> > I don't think we should commit this without synchronizing the authn between
> > worker / leader (in a separate commit). Too likely that some function that's
> > marked parallel ok queries the authn_id, opening up a security/monitoring 
> > hole
> > or such because of a bogus return value.
>
> It is not free to copy data from the leader to the worker. I don't
> think we should just adopt a policy of copying everything anyone
> thinks of, because then most of the time we'll be copying a bunch of
> stuff that really isn't needed.

I agree.

> My gut reaction is to think that this is way too marginal to be worth
> making parallel-safe, but it is also possible that I just don't know
> enough to understand its true value.

My problem with that is that as far as I can tell the only real use of the
field / function is for stuff like audit logging, RLS etc. Which seems
problematic for two reasons:

1) It's likely that the call to the function is nested into other functions,
   "hiding" the parallel safety. Then it'd return bogus data silently. At
   the very least we need to make it error out if called in a parallel worker.
2) If used for the purposes above, there's basically no parallelism possible
   anymore.

Greetings,

Andres Freund

Doc patch: replace 'salesmen' with 'salespeople'

2022-03-24 Thread Dagfinn Ilmari Mannsåker

Hi Hackers,

I just spotted an unnecessarily gendered example involving a 'salesmen'
table in the UPDATE docs. Here's a patch that changes that to
'salespeople'.

- ilmari

>From fde378ccd44c15f827a3c22630265f477d70d748 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 24 Mar 2022 18:21:48 +
Subject: [PATCH] doc: replace 'salesmen' with 'salespeople'

---
 doc/src/sgml/ref/update.sgml | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 3a0285df79..a1fc4bbb4a 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -387,21 +387,21 @@
 
   
Update contact names in an accounts table to match the currently assigned
-   salesmen:
+   salespeople:
 
 UPDATE accounts SET (contact_first_name, contact_last_name) =
-(SELECT first_name, last_name FROM salesmen
- WHERE salesmen.id = accounts.sales_id);
+(SELECT first_name, last_name FROM salespeople
+ WHERE salespeople.id = accounts.sales_id);
 
A similar result could be accomplished with a join:
 
 UPDATE accounts SET contact_first_name = first_name,
 contact_last_name = last_name
-  FROM salesmen WHERE salesmen.id = accounts.sales_id;
+  FROM salespeople WHERE salespeople.id = accounts.sales_id;
 
However, the second query may give unexpected results
-   if salesmen.id is not a unique key, whereas
-   the first query is guaranteed to raise an error if there are multiple
+   if salespeople.id is not a unique key,
+   whereas the first query is guaranteed to raise an error if there are multiple
id matches.  Also, if there is no match for a particular
accounts.sales_id entry, the first query
will set the corresponding name fields to NULL, whereas the second query
-- 
2.30.2

Re: pg_walinspect - a new extension to get raw WAL data and WAL stats

2022-03-24 Thread Andres Freund

Hi,

On 2022-03-24 15:02:29 +0530, Bharath Rupireddy wrote:
> On Thu, Mar 24, 2022 at 10:22 AM Kyotaro Horiguchi
> > This doesn't seem to be a part of xlogreader.  Couldn't we add a new
> > module "xlogstats"?  XLogRecGetBlockRefInfo also doesn't seem to me as
> > a part of xlogreader, the xlogstats looks like a better place.
> 
> I'm not sure if it's worth adding new files xlogstats.h/.c just for 2
> structures, 1 macro, and 2 functions with no plan to add new stats
> structures or functions. Since xlogreader is the one that reads the
> WAL, and is being included by both backend and other modules (tools
> and extensions) IMO it's the right place. However, I can specify in
> xlogreader that if at all the new stats related structures or
> functions are going to be added, it's good to move them into a new
> header and .c file.

I don't like that location for XLogRecGetBlockRefInfo(). How about putting it
in xlogdesc.c - that kind of fits?

And what do you think about creating src/backend/access/rmgrdesc/stats.c for
XLogRecStoreStats()? It's not a perfect location, but not too bad either.

XLogRecGetLen() would be ok in xlogreader, but stats.c also would work?

Greetings,

Andres Freund

Re: Race conditions in 019_replslot_limit.pl

2022-03-24 Thread Tom Lane

I just noticed something very interesting: in a couple of recent
buildfarm runs with this failure, the pg_stat_activity printout
no longer shows the extra walsender:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2022-03-24%2017%3A50%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2022-03-23%2011%3A00%3A05

This is just two of the 33 such failures in the past ten days,
so maybe it's not surprising that we didn't see it already.
(I got bored before looking back further than that.)

What this suggests to me is that maybe the extra walsender is
indeed not blocked on anything, but is just taking its time
about exiting.  In these two runs, as well as in all the
non-failing runs, it had enough time to do so.

I suggest that we add a couple-of-seconds sleep in front of
the query that collects walsender PIDs, and maybe a couple more
seconds before the pg_stat_activity probe in the failure path,
and see if the behavior changes at all.  That should be enough
to confirm or disprove this idea pretty quickly.  If it is
right, a permanent fix could be to wait for the basebackup's
walsender to disappear from node_primary3's pg_stat_activity
before we start the one for node_standby3.

regards, tom lane

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-24 Thread Andrew Dunstan



On 3/24/22 12:49, Mark Dilger wrote:
>
>> On Mar 17, 2022, at 8:41 AM, Andrew Dunstan  wrote:
>>
>> If we abandoned that for this form of GRANT/REVOKE I think we could
>> probably get away with
>>
>>
>> GRANT { SET | ALTER SYSTEM } ON setting_name ...
>>
>>
>> I haven't tried it, so I could be all wrong.
> Version 12 of the patch uses SET and ALTER SYSTEM as the names of the 
> privileges, and PARAMETER as the name of the thing on which the privilege is 
> granted.  The catalog table which tracks these grants is now named 
> pg_parameter_acl, and various other parts of the patch have been adjusted to 
> use a "parameter" based, rather than a "setting" based, naming scheme.  One 
> exception to this rule is the "setacl" column in pg_parameter_acl, which is 
> much more compact than the "parameteracl" name would be, so that remains 
> under the old name.


I can live with it I guess, but it seems perverse to me to have
pg_settings but pg_paramater_acl effectively referring to the same set
of things. If we're going to do this perhaps we should create a
pg_parameters view which is identical to pg_settings and deprecate
pg_settings. I don;t want to hold up this patch, I think this can
probably be managed as a follow up item.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-24 Thread Peter Geoghegan

On Thu, Mar 24, 2022 at 10:21 AM Robert Haas  wrote:
> You're probably not going to love hearing this, but I think you're
> still explaining things here in ways that are too baroque and hard to
> follow. I do think it's probably better.

There are a lot of dimensions to this work. It's hard to know which to
emphasize here.

> But, for example, in the
> commit message for 0001, I think you could change the subject line to
> "Allow non-aggressive vacuums to advance relfrozenxid" and it would be
> clearer.

But non-aggressive VACUUMs have always been able to do that.

How about: "Set relfrozenxid to oldest extant XID seen by VACUUM"

> And then I think you could eliminate about half of the first
> paragraph, starting with "There is no fixed relationship", and all of
> the third paragraph (which starts with "Later work..."), and I think
> removing all that material would make it strictly more clear than it
> is currently. I don't think it's the place of a commit message to
> speculate too much on future directions or to wax eloquent on
> theoretical points. If that belongs anywhere, it's in a mailing list
> discussion.

Okay, I'll do that.

> It seems to me that 0002 mixes code movement with functional changes.

Believe it or not, I avoided functional changes in 0002 -- at least in
one important sense. That's why you had difficulty spotting any. This
must sound peculiar, since the commit message very clearly says that
the commit avoids a problem seen only in the non-aggressive case. It's
really quite subtle.

You wrote this comment and code block (which I propose to remove in
0002), so clearly you already understand the race condition that I'm
concerned with here:

-   if (skipping_blocks && blkno < rel_pages - 1)
-   {
-   /*
-* Tricky, tricky.  If this is in aggressive vacuum, the page
-* must have been all-frozen at the time we checked whether it
-* was skippable, but it might not be any more.  We must be
-* careful to count it as a skipped all-frozen page in that
-* case, or else we'll think we can't update relfrozenxid and
-* relminmxid.  If it's not an aggressive vacuum, we don't
-* know whether it was initially all-frozen, so we have to
-* recheck.
-*/
-   if (vacrel->aggressive ||
-   VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
-   vacrel->frozenskipped_pages++;
-   continue;
-   }

What you're saying here boils down to this: it doesn't matter what the
visibility map would say right this microsecond (in the aggressive
case) were we to call VM_ALL_FROZEN(): we know for sure that the VM
said that this page was all-frozen *in the recent past*. That's good
enough; we will never fail to scan a page that might have an XID <
OldestXmin (ditto for XMIDs) this way, which is all that really
matters.

This is absolutely mandatory in the aggressive case, because otherwise
relfrozenxid advancement might be seen as unsafe. My observation is:
Why should we accept the same race in the non-aggressive case? Why not
do essentially the same thing in every VACUUM?

In 0002 we now track if each range that we actually chose to skip had
any all-visible (not all-frozen) pages -- if that happens then
relfrozenxid advancement becomes unsafe. The existing code uses
"vacrel->aggressive" as a proxy for the same condition -- the existing
code reasons based on what the visibility map must have said about the
page in the recent past. Which makes sense, but only works in the
aggressive case. The approach taken in 0002 also makes the code
simpler, which is what enabled putting the VM skipping code into its
own helper function, but that was just a bonus.

And so you could almost say that there is now behavioral change at
all. We're skipping pages in the same way, based on the same
information (from the visibility map) as before. We're just being a
bit more careful than before about how that information is tracked, to
avoid this race. A race that we always avoided in the aggressive case
is now consistently avoided.

> I'm completely on board with moving the code that decides how much to
> skip into a function. That seems like a great idea, and probably
> overdue. But it is not easy for me to see what has changed
> functionally between the old and new code organization, and I bet it
> would be possible to split this into two patches, one of which creates
> a function, and the other of which fixes the problem, and I think that
> would be a useful service to future readers of the code.

It seems kinda tricky to split up 0002 like that. It's possible, but
I'm not sure if it's possible to split it in a way that highlights the
issue that I just described. Because we already avoided the race in
the aggressive case.

> I also think that the commit message for 0002 is probably longer and
> more complex than is rea

Re: Corruption during WAL replay

2022-03-24 Thread Robert Haas

On Thu, Mar 17, 2022 at 9:21 PM Kyotaro Horiguchi
 wrote:
> Finally, no two of from 10 to 14 doesn't accept the same patch.
>
> As a cross-version check, I compared all combinations of the patches
> for two adjacent versions and confirmed that no hunks are lost.
>
> All versions pass check world.

Thanks, committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ExecRTCheckPerms() and many prunable partitions

2022-03-24 Thread David Rowley

On Wed, 23 Mar 2022 at 20:03, Amit Langote  wrote:
>
> On Mon, Mar 14, 2022 at 4:36 PM Amit Langote  wrote:
> > Also needed fixes when rebasing.
>
> Needed another rebase.

I had a look at the v10-0001 patch. I agree that it seems to be a good
idea to separate out the required permission checks rather than having
the Bitmapset to index the interesting range table entries.

One thing that I could just not determine from looking at the patch
was if there's meant to be just 1 RelPermissionInfo per RTE or per rel
Oid. None of the comments helped me understand this and
MergeRelPermissionInfos() seems to exist that appears to try and
uniquify this to some extent.  I just see no code that does this
process for a single query level. I've provided more detail on this in
#5 below.

Here's my complete review of v10-0001:

1. ExecCheckPermisssions -> ExecCheckPermissions

2. I think you'll want to move the userid field away from below a
comment that claims the following fields are for foreign tables only.

  /* Information about foreign tables and foreign joins */
  Oid serverid; /* identifies server for the table or join */
- Oid userid; /* identifies user to check access as */
+ Oid userid; /* identifies user to check access as; set
+ * in non-foreign table relations too! */

3. This should use READ_OID_FIELD()

READ_INT_FIELD(checkAsUser);

and this one:

READ_INT_FIELD(relid);

4.  This looks wrong:

- rel->userid = rte->checkAsUser;
+ if (rte->rtekind == RTE_RELATION)
+ {
+ /* otherrels use the root parent's value. */
+ rel->userid = parent ? parent->userid :
+ GetRelPermissionInfo(root->parse->relpermlist,
+ rte, false)->checkAsUser;
+ }

If 'parent' is false then you'll assign the result of
GetRelPermissionInfo (a RelPermissionInfo *) to an Oid.

5. I'm not sure if there's a case that can break this one, but I'm not
very confident that there's not one:

I'm not sure I agree with how you've coded GetRelPermissionInfo().
You're searching for a RelPermissionInfo based on the table's Oid.  If
you have multiple RelPermissionInfos for the same Oid then this will
just find the first one and return it, but that might not be the one
for the RangeTblEntry in question.

Here's an example of the sort of thing that could have problems with this:

postgres=# create role bob;
CREATE ROLE
postgres=# create table ab (a int, b int);
CREATE TABLE
postgres=# grant all (a) on table ab to bob;
GRANT
postgres=# set session authorization bob;
SET
postgres=> update ab set a = (select b from ab);
ERROR:  permission denied for table ab

The patch does correctly ERROR out here on permission failure, but as
far as I can see, that's just due to the fact that we're checking the
permissions of all items in the PlannedStmt.relpermlist List.  If
there had been code that had tried to find the RelPermissionInfo based
on the relation's Oid then we'd have accidentally found that we only
need an UPDATE permission on (a).  SELECT on (b) wouldn't have been
checked.

As far as I can see, to fix that you'd either need to store the RTI of
the RelPermissionInfo and lookup based on that, or you'd have to
bms_union() all the columns and bitwise OR the required permissions
and ensure you only have 1 RelPermissionInfo per Oid.

The fact that I have two entries when I debug InitPlan() seems to
disagree with what the comment in AddRelPermissionInfo() is claiming
should happen:

/*
* To prevent duplicate entries for a given relation, check if already in
* the list.
*/

I'm not clear on if the list is meant to be unique on Oid or not.

6. acesss?

- * Set flags and access permissions.
+ * Set flags and initialize acesss permissions.

7. I was expecting to see an |= here:

+ /* "merge" proprties. */
+ dest_perminfo->inh = src_perminfo->inh;

Why is a plain assignment ok?

8. Some indentation problems here:

@@ -3170,6 +3148,8 @@ rewriteTargetView(Query *parsetree, Relation view)

  base_rt_index = rtr->rtindex;
  base_rte = rt_fetch(base_rt_index, viewquery->rtable);
+base_perminfo = GetRelPermissionInfo(viewquery->relpermlist, base_rte,
+ false);

9. You can use foreach_current_index(lc) + 1 in:

+ i = 0;
+ foreach(lc, relpermlist)
+ {
+ perminfo = (RelPermissionInfo *) lfirst(lc);
+ if (perminfo->relid == rte->relid)
+ {
+ /* And set the index in RTE. */
+ rte->perminfoindex = i + 1;
+ return perminfo;
+ }
+ i++;
+ }

10. I think the double quote is not in the correct place in this comment:

  List*finalrtable; /* "flat" rangetable for executor */

+ List*finalrelpermlist; /* "flat list of RelPermissionInfo "*/

11. Looks like an accidental change:

+++ b/src/include/optimizer/planner.h
@@ -58,4 +58,5 @@ extern Path *get_cheapest_fractional_path(RelOptInfo *rel,

 extern Expr *preprocess_phv_expression(PlannerInfo *root, Expr *expr);

+

12. These need to be broken into multiple lines:

+extern RelPermissionInfo *AddRelPermissionInfo(List **relpermlist,
RangeTblEntry *rte);
+extern void MergeRelPermissionInfos(Query *dest_query, List *src_relpermlist);

Re: Granting SET and ALTER SYSTE privileges for GUCs

2022-03-24 Thread Mark Dilger




> On Mar 24, 2022, at 12:06 PM, Andrew Dunstan  wrote:
> 
> 
> On 3/24/22 12:49, Mark Dilger wrote:
>> 
>>> On Mar 17, 2022, at 8:41 AM, Andrew Dunstan  wrote:
>>> 
>>> If we abandoned that for this form of GRANT/REVOKE I think we could
>>> probably get away with
>>> 
>>> 
>>>GRANT { SET | ALTER SYSTEM } ON setting_name ...
>>> 
>>> 
>>> I haven't tried it, so I could be all wrong.
>> Version 12 of the patch uses SET and ALTER SYSTEM as the names of the 
>> privileges, and PARAMETER as the name of the thing on which the privilege is 
>> granted.  The catalog table which tracks these grants is now named 
>> pg_parameter_acl, and various other parts of the patch have been adjusted to 
>> use a "parameter" based, rather than a "setting" based, naming scheme.  One 
>> exception to this rule is the "setacl" column in pg_parameter_acl, which is 
>> much more compact than the "parameteracl" name would be, so that remains 
>> under the old name.
> 
> 
> I can live with it I guess, but it seems perverse to me to have
> pg_settings but pg_paramater_acl effectively referring to the same set
> of things. If we're going to do this perhaps we should create a
> pg_parameters view which is identical to pg_settings and deprecate
> pg_settings. I don;t want to hold up this patch, I think this can
> probably be managed as a follow up item.

Right, the version 12 patch was following Peter's and Tom's comments upthread:

> On Mar 17, 2022, at 7:47 AM, Tom Lane  wrote:
> 
> Well, I still say that "SET" is sufficient for the one privilege name
> (unless we really can't make Bison handle that, which I doubt).  But
> I'm willing to yield on using "ALTER SYSTEM" for the other.
> 
> If we go with s/SETTING/PARAMETER/ as per your other message, then
> that would be adequately consistent with the docs I think.  So it'd
> be
> 
> GRANT { SET | ALTER SYSTEM } ON PARAMETER foo TO ...
> 
> and the new catalog would be pg_parameter_acl, and so on.

We could debate that again, but it seems awfully late in the development cycle. 
 I'd rather just get this committed, barring any objections?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

identifying unrecognized node type errors

2022-03-24 Thread Andrew Dunstan


As I was tracking down some of these errors in the sql/json patches I
noticed that we have a whole lot of them in the code, so working out
which one has triggered an error is not as easy as it might be. ISTM we
could usefully prefix each such message with the name of the function in
which it occurs, along the lines of this fragment for nodeFuncs.c. Thoughts?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 6f6a1f4ffc..675789d104 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -270,7 +270,7 @@ exprType(const Node *expr)
 			type = exprType(((const JsonCoercion *) expr)->expr);
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
+			elog(ERROR, "exprType: unrecognized node type: %d", (int) nodeTag(expr));
 			type = InvalidOid;	/* keep compiler quiet */
 			break;
 	}
@@ -1017,7 +1017,7 @@ exprCollation(const Node *expr)
 			}
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
+			elog(ERROR, "exprCollation: unrecognized node type: %d", (int) nodeTag(expr));
 			coll = InvalidOid;	/* keep compiler quiet */
 			break;
 	}
@@ -1261,7 +1261,7 @@ exprSetCollation(Node *expr, Oid collation)
 			}
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d", (int) nodeTag(expr));
+			elog(ERROR, "exprSetCollation: unrecognized node type: %d", (int) nodeTag(expr));
 			break;
 	}
 }
@@ -2531,7 +2531,7 @@ expression_tree_walker(Node *node,
 			}
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d",
+			elog(ERROR, "expression_tree_walker: unrecognized node type: %d",
  (int) nodeTag(node));
 			break;
 	}
@@ -3588,7 +3588,7 @@ expression_tree_mutator(Node *node,
 			}
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d",
+			elog(ERROR, "expression_tree_mutator: unrecognized node type: %d",
  (int) nodeTag(node));
 			break;
 	}
@@ -4448,7 +4448,7 @@ raw_expression_tree_walker(Node *node,
 			}
 			break;
 		default:
-			elog(ERROR, "unrecognized node type: %d",
+			elog(ERROR, "raw_expression_tree_walker: unrecognized node type: %d",
  (int) nodeTag(node));
 			break;
 	}

Re: identifying unrecognized node type errors

2022-03-24 Thread David Rowley

On Fri, 25 Mar 2022 at 08:53, Andrew Dunstan  wrote:
> As I was tracking down some of these errors in the sql/json patches I
> noticed that we have a whole lot of them in the code, so working out
> which one has triggered an error is not as easy as it might be. ISTM we
> could usefully prefix each such message with the name of the function in
> which it occurs, along the lines of this fragment for nodeFuncs.c. Thoughts?

Can you not use \set VERBOSITY verbose ?

postgres=# \set VERBOSITY verbose
postgres=# select 1/0;
ERROR:  22012: division by zero
LOCATION:  int4div, int.c:846

David

Re: Small TAP tests cleanup for Windows and unused modules

2022-03-24 Thread Daniel Gustafsson

> On 18 Feb 2022, at 22:02, Daniel Gustafsson  wrote:

> .. let's drop the 0001 from this thread and just go ahead with 0002.

I applied the 0002 patch today, cleaning up the unused module imports.

--
Daniel Gustafsson   https://vmware.com/

Re: identifying unrecognized node type errors

2022-03-24 Thread Tom Lane

Andrew Dunstan  writes:
> As I was tracking down some of these errors in the sql/json patches I
> noticed that we have a whole lot of them in the code, so working out
> which one has triggered an error is not as easy as it might be. ISTM we
> could usefully prefix each such message with the name of the function in
> which it occurs, along the lines of this fragment for nodeFuncs.c. Thoughts?

-1.  You're reinventing the error location support that already exists
inside elog.  Just turn up log_error_verbosity instead.

regards, tom lane

Re: make MaxBackends available in _PG_init

2022-03-24 Thread Nathan Bossart

On Wed, Mar 23, 2022 at 09:03:18PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 23, 2022 at 08:32:39AM -0400, Robert Haas wrote:
>> Well, the conclusion upthread was that extensions might change the
>> values of those GUCs from _PG_init(). If that's a real thing, then
>> what you're asking for here is impossible, because the final value is
>> indeterminate until all such extensions have finished twiddling those
>> the GUCs. On the other hand, it's definitely intended that extensions
>> should RequestAddinShmemSpace() from _PG_init(), and wanting to size
>> that memory based on MaxBackends is totally reasonable. Do we need to
>> add another function, alongside _PG_init(), that gets called after
>> MaxBackends is determined and before it's too late to
>> RequestAddinShmemSpace()?
> 
> Yes, I don't see how we can support such extensions without an additional 
> hook,
> probably called right after InitializeMaxBackends() since at least
> InitializeShmemGUCs() will need to know about those extra memory needs.

Another possibility could be to add a hook that is called _before_
_PG_init() where libraries are permitted to adjust GUCs.  After the library
is loaded, we first call this _PG_change_GUCs() function, then we
initialize MaxBackends, and then we finally call _PG_init().  This way,
extensions would have access to MaxBackends within _PG_init(), and if an
extension really needed to alter GUCs, іt could define this new function.

> I'm not sure how to prevent third party code from messing with the gucs in it,
> but a clear indication in the hook comment should probably be enough.  It's 
> not
> like it's hard for third-party code author to break something anyway.

ERROR-ing in SetConfigOption() might be another way to dissuade folks from
messing with GUCs.  This is obviously not a perfect solution.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 3:28 PM Peter Geoghegan  wrote:
> But non-aggressive VACUUMs have always been able to do that.
>
> How about: "Set relfrozenxid to oldest extant XID seen by VACUUM"

Sure, that sounds nice.

> Believe it or not, I avoided functional changes in 0002 -- at least in
> one important sense. That's why you had difficulty spotting any. This
> must sound peculiar, since the commit message very clearly says that
> the commit avoids a problem seen only in the non-aggressive case. It's
> really quite subtle.

Well, I think the goal in revising the code is to be as un-subtle as
possible. Commits that people can't easily understand breed future
bugs.

> What you're saying here boils down to this: it doesn't matter what the
> visibility map would say right this microsecond (in the aggressive
> case) were we to call VM_ALL_FROZEN(): we know for sure that the VM
> said that this page was all-frozen *in the recent past*. That's good
> enough; we will never fail to scan a page that might have an XID <
> OldestXmin (ditto for XMIDs) this way, which is all that really
> matters.

Makes sense. So maybe the commit message should try to emphasize this
point e.g. "If a page is all-frozen at the time we check whether it
can be skipped, don't allow it to affect the relfrozenxmin and
relminmxid which we set for the relation. This was previously true for
aggressive vacuums, but not for non-aggressive vacuums, which was
inconsistent. (The reason this is a safe thing to do is that any new
XIDs or MXIDs that appear on the page after we initially observe it to
be frozen must be newer than any relfrozenxid or relminmxid the
current vacuum could possibly consider storing into pg_class.)"

> This is absolutely mandatory in the aggressive case, because otherwise
> relfrozenxid advancement might be seen as unsafe. My observation is:
> Why should we accept the same race in the non-aggressive case? Why not
> do essentially the same thing in every VACUUM?

Sure, that seems like a good idea. I think I basically agree with the
goals of the patch. My concern is just about making the changes
understandable to future readers. This area is notoriously subtle, and
people are going to introduce more bugs even if the comments and code
organization are fantastic.

> And so you could almost say that there is now behavioral change at
> all.

I vigorously object to this part, though. We should always err on the
side of saying that commits *do* have behavioral changes. We should go
out of our way to call out in the commit message any possible way that
someone might notice the difference between the post-commit situation
and the pre-commit situation. It is fine, even good, to also be clear
about how we're maintaining continuity and why we don't think it's a
problem, but the only commits that should be described as not having
any behavioral change are ones that do mechanical code movement, or
are just changing comments, or something like that.

> It seems kinda tricky to split up 0002 like that. It's possible, but
> I'm not sure if it's possible to split it in a way that highlights the
> issue that I just described. Because we already avoided the race in
> the aggressive case.

I do see that there are some difficulties there. I'm not sure what to
do about that. I think a sufficiently clear commit message could
possibly be enough, rather than trying to split the patch. But I also
think splitting the patch should be considered, if that can reasonably
be done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: make MaxBackends available in _PG_init

2022-03-24 Thread Robert Haas

On Thu, Mar 24, 2022 at 4:20 PM Nathan Bossart  wrote:
> Another possibility could be to add a hook that is called _before_
> _PG_init() where libraries are permitted to adjust GUCs.  After the library
> is loaded, we first call this _PG_change_GUCs() function, then we
> initialize MaxBackends, and then we finally call _PG_init().  This way,
> extensions would have access to MaxBackends within _PG_init(), and if an
> extension really needed to alter GUCs, іt could define this new function.

Yeah, I think this might be better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

1 2 3 >

1 - 100 of 202 matches

Mail list logo