date:20210727

On Thu, Jul 22, 2021 at 04:41:54AM -0500, Justin Pryzby wrote:
> It looks like one hunk was missing/uncommitted from the 0002 patch..

Okay, hearing nothing, I have looked again at 0001 and did some light
adjustments, mainly in the tests.  I did not spot any issues in the
patch, so that looks good to go for me.
--
Michael
From 19a658eaa3db26cc6fc47305740035d665648b68 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 27 Jul 2021 16:37:43 +0900
Subject: [PATCH] ALTER TABLE ... SET ACCESS METHOD

Adds support for changing the access method of a table with a
rewrite.

Author: Justin Pryzby, Jeff Davis
---
 src/include/commands/cluster.h  |  4 +-
 src/include/commands/event_trigger.h|  1 +
 src/include/nodes/parsenodes.h  |  1 +
 src/backend/commands/cluster.c  | 21 +---
 src/backend/commands/matview.c  |  5 +-
 src/backend/commands/tablecmds.c| 66 +++--
 src/backend/parser/gram.y   |  8 +++
 src/bin/psql/tab-complete.c | 11 -
 src/test/regress/expected/create_am.out | 34 +
 src/test/regress/sql/create_am.sql  | 17 +++
 doc/src/sgml/ref/alter_table.sgml   | 20 
 11 files changed, 173 insertions(+), 15 deletions(-)

diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..b64d3bc204 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -35,8 +35,8 @@ extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
 	   bool recheck, LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
-extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
-		  LOCKMODE lockmode);
+extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
+		  char relpersistence, LOCKMODE lockmode);
 extern void finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 			 bool is_system_catalog,
 			 bool swap_toast_by_content,
diff --git a/src/include/commands/event_trigger.h b/src/include/commands/event_trigger.h
index c11bf2d781..e765e67fd1 100644
--- a/src/include/commands/event_trigger.h
+++ b/src/include/commands/event_trigger.h
@@ -32,6 +32,7 @@ typedef struct EventTriggerData
 #define AT_REWRITE_ALTER_PERSISTENCE	0x01
 #define AT_REWRITE_DEFAULT_VAL			0x02
 #define AT_REWRITE_COLUMN_REWRITE		0x04
+#define AT_REWRITE_ACCESS_METHOD		0x08
 
 /*
  * EventTriggerData is the node type that is passed as fmgr "context" info
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 947660a4b0..e28248af32 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1901,6 +1901,7 @@ typedef enum AlterTableType
 	AT_SetLogged,/* SET LOGGED */
 	AT_SetUnLogged,/* SET UNLOGGED */
 	AT_DropOids,/* SET WITHOUT OIDS */
+	AT_SetAccessMethod,			/* SET ACCESS METHOD */
 	AT_SetTableSpace,			/* SET TABLESPACE */
 	AT_SetRelOptions,			/* SET (...) -- AM specific parameters */
 	AT_ResetRelOptions,			/* RESET (...) -- AM specific parameters */
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 6487a9e3fc..b3d8b6deb0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -576,6 +576,7 @@ static void
 rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 {
 	Oid			tableOid = RelationGetRelid(OldHeap);
+	Oid			accessMethod = OldHeap->rd_rel->relam;
 	Oid			tableSpace = OldHeap->rd_rel->reltablespace;
 	Oid			OIDNewHeap;
 	char		relpersistence;
@@ -597,6 +598,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 
 	/* Create the transient table that will receive the re-ordered data */
 	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   accessMethod,
 			   relpersistence,
 			   AccessExclusiveLock);
 
@@ -618,16 +620,16 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 /*
  * Create the transient table that will be filled with new data during
  * CLUSTER, ALTER TABLE, and similar operations.  The transient table
- * duplicates the logical structure of the OldHeap, but is placed in
- * NewTableSpace which might be different from OldHeap's.  Also, it's built
- * with the specified persistence, which might differ from the original's.
+ * duplicates the logical structure of the OldHeap; but will have the
+ * specified physical storage properties NewTableSpace, NewAccessMethod, and
+ * relpersistence.
  *
  * After this, the caller should load the new heap with transferred/modified
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
-			  LOCKMODE lockmode)
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
+			  char relpersistence, LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
 	char		NewHeapName[NAMEDATALEN];
@@ -686,7 +688,7 @@ make_new_heap(Oid OIDOl

Re: .ready and .done files considered harmful

2021-07-27 Thread Dipesh Pandit

> Some minor suggestions:
Thanks for your comments. I have incorporated the changes
and updated a new patch. Please find the attached patch v4.

Thanks,
Dipesh

On Mon, Jul 26, 2021 at 9:44 PM Bossart, Nathan  wrote:

> On 7/26/21, 6:31 AM, "Robert Haas"  wrote:
> > In terms of immediate next steps, I think we should focus on
> > eliminating the O(n^2) problem and not get sucked into a bigger
> > redesign. The patch on the table aims to do just that much and I think
> > that's a good thing.
>
> I agree.  I'll leave further discussion about a redesign for another
> thread.
>
> Nathan
>
>
From c597411b08377ea64634d5198071060ec2b9c524 Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Wed, 30 Jun 2021 14:05:58 +0530
Subject: [PATCH] mitigate directory scan for WAL archiver

WAL archiver scans the status directory to identify the next WAL file
that needs to be archived. This directory scan can be minimised by
maintaining the log segment number of current file which is being
archived and incrementing it by '1' to get the next WAL file.
Archiver can check the availability of next file and in case if the
file is not available then it should fall-back to directory scan to
get the oldest WAL file.

If there is a timeline switch then backend sends a notification to
archiver. Archiver registers the timeline switch and performs a full
directory scan to make sure that archiving history files takes
precedence over archiving WAL files
---
 src/backend/access/transam/xlog.c |   8 +++
 src/backend/postmaster/pgarch.c   | 138 ++
 src/include/postmaster/pgarch.h   |   1 +
 3 files changed, 135 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3479402..2580ce8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -50,6 +50,7 @@
 #include "port/atomics.h"
 #include "port/pg_iovec.h"
 #include "postmaster/bgwriter.h"
+#include "postmaster/pgarch.h"
 #include "postmaster/startup.h"
 #include "postmaster/walwriter.h"
 #include "replication/basebackup.h"
@@ -8130,6 +8131,13 @@ StartupXLOG(void)
 	WalSndWakeup();
 
 	/*
+	 * If archiver is active, send notification that timeline has switched.
+	 */
+	if (XLogArchivingActive() && ArchiveRecoveryRequested &&
+		IsUnderPostmaster)
+		PgArchNotifyTLISwitch();
+
+	/*
 	 * If this was a promotion, request an (online) checkpoint now. This isn't
 	 * required for consistency, but the last restartpoint might be far back,
 	 * and in case of a crash, recovering from it might take a longer than is
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 74a7d7c..161a5b6 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -90,16 +90,18 @@ static PgArchData *PgArch = NULL;
  * Flags set by interrupt handlers for later service in the main loop.
  */
 static volatile sig_atomic_t ready_to_stop = false;
+static volatile sig_atomic_t timeline_switch = false;
 
 /* --
  * Local function forward declarations
  * --
  */
 static void pgarch_waken_stop(SIGNAL_ARGS);
+static void pgarch_timeline_switch(SIGNAL_ARGS);
 static void pgarch_MainLoop(void);
 static void pgarch_ArchiverCopyLoop(void);
 static bool pgarch_archiveXlog(char *xlog);
-static bool pgarch_readyXlog(char *xlog);
+static bool pgarch_readyXlog(char *xlog, bool *dirScan, XLogSegNo *nextLogSegNo);
 static void pgarch_archiveDone(char *xlog);
 static void pgarch_die(int code, Datum arg);
 static void HandlePgArchInterrupts(void);
@@ -169,10 +171,11 @@ PgArchiverMain(void)
 {
 	/*
 	 * Ignore all signals usually bound to some action in the postmaster,
-	 * except for SIGHUP, SIGTERM, SIGUSR1, SIGUSR2, and SIGQUIT.
+	 * except for SIGHUP, SIGINT, SIGTERM, SIGUSR1, SIGUSR2, and SIGQUIT.
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SIG_IGN);
+	/* Archiver is notified by backend if there is a timeline switch */
+	pqsignal(SIGINT, pgarch_timeline_switch);
 	pqsignal(SIGTERM, SignalHandlerForShutdownRequest);
 	/* SIGQUIT handler was already set up by InitPostmasterChild */
 	pqsignal(SIGALRM, SIG_IGN);
@@ -221,6 +224,23 @@ PgArchWakeup(void)
 		SetLatch(&ProcGlobal->allProcs[arch_pgprocno].procLatch);
 }
 
+/*
+ * Called by backend process to notify a timeline switch.
+ */
+void
+PgArchNotifyTLISwitch(void)
+{
+	int			arch_pgprocno = PgArch->pgprocno;
+
+	if (arch_pgprocno != INVALID_PGPROCNO)
+	{
+		int		archiver_pid = ProcGlobal->allProcs[arch_pgprocno].pid;
+
+		if (kill(archiver_pid, SIGINT) < 0)
+			elog(ERROR, "could not notify timeline change to archiver: %m");
+	}
+}
+
 
 /* SIGUSR2 signal handler for archiver process */
 static void
@@ -236,6 +256,19 @@ pgarch_waken_stop(SIGNAL_ARGS)
 }
 
 /*
+ * Interrupt handler for handling a timeline switch
+ *
+ * A timeline switch has been notified, mark this event so that the next
+ * iteration of pgarch_ArchiverCopyLoop() archives the history file.

Re: row filtering for logical replication

2021-07-27 Thread Peter Smith

FYI - v19 --> v20

(Only very minimal changes. Nothing functional)

Changes:

* The v19 patch was broken due to changes of commit [1] so I have
rebased so the cfbot is happy.

* I also renamed the TAP test 021_row_filter.pl ==> 023_row_filter.pl
because commit [2] already added another TAP test numbered 021.

--
[1] 
https://github.com/postgres/postgres/commit/2b00db4fb0c7f02f000276bfadaab65a14059168
[2] 
https://github.com/postgres/postgres/commit/a8fd13cab0ba815e9925dc9676e6309f699b5f72

Kind Regards,
Peter Smith.
Fujitsu Australia


v20-0001-Row-filter-for-logical-replication.patch
Description: Binary data

Re: Why don't update minimum recovery point in xact_redo_abort





On 2021/07/27 2:38, 蔡梦娟(玊于) wrote:

Hi, all

Recently, I got a PANIC while restarts standby, which can be reproduced by the 
following steps, based on pg 11:
1. begin a transaction in primary node;
2. create a table in the transaction;
3. insert lots of data into the table;
4. do a checkpoint, and restart standby after checkpoint is done in primary 
node;
5. insert/update lots of data into the table again;
6. abort the transaction.


I could reproduce the issue by using the similar steps and
disabling full_page_writes, in the master branch.




after step 6, fast shutdown standby node, and then restart standby, you will 
get a PANIC log, and the backtrace is:
#0  0x7fc663e5a277 in raise () from /lib64/libc.so.6
#1  0x7fc663e5b968 in abort () from /lib64/libc.so.6
#2  0x00c89f01 in errfinish (dummy=0) at elog.c:707
#3  0x00c8cba3 in elog_finish (elevel=22, fmt=0xdccc18 "WAL contains 
references to invalid pages") at elog.c:1658
#4  0x005e476a in XLogCheckInvalidPages () at xlogutils.c:253
#5  0x005cbc1a in CheckRecoveryConsistency () at xlog.c:9477
#6  0x005ca5c5 in StartupXLOG () at xlog.c:8609
#7  0x00a025a5 in StartupProcessMain () at startup.c:274
#8  0x00643a5c in AuxiliaryProcessMain (argc=2, argv=0x7ffe4e4849a0) at 
bootstrap.c:485
#9  0x00a00620 in StartChildProcess (type=StartupProcess) at 
postmaster.c:6215
#10 0x009f92c6 in PostmasterMain (argc=3, argv=0x4126500) at 
postmaster.c:1506
#11 0x008eab64 in main (argc=3, argv=0x4126500) at main.c:232

I think the reason for the above error is as follows:
1. the transaction in primary node was aborted finally, the standby node also 
deleted the table files after replayed the xlog record, however, without 
updating minimum recovery point;
2. primary node did a checkpoint before abort, and then standby node is 
restarted, so standby node will recovery from a point where the table has 
already been created and data has been inserted into the table;
3. when standby node restarts after step 6, it will find the page needed during 
recovery doesn't exist, which has already been deleted by xact_redo_abort 
before, so standby node will treat this page as an invalid page;
4. xact_redo_abort drop relation files without updating minumum recovery point, 
before standby node replay the abort xlog record and forget invalid pages 
again, it will reach consistency because the abort xlogrecord lsn is greater 
than minrecoverypoint;
5. during checkRecoveryConsistency, it will check invalid pages, and find that 
there is invalid page, and the PANIC log will be generated.

So why don't update minimum recovery point in xact_redo_abort, just like 
XLogFlush in xact_redo_commit, in which way standby could reach consistency and 
check invalid pages after replayed the abort xlogrecord.


ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-27 Thread Amit Kapila

On Tue, Jul 27, 2021 at 11:28 AM Dilip Kumar  wrote:
>
> On Tue, Jul 27, 2021 at 10:44 AM Amit Kapila  wrote:
> >
> > On Mon, Jul 26, 2021 at 8:33 PM Robert Haas  wrote:
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } 
> > ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
> >
> > The point was what should we do if the user specifies the option for a
> > non-partitioned table. Do we just ignore it or give an error that this
> > is not a valid syntax/option when used with non-partitioned tables? I
> > find it slightly odd that this option works for partitioned tables but
> > gives an error for non-partitioned tables but maybe we can document
> > it.
>
> IMHO, for a non-partitioned table, we should be default allow the
> parallel safely checking so that users don't have to set it for
> individual tables, OTOH, I don't think that there is any point in
> blocking the syntax for the non-partitioned table, So I think for the
> non-partitioned table if the user hasn't set it we should do automatic
> safety checking and if the user has defined the safety externally then
> we should respect that.  And for the partitioned table, we will never
> do the automatic safety checking and we should always respect what the
> user has set.
>

This is exactly what I am saying. BTW, do you have any preference for
the syntax among (a) or (b)?

-- 
With Regards,
Amit Kapila.

Re: Automatic notification of top transaction IDs

2021-07-27 Thread Neil Chen

Greetings,

I simply tested it and it works well. But I got a compilation warning,
should we move the definition of function FullTransactionIdToStr to the
"transam.h"?

-- 
There is no royal road to learning.
HighGo Software Co.

Re: ATTACH PARTITION locking documentation for DEFAULT partitions

2021-07-27 Thread Matthias van de Meent

On Tue, 27 Jul 2021 at 08:02, David Rowley  wrote:\>
> On Tue, 13 Jul 2021 at 02:30, Matthias van de Meent
>  wrote:
> > The algoritm as described in your patch implies that this recursive
> > locking is conditional on _only_ the check-constraints of the topmost
> > partition ("performed whilst holding ... and all of its
> > sub-partitions, if any"), whereas actually the locking on each
> > (sub-)partition is determined by the constraints of the hierarchy down
> > to that child partition. It in actuality, this should not matter much,
> > but this is a meaningful distinction that I wanted to call out.
>
> I had in mind that was implied, but maybe it's better to be explicit about 
> that.
>
> I've adjusted the patch and attached what I came up with. Let me know
> what you think.

I like this improved wording. Thanks!

Kind regards,

Matthias van de Meent

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-07-27 Thread Gilles Darold

Le 26/07/2021 à 21:56, Tom Lane a écrit :
> Gilles Darold  writes:
>> [ v4-0001-regexp-foo-functions.patch ]
> I started to work through this and was distressed to realize that
> it's trying to redefine regexp_replace() in an incompatible way.
> We already have
>
> regression=# \df regexp_replace
>List of functions
>Schema   |  Name  | Result data type |  Argument data types   | 
> Type 
> ++--++--
>  pg_catalog | regexp_replace | text | text, text, text   | 
> func
>  pg_catalog | regexp_replace | text | text, text, text, text | 
> func
> (2 rows)
>
> The patch proposes to add (among other alternatives)
>
> +{ oid => '9608', descr => 'replace text using regexp',
> +  proname => 'regexp_replace', prorettype => 'text',
> +  proargtypes => 'text text text int4', prosrc => 
> 'textregexreplace_extended_no_occurrence' },
>
> which is going to be impossibly confusing for both humans and machines.
> I don't think we should go there.  Even if you managed to construct
> examples that didn't result in "ambiguous function" failures, that
> doesn't mean that ordinary mortals won't get bit that way.
>
> I'm inclined to just drop the regexp_replace additions.  I don't think
> that the extra parameters Oracle provides here are especially useful.
> They're definitely not useful enough to justify creating compatibility
> hazards for.


I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter. Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:


    regexp_replace(source, pattern, replacement [, flags ]);

    regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);


of course with only 3 parameters the two functions are the same.


What do you think about the renaming proposal instead of simply drop the
extended form of the function?


Best regards,


[1] https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302

[2] https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace


-- 
Gilles Darold
http://www.darold.net/

Re: pg_settings.pending_restart not set when line removed

> On 27 Jul 2021, at 01:02, Alvaro Herrera  wrote:

> I tried the attached patch, which sets GUC_PENDING_RESTART if we're
> doing pg_file_settings().  Then any subsequent read of pg_settings will
> have the pending_restart flag set.  This seems to work correctly, and
> consistently with the case where you change a line (without removing it)
> in unpatched master.

LGTM after testing this with various changes and ways to reload, and +1 for
being consistent with changing a line.

> You could argue that this is *weird* (why does reading pg_file_settings
> set a flag in global state?) ... but that weirdness is not something
> this patch is introducing.

Agreed.

Another unrelated weird issue is that we claim that the config file "contains
errors" if the context is < PGC_SIGHUP for restart required settings.  It seems
a bit misleading to call pending_restart an error since it implies (in my
reading) there were syntax errors.  But, unrelated to this patch and report
(and it's been like that for a long time), just hadn't noticed that before.

--
Daniel Gustafsson   https://vmware.com/

Re: Some code cleanup for pgbench and pg_verifybackup

2021-07-27 Thread Fabien COELHO



Hello,

I do not understand your disagreement. Do you disagree about the 
expected>> semantics of fatal? Then why provide fatal if it should not 
be used? What is the expected usage of fatal?


I disagree about the fact that pgbench uses pg_log_fatal() in ways
that other binaries don't do.


Sure. Then what should be the expected usage of fatal? Doc says:

  * Severe errors that cause program termination.  (One-shot programs may
  * chose to label even fatal errors as merely "errors".  The distinction
  * is up to the program.)

pgbench is consistent with the doc. I prefer fatal for this purpose to 
distinguish these clearly from recoverable errors, i.e. the programs goes 
on despite the error, or at least for some time. I think it is good to 
have such a distinction, and bgpench has many errors and many fatals, 
although maybe some error should be fatal and some fatal should be error…


For example, other things use pg_log_error() followed by an exit(), but 
not this code.


Sure.


I am not going to fight hard on that, though.


Me neither.

That's a set of inconsistences I bumped into while plugging in 
option_parse_int()


Which is a very good thing! I have already been bitten by atoi.

--
Fabien.

Small typo in variable.c

2021-07-27 Thread Daniel Westermann (DWE)

Hi,

there is a typo in variable.c.
Attached a small fix for this.

Regards
Danieldiff --git a/src/bin/psql/variables.c b/src/bin/psql/variables.c
index 92a34f870f..538b83ddd9 100644
--- a/src/bin/psql/variables.c
+++ b/src/bin/psql/variables.c
@@ -361,7 +361,7 @@ SetVariableHooks(VariableSpace space, const char *name,
 }
 
 /*
- * Return true iff the named variable has substitute and/or assign hook
+ * Return true if the named variable has substitute and/or assign hook
  * functions.
  */
 bool

Re: Showing applied extended statistics in explain

2021-07-27 Thread Dmitry Dolgov

> On Sat, Mar 27, 2021 at 01:50:54AM +0100, Tomas Vondra wrote:
> Hi,
>
> With extended statistics it may not be immediately obvious if they were
> applied and to which clauses. If you have multiple extended statistics,
> we may also apply them in different order, etc. And with expressions,
> there's also the question of matching expressions to the statistics.
>
> So it seems useful to include this into in the explain plan - show which
> statistics were applied, in which order. Attached is an early PoC patch
> doing that in VERBOSE mode. I'll add it to the next CF.

Hi,

sounds like a useful improvement indeed, thanks for the patch. Do you
plan to invest more time in it?

> 1) The information is stashed in multiple lists added to a Plan. Maybe
> there's a better place, and maybe we need to invent a better way to
> track the info (a new node stashed in a single List).
>
> ...
>
> 3) It does not work for functional dependencies, because we effectively
> "merge" all functional dependencies and apply the entries. Not sure how
> to display this, but I think it should show the individual dependencies
> actually applied.
>
> ...
>
> 5) It includes just statistics name + clauses, but maybe we should
> include additional info (e.g estimate for that combination of clauses).

Yes, a new node would be nice to have. The other questions above are
somewhat related to what it should contain, and I guess it depends on
the use case this patch targets. E.g. for the case "help to figure out
if an extended statistics was applied" even "merged" functional
dependencies would be fine I believe. More advanced plan troubleshooting
may benefit from estimates. What exactly use case do you have in mind?

> 4) The info is collected always, but I guess we should do that only when
> in explain mode. Not sure how expensive it is.

Maybe it's in fact not that expensive to always collect the info? Adding
it as it is now do not increase number of cache lines Plan structure
occupies (although it comes close to the boundary), and a couple of
simple tests with CPU bounded load of various types doesn't show
significant slowdown. I haven't tried the worst case scenario with a lot
of extended stats involved, but in such situations I can imagine the
overhead also could be rather small in comparison with other expenses.

I've got few more questions after reading the patch:

* Is there any particular reason behind choosing only some scan nodes to
  display extended stats? E.g. BitmapHeapScan is missing.

* StatisticExtInfo should have a _copy etc. node functionalty now,
  right? I've hit "unrecognized node type" with a prepared statement
  because it's missing.

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-27 Thread Greg Nancarrow

On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar  wrote:
>
> IMHO, for a non-partitioned table, we should be default allow the
> parallel safely checking so that users don't have to set it for
> individual tables, OTOH, I don't think that there is any point in
> blocking the syntax for the non-partitioned table, So I think for the
> non-partitioned table if the user hasn't set it we should do automatic
> safety checking and if the user has defined the safety externally then
> we should respect that.  And for the partitioned table, we will never
> do the automatic safety checking and we should always respect what the
> user has set.
>

Provided it is possible to distinguish between the default
parallel-safety (unsafe) and that default being explicitly specified
by the user, it should be OK.
In the case of performing the automatic parallel-safety checking and
the table using something that is parallel-unsafe, there will be a
performance degradation compared to the current code (hopefully only
small). That can be avoided by the user explicitly specifying that
it's parallel-unsafe.

Regards,
Greg Nancarrow
Fujitsu Australia

Re: Small typo in variable.c

On Tue, Jul 27, 2021 at 10:04:36AM +, Daniel Westermann (DWE) wrote:
> there is a typo in variable.c.
> Attached a small fix for this.

"iff" stands for "if and only if".
--
Michael


signature.asc
Description: PGP signature

Re: Why don't update minimum recovery point in xact_redo_abort

On Tue, Jul 27, 2021 at 05:26:05PM +0900, Fujii Masao wrote:
> ISTM that you're right. xact_redo_abort() should call XLogFlush() to
> update the minimum recovery point on truncation. This seems
> the oversight in commit 7bffc9b7bf.

Indeed.  It would be nice to see some refactoring of this code as
well?  Both share a lot of steps, so adding something to one path can
easily lead to the other path being forgotten.
--
Michael


signature.asc
Description: PGP signature

Re: Small typo in variable.c

2021-07-27 Thread Daniel Westermann (DWE)

>On Tue, Jul 27, 2021 at 10:04:36AM +, Daniel Westermann (DWE) wrote:
>> there is a typo in variable.c.
>> Attached a small fix for this.

>"iff" stands for "if and only if".

Ah, good to know. Thx

Regards
Daniel

Re: Some code cleanup for pgbench and pg_verifybackup

On Tue, Jul 27, 2021 at 11:45:07AM +0200, Fabien COELHO wrote:
> Sure. Then what should be the expected usage of fatal? Doc says:
> 
>   * Severe errors that cause program termination.  (One-shot programs may
>   * chose to label even fatal errors as merely "errors".  The distinction
>   * is up to the program.)
>
> pgbench is consistent with the doc. I prefer fatal for this purpose to
> distinguish these clearly from recoverable errors, i.e. the programs goes on
> despite the error, or at least for some time. I think it is good to have
> such a distinction, and bgpench has many errors and many fatals, although
> maybe some error should be fatal and some fatal should be error.

Hm?  Incorrect option values are recoverable errors, no?  The root
cause of the problem is the user.  One can note that pg_log_fatal() vs
pg_log_error() results in a score of 54 vs 50 in src/bin/pgbench/, so
I am not quite sure your last statement is true.

>> That's a set of inconsistences I bumped into while plugging in
>> option_parse_int()
> 
> Which is a very good thing! I have already been bitten by atoi.

By the way, if you can write a patch that makes use of strtodouble()
for the float option parsing in pgbench with the way you see things,
I'd welcome that.  This is a local change as only pgbench needs to
care about that.
--
Michael


signature.asc
Description: PGP signature

Re: Some code cleanup for pgbench and pg_verifybackup

> On 27 Jul 2021, at 01:53, Michael Paquier  wrote:

> ..and I also recall that this point has been discussed, where the conclusion
> was that the logging should never exit() directly.

That's a characteristic of the API which still holds IMO.  If we want
something, it's better to have an explicit exit function which takes a log
string than a log function that exits.

--
Daniel Gustafsson   https://vmware.com/

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-27 Thread Amit Kapila

On Tue, Jul 27, 2021 at 4:00 PM Greg Nancarrow  wrote:
>
> On Tue, Jul 27, 2021 at 3:58 PM Dilip Kumar  wrote:
> >
> > IMHO, for a non-partitioned table, we should be default allow the
> > parallel safely checking so that users don't have to set it for
> > individual tables, OTOH, I don't think that there is any point in
> > blocking the syntax for the non-partitioned table, So I think for the
> > non-partitioned table if the user hasn't set it we should do automatic
> > safety checking and if the user has defined the safety externally then
> > we should respect that.  And for the partitioned table, we will never
> > do the automatic safety checking and we should always respect what the
> > user has set.
> >
>
> Provided it is possible to distinguish between the default
> parallel-safety (unsafe) and that default being explicitly specified
> by the user, it should be OK.
>

Offhand, I don't see any problem with this. Do you have something
specific in mind?

> In the case of performing the automatic parallel-safety checking and
> the table using something that is parallel-unsafe, there will be a
> performance degradation compared to the current code (hopefully only
> small). That can be avoided by the user explicitly specifying that
> it's parallel-unsafe.
>

True, but I guess this should be largely addressed by caching the
value of parallel safety at the relation level. Sure, there will be
some cost the first time we compute it but on consecutive accesses, it
should be quite cheap.

-- 
With Regards,
Amit Kapila.

Re: needless complexity in StartupXLOG

On Mon, Jul 26, 2021 at 4:15 PM Stephen Frost  wrote:
> All I was really trying to point out above was that the comment might be
> improved upon, just so someone understands that we aren't doing a
> checkpoint at this particular place, but one will be done later due to
> the promotion.  Maybe I'm being a bit extra with that, but that was my
> thought when reading the code and the use of the promoted flag variable.

Yeah, I agree, it confused me too, at first.

> Yeah ... not to mention that it really is just incredibly dangerous to
> use such an approach for PITR.  For my 2c, I'd rather we figure out a
> way to prevent this than to imply that we support it when we have no way
> of knowing if we actually have replayed far enough to be consistent.
> That isn't to say that using snapshots for database backups isn't
> possible, but it should be done in-between pg_start/stop_backup calls
> which properly grab the returned info from those and store the backup
> label with the snapshot, etc.

My position on that is that I would not particularly recommend the
technique described here, but I would not choose to try to block it
either. That's an argument for another thread, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Reduce the number of special cases to build contrib modules on windows

On Thu, 15 Jul 2021 at 04:01, vignesh C  wrote:
> The patch does not apply on Head anymore, could you rebase and post a
> patch. I'm changing the status to "Waiting for Author".

I've rebased this patch and broken it down into 6 individual patches.

0001: Removes an include directory for dblink. This appears like it's
not needed. It was added in ee3b4188a (Jan 2010), but an earlier
commit, 320c7eb8c (June 2008) seems to have made it pointless. It's
still a mystery to me why ee3b4188a would have been required in the
first place.

0002: Parses -D in the CPPFLAGS of Makefiles and uses those in the
MSVC script.  It also adjusts the ltree contrib module so that we do
the same LOWER_NODE behaviour as we did before.  The MSVC scripts
appear to have mistakenly forgotten to define LOWER_NODE as it is in
the Makefiles.

0003: Is a tidy up patch to make the 'includes' field an array rather
than a string

0004: Adds code to check for duplicate references and libraries before
adding new ones of the same name to the project.

0005: Is mostly a tidy up so that we use AddFile consistently instead
of sometimes doing $self->{files}->{} = 1;

0006: I'm not so sure about. It attempts to do a bit more Makefile
parsing to get rid of contrib_extrasource and the majority of
contrib_uselibpgport and contrib_uselibpgcommon usages.

David

v9-0001-Remove-unneeded-include-directory-in-MSVC-scripts.patch
Description: Binary data

v9-0002-Adjust-MSVC-build-scripts-to-parse-Makefiles-for-.patch
Description: Binary data

v9-0003-Make-the-includes-field-an-array-in-MSVC-build-sc.patch
Description: Binary data

v9-0004-Don-t-duplicate-references-and-libraries-in-MSVC-.patch
Description: Binary data

v9-0005-Use-AddFile-consistently-in-MSVC-scripts.patch
Description: Binary data

v9-0006-Remove-some-special-cases-from-MSVC-build-scripts.patch
Description: Binary data

Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

2021-07-27 Thread Bruce Momjian

On Tue, Jul 27, 2021 at 09:25:22AM +0200, Drouvot, Bertrand wrote:
> Hi,
> 
> On 7/27/21 4:39 AM, Bruce Momjian wrote:
> > CAUTION: This email originated from outside of the organization. Do not 
> > click links or open attachments unless you can confirm the sender and know 
> > the content is safe.
> > 
> > 
> > 
> > This patch has been applied back to 9.6 and will appear in the next
> > minor release.
> 
> Thank you!

Thank you for the patch --- this was a tricky problem, and frankly, I am
disappointed that we (and I) took so long to address this.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: needless complexity in StartupXLOG

2021-07-27 Thread Kyotaro Horiguchi

At Mon, 26 Jul 2021 16:15:23 -0400, Stephen Frost  wrote in 
> Greetings,
> 
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Mon, Jul 26, 2021 at 1:32 PM Stephen Frost  wrote:
> > > Yeah, tend to agree with this too ... but something I find a bit curious
> > > is the comment:
> > >
> > > * Insert a special WAL record to mark the end of
> > > * recovery, since we aren't doing a checkpoint.
> > >
> > > ... immediately after setting promoted = true, and then at the end of
> > > StartupXLOG() having:
> > >
> > > if (promoted)
> > > RequestCheckpoint(CHECKPOINT_FORCE);
> > >
> > > maybe I'm missing something, but seems like that comment isn't being
> > > terribly clear.  Perhaps we aren't doing a full checkpoint *there*, but
> > > sure looks like we're going to do one moments later regardless of
> > > anything else since we've set promoted to true..?
> > 
> > Yep. So it's a question of whether we allow operations that might
> > write WAL in the meantime. When we write the checkpoint record right
> > here, there can't be any WAL from the new server lifetime until the
> > checkpoint completes. When we write an end-of-recovery record, there
> > can. And there could actually be quite a bit, because if we do the
> > checkpoint right in this section of code, it will be a fast
> > checkpoint, whereas in the code you quoted above, it's a spread
> > checkpoint, which takes a lot longer. So the question is whether it's
> > reasonable to give the checkpoint some time to complete or whether it
> > needs to be completed right now.
> 
> All I was really trying to point out above was that the comment might be
> improved upon, just so someone understands that we aren't doing a
> checkpoint at this particular place, but one will be done later due to
> the promotion.  Maybe I'm being a bit extra with that, but that was my
> thought when reading the code and the use of the promoted flag variable.

(I feel we don't need to check for last-checkpoint, too.)

FWIW, by the way, I complained that the variable name "promoted" is a
bit confusing.  It's old name was fast_promoted, which means that fast
promotion is being *requsted* and ongoing.  On the other hand the
current name "promoted" still means "(fast=non-fallback) promotion is
ongoing" so there was a conversation as the follows.

https://www.postgresql.org/message-id/9fdd994d-a531-a52b-7906-e1cc22701310%40oss.nttdata.com

>> How about changing it to fallback_promotion, or some names with more
>> behavior-specific name like immediate_checkpoint_needed?
> 
> I like doEndOfRecoveryCkpt or something, but I have no strong opinion
> about that flag naming. So I'm ok with immediate_checkpoint_needed
> if others also like that, too.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Slim down integer formatting

On 2021-Jul-26, David Fetter wrote:

> Folks,
> 
> Please find attached a patch to do $subject. It's down to a one table
> lookup and 3 instructions.

So how much faster is it than the original?


-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Re: Reduce the number of special cases to build contrib modules on windows

2021-07-27 Thread Dagfinn Ilmari Mannsåker

David Rowley  writes:

> On Thu, 15 Jul 2021 at 04:01, vignesh C  wrote:
>> The patch does not apply on Head anymore, could you rebase and post a
>> patch. I'm changing the status to "Waiting for Author".
>
> I've rebased this patch and broken it down into 6 individual patches.

I don't know anything about the MSVC build process, but I figured I
could do a general Perl code review.

> --- a/src/tools/msvc/Mkvcbuild.pm
> +++ b/src/tools/msvc/Mkvcbuild.pm
[…]  
> + # Process custom compiler flags
> + if ($mf =~ /^PG_CPPFLAGS\s*=\s*(.*)$/mg || $mf =~ 
> /^override\s*CPPFLAGS\s*(?:[\+\:])?=\s*(.*)$/mg)

  ^^^
This is a very convoluted way of writing [+:]?

> --- a/src/tools/msvc/Project.pm
> +++ b/src/tools/msvc/Project.pm
> @@ -58,7 +58,7 @@ sub AddFiles
>  
>   while (my $f = shift)
>   {
> - $self->{files}->{ $dir . "/" . $f } = 1;
> + AddFile($self, $dir . "/" . $f, 1);

AddFile is a method, so should be called as $self->AddFile(…).

> --- a/src/tools/msvc/Mkvcbuild.pm
> +++ b/src/tools/msvc/Mkvcbuild.pm
> @@ -36,16 +36,12 @@ my @unlink_on_exit;
[…]
> + elsif ($flag =~ /^-I(.*)$/)
> + {
> + foreach my $proj (@projects)
> + {
> + if ($1 eq '$(libpq_srcdir)')
> + {
> + 
> $proj->AddIncludeDir('src\interfaces\libpq');
> + $proj->AddReference($libpq);
> + }
> + }
> + }

It would be better to do the if check outside the for loop.

> --- a/src/tools/msvc/Project.pm
> +++ b/src/tools/msvc/Project.pm
> @@ -51,6 +51,16 @@ sub AddFile
>   return;
>  }
>  
> +sub AddFileAndAdditionalFiles
> +{
> + my ($self, $filename) = @_;
> + if (FindAndAddAdditionalFiles($self, $filename) != 1)

Again, FindAndAddAdditionalFiles is a method and should be called as
$self->FindAndAddAdditionalFiles($filename).

> + {
> + $self->{files}->{$filename} = 1;
> + }
> + return;
> +}


- ilmari

Re: pg_settings.pending_restart not set when line removed

Alvaro Herrera  writes:
> You could argue that this is *weird* (why does reading pg_file_settings
> set a flag in global state?) ... but that weirdness is not something
> this patch is introducing.

Ugh.  I think this patch is likely to create more problems than it fixes.
We should be looking to get rid of that flag, not make its behavior even
more complex.

regards, tom lane

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

On Mon, Jul 26, 2021 at 4:58 PM Tom Lane  wrote:
> After further thought, I can't poke a hole in that concept.
> We'd keep the rule that the trigger executes as the calling user.
> Therefore, the trigger cannot perform any action that the calling
> user couldn't do if she chose.  Conversely, since the trigger
> owner could become a member of that role and then do whatever the
> trigger intends to do, this scheme does not give the trigger owner
> any new abilities either.  All we've done is provide what some
> programming languages call an observer or annotation.
>
> I also like the fact that with this rule, superusers' ability to
> create event triggers that fire for everything is not a special case.

I think this has potential. In a managed services environment, you can
imagine the provider as the super-duper user, having the ability to do
anything - because they control the box, so there's really no stopping
it - but presumably very little interest in what happens within the
database. Then you have the tenant, who is a semi-super-user,
authorized by the provider to do anything internal to the database
that the provider doesn't think will cause them problems. With the
setup you're proposing here, I suppose what the provider needs to do
is have a role like 'tenant' and make all the other tenant role
members of that master role. Then the tenant can log in as 'tenant' as
set up event triggers that will apply to all of those users, but
there's no security compromise for the provider because the role (or
roles) that they use to log in are not members of 'tenant'.

I thought for a while there might be a problem with tenant users
creating event triggers and then altering the owner to 'tenant' but I
think now that was backwards thinking. 'tenant' is a member of all of
the tenant users but not the other way around, so they can't give
their event triggers away to 'tenant'.

Do I have that right?

I agree with you that it's really nice that this eliminates the
special case for superusers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

2021-07-27 Thread Isaac Morland

On Tue, 27 Jul 2021 at 10:19, Robert Haas  wrote:

> I think this has potential. In a managed services environment, you can
> imagine the provider as the super-duper user, having the ability to do
> anything - because they control the box, so there's really no stopping
> it - but presumably very little interest in what happens within the
> database. Then you have the tenant, who is a semi-super-user,
> authorized by the provider to do anything internal to the database
> that the provider doesn't think will cause them problems. With the
> setup you're proposing here, I suppose what the provider needs to do
> is have a role like 'tenant' and make all the other tenant role
> members of that master role. Then the tenant can log in as 'tenant' as
> set up event triggers that will apply to all of those users, but
> there's no security compromise for the provider because the role (or
> roles) that they use to log in are not members of 'tenant'.
>

Isn’t this backwards? If all those roles are members of "tenant" then they
can do anything "tenant" can do. The reverse might work - make "tenant" a
member of all the related roles - although I haven’t thought through in
detail.

The comparison is to making all roles members of "postgres" (disaster) vs.
making "postgres" a member of all roles (redundant, because of how
permissions work for superuser, but harmless).

Re: pg_settings.pending_restart not set when line removed

On 2021-Jul-27, Tom Lane wrote:

> Alvaro Herrera  writes:
> > You could argue that this is *weird* (why does reading pg_file_settings
> > set a flag in global state?) ... but that weirdness is not something
> > this patch is introducing.
> 
> Ugh.  I think this patch is likely to create more problems than it fixes.

I doubt that; as I said, the code already behaves in exactly that way
for closely related operations, so this patch isn't doing anything new.
Note that that loop this code is modifying only applies to lines that
are removed from the config file.

> We should be looking to get rid of that flag, not make its behavior even
> more complex.

Are you proposing to remove the pending_restart column from pg_settings?
That seems a step backwards.

What I know is that the people behind management interfaces need some
way to know if changes to the config need a system restart.  Now maybe 
we want that feature to be implemented in a different way than it
currently is.  I frankly don't care enough to do that myself.  I agree
that the current mechanism is weird, but it's going to take more than a
one-liner to fix it.  The one-liner is only intended to fix a very
specific problem.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

Re: Reduce the number of special cases to build contrib modules on windows

David Rowley  writes:
> 0001: Removes an include directory for dblink. This appears like it's
> not needed. It was added in ee3b4188a (Jan 2010), but an earlier
> commit, 320c7eb8c (June 2008) seems to have made it pointless. It's
> still a mystery to me why ee3b4188a would have been required in the
> first place.

FWIW, I poked around in the mailing list archives around that date
and couldn't find any supporting discussion.  It does seem like it
shouldn't be needed, given that dblink's Makefile does no such thing.

I'd suggest just pushing your 0001 and seeing if the buildfarm
complains.

> 0002: Parses -D in the CPPFLAGS of Makefiles and uses those in the
> MSVC script.  It also adjusts the ltree contrib module so that we do
> the same LOWER_NODE behaviour as we did before.  The MSVC scripts
> appear to have mistakenly forgotten to define LOWER_NODE as it is in
> the Makefiles.

The LOWER_NODE situation seems like a mess, but I think the right fix
is to remove -DLOWER_NODE from the Makefile altogether and move the
responsibility into the C code.  You could have ltree.h do

#if !defined(_MSC_VER)
#define LOWER_NODE 1
#endif

and put the explanatory comment on that, not on the uses of the flag.

Haven't looked at the rest.

regards, tom lane

Re: pg_settings.pending_restart not set when line removed

Alvaro Herrera  writes:
> On 2021-Jul-27, Tom Lane wrote:
>> Ugh.  I think this patch is likely to create more problems than it fixes.

> I doubt that; as I said, the code already behaves in exactly that way
> for closely related operations, so this patch isn't doing anything new.
> Note that that loop this code is modifying only applies to lines that
> are removed from the config file.

Ah ... what's wrong here is some combination of -ENOCAFFEINE and a
not-great explanation on your part.  I misread the patch as adding
"error = true" rather than the flag change.  I agree that setting
the GUC_PENDING_RESTART flag is fine, because set_config_option
would do so if we reached it.  Perhaps you should comment this
along that line?  Also, the cases inside set_config_option
uniformly set that flag *before* the ereport not after.
So maybe like

if (gconf->context < PGC_SIGHUP)
{
+   /* The removal can't be effective without a restart */
+   gconf->status |= GUC_PENDING_RESTART;
ereport(elevel,
(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),

One thing worth checking is whether the pending-restart flag
gets cleared again if the DBA undoes the removal and again
reloads.  I think the right thing will happen, but it'd be
worthwhile to check.

regards, tom lane

Re: Reduce the number of special cases to build contrib modules on windows

On 2021-Jul-28, David Rowley wrote:

> 0003: Is a tidy up patch to make the 'includes' field an array rather
> than a string

In this one, you can avoid turning one line into four with map,

-   $p->AddIncludeDir($pl_proj->{includes});
+   foreach my $inc (@{ $pl_proj->{includes} })
+   {
+   $p->AddIncludeDir($inc);
+   }

Instead of that you can do something like this:

+   map { $p->AddIncludeDir($_); } @{$pl_proj->{includes}};

> 0004: Adds code to check for duplicate references and libraries before
> adding new ones of the same name to the project.

I think using the return value of grep as a boolean is confusing.  It
seems more legible to compare to 0.  So instead of this:

+   if (! grep { $_ eq $ref} @{ $self->{references} })
+   {
+   push @{ $self->{references} }, $ref;
+   }

use something like:

+   if (grep { $_ eq $ref} @{ $self->{references} } == 0)


> 0006: I'm not so sure about. It attempts to do a bit more Makefile
> parsing to get rid of contrib_extrasource and the majority of
> contrib_uselibpgport and contrib_uselibpgcommon usages.

I wonder if we could fix up libpq_pipeline's Makefile somehow to get rid
of the remaining ones.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: needless complexity in StartupXLOG

On Tue, Jul 27, 2021 at 9:18 AM Kyotaro Horiguchi
 wrote:
> FWIW, by the way, I complained that the variable name "promoted" is a
> bit confusing.  It's old name was fast_promoted, which means that fast
> promotion is being *requsted* and ongoing.  On the other hand the
> current name "promoted" still means "(fast=non-fallback) promotion is
> ongoing" so there was a conversation as the follows.
>
> https://www.postgresql.org/message-id/9fdd994d-a531-a52b-7906-e1cc22701310%40oss.nttdata.com

I agree - that variable name is also not great. I am open to making
improvements in that area and in others that have been suggested on
this thread, but my immediate goal is to figure out whether anyone
objects to me committing the posted patch. If nobody comes up with a
reason why it's a bad idea in the next few days, I'll plan to move
ahead with it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Documentation disagrees with behavior of ALTER EVENT TRIGGER

2021-07-27 Thread Mark Dilger

The documentation for ALTER EVENT TRIGGER claims "You must be superuser to 
alter an event trigger" which is manifestly false, as shown below:

+CREATE ROLE evt_first_owner SUPERUSER;
+CREATE ROLE evt_second_owner SUPERUSER;
+SET SESSION AUTHORIZATION evt_first_owner;
+CREATE OR REPLACE FUNCTION evt_test_func()
+RETURNS event_trigger AS $$
+BEGIN
+RAISE NOTICE 'event_trigger called with tag %', tg_tag;
+END;
+$$ LANGUAGE plpgsql;
+CREATE EVENT TRIGGER evt_test_trigger ON ddl_command_start
+   EXECUTE PROCEDURE evt_test_func();
+RESET SESSION AUTHORIZATION;
+ALTER ROLE evt_first_owner NOSUPERUSER;
+SET SESSION AUTHORIZATION evt_first_owner;
+ALTER EVENT TRIGGER evt_test_trigger DISABLE;
+ALTER EVENT TRIGGER evt_test_trigger ENABLE;
+ALTER EVENT TRIGGER evt_test_trigger ENABLE REPLICA;
+ALTER EVENT TRIGGER evt_test_trigger ENABLE ALWAYS;
+ALTER EVENT TRIGGER evt_test_trigger RENAME TO evt_new_name;
+RESET SESSION AUTHORIZATION;
+ALTER EVENT TRIGGER evt_new_name OWNER TO evt_second_owner;
+ALTER EVENT TRIGGER evt_new_name OWNER TO evt_first_owner;
+ERROR:  permission denied to change owner of event trigger "evt_new_name"
+HINT:  The owner of an event trigger must be a superuser.

Per the documentation, the five ALTER commands performed as evt_first_owner 
should have failed.  They did not.  At that time, evt_first_owner owned the 
event trigger despite not being a superuser.

The attempt later to assign ownership back to evt_first_owner fails claiming, 
"The owner of an event trigger must be a superuser", but that claim is not 
precisely true.  At best, "The owner of an event trigger must be a superuser at 
the time ownership is transferred."  There are similar oddities with some other 
object types which make a half-hearted attempt to require the owner to be a 
superuser, but I will start separate threads for those.

This behavior is weird enough that I don't know if it is the code or the 
documentation that is wrong.  I'd like to post patches to clean this up, but 
need community feedback on whether it is the documentation or the behavior that 
needs adjusting.

Over in [1], I am trying to create new privileged roles and assign them some of 
the powers currently reserved to superuser.  It is hard to make patches over 
there when the desired behavior of the system is not quite well defined.

[1] https://www.postgresql.org/message-id/214052.1627331086%40sss.pgh.pa.us

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix around conn_duration in pgbench





On 2021/07/27 11:02, Yugo NAGATA wrote:

Hello Fujii-san,

Thank you for looking at it.

On Tue, 27 Jul 2021 03:04:35 +0900
Fujii Masao  wrote:


case CSTATE_FINISHED:
+   /* per-thread last disconnection time is not 
measured */

Could you tell me why we don't need to do this measurement?


We don't need to do it because it is already done in CSTATE_END_TX state when
the transaction successfully finished. Also, we don't need it when the thread
is aborted (that it, in CSTATE_ABORTED case) because we can't report complete
results anyway in such cases.


Understood.



I updated the comment.


Thanks!

+* Per-thread last disconnection time is not 
measured because it
+* is already done when the transaction 
successfully finished.
+* Also, we don't need it when the thread is 
aborted because we
+* can't report complete results anyway in such 
cases.

What about commenting a bit more explicitly like the following?


In CSTATE_FINISHED state, this disconnect_all() is no-op under -C/--connect 
because all the connections that this thread established should have already 
been closed at the end of transactions. So we don't need to measure the 
disconnection delays here.

In CSTATE_ABORTED state, the measurement is no longer necessary because we 
cannot report complete results anyways in this case.



  

-   /* no connection delay to record */
-   thread->conn_duration = 0;
+   /* connection delay is measured globally between the barriers */

This comment is really correct? I was thinking that the measurement is not 
necessary here because this is the case where -C option is not specified.


This comment means that, when -C is not specified, the connection delay is
measured between the barrier point where the benchmark starts

  /* READY */
  THREAD_BARRIER_WAIT(&barrier);

and the barrier point where all the thread finish making initial connections.

  /* GO */
  THREAD_BARRIER_WAIT(&barrier);


Ok, so you're commenting about the initial connection delay that's
measured when -C is not specified. But I'm not sure if this comment
here is really helpful. Seem rather confusing??






done:
start = pg_time_now();
disconnect_all(state, nstate);
thread->conn_duration += pg_time_now() - start;

We should measure the disconnection time here only when -C option specified 
(i.e., is_connect variable is true)? Though, I'm not sure how much this change 
is helpful to reduce the performance overhead


You are right. We are measuring the disconnection time only when -C option is
specified, but it is already done at the end of transaction (i.e., 
CSTATE_END_TX).
We need disconnection here only when we get an error.
Therefore, we don't need the measurement here.


Ok.

I found another disconnect_all().

/* XXX should this be connection time? */
disconnect_all(state, nclients);

The measurement is also not necessary here.
So the above comment should be removed or updated?

 

I attached the updated patch.


Thanks!

Regards.

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

On Tue, Jul 27, 2021 at 10:24 AM Isaac Morland  wrote:
> Isn’t this backwards? If all those roles are members of "tenant" then they 
> can do anything "tenant" can do. The reverse might work - make "tenant" a 
> member of all the related roles - although I haven’t thought through in 
> detail.

Dang it, yes. The tenant needs to be members of all the other users,
not the other way around. I spent a long time trying to not get that
backwards and still did.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Reduce the number of special cases to build contrib modules on windows

2021-07-27 Thread Andrew Dunstan



On 7/27/21 11:01 AM, Alvaro Herrera wrote:
> On 2021-Jul-28, David Rowley wrote:
>
>> 0003: Is a tidy up patch to make the 'includes' field an array rather
>> than a string
> In this one, you can avoid turning one line into four with map,
>
> - $p->AddIncludeDir($pl_proj->{includes});
> + foreach my $inc (@{ $pl_proj->{includes} })
> + {
> + $p->AddIncludeDir($inc);
> + }
>
> Instead of that you can do something like this:
>
> + map { $p->AddIncludeDir($_); } @{$pl_proj->{includes}};


using map() for a side effect like this is generally frowned upon. See



    do { $p->AddIncludeDir($_); } foreach @{$pl_proj->{includes}};


would be an ok one-liner.


>
>> 0004: Adds code to check for duplicate references and libraries before
>> adding new ones of the same name to the project.
> I think using the return value of grep as a boolean is confusing.  It
> seems more legible to compare to 0.  So instead of this:
>
> + if (! grep { $_ eq $ref} @{ $self->{references} })
> + {
> + push @{ $self->{references} }, $ref;
> + }
>
> use something like:
>
> + if (grep { $_ eq $ref} @{ $self->{references} } == 0)
>


But I believe that's a widely used idiom :-)



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)

Robert Haas  writes:
> Dang it, yes. The tenant needs to be members of all the other users,
> not the other way around. I spent a long time trying to not get that
> backwards and still did.

The "membership" terminology is inherently confusing I fear.
Maybe better to say that all the roles-to-be-audited must
be GRANTed to the "tenant" role?

regards, tom lane

Re: Reduce the number of special cases to build contrib modules on windows

2021-07-27 Thread Dagfinn Ilmari Mannsåker

Alvaro Herrera  writes:

> On 2021-Jul-28, David Rowley wrote:
>
>> 0003: Is a tidy up patch to make the 'includes' field an array rather
>> than a string
>
> In this one, you can avoid turning one line into four with map,
>
> - $p->AddIncludeDir($pl_proj->{includes});
> + foreach my $inc (@{ $pl_proj->{includes} })
> + {
> + $p->AddIncludeDir($inc);
> + }
>
> Instead of that you can do something like this:
>
> + map { $p->AddIncludeDir($_); } @{$pl_proj->{includes}};

map (and grep) should never be used void context for side effects.  Our
perlcritic policy doesn't currently forbid that, but it should (and
there is one violation of that in contrib/intarray).  I'll submit a
patch for that separately.

The acceptable one-liner version would be a postfix for loop:

$p->AddIncludeDir($_) for @{$pl_proj->{includes}};

>> 0004: Adds code to check for duplicate references and libraries before
>> adding new ones of the same name to the project.
>
> I think using the return value of grep as a boolean is confusing.  It
> seems more legible to compare to 0.  So instead of this:
>
> + if (! grep { $_ eq $ref} @{ $self->{references} })
> + {
> + push @{ $self->{references} }, $ref;
> + }
>
> use something like:
>
> + if (grep { $_ eq $ref} @{ $self->{references} } == 0)

I disagree.  Using grep in boolean context is perfectly idiomatic perl.
What would be more idiomatic is List::Util::any, but that's not availble
without upgrading List::Util from CPAN on Perls older than 5.20, so we
can't use that.

- ilmari

Re: Reduce the number of special cases to build contrib modules on windows

On 2021-Jul-27, Dagfinn Ilmari Mannsåker wrote:

> Alvaro Herrera  writes:

> > +   if (grep { $_ eq $ref} @{ $self->{references} } == 0)
> 
> I disagree.  Using grep in boolean context is perfectly idiomatic perl.
> What would be more idiomatic is List::Util::any, but that's not availble
> without upgrading List::Util from CPAN on Perls older than 5.20, so we
> can't use that.

I was wondering if instead of grepping the whole list for each addition
it would make sense to push always, and do a unique-ification step at
the end.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/

Re: truncating timestamps on arbitrary intervals

2021-07-27 Thread John Naylor

I wrote:

> On Thu, Jul 22, 2021 at 4:49 PM Bauyrzhan Sakhariyev <
baurzhansahar...@gmail.com> wrote:
> >
> > > No, the boundary is intentionally the earlier one:
> >
> > I found that commit in GitHub, thanks for pointing it out.
> > When I test locally origin_in_the_future case I get different results
for positive and negative intervals (see queries #1 and #2 from above, they
have same timestamp, origin and interval magnitude, difference is only in
interval sign) - can it be that the version I downloaded from
https://www.enterprisedb.com/postgresql-early-experience doesn't include
commit with that improvement?
>
> Sorry, I wasn't clear. The intention is that the boundary is on the lower
side, but query #1 doesn't follow that, so that's a bug in my view. I found
while developing the feature that the sign of the stride didn't seem to
matter, but evidently I didn't try with the origin in the future.
>
> > >  I wonder if we should just disallow negative intervals here.
> >
> > I cannot imagine somebody using negative as a constant argument but
users can pass another column as a first argument date or some function(ts)
- not likely but possible. A line in docs about the leftmost point of
interval as start of the bin could be helpful.
>
> In recent years there have been at least two attempts to add an absolute
value function for intervals, and both stalled over semantics, so I'd
rather just side-step the issue, especially as we're in beta.

Concretely, I propose to push the attached on master and v14. Since we're
in beta 2 and this thread might not get much attention, I've CC'd the RMT.

--
John Naylor
EDB: http://www.enterprisedb.com


0001-Disallow-negative-strides-in-date_bin.patch
Description: Binary data

perlcritic: prohibit map and grep in void conext

2021-07-27 Thread Dagfinn Ilmari Mannsåker

Hi hackers,

In the patches for improving the MSVC build process, I noticed a use of
`map` in void context.  This is considered bad form, and has a
perlcritic policy forbidding it:
https://metacpan.org/pod/Perl::Critic::Policy::BuiltinFunctions::ProhibitVoidMap.

Attached is a patch that increases severity of that and the
corresponding `grep` policy to 5 to enable it in our perlcritic policy,
and fixes the one use that had already snuck in.

- ilmari

>From c82b08ce047ab47188753806984f1dc5be365556 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Tue, 27 Jul 2021 16:51:29 +0100
Subject: [PATCH] perlcritic: prohibit map and grep in void context

---
 contrib/intarray/bench/create_test.pl | 2 +-
 src/tools/perlcheck/perlcriticrc  | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/contrib/intarray/bench/create_test.pl b/contrib/intarray/bench/create_test.pl
index 993a4572f4..ae8d72bab0 100755
--- a/contrib/intarray/bench/create_test.pl
+++ b/contrib/intarray/bench/create_test.pl
@@ -51,7 +51,7 @@
 	else
 	{
 		print $msg "$i\t{" . join(',', @sect) . "}\n";
-		map { print $map "$i\t$_\n" } @sect;
+		print $map "$i\t$_\n" foreach @sect;
 	}
 }
 close $map;
diff --git a/src/tools/perlcheck/perlcriticrc b/src/tools/perlcheck/perlcriticrc
index e230111b23..9267fb43b2 100644
--- a/src/tools/perlcheck/perlcriticrc
+++ b/src/tools/perlcheck/perlcriticrc
@@ -22,3 +22,10 @@ verbose = %f: %m at line %l, column %c.  %e.  ([%p] Severity: %s)\n
 # insist on use of the warnings pragma
 [TestingAndDebugging::RequireUseWarnings]
 severity = 5
+
+# forbid grep and map in void context
+[BuiltinFunctions::ProhibitVoidGrep]
+severity = 5
+
+[BuiltinFunctions::ProhibitVoidMap]
+severity = 5
-- 
2.30.2

Re: a thinko in b676ac443b6

2021-07-27 Thread Tomas Vondra

On 7/27/21 4:28 AM, Amit Langote wrote:
> Hi,
> 
> I noticed $subject while rebasing my patch at [1] to enable batching
> for the inserts used in cross-partition UPDATEs.
> 
> b676ac443b6 did this:
> 
> -   resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
> -   MakeSingleTupleTableSlot(planSlot->tts_tupleDescriptor,
> -planSlot->tts_ops);
> ...
> +   {
> +   TupleDesc tdesc =
> CreateTupleDescCopy(slot->tts_tupleDescriptor);
> +
> +   resultRelInfo->ri_Slots[resultRelInfo->ri_NumSlots] =
> +   MakeSingleTupleTableSlot(tdesc, slot->tts_ops);
> ...
> +   resultRelInfo->ri_PlanSlots[resultRelInfo->ri_NumSlots] =
> +   MakeSingleTupleTableSlot(tdesc, planSlot->tts_ops);
> 
> I think it can be incorrect to use the same TupleDesc for both the
> slots in ri_Slots (for ready-to-be-inserted tuples) and ri_PlanSlots
> (for subplan output tuples).  Especially if you consider what we did
> in 86dc90056df that was committed into v14.  In that commit, we
> changed the way a subplan under ModifyTable produces its output for an
> UPDATE statement.  Previously, it would produce a tuple matching the
> target table's TupleDesc exactly (plus any junk columns), but now it
> produces only a partial tuple containing the values for the changed
> columns.
> 
> So it's better to revert to using planSlot->tts_tupleDescriptor for
> the slots in ri_PlanSlots.  Attached a patch to do so.
> 

Yeah, this seems like a clear mistake - thanks for noticing it! Clearly
no regression test triggered the issue, so I wonder what's the best way
to test it - any idea what would the test need to do?

I did some quick experiments with batched INSERTs with RETURNING clauses
and/or subplans, but I haven't succeeded in triggering the issue :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Skip temporary table schema name from explain-verbose output.

I wrote:
> I'm inclined to change this in HEAD but leave it alone in the back
> branches.  While it seems pretty bogus, it's not clear if anyone
> out there could be relying on the current behavior.

I've pushed both the 0001 v2 patch and the event trigger change,
and am going to mark the CF entry closed, because leaving it open
would confuse the cfbot.  I think there may still be room to do
something about pg_temp_NNN output in psql's \d commands as 0002
attempted to, but I don't have immediate ideas about how to do
that in a safe/clean way.

regards, tom lane

Re: truncating timestamps on arbitrary intervals

John Naylor  writes:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

+1, we can figure out whether that has a use some other time.

regards, tom lane

Re: Inaccurate error message when set fdw batch_size to 0





On 2021/07/27 15:06, Bharath Rupireddy wrote:

Thanks for the v8 patch, it LGTM.


Pushed. Thanks!

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Why don't update minimum recovery point in xact_redo_abort




On 2021/07/27 19:51, Michael Paquier wrote:

On Tue, Jul 27, 2021 at 05:26:05PM +0900, Fujii Masao wrote:

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.


Indeed.  It would be nice to see some refactoring of this code as
well?  Both share a lot of steps, so adding something to one path can
easily lead to the other path being forgotten.


That's idea, but as far as I read both functions, they seem not
so similar. So I'm not sure how much such refactoring would help.

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index 441445927e..387f80419a 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -5983,7 +5983,16 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, 
TransactionId xid,
}
 
/* Make sure files supposed to be dropped are dropped */
-   DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+   if (parsed->nrels > 0)
+   {
+   /*
+* See comments about update of minimum recovery point on 
truncation,
+* in xact_redo_commit().
+*/
+   XLogFlush(lsn);
+
+   DropRelationFiles(parsed->xnodes, parsed->nrels, true);
+   }
 }
 
 void

Re: Removing "long int"-related limit on hash table sizes

2021-07-27 Thread Andres Freund

Hi,

On 2021-07-26 11:38:41 +0900, Michael Paquier wrote:
> On Sun, Jul 25, 2021 at 12:28:04PM -0400, Tom Lane wrote:
> > Andres Freund  writes:
> >> We really ought to just remove every single use of long.
> > 
> > I have no objection to that as a long-term goal.  But I'm not volunteering
> > to do all the work, and in any case it wouldn't be a back-patchable fix.
> > I feel that we do need to do something about this performance regression
> > in v13.
> 
> Another idea may be to be more aggressive in c.h?  A tweak there would
> be dirtier than marking long as deprecated, but that would be less
> invasive.  Any of that is not backpatchable, of course..

Hard to see how that could work - plenty system headers use long...

Greetings,

Andres Freund

回复：Why don't update minimum recovery point in xact_redo_abort

2021-07-27 Thread 蔡梦娟(玊于)


Hi, Fujii

Thanks for your reply.
And I want to share a patch about the bug with you, I add XLogFlush() in 
xact_redo_abort() to update the minimum recovery point.

Best Regards,
Suyu



--
发件人：Fujii Masao 
发送时间：2021年7月27日(星期二) 16:26
收件人：蔡梦娟(玊于) ; pgsql-hackers 

主　题：Re: Why don't update minimum recovery point in xact_redo_abort



On 2021/07/27 2:38, 蔡梦娟(玊于) wrote:
> Hi, all
> 
> Recently, I got a PANIC while restarts standby, which can be reproduced by 
> the following steps, based on pg 11:
> 1. begin a transaction in primary node;
> 2. create a table in the transaction;
> 3. insert lots of data into the table;
> 4. do a checkpoint, and restart standby after checkpoint is done in primary 
> node;
> 5. insert/update lots of data into the table again;
> 6. abort the transaction.

I could reproduce the issue by using the similar steps and
disabling full_page_writes, in the master branch.


> 
> after step 6, fast shutdown standby node, and then restart standby, you will 
> get a PANIC log, and the backtrace is:
> #0  0x7fc663e5a277 in raise () from /lib64/libc.so.6
> #1  0x7fc663e5b968 in abort () from /lib64/libc.so.6
> #2  0x00c89f01 in errfinish (dummy=0) at elog.c:707
> #3  0x00c8cba3 in elog_finish (elevel=22, fmt=0xdccc18 "WAL contains 
> references to invalid pages") at elog.c:1658
> #4  0x005e476a in XLogCheckInvalidPages () at xlogutils.c:253
> #5  0x005cbc1a in CheckRecoveryConsistency () at xlog.c:9477
> #6  0x005ca5c5 in StartupXLOG () at xlog.c:8609
> #7  0x00a025a5 in StartupProcessMain () at startup.c:274
> #8  0x00643a5c in AuxiliaryProcessMain (argc=2, argv=0x7ffe4e4849a0) 
> at bootstrap.c:485
> #9  0x00a00620 in StartChildProcess (type=StartupProcess) at 
> postmaster.c:6215
> #10 0x009f92c6 in PostmasterMain (argc=3, argv=0x4126500) at 
> postmaster.c:1506
> #11 0x008eab64 in main (argc=3, argv=0x4126500) at main.c:232
> 
> I think the reason for the above error is as follows:
> 1. the transaction in primary node was aborted finally, the standby node also 
> deleted the table files after replayed the xlog record, however, without 
> updating minimum recovery point;
> 2. primary node did a checkpoint before abort, and then standby node is 
> restarted, so standby node will recovery from a point where the table has 
> already been created and data has been inserted into the table;
> 3. when standby node restarts after step 6, it will find the page needed 
> during recovery doesn't exist, which has already been deleted by 
> xact_redo_abort before, so standby node will treat this page as an invalid 
> page;
> 4. xact_redo_abort drop relation files without updating minumum recovery 
> point, before standby node replay the abort xlog record and forget invalid 
> pages again, it will reach consistency because the abort xlogrecord lsn is 
> greater than minrecoverypoint;
> 5. during checkRecoveryConsistency, it will check invalid pages, and find 
> that there is invalid page, and the PANIC log will be generated.
> 
> So why don't update minimum recovery point in xact_redo_abort, just like 
> XLogFlush in xact_redo_commit, in which way standby could reach consistency 
> and check invalid pages after replayed the abort xlogrecord.

ISTM that you're right. xact_redo_abort() should call XLogFlush() to
update the minimum recovery point on truncation. This seems
the oversight in commit 7bffc9b7bf.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

0001-Update-minimum-recovery-point-on-file-deletion-durin.patch
Description: Binary data

Re: Minimal logical decoding on standbys

2021-07-27 Thread Andres Freund

Hi,

On 2021-07-27 09:23:48 +0200, Drouvot, Bertrand wrote:
> Thanks for the warning, rebase done and new v21 version attached.

Did you have a go at fixing the walsender race conditions I
(re-)discovered? Without fixing those I don't see this patch going in...

Greetings,

Andres Freund

Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-27 Thread Bossart, Nathan

On 7/26/21, 10:34 PM, "Bharath Rupireddy" 
 wrote:
> PSA v3 patch.

LGTM.  The pre/post_auth_delay parameters seem to work as intended,
and they are responsive to postmaster crashes.  I didn't find any
examples of calling WaitLatch() without WL_LATCH_SET, but the function
appears to have support for that.  (In fact, it just sets the latch
variable to NULL in that case, so perhaps we should replace MyLatch
with NULL in the patch.)  I do see that WaitLatchOrSocket() is
sometimes called without WL_LATCH_SET, though.

I am marking this patch as ready-for-committer.

Nathan

Re: .ready and .done files considered harmful

On Tue, Jul 27, 2021 at 3:43 AM Dipesh Pandit  wrote:
> and updated a new patch. Please find the attached patch v4.

Some review:

/*
+* If archiver is active, send notification that timeline has switched.
+*/
+   if (XLogArchivingActive() && ArchiveRecoveryRequested &&
+   IsUnderPostmaster)
+   PgArchNotifyTLISwitch();

There are a few other places in xlog.c that are conditional on
XLogArchivingActive(), but none of them test ArchiveRecoveryRequested
or IsUnderPostmaster. It appears to me that PgArchStartupAllowed()
controls whether the archiver runs, and that's not contingent on
ArchiveRecoveryRequested and indeed couldn't be, since it's running in
the postmaster where that variable wouldn't be initialized. So why do
we care about ArchiveRecoveryRequested here? This is not entirely a
rhetorical question; maybe there's some reason we should care. If so,
the comment ought to mention it. If not, the test should go away.

IsUnderPostmaster does make a difference, but I think that test could
be placed inside PgArchNotifyTLISwitch() rather than putting it here
in StartupXLOG(). In fact, I think the test could be removed entirely,
since if PgArchNotifyTLISwitch() is called in single-user mode, it
will presumably just discover that arch_pgprocno == INVALID_PGPROCNO,
so it will simply do nothing even without the special-case code.

+   pqsignal(SIGINT, pgarch_timeline_switch);

I don't think it's great that we're using up SIGINT for this purpose.
There aren't that many signals available at the O/S level that we can
use for our purposes, and we generally try to multiplex them at the
application layer, e.g. by setting a latch or a flag in shared memory,
rather than using a separate signal. Can we do something of that sort
here? Or maybe we don't even need a signal. ThisTimeLineID is already
visible in shared memory, so why not just have the archiver just check
and see whether it's changed, say via a new accessor function
GetCurrentTimeLineID()? I guess there could be a concern about the
expensive of that, because we'd probably be taking a spinlock or an
lwlock for every cycle, but I don't think it's probably that bad,
because I doubt we can archive much more than a double-digit number of
files per second even with a very fast archive_command, and contention
on a lock generally requires a five digit number of acquisitions per
second. It would be worth testing to see if we can see a problem here,
but I'm fairly hopeful that it's not an issue. If we do feel that it's
important to avoid repeatedly taking a lock, let's see if we can find
a way to do it without dedicating a signal to this purpose.

+*
+* "nextLogSegNo" identifies the next log file to be archived in a log
+* sequence and the flag "dirScan" specifies a full directory
scan to find
+* the next log file.
 */
-   while (pgarch_readyXlog(xlog))
+   while (pgarch_readyXlog(xlog, &dirScan, &nextLogSegNo))

I do not like this very much. dirScan and nextLogSegNo aren't clearly
owned either by pgarch_ArchiverCopyLoop() or by pgarch_readyXlog(),
since both functions modify both variables, in each case
conditionally, while also relying on the way that the other function
manipulates them. Essentially these are global variables in disguise.
There's a third, related variable too, which is handled differently:

+   static TimeLineID curFileTLI = 0;

This is really the same kind of thing as the other two, but because
pgarch_readyXlog() happens not to need this one, you just made it
static inside pgarch_readyXlog() instead of passing it back and forth.

The problem with all this is that you can't understand either function
in isolation. Unless you read them both together and look at all of
the ways these three variables are manipulated, you can't really
understand the logic. And there's really no reason why that needs to
be true. The job of cleaning timeline_switch and setting dirScan could
be done entirely within pgarch_readyXlog(), and so could the job of
incrementing nextLogSegNo, because we're not going to again call
pgarch_readyXlog() unless archiving succeeded.

Also note that the TLI which is stored in curFileTLI corresponds to
the segment number stored in nextLogSegNo, yet one of them has "cur"
for "current" in the name and the other has "next". It would be easier
to read the code if the names were chosen more consistently.

My tentative idea as to how to clean this up is: declare a new struct
with a name like readyXlogState and members lastTLI and lastSegNo.
Have pgarch_ArchiverCopyLoop() declare a variable of this type, zero
it, pass it as a parameter to pgarch_readyXlog(), and otherwise leave
it alone. Then let pgarch_readyXlog() do all of the manipulation of
the values stored therein.

+   /*
+* Fall-back to directory scan
+*
+* open xlog status directory and read through list of xlogs
that have the
+* .ready suffix, lo

Why does the owner of a publication need CREATE privileges on the database?

2021-07-27 Thread Mark Dilger

The documentation for ALTER PUBLICATION ... OWNER TO ... claims the new owner
must have CREATE privilege on the database, though superuser can change the
ownership in spite of this restriction. No explanation is given for this
requirement. It seems to just mirror the requirement that many types of
objects which exist within namespaces cannot be transferred to new owners who
lack CREATE privilege on the namespace. But is it rational to follow that
pattern here? I would expect it to follow more closely the behavior of objects
which do not exist within namespaces, like AlterSchemaOwner or
AlterForeignServerOwner which don't require this. (There are other examples to
look at, but those require the new owner to be superuser, so they provide no
guidance.)

During the development of the feature, Peter E. says in [1], "I think ALTER
PUBLICATION does not need to require CREATE privilege on the database." Petr
J. replies in [2], "Right, I removed the check." and the contents of the patch
file 0002-Add-PUBLICATION-catalogs-and-DDL-v12.patch confirm this. After the
feature was first committed in 665d1fad99, Peter updated it in commit
4cfc9484d4, but the reasoning for bringing back this requirement is not clear,
as the commit message just says, "Previously, the new owner had to be a
superuser. The new rules are more refined similar to other objects." The
commit appears not to have had a commitfest entry, nor does it have any
associated email discussion that I can find.

To investigate, I edited all 22 scripts in src/test/subscription/t/ assigning
ownership of all publications to nonsuperuser roles which lack CREATE before
the rest of the test is run. Nothing changes. Either the tests are not
checking the sort of thing this breaks, or this breaks nothing. I also edited
src/backend/commands/publicationcmds.c circa line 693 to only raise a warning
when the assignee lacks CREATE rather than an error and then ran check-world
with TAP tests enabled. Everything passes. So no help there in understanding
why this requirement exists.

Assuming the requirement makes sense, I'd like the error message generated when
the assignee lacks CREATE privilege to be less cryptic:

ALTER PUBLICATION testpub OWNER TO second_pub_owner;
ERROR: permission denied for database regression

But since similarly cryptic messages are produced for other object types that
follow this pattern, maybe that should be a separate thread.

[1]
https://www.postgresql.org/message-id/acbc4035-5be6-9efd-fb37-1d61b8c35ea5%402ndquadrant.com

[2]
https://www.postgresql.org/message-id/ed24d725-1b8c-ed25-19c6-61410e6b1ec6%402ndquadrant.com

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: Logging plan of the running query





On 2021/07/09 14:05, torikoshia wrote:

On 2021-07-02 23:21, Bharath Rupireddy wrote:

On Tue, Jun 22, 2021 at 8:00 AM torikoshia  wrote:

Updated the patch.


Thanks for the patch. Here are some comments on the v4 patch:


Thanks for your comments and suggestions!
I agree with you and updated the patch.

On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao  wrote:


 DO $$
 BEGIN
 PERFORM pg_sleep(100);
 END$$;

When I called pg_log_current_query_plan() to send the signal to
the backend executing the above query, I got the following log message.
I think that this is not expected message. I guess this issue happened
because the information about query text and plan is retrieved
from ActivePortal. If this understanding is right, ISTM that we should
implement new mechanism so that we can retrieve those information
even while nested query is being executed.


I'm now working on this comment.


One idea is to define new global pointer, e.g., "QueryDesc *ActiveQueryDesc;".
This global pointer is set to queryDesc in ExecutorRun()
(also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd() and
when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
get the plan of the currently running query from that global pointer
instead of ActivePortal, and log it. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: RFC: Logging plan of the running query

2021-07-27 Thread Pavel Stehule

út 27. 7. 2021 v 20:34 odesílatel Fujii Masao 
napsal:

>
>
> On 2021/07/09 14:05, torikoshia wrote:
> > On 2021-07-02 23:21, Bharath Rupireddy wrote:
> >> On Tue, Jun 22, 2021 at 8:00 AM torikoshia 
> wrote:
> >>> Updated the patch.
> >>
> >> Thanks for the patch. Here are some comments on the v4 patch:
> >
> > Thanks for your comments and suggestions!
> > I agree with you and updated the patch.
> >
> > On Thu, Jul 1, 2021 at 3:34 PM Fujii Masao 
> wrote:
> >
> >>  DO $$
> >>  BEGIN
> >>  PERFORM pg_sleep(100);
> >>  END$$;
> >>
> >> When I called pg_log_current_query_plan() to send the signal to
> >> the backend executing the above query, I got the following log message.
> >> I think that this is not expected message. I guess this issue happened
> >> because the information about query text and plan is retrieved
> >> from ActivePortal. If this understanding is right, ISTM that we should
> >> implement new mechanism so that we can retrieve those information
> >> even while nested query is being executed.
> >
> > I'm now working on this comment.
>
> One idea is to define new global pointer, e.g., "QueryDesc
> *ActiveQueryDesc;".
> This global pointer is set to queryDesc in ExecutorRun()
> (also maybe ExecutorStart()). And this is reset to NULL in ExecutorEnd()
> and
> when an error is thrown. Then ProcessLogCurrentPlanInterrupt() can
> get the plan of the currently running query from that global pointer
> instead of ActivePortal, and log it. Thought?
>

It cannot work - there can be a lot of nested queries, and at the end you
cannot reset to null, but you should return back pointer to outer query.

Regards

Pavel


> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>
>
>

Re: perlcritic: prohibit map and grep in void conext

> On 27 Jul 2021, at 18:06, Dagfinn Ilmari Mannsåker  wrote:

> Attached is a patch that increases severity of that and the
> corresponding `grep` policy to 5 to enable it in our perlcritic policy,
> and fixes the one use that had already snuck in.

+1, the use of foreach also improves readability a fair bit IMO.

--
Daniel Gustafsson   https://vmware.com/

Re: Have I found an interval arithmetic bug?

2021-07-27 Thread Bruce Momjian

On Fri, Apr  2, 2021 at 09:02:29PM -0400, Bruce Momjian wrote:
> On Fri, Apr  2, 2021 at 05:50:59PM -0700, Bryn Llewellyn wrote:
> > are the user’s parameterization. All are real numbers. Because non-integral
> > values for years, months, days, hours, and minutes are allowed when you 
> > specify
> > a value using the ::interval typecast, my reference doc must state the 
> > rules. I
> > would have struggled to express these rules in prose—especially given the 
> > use
> > both of trunc() and floor(). I would have struggled more to explain what
> > requirements these rules meet.
> 
> The fundamental issue is that while months, days, and seconds are
> consistent in their own units, when you have to cross from one unit to
> another, it is by definition imprecise, since the interval is not tied
> to a specific date, with its own days-of-the-month and leap days and
> daylight savings time changes.  It feels like it is going to be
> imprecise no matter what we do.
> 
> Adding to this is the fact that interval values are stored in C 'struct
> tm' defined in libc's ctime(), where months are integers, so carrying
> around non-integer month values until we get a final result would add a
> lot of complexity, and complexity to a system that is by definition
> imprecise, which doesn't seem worth it.

I went ahead and modified the interval multiplication/division functions
to use the same logic as fractional interval units:

SELECT interval '23 mons';
interval

 1 year 11 mons

SELECT interval '23 mons' / 2;
?column?
-
 11 mons 15 days

SELECT interval '23.5 mons';
interval

 1 year 11 mons 15 days

SELECT interval '23.5 mons' / 2;
 ?column?
--
 11 mons 22 days 12:00:00

I think the big issue is that the casting to interval into integer
mons/days/secs so we can no longer make the distinction of units >
months vs months.

Using Bryn's example, the master branch output is:

SELECT
   interval  '1.3443 years' as i1,
   interval '1 years' * 1.3443 as i2;
  i1   |   i2
---+-
 1 year 4 mons | 1 year 4 mons 3 days 22:45:07.2

and the attached patch output is:

SELECT
  interval  '1.3443 years' as i1,
  interval '1 years' * 1.3443 as i2;
  i1   |  i2
---+--
 1 year 4 mons | 1 year 4 mons 4 days

which looks like an improvement.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.



interval.diff.gz
Description: application/gzip

Re: Have I found an interval arithmetic bug?

Bruce Momjian  writes:
> I went ahead and modified the interval multiplication/division functions
> to use the same logic as fractional interval units:

Wait. A. Minute.

What I think we have consensus on is that interval_in is doing the
wrong thing in a particular corner case.  I have heard nobody but
you suggesting that we should start undertaking behavioral changes
in other interval functions, and I don't believe that that's a good
road to start going down.  These behaviors have stood for many years.
Moreover, since the whole thing is by definition operating with
inadequate information, it is inevitable that for every case you
make better there will be another one you make worse.

I'm really not on board with changing anything except interval_in,
and even there, we had better be certain that everything we change
is a case that is certainly being made better.

BTW, please do not post patches as gzipped attachments, unless
they're enormous.  You're just adding another step making it
harder for people to look at them.

regards, tom lane

Re: pg_settings.pending_restart not set when line removed

On 2021-Jul-27, Tom Lane wrote:

> So maybe like
> 
> if (gconf->context < PGC_SIGHUP)
> {
> +   /* The removal can't be effective without a restart */
> +   gconf->status |= GUC_PENDING_RESTART;
> ereport(elevel,
> (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),

Thanks, done that way.

> One thing worth checking is whether the pending-restart flag
> gets cleared again if the DBA undoes the removal and again
> reloads.  I think the right thing will happen, but it'd be
> worthwhile to check.

I tested this -- it works correctly AFAICS.

Thanks!

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Re: Showing applied extended statistics in explain

2021-07-27 Thread Tomas Vondra

Hi,

On 7/27/21 12:21 PM, Dmitry Dolgov wrote:
>> On Sat, Mar 27, 2021 at 01:50:54AM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> With extended statistics it may not be immediately obvious if they were
>> applied and to which clauses. If you have multiple extended statistics,
>> we may also apply them in different order, etc. And with expressions,
>> there's also the question of matching expressions to the statistics.
>>
>> So it seems useful to include this into in the explain plan - show which
>> statistics were applied, in which order. Attached is an early PoC patch
>> doing that in VERBOSE mode. I'll add it to the next CF.
> 
> Hi,
> 
> sounds like a useful improvement indeed, thanks for the patch. Do you
> plan to invest more time in it?
> 

Yes. I think providing more insight into which statistics were applied,
in which order and to which clauses, is quite desirable.

>> 1) The information is stashed in multiple lists added to a Plan. Maybe
>> there's a better place, and maybe we need to invent a better way to
>> track the info (a new node stashed in a single List).
>>
>> ...
>>
>> 3) It does not work for functional dependencies, because we effectively
>> "merge" all functional dependencies and apply the entries. Not sure how
>> to display this, but I think it should show the individual dependencies
>> actually applied.
>>
>> ...
>>
>> 5) It includes just statistics name + clauses, but maybe we should
>> include additional info (e.g estimate for that combination of clauses).
> 
> Yes, a new node would be nice to have. The other questions above are
> somewhat related to what it should contain, and I guess it depends on
> the use case this patch targets. E.g. for the case "help to figure out
> if an extended statistics was applied" even "merged" functional
> dependencies would be fine I believe.

What do you mean by "merged" functional dependencies? I guess we could
say "these clauses were estimated using dependencies" without listing
which exact dependencies were applied.

> More advanced plan troubleshooting may benefit from estimates.

I'm sorry, I don't know what you mean by this. Can you elaborate?

> What exactly use case do you have in mind?
Well, my goal was to help users investigating the plan/estimates,
because once you create multiple "candidate" statistics objects it may
get quite tricky to determine which of them were applied, in what order,
etc. It's not all that straightforward, given the various heuristics we
use to pick the "best" statistics, apply dependencies last, etc. And I
don't quite want to teach the users those rules, because I consider them
mostly implementation details that wee may want to tweak in the future.

>> 4) The info is collected always, but I guess we should do that only when
>> in explain mode. Not sure how expensive it is.
> 
> Maybe it's in fact not that expensive to always collect the info? Adding
> it as it is now do not increase number of cache lines Plan structure
> occupies (although it comes close to the boundary), and a couple of
> simple tests with CPU bounded load of various types doesn't show
> significant slowdown. I haven't tried the worst case scenario with a lot
> of extended stats involved, but in such situations I can imagine the
> overhead also could be rather small in comparison with other expenses.
> 

Yeah, once there are many statistics it's probably not an issue - the
overhead from processing them is likely way higher than copying this
extra info. Plus people tend to create statistics when there are issues
with planning the queries.

> I've got few more questions after reading the patch:
> 
> * Is there any particular reason behind choosing only some scan nodes to
>   display extended stats? E.g. BitmapHeapScan is missing.
> 

All nodes that may apply extended stats to estimate quals should include
this info. I guess BitmapHeapScan may do that for the filter, right?

> * StatisticExtInfo should have a _copy etc. node functionalty now,
>   right? I've hit "unrecognized node type" with a prepared statement
>   because it's missing.
> 

Yeah, probably.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Showing applied extended statistics in explain

Tomas Vondra  writes:
> On 7/27/21 12:21 PM, Dmitry Dolgov wrote:
>>> So it seems useful to include this into in the explain plan - show which
>>> statistics were applied, in which order. Attached is an early PoC patch
>>> doing that in VERBOSE mode. I'll add it to the next CF.

> Yes. I think providing more insight into which statistics were applied,
> in which order and to which clauses, is quite desirable.

TBH I do not agree that this is a great idea.  I think it's inevitably
exposing a lot of unstable internal planner details.  I like even less
the aspect that this means a lot of information has to be added to the
finished plan in case it's needed for EXPLAIN.  Aside from the sheer
cost of copying that data around, what happens if for example somebody
drops a statistic object between the time of planning and the EXPLAIN?
Are we going to start keeping locks on those objects for the lifetime
of the plans?

regards, tom lane

Re: [Patch] ALTER SYSTEM READ ONLY

On Fri, Jul 23, 2021 at 4:03 PM Robert Haas  wrote:
> My 0003 is where I see some lingering problems. It creates
> XLogAcceptWrites(), moves the appropriate stuff there, and doesn't
> need the xlogreader. But it doesn't really solve the problem of how
> checkpointer.c would be able to call this function with proper
> arguments. It is at least better in not needing two arguments to
> decide what to do, but how is checkpointer.c supposed to know what to
> pass for xlogaction? Worse yet, how is checkpointer.c supposed to know
> what to pass for EndOfLogTLI and EndOfLog?

On further study, I found another problem: the way my patch set leaves
things, XLogAcceptWrites() depends on ArchiveRecoveryRequested, which
will not be correctly initialized in any process other than the
startup process. So CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog)
would just be skipped. Your 0001 seems to have the same problem. You
added Assert(AmStartupProcess()) to the inside of the if
(ArchiveRecoveryRequested) block, but that doesn't fix anything.
Outside the startup process, ArchiveRecoveryRequested will always be
false, but the point is that the associated stuff should be done if
ArchiveRecoveryRequested would have been true in the startup process.
Both of our patch sets leave things in a state where that would never
happen, which is not good. Unless I'm missing something, it seems like
maybe you didn't test your patches to verify that, when the
XLogAcceptWrites() call comes from the checkpointer, all the same
things happen that would have happened had it been called from the
startup process. That would be a really good thing to have tested
before posting your patches.

As far as EndOfLogTLI is concerned, there are, somewhat annoyingly,
several TLIs stored in XLogCtl. None of them seem to be precisely the
same thing as EndLogTLI, but I am hoping that replayEndTLI is close
enough. I found out pretty quickly through testing that replayEndTLI
isn't always valid -- it ends up 0 if we don't enter recovery. That's
not really a problem, though, because we only need it to be valid if
ArchiveRecoveryRequested. The code that initializes and updates it
seems to run whenever InRecovery = true, and ArchiveRecoveryRequested
= true will force InRecovery = true. So it looks to me like
replayEndTLI will always be initialized in the cases where we need a
value. It's not yet entirely clear to me if it has to have the same
value as EndOfLogTLI. I find this code comment quite mysterious:

/*
 * EndOfLogTLI is the TLI in the filename of the XLOG segment containing
 * the end-of-log. It could be different from the timeline that EndOfLog
 * nominally belongs to, if there was a timeline switch in that segment,
 * and we were reading the old WAL from a segment belonging to a higher
 * timeline.
 */
EndOfLogTLI = xlogreader->seg.ws_tli;

The thing is, if we were reading old WAL from a segment belonging to a
higher timeline, wouldn't we have switched to that new timeline?
Suppose we want WAL segment 246 from TLI 1, but we don't have that
segment on TLI 1, only TLI 2. Well, as far as I know, for us to use
the TLI 2 version, we'd need to have TLI 2 in the history of the
recovery_target_timeline. And if that is the case, then we would have
to replay through the record where the timeline changes. And if we do
that, then the discrepancy postulated by the comment cannot still
exist by the time we reach this code, because this code is only
reached after we finish WAL redo. So I'm baffled as to how this can
happen, but considering how many cases there are in this code, I sure
can't promise that it doesn't. The fact that we have few tests for any
of this doesn't help either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Have I found an interval arithmetic bug?

2021-07-27 Thread Bruce Momjian

On Tue, Jul 27, 2021 at 04:01:54PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > I went ahead and modified the interval multiplication/division functions
> > to use the same logic as fractional interval units:
> 
> Wait. A. Minute.
> 
> What I think we have consensus on is that interval_in is doing the
> wrong thing in a particular corner case.  I have heard nobody but
> you suggesting that we should start undertaking behavioral changes
> in other interval functions, and I don't believe that that's a good
> road to start going down.  These behaviors have stood for many years.
> Moreover, since the whole thing is by definition operating with
> inadequate information, it is inevitable that for every case you
> make better there will be another one you make worse.

Bryn mentioned this so I thought I would see what the result looks like.
I am fine to skip them.

> I'm really not on board with changing anything except interval_in,
> and even there, we had better be certain that everything we change
> is a case that is certainly being made better.

Well, I think what I had before the multiply/divide changes were
acceptable to everyone except Bryn, who was looking for more
consistency.
 
> BTW, please do not post patches as gzipped attachments, unless
> they're enormous.  You're just adding another step making it
> harder for people to look at them.

OK, what is large for you?  100k bytes?  I was using 10k bytes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Why don't update minimum recovery point in xact_redo_abort

2021-07-27 Thread Heikki Linnakangas


On 27/07/2021 19:49, Fujii Masao wrote:

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.


Looks good to me, thanks! FWIW, I used the attached script to reproduce 
this.


- Heikki
# Repro for https://www.postgresql.org/message-id/b029fce3-4fac-4265-968e-16f36ff4d075.mengjuan.cmj%40alibaba-inc.com

import psycopg2
import subprocess
import time

subprocess.run(["initdb", "-D", "data-master"])

f = open('data-master/postgresql.conf', 'a')
f.write("full_page_writes='off'\n")
f.close();

subprocess.run(["pg_ctl", "-D", "data-master", "start"])

conn = psycopg2.connect("dbname=postgres host=localhost")
cur = conn.cursor()

cur.execute("create table foo (t text)")
cur.execute("insert into foo values ('foo')");

subprocess.run(["pg_basebackup", "-D", "data-standby", "-R"])

f = open('data-standby/postgresql.conf', 'a')
f.write("port='5433'\n")
f.close();

subprocess.run(["pg_ctl", "-D", "data-standby", "start"])

cur.execute("delete from foo");
conn.rollback();

time.sleep(2) # wait for the ROLLBACK record to reach the standby

subprocess.run(["pg_ctl", "-D", "data-standby", "stop", "-m", "immediate"])
subprocess.run(["pg_ctl", "-D", "data-standby", "start"])

Re: Showing applied extended statistics in explain

2021-07-27 Thread Tomas Vondra



On 7/27/21 10:50 PM, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 7/27/21 12:21 PM, Dmitry Dolgov wrote:
 So it seems useful to include this into in the explain plan - show which
 statistics were applied, in which order. Attached is an early PoC patch
 doing that in VERBOSE mode. I'll add it to the next CF.
> 
>> Yes. I think providing more insight into which statistics were applied,
>> in which order and to which clauses, is quite desirable.
> 
> TBH I do not agree that this is a great idea.  I think it's inevitably
> exposing a lot of unstable internal planner details.

Which unstable planner details? IMHO it's not all that different from
info about which indexes were picked for the query.

> I like even less the aspect that this means a lot of information has
> to be added to the finished plan in case it's needed for EXPLAIN.
Yes, that's true. I mentioned that in my initial post, and I suggested
we might collect it only when in explain mode.

> Aside from the sheer cost of copying that data around, what happens
> if for example somebody drops a statistic object between the time of
> planning and the EXPLAIN? Are we going to start keeping locks on
> those objects for the lifetime of the plans?
> 

Yes, that'd be an issue. I'm not sure what to do about it, short of
either locking the (applied) statistics objects, or maybe just simply
copying the bits we need for explain (pretty much just name).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] test/ssl: rework the sslfiles Makefile target

> On 4 Mar 2021, at 01:03, Jacob Champion  wrote:

> Andrew pointed out elsewhere [1] that it's pretty difficult to add new
> certificates to the test/ssl suite without blowing away the current
> state and starting over. I needed new cases for the NSS backend work,
> and ran into the same pain, so here is my attempt to improve the
> situation.

Thanks for working on this, I second the pain cited.  I've just started to look
at this, so only a few comments thus far.

> The unused server-ss certificate has been removed entirely.

Nice catch, this seems to have been unused since the original import of the SSL
test suite.  To cut down scope of the patch (even if only a small bit) I
propose to apply this separately first, as per the attached.

> - Serial number collisions are less likely, thanks to Andrew's idea to
> use the current clock time as the initial serial number in a series.

+my $serialno = `openssl x509 -serial -noout -in ssl/client.crt`;
+$serialno =~ s/^serial=//;
+$serialno = hex($serialno); # OpenSSL prints serial numbers in hexadecimal

Will that work on Windows?  We don't currently require the openssl binary to be
in PATH unless one wants to rebuild sslfiles (which it is quite likely to be
but there should at least be errorhandling covering when it's not).

> - I am making _heavy_ use of GNU Make-isms, which does not improve
> long-term maintainability.

GNU Make is already a requirement, I don't see this shifting the needle in any
direction.

--
Daniel Gustafsson   https://vmware.com/

ssl-remove-server-ss.patch
Description: Binary data

Re: Have I found an interval arithmetic bug?

2021-07-27 Thread Bryn Llewellyn

> On 27-Jul-2021, at 14:13, Bruce Momjian  wrote:
> 
> On Tue, Jul 27, 2021 at 04:01:54PM -0400, Tom Lane wrote:
>> Bruce Momjian  writes:
>>> I went ahead and modified the interval multiplication/division functions
>>> to use the same logic as fractional interval units:
>> 
>> Wait. A. Minute.
>> 
>> What I think we have consensus on is that interval_in is doing the
>> wrong thing in a particular corner case.  I have heard nobody but
>> you suggesting that we should start undertaking behavioral changes
>> in other interval functions, and I don't believe that that's a good
>> road to start going down.  These behaviors have stood for many years.
>> Moreover, since the whole thing is by definition operating with
>> inadequate information, it is inevitable that for every case you
>> make better there will be another one you make worse.
> 
> Bryn mentioned this so I thought I would see what the result looks like.
> I am fine to skip them.
> 
>> I'm really not on board with changing anything except interval_in,
>> and even there, we had better be certain that everything we change
>> is a case that is certainly being made better.
> 
> Well, I think what I had before the multiply/divide changes were
> acceptable to everyone except Bryn, who was looking for more
> consistency.
> 
>> BTW, please do not post patches as gzipped attachments, unless
>> they're enormous.  You're just adding another step making it
>> harder for people to look at them.
> 
> OK, what is large for you?  100k bytes?  I was using 10k bytes.

Before I say anything else, I’ll stress what I wrote recently (under the 
heading “summary”). I support Tom’s idea that the only appropriate change to 
make is to fix only the exactly self-evident bug that I reported at the start 
of this thread.

I fear that Bruce doesn’t understand my point about interval multiplication 
(which includes multiplying by a number whose absolute value lies between 0 and 
1). Here it is. I believe that the semantics are (and should be) defined like 
this:

[mm, dd, ss]*n == post_spilldown([mm*n, dd*n, ss*n])

where the function post_spilldown() applies the rules that are used when an 
interval literal that specifies only values for months, days, and seconds is 
converted to the internal [mm, dd, ss] representation—where mm and dd are 
4-byte integers and ss is an 80byte integer that represents microseconds.

Here’s a simple test that’s consistent with that hypothesis:

with
  c1 as (
select
  '1 month 1 day 1 second'::interval as i1,
  '1.234 month 1.234 day 1.234 second'::interval as i3),

  c2 as (
select i1*1.234 as i2, i3 from c1)

select i2::text as i2_txt, i3::text from c2 as i3_txt;

Here’s the result:

  i2_txt   |i3 
---+---
 1 mon 8 days 06:05:46.834 | 1 mon 8 days 06:05:46.834

So I’m so far happy.

But, like I said, I’d forgotten a orthogonal quirk. This test shows it. It’s 
informed by the fact that 1.2345*12.0 is 14.8140.

select
  ('1.2345 years'  ::interval)::text as i1_txt,
  ('14.8140 months'::interval)::text as i2_txt;

Here’s the result:

i1_txt | i2_txt 
---+
 1 year 2 mons | 1 year 2 mons 24 days 10:04:48

It seems to be to be crazy behavior. I haven’t found any account of it in the 
PG docs. Others have argued that it’s a sensible result. Anyway, I don’t 
believe that I’ve ever argued that it’s a bug. I wanted only to know what 
rationale informed the design. I agree that changing the behavior here would be 
problematic for extant code.

This quirk explains the outcome of this test:

select
  ('1.2345 years'::interval)::text as i1_txt,
  ('14.8140 months'::interval)::text as i2_txt,
  (1.2345*('1 years'::interval))::text as i3_txt;

This is the result:

i1_txt | i2_txt | i3_txt

---++
 1 year 2 mons | 1 year 2 mons 24 days 10:04:48 | 1 year 2 mons 24 days 10:04:48

Notice that the same text is reported for i2_txt as for i3_txt.

Remove client-log from SSL test .gitignore

The original import of the SSL tests saved the clientside log in /client-log,
which was later removed in 1caef31d9.  The test/ssl .gitignore didn't get the
memo though.

The attached trivial patch removes it from .gitignore, barring objections I'll
push that.

--
Daniel Gustafsson   https://vmware.com/



ssl-gitignore.patch
Description: Binary data

Out-of-memory error reports in libpq

While cleaning out dead branches in my git repo, I came across an
early draft of what eventually became commit ffa2e4670 ("In libpq,
always append new error messages to conn->errorMessage").  I realized
that it contained a good idea that had gotten lost on the way to that
commit.  Namely, let's reduce all of the 60-or-so "out of memory"
reports in libpq to calls to a common subroutine, and then let's teach
the common subroutine a recovery strategy for the not-unlikely
possibility that it fails to append the "out of memory" string to
conn->errorMessage.  That recovery strategy of course is to reset the
errorMessage buffer to empty, hopefully regaining some space.  We lose
whatever we'd had in the buffer before, but we have a better chance of
the "out of memory" message making its way to the user.

The first half of that just saves a few hundred bytes of repetitive
coding.  However, I think that the addition of recovery logic is
important for robustness, because as things stand libpq may be
worse off than before for OOM handling.  Before ffa2e4670, almost
all of these call sites did printfPQExpBuffer(..., "out of memory").
That would automatically clear the message buffer to empty, and
thereby be sure to report the out-of-memory failure if at all
possible.  Now we might fail to report the thing that the user
really needs to know to make sense of what happened.

Therefore, I feel like this was an oversight in ffa2e4670,
and we ought to back-patch the attached into v14.

cc'ing the RMT in case they wish to object.

regards, tom lane

diff --git a/src/interfaces/libpq/fe-auth-scram.c b/src/interfaces/libpq/fe-auth-scram.c
index 4337e89ce9..48565343eb 100644
--- a/src/interfaces/libpq/fe-auth-scram.c
+++ b/src/interfaces/libpq/fe-auth-scram.c
@@ -378,8 +378,7 @@ build_client_first_message(fe_scram_state *state)
 	state->client_nonce = malloc(encoded_len + 1);
 	if (state->client_nonce == NULL)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return NULL;
 	}
 	encoded_len = pg_b64_encode(raw_nonce, SCRAM_RAW_NONCE_LEN,
@@ -453,8 +452,7 @@ build_client_first_message(fe_scram_state *state)
 
 oom_error:
 	termPQExpBuffer(&buf);
-	appendPQExpBufferStr(&conn->errorMessage,
-		 libpq_gettext("out of memory\n"));
+	pqReportOOM(conn);
 	return NULL;
 }
 
@@ -607,8 +605,7 @@ build_client_final_message(fe_scram_state *state)
 
 oom_error:
 	termPQExpBuffer(&buf);
-	appendPQExpBufferStr(&conn->errorMessage,
-		 libpq_gettext("out of memory\n"));
+	pqReportOOM(conn);
 	return NULL;
 }
 
@@ -628,8 +625,7 @@ read_server_first_message(fe_scram_state *state, char *input)
 	state->server_first_message = strdup(input);
 	if (state->server_first_message == NULL)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return false;
 	}
 
@@ -654,8 +650,7 @@ read_server_first_message(fe_scram_state *state, char *input)
 	state->nonce = strdup(nonce);
 	if (state->nonce == NULL)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return false;
 	}
 
@@ -669,8 +664,7 @@ read_server_first_message(fe_scram_state *state, char *input)
 	state->salt = malloc(decoded_salt_len);
 	if (state->salt == NULL)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return false;
 	}
 	state->saltlen = pg_b64_decode(encoded_salt,
@@ -719,8 +713,7 @@ read_server_final_message(fe_scram_state *state, char *input)
 	state->server_final_message = strdup(input);
 	if (!state->server_final_message)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return false;
 	}
 
@@ -758,8 +751,7 @@ read_server_final_message(fe_scram_state *state, char *input)
 	decoded_server_signature = malloc(server_signature_len);
 	if (!decoded_server_signature)
 	{
-		appendPQExpBufferStr(&conn->errorMessage,
-			 libpq_gettext("out of memory\n"));
+		pqReportOOM(conn);
 		return false;
 	}
 
diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 3421ed4685..c190f5ced9 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -73,9 +73,7 @@ pg_GSS_continue(PGconn *conn, int payloadlen)
 		ginbuf.value = malloc(payloadlen);
 		if (!ginbuf.value)
 		{
-			appendPQExpBuffer(&conn->errorMessage,
-			  libpq_gettext("out of memory allocating GSSAPI buffer (%d)\n"),
-			  payloadlen);
+			pqReportOOM(conn);
 			return STATUS_ERROR;
 		}
 		if (pqGetnchar(ginbuf.value, payloadlen, conn))
@@ -227,9 +225,7 @@ pg_SSPI_continue(PGconn *conn, int payloadlen)
 		inputbuf = malloc(payloadlen);
 		if (!inputbuf)
 		{
-			appendPQExpBuffer(&conn->errorMessage,
-			  libpq_gettext("out of memory allocating SSPI buffer (%d)\n"),
-			  payloadlen);
+			pqReportOOM(conn);

Re: Have I found an interval arithmetic bug?

2021-07-27 Thread John W Higgins

On Tue, Jul 27, 2021 at 3:36 PM Bryn Llewellyn  wrote:

>
> with
>   c1 as (
> select
>   '1 month 1 day 1 second'::interval as i1,
>   '1.234 month 1.234 day 1.234 second'::interval as i3),
>
>   c2 as (
> select i1*1.234 as i2, i3 from c1)
>
> select i2::text as i2_txt, i3::text from c2 as i3_txt;
>
>
It's nice to envision all forms of fancy calculations. But the fact is that

'1.5 month'::interval * 2 != '3 month"::interval

with any of these patches - and if that doesn't work - the rest of the
strange numbers really seem to be irrelevant.

If there is a desire to handle fractional cases - then all pieces need to
be held as provided until they are transformed into something. In other
words - 1.5 month needs to be held as 1.5 month until we ask for it to be
reduced to 1 month and 15 days at some point. If the interval data type
immediately casts 1.5 months to 1 month 15 days then all subsequent
calculations are going to be wrong.

I appreciate there is generally no way to accomplish this right now - but
that means walking away from things like 1 month * 1.234 as being not
calculable as opposed to trying to piece something together that fails
pretty quickly.

John

Re: Out-of-memory error reports in libpq

2021-07-27 Thread Bossart, Nathan

On 7/27/21, 3:41 PM, "Tom Lane"  wrote:
> The first half of that just saves a few hundred bytes of repetitive
> coding.  However, I think that the addition of recovery logic is
> important for robustness, because as things stand libpq may be
> worse off than before for OOM handling.  Before ffa2e4670, almost
> all of these call sites did printfPQExpBuffer(..., "out of memory").
> That would automatically clear the message buffer to empty, and
> thereby be sure to report the out-of-memory failure if at all
> possible.  Now we might fail to report the thing that the user
> really needs to know to make sense of what happened.

IIUC, before ffa2e4670, callers mainly used printfPQExpBuffer(), which
always cleared the buffer before attempting to append the OOM message.
With ffa2e4670 applied, callers always attempt to append the OOM
message without resetting the buffer first.  With this new change,
callers will attempt to append the OOM message without resetting the
buffer first, but if that fails, we fall back to the original behavior
before ffa2e4670.

+   if (PQExpBufferBroken(errorMessage))
+   {
+   resetPQExpBuffer(errorMessage);
+   appendPQExpBufferStr(errorMessage, msg);
+   }

I see that appendPQExpBufferStr() checks whether the buffer is broken
by way of enlargePQExpBuffer(), so the fallback steps roughly match
the calls to printfPQExpBuffer() before ffa2e4670.

-   appendPQExpBuffer(&conn->errorMessage,
- libpq_gettext("out of 
memory allocating GSSAPI buffer (%d)\n"),
- payloadlen);
+   pqReportOOM(conn);

I see that some context is lost in a few places (e.g., the one above
points to a GSSAPI buffer).  Perhaps this extra context could be
useful to identify problematic areas, but it might be unlikely to help
much in these parts of libpq.  In any case, the vast majority of
existing callers don't provide any extra context.

Overall, the patch looks good to me.

> Therefore, I feel like this was an oversight in ffa2e4670,
> and we ought to back-patch the attached into v14.

Back-patching to v14 seems reasonable to me.

Nathan

Replace l337sp34k in comments.

2021-07-27 Thread Peter Smith

IMO the PG code comments are not an appropriate place for leetspeak creativity.

PSA a patch to replace a few examples that I recently noticed.

"up2date" --> "up-to-date"

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Replace-leetspeak-comments-with-English.patch
Description: Binary data

Re: Reduce the number of special cases to build contrib modules on windows

On Wed, 28 Jul 2021 at 02:33, Tom Lane  wrote:
>
> David Rowley  writes:
> > 0001: Removes an include directory for dblink. This appears like it's
> > not needed. It was added in ee3b4188a (Jan 2010), but an earlier
> > commit, 320c7eb8c (June 2008) seems to have made it pointless. It's
> > still a mystery to me why ee3b4188a would have been required in the
> > first place.
>
> FWIW, I poked around in the mailing list archives around that date
> and couldn't find any supporting discussion.  It does seem like it
> shouldn't be needed, given that dblink's Makefile does no such thing.

I think the reason is that src/backend/utils/Makefile symlinks
fmgroids.h into src/include/utils.  The copy you added in 320c7eb8c
seems to be the MSVC build's equivalent of that.

David

Re: archive status ".ready" files may be created too early

On 2021-Feb-19, Bossart, Nathan wrote:

> 0002 adds logic for persisting the last notified segment through
> crashes.  This is needed because a poorly-timed crash could otherwise
> cause us to skip marking segments as ready-for-archival altogether.
> This file is only used for primary servers, as there exists a separate
> code path for marking segments as ready-for-archive for standbys.

I'm not sure I understand what's the reason not to store this value in
pg_control; I feel like I'm missing something.  Can you please explain?

There were some comments earlier in the thread about the maximum size of
a record.  As I recall, you can have records of arbitrary size if you
have COMMIT with a large number of relation invalidation messages being
included in the xlog record, or a large number of XIDs of
subtransactions in the transaction.  Spanning several segments is
possible, AFAIU.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Re: a thinko in b676ac443b6

2021-07-27 Thread Amit Langote

On Wed, Jul 28, 2021 at 1:07 AM Tomas Vondra
 wrote:
> On 7/27/21 4:28 AM, Amit Langote wrote:
> > I think it can be incorrect to use the same TupleDesc for both the
> > slots in ri_Slots (for ready-to-be-inserted tuples) and ri_PlanSlots
> > (for subplan output tuples).  Especially if you consider what we did
> > in 86dc90056df that was committed into v14.  In that commit, we
> > changed the way a subplan under ModifyTable produces its output for an
> > UPDATE statement.  Previously, it would produce a tuple matching the
> > target table's TupleDesc exactly (plus any junk columns), but now it
> > produces only a partial tuple containing the values for the changed
> > columns.
> >
> > So it's better to revert to using planSlot->tts_tupleDescriptor for
> > the slots in ri_PlanSlots.  Attached a patch to do so.
>
> Yeah, this seems like a clear mistake - thanks for noticing it! Clearly
> no regression test triggered the issue, so I wonder what's the best way
> to test it - any idea what would the test need to do?

Ah, I should've mentioned that this is only a problem if the original
query is an UPDATE.  With v14, only INSERTs can use batching and the
subplan does output a tuple matching the target table's TupleDesc in
their case, so the code seems to work fine.

As I said, I noticed a problem when rebasing my patch to allow
cross-partition UPDATEs to use batching for the inserts that are
performed internally to implement such UPDATEs.  The exact problem I
noticed is that the following Assert tts_virtual_copyslot() (via
ExecCopySlot called with an ri_PlanSlots[] entry) failed:

Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);

srcdesc in this case is a slot in ri_PlanSlots[] initialized with the
target table's TupleDesc (the "thinko") and dstslot is the slot that
holds subplan's output tuple ('planSlot' passed to ExecInsert).  As I
described in my previous email, dstslot's TupleDesc can be narrower
than the target table's TupleDesc in the case of an UPDATE, so the
Assert can fail in theory.

> I did some quick experiments with batched INSERTs with RETURNING clauses
> and/or subplans, but I haven't succeeded in triggering the issue :-(

Yeah, no way to trigger this except UPDATEs.  It still seems like a
good idea to fix this in v14.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Slim down integer formatting

On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera  wrote:
> So how much faster is it than the original?

I only did some very quick tests.  They're a bit noisey. The results
indicate an average speedup of 1.7%, but the noise level is above
that, so unsure.

create table a (a int);
insert into a select a from generate_series(1,100)a;
vacuum freeze a;

bench.sql: copy a to '/dev/null';

master @ 93a0bf239
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
latency average = 153.815 ms
latency average = 152.955 ms
latency average = 147.491 ms

master + v2 patch
drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
latency average = 144.749 ms
latency average = 151.525 ms
latency average = 150.392 ms

David

Re: Replace l337sp34k in comments.

On Wed, Jul 28, 2021 at 09:39:02AM +1000, Peter Smith wrote:
> IMO the PG code comments are not an appropriate place for leetspeak 
> creativity.
> 
> PSA a patch to replace a few examples that I recently noticed.
> 
> "up2date" --> "up-to-date"

Agreed that this is a bit cleaner to read, so done.  Just note that
pgindent has been complaining about the format of some of the updated
comments.
--
Michael


signature.asc
Description: PGP signature

Re: archive status ".ready" files may be created too early

2021-07-27 Thread Bossart, Nathan

On 7/27/21, 6:05 PM, "Alvaro Herrera"  wrote:
> On 2021-Feb-19, Bossart, Nathan wrote:
>
>> 0002 adds logic for persisting the last notified segment through
>> crashes.  This is needed because a poorly-timed crash could otherwise
>> cause us to skip marking segments as ready-for-archival altogether.
>> This file is only used for primary servers, as there exists a separate
>> code path for marking segments as ready-for-archive for standbys.
>
> I'm not sure I understand what's the reason not to store this value in
> pg_control; I feel like I'm missing something.  Can you please explain?

Thanks for taking a look.

The only reason I can think of is that it could make back-patching
difficult.  I don't mind working on a version of the patch that uses
pg_control.  Back-patching this fix might be a stretch, anyway.

> There were some comments earlier in the thread about the maximum size of
> a record.  As I recall, you can have records of arbitrary size if you
> have COMMIT with a large number of relation invalidation messages being
> included in the xlog record, or a large number of XIDs of
> subtransactions in the transaction.  Spanning several segments is
> possible, AFAIU.

This is my understanding, too.

Nathan

Re: Autovacuum on partitioned table (autoanalyze)

2021-07-27 Thread Andres Freund

Hi,

On 2021-07-22 13:54:58 -0700, Andres Freund wrote:
> On 2021-04-08 01:20:14 -0400, Alvaro Herrera wrote:
> > On 2021-Apr-07, Alvaro Herrera wrote:
> >
> > > OK, I bit the bullet and re-did the logic in the way I had proposed
> > > earlier in the thread: do the propagation on the collector's side, by
> > > sending only the list of ancestors: the collector can read the tuple
> > > change count by itself, to add it to each ancestor.  This seems less
> > > wasteful.  Attached is v16 which does it that way and seems to work
> > > nicely under my testing.
> >
> > Pushed with this approach.  Thanks for persisting with this.
> 
> I'm looking at this in the context of rebasing & polishing the shared
> memory stats patch.
> 
> I have a few questions / concerns:

Another one, and I think this might warrant thinking about for v14:

Isn't this going to create a *lot* of redundant sampling?  Especially if you
have any sort of nested partition tree. In the most absurd case a partition
with n parents will get sampled n times, solely due to changes to itself.

Look at the following example:

BEGIN;
DROP TABLE if exists p;
CREATE TABLE p (i int) partition by range(i);
CREATE TABLE p_0 PARTITION OF p FOR VALUES FROM (   0) to (5000) partition 
by range(i);
CREATE TABLE p_0_0 PARTITION OF p_0 FOR VALUES FROM (   0) to (1000);
CREATE TABLE p_0_1 PARTITION OF p_0 FOR VALUES FROM (1000) to (2000);
CREATE TABLE p_0_2 PARTITION OF p_0 FOR VALUES FROM (2000) to (3000);
CREATE TABLE p_0_3 PARTITION OF p_0 FOR VALUES FROM (3000) to (4000);
CREATE TABLE p_0_4 PARTITION OF p_0 FOR VALUES FROM (4000) to (5000);
-- create some initial data
INSERT INTO p select generate_series(0, 5000 - 1) data FROM generate_series(1, 
100) reps;
COMMIT;

UPDATE p_0_4 SET i = i;

Whenever the update is executed, all partitions will be sampled at least twice
(once for p and once for p_0), with p_0_4 sampled three times.

Of course, this is an extreme example, but it's not hard to imagine cases
where v14 will cause the number of auto-analyzes increase sufficiently to bog
down autovacuum to a problematic degree.

Additionally, while analyzing all child partitions for a partitioned tables
are AccessShareLock'ed at once. If a partition hierarchy has more than one
level, it actually is likely that multiple autovacuum workers will end up
processing the ancestors separately.  This seems like it might contribute to
lock exhaustion issues with larger partition hierarchies?

Greetings,

Andres Freund

Re: Slim down integer formatting

2021-07-27 Thread David Fetter

On Wed, Jul 28, 2021 at 01:17:43PM +1200, David Rowley wrote:
> On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera  wrote:
> > So how much faster is it than the original?
> 
> I only did some very quick tests.  They're a bit noisey. The results
> indicate an average speedup of 1.7%, but the noise level is above
> that, so unsure.
> 
> create table a (a int);
> insert into a select a from generate_series(1,100)a;
> vacuum freeze a;
> 
> bench.sql: copy a to '/dev/null';
> 
> master @ 93a0bf239
> drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 153.815 ms
> latency average = 152.955 ms
> latency average = 147.491 ms
> 
> master + v2 patch
> drowley@amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 144.749 ms
> latency average = 151.525 ms
> latency average = 150.392 ms

Thanks for testing this!  I got a few promising results early on with
-O0, and the technique seemed like a neat way to do things.

I generated a million int4s intended to be uniformly distributed
across the range of int4, and similarly across int8.

int4:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average:  362.149 ms 359.933 ms
latency stddev:  3.44 ms3.40 ms

int8:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average:  434.944 ms 422.270 ms
latency stddev:  3.23 ms4.02 ms

when compiled with -O2:

int4:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 167.262 ms 148.673 ms
latency stddev:6.26  ms   1.28 ms

i.e. it was actually slower, at least over the 10 runs I did.

I assume that "uniform distribution across the range" is a bad case
scenario for ints, but I was a little surprised to measure worse
performance.  Interestingly, what I got for int8s generated to be
uniform across their range was

int8:
   patch   6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 171.737 ms 174.013 ms
latency stddev:1.94  ms   6.84 ms

which doesn't look like a difference to me.

Intuitively, I'd expect us to get things in the neighborhood of 1 a
lot more often than things in the neighborhood of 1 << (30 or 60).  Do
we have some idea of the distribution, or at least of the distribution
family, that we should expect for ints?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Out-of-memory error reports in libpq

"Bossart, Nathan"  writes:
> - appendPQExpBuffer(&conn->errorMessage,
> -   libpq_gettext("out of 
> memory allocating GSSAPI buffer (%d)\n"),
> -   payloadlen);
> + pqReportOOM(conn);

> I see that some context is lost in a few places (e.g., the one above
> points to a GSSAPI buffer).  Perhaps this extra context could be
> useful to identify problematic areas, but it might be unlikely to help
> much in these parts of libpq.  In any case, the vast majority of
> existing callers don't provide any extra context.

Yeah, there are half a dozen places that currently print something
more specific than "out of memory".  I judged that the value of this
was not worth the complexity it'd add to support it in this scheme.
Different opinions welcome of course.

regards, tom lane

Re: Slim down integer formatting

On Wed, 28 Jul 2021 at 14:25, David Fetter  wrote:
> Intuitively, I'd expect us to get things in the neighborhood of 1 a
> lot more often than things in the neighborhood of 1 << (30 or 60).  Do
> we have some idea of the distribution, or at least of the distribution
> family, that we should expect for ints?

serial and bigserial are generally going to start with smaller
numbers. Larger and longer lived databases those numbers could end up
on the larger side.  serial and bigserial should be a fairly large
portion of the use case for integer types, so anything that slows down
int4out and int8out for lower order numbers is not a good idea. I
think it would have to be a very small slowdown on the low order
numbers vs a large speedup for higher order numbers for us to even
consider it.

David

Re: Use WaitLatch for {pre, post}_auth_delay instead of pg_usleep

2021-07-27 Thread Kyotaro Horiguchi

Hello.

At Tue, 27 Jul 2021 11:04:09 +0530, Bharath Rupireddy 
 wrote in 
> On Mon, Jul 26, 2021 at 11:03 PM Bossart, Nathan  wrote:
> PSA v3 patch.

I have some comments.

- No harm, but it's pointless to feed MyLatch to WaitLatch when
 WL_LATCH_SET is not set (or rather misleading).

- It seems that the additional wait-event is effectively useless for
 most of the processes. Considering that this feature is for debugging
 purpose, it'd be better to use ps display instead (or additionally)
 if we want to see the wait event anywhere.

The events of autovacuum workers can be seen in pg_stat_activity properly.

For client-backends, that state cannot be seen in
pg_stat_activity. That seems inevitable since backends aren't
allocated a PGPROC entry yet at that time. (So the wait event is set
to local memory as a safety measure in this case.)  On the other hand,
I find it inconvenient that the ps display is shown as just "postgres"
while in that state.  I think we can show "postgres: preauth waiting"
or something.  (It is shown as "postgres: username dbname [conn]
initializing" while PostAuthDelay)

Background workers behave the same way to client backends for the same
reason to the above. We might be able to *fix* that but I'm not sure
it's worth doing that only for this specific case.

Autovacuum launcher is seen in pg_stat_activity but clients cannot
start connection before autovac launcher starts unless unless process
startup time is largely fluctuated.  So the status is effectively
useless in regard to the process.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-27 Thread houzj.f...@fujitsu.com

On July 27, 2021 1:14 PM Amit Kapila 
> On Mon, Jul 26, 2021 at 8:33 PM Robert Haas 
> wrote:
> >
> > On Sat, Jul 24, 2021 at 5:52 AM Amit Kapila 
> wrote:
> > > I think for the consistency argument how about allowing users to
> > > specify a parallel-safety option for both partitioned and
> > > non-partitioned relations but for non-partitioned relations if users
> > > didn't specify, it would be computed automatically? If the user has
> > > specified parallel-safety option for non-partitioned relation then we
> > > would consider that instead of computing the value by ourselves.
> >
> > Having the option for both partitioned and non-partitioned tables
> > doesn't seem like the worst idea ever, but I am also not entirely sure
> > that I understand the point.
> >
> 
> Consider below ways to allow the user to specify the parallel-safety option:
> 
> (a)
> CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ...
> ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> 
> OR
> 
> (b)
> CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)

Personally, I think the approach (a) might be better. Since it's similar to
ALTER FUNCTION PARALLEL XXX which user might be more familiar with.

Besides, I think we need a new default value about parallel dml safety. Maybe
'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
likely to alter the safety to the default value to get the automatic safety
check, a independent default value can make it more clear.

Best regards,
Houzj

Re: ATTACH PARTITION locking documentation for DEFAULT partitions

On Tue, 27 Jul 2021 at 21:36, Matthias van de Meent
 wrote:
>
> On Tue, 27 Jul 2021 at 08:02, David Rowley  wrote:\>
> > I've adjusted the patch and attached what I came up with. Let me know
> > what you think.
>
> I like this improved wording. Thanks!

I've pushed this with some very minor further wording adjustments.

Thanks for working on this.

David

Re: Some code cleanup for pgbench and pg_verifybackup

On Tue, Jul 27, 2021 at 08:53:52AM +0900, Michael Paquier wrote:
> On Mon, Jul 26, 2021 at 03:35:29PM -0400, Alvaro Herrera wrote:
>> (In reality we cannot literally just exit(1) in pg_log_fatal(), because
>> there are quite a few places that do some other thing after the log
>> call and before exit(1), or terminate the program in some other way than
>> exit().)
> 
> Yes, that's obviously wrong.  I am wondering if there is more of
> that.

I have been looking at that.  Here are some findings:
- In pg_archivecleanup, CleanupPriorWALFiles() does not bother
exit()'ing with an error code.  Shouldn't we worry about reporting
that correctly?  It seems strange to not inform users about paths that
would be broken, as that could bloat the archives without one knowing
about it.
- In pg_basebackup.c, ReceiveTarAndUnpackCopyChunk() would not
exit() when failing to change permissions for non-WIN32.
- pg_recvlogical is missing a failure handling for fstat(), as of
5c0de38. 
- pg_verifybackup is in the wrong, as mentioned upthread.

Thoughts?  All that does not seem to enter into the category of things
worth back-patching, except for pg_verifybackup.
--
Michael
diff --git a/src/bin/pg_archivecleanup/pg_archivecleanup.c b/src/bin/pg_archivecleanup/pg_archivecleanup.c
index 12338e3bb2..6c3e7f4e01 100644
--- a/src/bin/pg_archivecleanup/pg_archivecleanup.c
+++ b/src/bin/pg_archivecleanup/pg_archivecleanup.c
@@ -151,21 +151,30 @@ CleanupPriorWALFiles(void)
 {
 	pg_log_error("could not remove file \"%s\": %m",
  WALFilePath);
-	break;
+	exit(1);
 }
 			}
 		}
 
 		if (errno)
+		{
 			pg_log_error("could not read archive location \"%s\": %m",
 		 archiveLocation);
+			exit(1);
+		}
 		if (closedir(xldir))
+		{
 			pg_log_error("could not close archive location \"%s\": %m",
 		 archiveLocation);
+			exit(1);
+		}
 	}
 	else
+	{
 		pg_log_error("could not open archive location \"%s\": %m",
 	 archiveLocation);
+		exit(1);
+	}
 }
 
 /*
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 8f69c57380..7296eb97d0 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1626,8 +1626,11 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
 }
 #ifndef WIN32
 if (chmod(state->filename, (mode_t) filemode))
+{
 	pg_log_error("could not set permissions on directory \"%s\": %m",
  state->filename);
+	exit(1);
+}
 #endif
 			}
 			else if (copybuf[156] == '2')
@@ -1676,8 +1679,11 @@ ReceiveTarAndUnpackCopyChunk(size_t r, char *copybuf, void *callback_data)
 
 #ifndef WIN32
 		if (chmod(state->filename, (mode_t) filemode))
+		{
 			pg_log_error("could not set permissions on file \"%s\": %m",
 		 state->filename);
+			exit(1);
+		}
 #endif
 
 		if (state->current_len_left == 0)
diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c
index 1d59bf3744..ebeb12d497 100644
--- a/src/bin/pg_basebackup/pg_recvlogical.c
+++ b/src/bin/pg_basebackup/pg_recvlogical.c
@@ -341,7 +341,10 @@ StreamLogicalLog(void)
 			}
 
 			if (fstat(outfd, &statbuf) != 0)
+			{
 pg_log_error("could not stat file \"%s\": %m", outfile);
+goto error;
+			}
 
 			output_isfile = S_ISREG(statbuf.st_mode) && !isatty(outfd);
 		}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f5ebd57a47..bb93b43093 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -304,6 +304,7 @@ main(int argc, char **argv)
 			 "but was not the same version as %s.\n"
 			 "Check your installation.",
 			 "pg_waldump", full_path, "pg_verifybackup");
+			exit(1);
 		}
 	}
 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c51ebb8e31..55d14604c0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -6469,7 +6469,10 @@ main(int argc, char **argv)
 
 	errno = THREAD_BARRIER_INIT(&barrier, nthreads);
 	if (errno != 0)
+	{
 		pg_log_fatal("could not initialize barrier: %m");
+		exit(1);
+	}
 
 #ifdef ENABLE_THREAD_SAFETY
 	/* start all threads but thread 0 which is executed directly later */


signature.asc
Description: PGP signature

Re: [bug?] Missed parallel safety checks, and wrong parallel safety

2021-07-27 Thread Greg Nancarrow

On Wed, Jul 28, 2021 at 12:52 PM houzj.f...@fujitsu.com
 wrote:
>
> > Consider below ways to allow the user to specify the parallel-safety option:
> >
> > (a)
> > CREATE TABLE table_name (...) PARALLEL DML { UNSAFE | RESTRICTED | SAFE } 
> > ...
> > ALTER TABLE table_name PARALLEL DML { UNSAFE | RESTRICTED | SAFE } ..
> >
> > OR
> >
> > (b)
> > CREATE TABLE table_name (...) WITH (parallel_dml_enabled = true)
> > ALTER TABLE table_name (...) WITH (parallel_dml_enabled = true)
>
> Personally, I think the approach (a) might be better. Since it's similar to
> ALTER FUNCTION PARALLEL XXX which user might be more familiar with.
>

I think so too.

> Besides, I think we need a new default value about parallel dml safety. Maybe
> 'auto' or 'null'(different from safe/restricted/unsafe). Because, user is
> likely to alter the safety to the default value to get the automatic safety
> check, a independent default value can make it more clear.
>

Yes, I was thinking something similar when I said "Provided it is
possible to distinguish between the default parallel-safety (unsafe)
and that default being explicitly specified by the user". If we don't
have a new default value, then we need to distinguish these cases, but
I'm not sure Postgres does something similar elsewhere (for example,
for function parallel-safety, it's not currently recorded whether
parallel-safety=unsafe is because of the default or because the user
specifically set it to what is the default value).
Opinions?

Regards,
Greg Nancarrow
Fujitsu Australia

Re: alter table set TABLE ACCESS METHOD

On Tue, Jul 27, 2021 at 04:38:48PM +0900, Michael Paquier wrote:
> Okay, hearing nothing, I have looked again at 0001 and did some light
> adjustments, mainly in the tests.  I did not spot any issues in the
> patch, so that looks good to go for me.

And done as of b048326.
--
Michael


signature.asc
Description: PGP signature

Re: Out-of-memory error reports in libpq

On Tue, Jul 27, 2021 at 10:31:25PM -0400, Tom Lane wrote:
> Yeah, there are half a dozen places that currently print something
> more specific than "out of memory".  I judged that the value of this
> was not worth the complexity it'd add to support it in this scheme.
> Different opinions welcome of course.

I don't mind either that this removes a bit of context.  For
unlikely-going-to-happen errors that's not worth the extra translation
cost.  No objections from me for an integration into 14 as that's
straight-forward, and that would minimize conflicts between HEAD and 
14 in the event of a back-patch

+pqReportOOM(PGconn *conn)
+{
+   pqReportOOMBuffer(&conn->errorMessage);
+}
+
+/*
+ * As above, but work with a bare error-message-buffer pointer.
+ */
+void
+pqReportOOMBuffer(PQExpBuffer errorMessage)
+{
Not much a fan of having two routines to do this job though.  I would
vote for keeping the one named pqReportOOM() with PQExpBuffer as
argument.
--
Michael


signature.asc
Description: PGP signature

Re: alter table set TABLE ACCESS METHOD

2021-07-27 Thread Jeff Davis

On Wed, 2021-07-28 at 12:23 +0900, Michael Paquier wrote:
> On Tue, Jul 27, 2021 at 04:38:48PM +0900, Michael Paquier wrote:
> > Okay, hearing nothing, I have looked again at 0001 and did some
> > light
> > adjustments, mainly in the tests.  I did not spot any issues in the
> > patch, so that looks good to go for me.
> 
> And done as of b048326.

I just returned from vacation and I was about ready to commit this
myself, but I noticed that it doesn't seem to be calling
InvokeObjectPostAlterHook(). I was in the process of trying to be sure
of where to call it. It looks like it should be called after catalog
changes but before CommandCounterIncrement(), and it also looks like it
should be called even for no-op commands.

Also, I agree with Justin that it should fail when there are multiple
SET ACCESS METHOD subcommands consistently, regardless of whether one
is a no-op, and it should probably throw a syntax error to match SET
TABLESPACE.

Minor nit: in tab-complete.c, why does it say ""? Is that just a
typo or is there a reason it's different from everything else, which
uses ""? And what does "sth" mean anyway?

Regards,
Jeff Davis

Re: 回复：Why don't update minimum recovery point in xact_redo_abort





On 2021/07/28 1:55, 蔡梦娟(玊于) wrote:


Hi, Fujii

Thanks for your reply.
And I want to share a patch about the bug with you, I add XLogFlush() in 
xact_redo_abort() to update the minimum recovery point.


Thanks for the patch! It looks almost the same as the patch I posted upthread.
One diff between them is that you copy-and-pasted the comments for update of
minRecoveryPoint, but instead I just added the comment "See comments ... in
xact_redo_commit()". IMO it's better to avoid putting the same (a bit long)
comments in multiple places so that we can more easily maintain the comments.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Why don't update minimum recovery point in xact_redo_abort





On 2021/07/28 6:25, Heikki Linnakangas wrote:

On 27/07/2021 19:49, Fujii Masao wrote:

Anyway I attached the patch that changes only xact_redo_abort()
so that it calls XLogFlush() to update min recovery point.


Looks good to me, thanks! FWIW, I used the attached script to reproduce this.


Thanks for the review!

Barring any objection, I will commit the patch and
back-patch it to all supported versions.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Reduce the number of special cases to build contrib modules on windows

On Wed, 28 Jul 2021 at 01:44, Dagfinn Ilmari Mannsåker
 wrote:
> I don't know anything about the MSVC build process, but I figured I
> could do a general Perl code review.

Thanks for looking at this. Perl review is very useful as it's
certainly not my native tongue, as you might have noticed.

> > --- a/src/tools/msvc/Mkvcbuild.pm
> > +++ b/src/tools/msvc/Mkvcbuild.pm
> […]
> > + # Process custom compiler flags
> > + if ($mf =~ /^PG_CPPFLAGS\s*=\s*(.*)$/mg || $mf =~ 
> > /^override\s*CPPFLAGS\s*(?:[\+\:])?=\s*(.*)$/mg)
>   
> ^^^
> This is a very convoluted way of writing [+:]?

I've replaced the (?:[\+\:])? with [+:]? It's a bit of a blind
adjustment. I see that the resulting vcxproj files have not changed as
a result of that.

> > --- a/src/tools/msvc/Project.pm
> > +++ b/src/tools/msvc/Project.pm
> > @@ -58,7 +58,7 @@ sub AddFiles
> >
> >   while (my $f = shift)
> >   {
> > - $self->{files}->{ $dir . "/" . $f } = 1;
> > + AddFile($self, $dir . "/" . $f, 1);
>
> AddFile is a method, so should be called as $self->AddFile(…).

Adjusted thanks.

> > --- a/src/tools/msvc/Mkvcbuild.pm
> > +++ b/src/tools/msvc/Mkvcbuild.pm
> > @@ -36,16 +36,12 @@ my @unlink_on_exit;
> […]
> > + elsif ($flag =~ /^-I(.*)$/)
> > + {
> > + foreach my $proj (@projects)
> > + {
> > + if ($1 eq '$(libpq_srcdir)')
> > + {
> > + 
> > $proj->AddIncludeDir('src\interfaces\libpq');
> > + $proj->AddReference($libpq);
> > + }
> > + }
> > + }
>
> It would be better to do the if check outside the for loop.

Agreed.

> > --- a/src/tools/msvc/Project.pm
> > +++ b/src/tools/msvc/Project.pm
> > @@ -51,6 +51,16 @@ sub AddFile
> >   return;
> >  }
> >
> > +sub AddFileAndAdditionalFiles
> > +{
> > + my ($self, $filename) = @_;
> > + if (FindAndAddAdditionalFiles($self, $filename) != 1)
>
> Again, FindAndAddAdditionalFiles is a method and should be called as
> $self->FindAndAddAdditionalFiles($filename).
>
> > + {
> > + $self->{files}->{$filename} = 1;
> > + }
> > + return;
> > +}

Adjusted.

I'll send updated patches once I look at the other reviews.

David

Re: Reduce the number of special cases to build contrib modules on windows

On Wed, 28 Jul 2021 at 03:52, Dagfinn Ilmari Mannsåker
 wrote:
>
> Alvaro Herrera  writes:
>
> > On 2021-Jul-28, David Rowley wrote:
> >
> >> 0003: Is a tidy up patch to make the 'includes' field an array rather
> >> than a string
> >
> > In this one, you can avoid turning one line into four with map,
> >
> > - $p->AddIncludeDir($pl_proj->{includes});
> > + foreach my $inc (@{ $pl_proj->{includes} })
> > + {
> > + $p->AddIncludeDir($inc);
> > + }
> >
> > Instead of that you can do something like this:
> >
> > + map { $p->AddIncludeDir($_); } @{$pl_proj->{includes}};
>
> map (and grep) should never be used void context for side effects.  Our
> perlcritic policy doesn't currently forbid that, but it should (and
> there is one violation of that in contrib/intarray).  I'll submit a
> patch for that separately.
>
> The acceptable one-liner version would be a postfix for loop:
>
> $p->AddIncludeDir($_) for @{$pl_proj->{includes}};

I'm not sure if this is all just getting overly smart about it.
There's already a loop next to this doing:

foreach my $type_lib (@{ $type_proj->{libraries} })
{
$p->AddLibrary($type_lib);
}

I don't object to changing mine, if that's what people think who are
more familiar with Perl than I am, but I do think consistency is a
good thing. TBH, I kinda prefer the multi-line loop. I think most
people that look at these scripts are going to be primarily C coders,
so assuming each of the variations do the same job, then I'd rather
see us stick to the most C like version.

In the meantime, I'll just change it to $p->AddIncludeDir($_) for
@{$pl_proj->{includes}};. I just wanted to note my thoughts.

David

Re: Reduce the number of special cases to build contrib modules on windows

On Wed, 28 Jul 2021 at 04:01, Alvaro Herrera  wrote:
>
> On 2021-Jul-27, Dagfinn Ilmari Mannsåker wrote:
>
> > Alvaro Herrera  writes:
>
> > > +   if (grep { $_ eq $ref} @{ $self->{references} } == 0)
> >
> > I disagree.  Using grep in boolean context is perfectly idiomatic perl.
> > What would be more idiomatic is List::Util::any, but that's not availble
> > without upgrading List::Util from CPAN on Perls older than 5.20, so we
> > can't use that.
>
> I was wondering if instead of grepping the whole list for each addition
> it would make sense to push always, and do a unique-ification step at
> the end.

I see [1] has some thoughts on this,  I don't think performance will
matter much here though. I think the order of the final array is
likely more important.  I didn't test, but I imagine using one of
those hash solutions might end up having the array elements in some
hashtable like order.

I'm not quite sure if I can tell here if it's ok to leave the grep
as-is or if I should be changing it to:

 if (grep { $_ eq $ref} @{ $self->{references} } == 0)

David

[1] https://www.oreilly.com/library/view/perl-cookbook/1565922433/ch04s07.html

Re: truncating timestamps on arbitrary intervals

On Tue, Jul 27, 2021 at 12:05:51PM -0400, John Naylor wrote:
> Concretely, I propose to push the attached on master and v14. Since we're
> in beta 2 and this thread might not get much attention, I've CC'd the RMT.

(It looks like gmail has messed up a bit the format of your last
message.)

Hmm.  The docs say also the following thing, but your patch does not
reflect that anymore:
"Negative intervals are allowed and are treated the same as positive
intervals."
So you may want to update that, at least.
--
Michael


signature.asc
Description: PGP signature

Re: proposal: possibility to read dumped table's name from file

2021-07-27 Thread Pavel Stehule

st 12. 5. 2021 v 8:22 odesílatel Pavel Stehule 
napsal:

> Hi
>
> ne 11. 4. 2021 v 9:48 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> út 16. 2. 2021 v 20:32 odesílatel Pavel Stehule 
>> napsal:
>>
>>>
>>> Hi
>>>
>>> rebase
>>>
>>
>>
> rebase
>
>
>
>> fresh rebase
>>
>
fresh rebase



>> Regards
>>
>> Pavel
>>
>>
>>> Regards
>>>
>>> Pavel
>>>
>>
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 7682226b99..94fc5aa6c2 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1006,6 +1006,42 @@ PostgreSQL documentation
   
  
 
+ 
+  --options-file=filename
+  
+   
+Read options from file (one option per line). Short or long options
+are supported. If you use "-" as a filename, the filters are read
+from stdin.
+   
+
+   
+With the following options file, the dump would include table
+mytable1 and data from foreign tables of
+some_foreign_server foreign server, but exclude data
+from table mytable2.
+
+-t mytable1
+--include-foreign-data=some_foreign_server
+--exclude-table-data=mytable2
+
+   
+
+   
+The text after symbol # is ignored. This can
+be used for comments, notes. Empty lines are ignored too.
+   
+
+   
+The option --options-file can be used more times,
+and the nesting is allowed. The options from options files are
+processed first, other options from command line later. Any option
+file is processed only one time. In next time the processing is
+ignored.
+   
+  
+ 
+
  
   --quote-all-identifiers
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 90ac445bcd..29f85ccfbf 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,10 +55,12 @@
 #include "catalog/pg_trigger_d.h"
 #include "catalog/pg_type_d.h"
 #include "common/connect.h"
+#include "common/string.h"
 #include "dumputils.h"
 #include "fe_utils/option_utils.h"
 #include "fe_utils/string_utils.h"
 #include "getopt_long.h"
+#include "lib/stringinfo.h"
 #include "libpq/libpq-fs.h"
 #include "parallel.h"
 #include "pg_backup_db.h"
@@ -128,18 +130,34 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList optsfilenames_processed = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
 static bool have_extra_float_digits = false;
 static int	extra_float_digits;
 
+static const char *filename = NULL;
+static const char *format = "p";
+static bool g_verbose = false;
+static const char *dumpencoding = NULL;
+static const char *dumpsnapshot = NULL;
+static char *use_role = NULL;
+static int numWorkers = 1;
+static int compressLevel = -1;
+
 /*
  * The default number of rows per INSERT when
  * --inserts is specified without --rows-per-insert
  */
 #define DUMP_DEFAULT_ROWS_PER_INSERT 1
 
+/*
+ * Option's code of "options-file" option
+ */
+#define OPTIONS_FILE_OPTNUM 12
+
 /*
  * Macro for producing quoted, schema-qualified name of a dumpable object.
  */
@@ -308,14 +326,212 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
 static char *get_synchronized_snapshot(Archive *fout);
 static void setupDumpWorker(Archive *AHX);
 static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
+static bool read_options_from_file(const char *filename,
+   DumpOptions *dopt,
+   const char *optstring,
+   const struct option *longopts,
+   const char *progname);
+
+/*
+ * It assigns the values of options to related DumpOption fields or to
+ * some global values. Options file loading is not processed here.
+ */
+static bool
+process_option(int opt,
+			   char *optargstr,
+			   DumpOptions *dopt,
+			   const char *progname)
+{
+	switch (opt)
+	{
+		case 'a':			/* Dump data only */
+			dopt->dataOnly = true;
+			break;
+
+		case 'b':			/* Dump blobs */
+			dopt->outputBlobs = true;
+			break;
+
+		case 'B':			/* Don't dump blobs */
+			dopt->dontOutputBlobs = true;
+			break;
+
+		case 'c':			/* clean (i.e., drop) schema prior to create */
+			dopt->outputClean = 1;
+			break;
+
+		case 'C':			/* Create DB */
+			dopt->outputCreateDB = 1;
+			break;
+
+		case 'd':			/* database name */
+			dopt->cparams.dbname = pg_strdup(optargstr);
+			break;
+
+		case 'e':			/* include extension(s) */
+			simple_string_list_append(&extension_include_patterns, optargstr);
+			dopt->include_everything = false;
+			break;
+
+		case 'E':			/* Dump encoding */
+			dumpencoding = pg_strdup(optargstr);
+			break;
+
+		case 'f':
+			filename = pg_strdup(optargstr);
+			break;
+
+		case 'F':
+			format = pg_strdup(optargstr);
+			break;
+
+		case 'h':			/* server host */
+			dopt->cparams.pghost =

Re: perlcritic: prohibit map and grep in void conext

On Tue, Jul 27, 2021 at 09:09:10PM +0200, Daniel Gustafsson wrote:
> On 27 Jul 2021, at 18:06, Dagfinn Ilmari Mannsåker  wrote:
>> Attached is a patch that increases severity of that and the
>> corresponding `grep` policy to 5 to enable it in our perlcritic policy,
>> and fixes the one use that had already snuck in.
> 
> +1, the use of foreach also improves readability a fair bit IMO.

Sounds interesting to avoid.  pgperlcritic does not complain here
after this patch.
--
Michael


signature.asc
Description: PGP signature

Re: alter table set TABLE ACCESS METHOD

On Tue, Jul 27, 2021 at 08:40:59PM -0700, Jeff Davis wrote:
> I just returned from vacation and I was about ready to commit this
> myself, but I noticed that it doesn't seem to be calling
> InvokeObjectPostAlterHook().

Arg, sorry about that!  I was unclear what the situation of the patch
was.

> I was in the process of trying to be sure
> of where to call it. It looks like it should be called after catalog
> changes but before CommandCounterIncrement(), and it also looks like it
> should be called even for no-op commands.

Right.  Isn't that an older issue though?  A rewrite involved after a
change of relpersistence does not call the hook either.  It looks to
me that this should be after finish_heap_swap() to match with
ATExecSetTableSpace() in ATRewriteTables().  The only known user of
object_access_hook in the core code is sepgsql, so this would
involve a change of behavior.  And I don't recall any backpatching
that added a post-alter hook.

> Also, I agree with Justin that it should fail when there are multiple
> SET ACCESS METHOD subcommands consistently, regardless of whether one
> is a no-op, and it should probably throw a syntax error to match SET
> TABLESPACE.

Hmm.  Okay.

> Minor nit: in tab-complete.c, why does it say ""? Is that just a
> typo or is there a reason it's different from everything else, which
> uses ""? And what does "sth" mean anyway?

"Something".  That should be "" to be consistent with the area.
--
Michael

signature.asc
Description: PGP signature

Re: CREATE SEQUENCE with RESTART option