Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread Fujii Masao




On 2020/04/02 10:41, movead...@highgo.ca wrote:


 >So I'd like to propose the attached patch. The patch changes the message

logged when a promotion is requested, based on whether the recovery is
in paused state or not.

It is a compromise,


Ok, so barring any objection, I will commit the patch.


we should notice it in document I think.


There is the following explation about the relationship the recovery
pause and the promotion, in the document. You may want to add more
descriptions into the docs?

--
If a promotion is triggered while recovery is paused, the paused
state ends and a promotion continues.
--

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-08 Thread davinder singh
On Tue, Apr 7, 2020 at 8:30 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
> * The logic on "defined(_MSC_VER) && (_MSC_VER >= 1900)" is defined as
> "_WIN32_WINNT >= 0x0600" on other parts of the code. I would
> recommend using the later.
>
I think  "_WIN32_WINNT >= 0x0600" represents windows versions only and
doesn't include any information about Visual Studio versions. So I am
sticking to " defined(_MSC_VER) && (_MSC_VER >= 1900)".
I have resolved other comments. I have attached a new version of the patch.
-- 
Regards,
Davinder.


0001-PG-compilation-error-with-VS-2015-2017-2019.patch
Description: Binary data


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-04-08 Thread Amit Kapila
On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
 wrote:
>
> On Wed, 8 Apr 2020 at 14:44, Amit Kapila  wrote:
> >
> >
> > Thanks for the investigation.  I don't see we can do anything special
> > about this.  In an ideal world, this should be done once and not for
> > each worker but I guess it doesn't matter too much.  I am not sure if
> > it is worth adding a comment for this, what do you think?
> >
>
> I agree with you. If the differences were considerably large probably
> we would do something but I think we don't need to anything at this
> time.
>

Fair enough, can you once check this in back-branches as this needs to
be backpatched?  I will do that once by myself as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: WIP/PoC for parallel backup

2020-04-08 Thread Asif Rehman
rebased and updated to current master (d025cf88ba). v12 is attahced.

Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:

- LIST_WAL_FILES 'startptr' 'endptr'
- SEND_FILE 'FILE'  [NOVERIFY_CHECKSUMS]


On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Asif,
>
> Thanks for new patches.
>
> Patches need to be rebased on head. Getting a failure while applying the
> 0003 patch.
> edb@localhost postgresql]$ git apply
> v11/0003-Parallel-Backup-Backend-Replication-commands.patch
> error: patch failed: src/backend/storage/ipc/ipci.c:147
> error: src/backend/storage/ipc/ipci.c: patch does not apply
>
> I have applied v11 patches on commit -
> 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.
>
> pg_basebackup has a new option "--no-estimate-size",  pg_basebackup
> crashes when using this option.
>
> [edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size
> --jobs=2
> Segmentation fault (core dumped)
>
> --stacktrace
> [edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
> Loaded symbols for /lib64/libselinux.so.1
> Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size
> --jobs=2'.
> Program terminated with signal 11, Segmentation fault.
> #0  strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group= optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
> 298  while (ISSPACE (*s))
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group= optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
> #1  0x003921233b30 in atoi (nptr=) at atoi.c:28
> #2  0x0040841e in main (argc=5, argv=0x7ffeaa6fb968) at
> pg_basebackup.c:2526
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
>
>
> On Tue, Apr 7, 2020 at 11:07 PM Robert Haas  wrote:
>
>> On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman 
>> wrote:
>> > I will, however parallel backup is already quite a large patch. So I
>> think we should first
>> > agree on the current work before adding a backup manifest and
>> progress-reporting support.
>>
>> It's going to be needed for commit, but it may make sense for us to do
>> more review of what you've got here before we worry about it.
>>
>> I'm gonna try to find some time for that as soon as I can.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>

-- 
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
<>


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-04-08 Thread Julien Rouhaud
On Wed, Apr 8, 2020 at 8:23 AM Masahiko Sawada
 wrote:
>
> On Wed, 8 Apr 2020 at 14:44, Amit Kapila  wrote:
> >
> > On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
> >  wrote:
> > >
> > > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> > > > >  wrote:
> > > > > >
> > > > > > Buffer usage statistics seem correct. The small differences would be
> > > > > > catalog lookups Peter mentioned.
> > > > > >
> > > > >
> > > > > Agreed, but can you check which part of code does that lookup?  I want
> > > > > to see if we can avoid that from buffer usage stats or at least write
> > > > > a comment about it, otherwise, we might have to face this question
> > > > > again and again.
> > > >
> > > > Okay, I'll check it.
> > > >
> > >
> > > I've checked the buffer usage differences when parallel btree index 
> > > creation.
> > >
> > > TL;DR;
> > >
> > > During tuple sorting individual parallel workers read blocks of
> > > pg_amproc and pg_amproc_fam_proc_index to get the sort support
> > > function. The call flow is like:
> > >
> > > ParallelWorkerMain()
> > >   _bt_parallel_scan_and_sort()
> > > tuplesort_begin_index_btree()
> > >   PrepareSortSupportFromIndexRel()
> > > FinishSortSupportFunction()
> > >   get_opfamily_proc()
> > >
> >
> > Thanks for the investigation.  I don't see we can do anything special
> > about this.  In an ideal world, this should be done once and not for
> > each worker but I guess it doesn't matter too much.  I am not sure if
> > it is worth adding a comment for this, what do you think?
> >
>
> I agree with you. If the differences were considerably large probably
> we would do something but I think we don't need to anything at this
> time.

+1




Re: segmentation fault using currtid and partitioned tables

2020-04-08 Thread Michael Paquier
On Sun, Apr 05, 2020 at 12:51:56PM -0400, Tom Lane wrote:
> I think it might be a good idea to make relations-without-storage
> set up rd_tableam as a vector of dummy functions that will throw
> some suitable complaint about "relation lacks storage".  NULL is
> a horrible default for this.

Yeah, that's not good, but I am not really comfortable with the
concept of implying that (pg_class.relam == InvalidOid) maps to a
dummy AM callback set instead of NULL for rd_tableam.  That feels less
natural.  As mentioned upthread, the error that we get in ~11 is
confusing as well when using a relation that has no storage:
ERROR:  58P01: could not open file "base/16384/16385": No such file or directory

I have been looking at the tree and the use of the table AM APIs, and
those TID lookups are really a particular case compared to the other
callers of the table AM callbacks.  Anyway, I have not spotted similar
problems, so I find very tempting the option to just add some
RELKIND_HAS_STORAGE() to tid.c where it matters and call it a day.

Andres, what do you think?
--
Michael


signature.asc
Description: PGP signature


Re: doc review for parallel vacuum

2020-04-08 Thread Masahiko Sawada
On Tue, 7 Apr 2020 at 13:55, Justin Pryzby  wrote:
>
> On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
> > On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila  
> > wrote:
> > > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby  
> > > wrote:
> > > >
> > > > Original, long thread
> > > > https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> > > >
> > >
> > > I have comments/questions on the patches:
> > > doc changes
> > > -
> > > 1.
> > >   
> > > -  Perform vacuum index and cleanup index phases of 
> > > VACUUM
> > > +  Perform vacuum index and index cleanup phases of 
> > > VACUUM
> > >
> > > Why is the proposed text improvement over what is already there?
> >
> > I have kept the existing text as it is for this case.
>
> Probably it should say "index vacuum and index cleanup", which is also what 
> the
> comment in vacuumlazy.c says.
>
> > > 2.
> > > If the
> > > -  PARALLEL option is omitted, then
> > > -  VACUUM decides the number of workers based on 
> > > number
> > > -  of indexes that support parallel vacuum operation on the relation 
> > > which
> > > -  is further limited by  > > linkend="guc-max-parallel-workers-maintenance"/>.
> > > -  The index can participate in a parallel vacuum if and only if the 
> > > size
> > > +  PARALLEL option is omitted, then the number of 
> > > workers
> > > +  is determined based on the number of indexes that support parallel 
> > > vacuum
> > > +  operation on the relation, and is further limited by  > > +  linkend="guc-max-parallel-workers-maintenance"/>.
> > > +  An index can participate in parallel vacuum if and only if the size
> > >of the index is more than  > > linkend="guc-min-parallel-index-scan-size"/>.
> > >
> > > Here, it is not clear to me why the proposed text is better than what
> > > we already have?
> > Changed as per your suggestion.
>
> To answer your question:
>
> "VACUUM decides" doesn't sound consistent with docs.
>
> "based on {+the+} number"
> => here, you're veritably missing an article...
>
> "relation which" technically means that the *relation* is "is further limited
> by"...
>
> > > Patch changes
> > > -
> > > 1.
> > > - * and index cleanup with parallel workers unless the parallel vacuum is
> > > + * and index cleanup with parallel workers unless parallel vacuum is
> > >
> > > As per my understanding 'parallel vacuum' is a noun phrase, so we
> > > should have a determiner before it.
> >
> > Changed as per your suggestion.
>
> Thanks (I think you meant an "article").
>
> > > - * We allow each worker to update it as and when it has incurred any 
> > > cost and
> > > + * We allow each worker to update it AS AND WHEN it has incurred any 
> > > cost and
> > >
> > > I don't see why it is necessary to make this part bold?  We are using
> > > it at one other place in the code in the way it is used here.
> > >
> >
> > Kept the existing text as it is.
>
> I meant this as a question.  I'm not sure what "as and when means" ?  "If and
> when" ?
>
> > I have combined both of your patches.  Let me know if you have any
> > more suggestions, otherwise, I am planning to push this tomorrow.
>
> In the meantime, I found some more issues, so I rebased on top of your patch 
> so
> you can review it.
>

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BUG #16346: pg_upgrade fails on a trigger with a comment

2020-04-08 Thread Hamid Akhtar
I have tested the patch in a little more detail.
(1) Verified that it fixes the bug
(2) Ran regression tests; all are passing.

To recap, the attached patch moves restoration of comments to the
RESTORE_PASS_POST_ACL. This ensures that comments are
restored in a PASS when essentially all required objects are created
including event triggers and materialized views (and any other db
objects).

This patch is good from my side.

On Wed, Apr 8, 2020 at 1:10 AM Hamid Akhtar  wrote:

> As you have mentioned, I have verified that indeed commit 4c40b27b broke
> this.
>
> In this particular commit moves restoration of materialized views and
> event triggers to the last phase. Perhaps, comments should also be moved to
> this phase as there may comments on either of these types of objects.
>
> Attached is a patch that resolves this issue. I've verified that it
> resolve the upgrade (and restore issue) introduced by 4c40b27b. I'll test
> this patch in a little more detail tomorrow.
>
> On Mon, Apr 6, 2020 at 8:26 PM PG Bug reporting form <
> nore...@postgresql.org> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:  16346
>> Logged by:  Alexander Lakhin
>> Email address:  exclus...@gmail.com
>> PostgreSQL version: 12.2
>> Operating system:   Ubuntu 18.04
>> Description:
>>
>> When using pg_upgrade on a database with the following contents:
>> CREATE FUNCTION public.test_event_trigger() RETURNS event_trigger
>> LANGUAGE plpgsql
>> AS $$
>> BEGIN
>> RAISE NOTICE 'test_event_trigger: % %', tg_event, tg_tag;
>> END
>> $$;
>>
>> CREATE EVENT TRIGGER regress_event_trigger3 ON ddl_command_start
>>EXECUTE PROCEDURE public.test_event_trigger();
>>
>> COMMENT ON EVENT TRIGGER regress_event_trigger3 IS 'test comment';
>>
>> I get:
>> Restoring global objects in the new cluster ok
>> Restoring database schemas in the new cluster
>>   postgres
>> *failure*
>>
>> Consult the last few lines of "pg_upgrade_dump_14174.log" for
>> the probable cause of the failure.
>> Failure, exiting
>>
>> pg_upgrade_dump_14174.log contains:
>> command: "/src/postgres/tmp_install/usr/local/pgsql/bin/pg_restore" --host
>> /src/postgres --port 50432 --username postgres --clean --create
>> --exit-on-error --verbose --dbname template1
>> "pg_upgrade_dump_14174.custom"
>> >> "pg_upgrade_dump_14174.log" 2>&1
>> pg_restore: connecting to database for restore
>> pg_restore: dropping DATABASE PROPERTIES postgres
>> pg_restore: dropping DATABASE postgres
>> pg_restore: creating DATABASE "postgres"
>> pg_restore: connecting to new database "postgres"
>> pg_restore: connecting to database "postgres" as user "postgres"
>> pg_restore: creating COMMENT "DATABASE "postgres""
>> pg_restore: creating DATABASE PROPERTIES "postgres"
>> pg_restore: connecting to new database "postgres"
>> pg_restore: connecting to database "postgres" as user "postgres"
>> pg_restore: creating pg_largeobject "pg_largeobject"
>> pg_restore: creating FUNCTION "public.test_event_trigger()"
>> pg_restore: creating COMMENT "EVENT TRIGGER "regress_event_trigger3""
>> pg_restore: while PROCESSING TOC:
>> pg_restore: from TOC entry 3705; 0 0 COMMENT EVENT TRIGGER
>> "regress_event_trigger3" postgres
>> pg_restore: error: could not execute query: ERROR:  event trigger
>> "regress_event_trigger3" does not exist
>> Command was: COMMENT ON EVENT TRIGGER "regress_event_trigger3" IS 'test
>> comment';
>>
>> It looks like the commit 4c40b27b broke this.
>>
>>
>
> --
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
> ADDR: 10318 WHALLEY BLVD, Surrey, BC
> CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
> SKYPE: engineeredvirus
>


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus


Re: 2pc leaks fds

2020-04-08 Thread Antonin Houska
Antonin Houska  wrote:

> Andres Freund  wrote:
> > But I'm not sure it's quite the right idea. I'm not sure I fully
> > understand the design of 0dc8ead46, but it looks to me like it's
> > intended to allow users of the interface to have different ways of
> > opening files. If we just close() the fd that'd be a bit more limited.
> 
> It should have allowed users to have different ways to *locate the segment*
> file. The WALSegmentOpen callback could actually return file path instead of
> the file descriptor and let WALRead() perform the opening/closing, but then
> the WALRead function would need to be aware whether it is executing in backend
> or in frontend (so it can use the correct function to open/close the file).

Well, #ifdef FRONTEND can be used to distinguish the caller of
WALRead(). However now that I tried to adjust the API, I see that
pg_waldump.c:WALDumpOpenSegment uses specific logic to open the file. So if
the callback only returned the file name, there would be no suitable place for
the things that WALDumpOpenSegment does.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Kuntal Ghosh
Hi,

On Wed, Apr 8, 2020 at 7:07 AM Etsuro Fujita  wrote:
>
> Pushed after modifying some comments further, based on the suggestions
> of Ashutosh.
I'm getting the following warning during compilation.

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ [-Wunused-variable]
  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
 ^
For fixing the same, we can declare inner_binfo as
PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread movead...@highgo.ca

>> we should notice it in document I think. 
>There is the following explation about the relationship the recovery
>pause and the promotion, in the document. You may want to add more
>descriptions into the docs?
>--
>If a promotion is triggered while recovery is paused, the paused
>state ends and a promotion continues.
>--

For example we can add this words:
First-time pg_wal_replay_pause() called during recovery which triggered
as promotion, pg_wal_replay_pause() show success but it did not really
pause the recovery.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


Re: [HACKERS] Restricting maximum keep segments by repslots

2020-04-08 Thread Kyotaro Horiguchi
At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I saw another issue, the following sequence on the primary freezes
> when invalidation happens.
> 
> =# create table tt(); drop table tt; select pg_switch_wal();create table 
> tt(); drop table tt; select pg_switch_wal();create table tt(); drop table tt; 
> select pg_switch_wal(); checkpoint;
> 
> The last checkpoint command is waiting for CV on
> CheckpointerShmem->start_cv in RequestCheckpoint(), while Checkpointer
> is waiting for the next latch at the end of
> CheckpointerMain. new_started doesn't move but it is the same value
> with old_started.
> 
> That freeze didn't happen when I removed
> ConditionVariableSleep(&s->active_cv) in
> InvalidateObsoleteReplicationSlots.
> 
> I continue investigating it.

I understand how it happens.

The latch triggered by checkpoint request by CHECKPOINT command has
been absorbed by ConditionVariableSleep() in
InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
use MyLatch for other than checkpoint request while a checkpoint is
running.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6b877f11f557fc76f206e7a71ff7890952bf63d4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 8 Apr 2020 16:35:25 +0900
Subject: [PATCH] Allow MyLatch of checkpointer for other use.

MyLatch of checkpointer process was used only to request for a
checkpoint.  Checkpoint can miss a request if the latch is used for
other purposes during a checkpoint.  Allow MyLatch be used for other
purposes such as condition variables by recording pending checkpoint
requests.
---
 src/backend/postmaster/checkpointer.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index e354a78725..86c355f035 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -160,6 +160,12 @@ static double ckpt_cached_elapsed;
 static pg_time_t last_checkpoint_time;
 static pg_time_t last_xlog_switch_time;
 
+/*
+ * Record checkpoint requests.  Since MyLatch is used other than
+ * CheckpointerMain, we need to record pending checkpoint request here.
+ */
+static bool CheckpointRequestPending = false;
+
 /* Prototypes for private functions */
 
 static void HandleCheckpointerInterrupts(void);
@@ -335,6 +341,7 @@ CheckpointerMain(void)
 
 		/* Clear any already-pending wakeups */
 		ResetLatch(MyLatch);
+		CheckpointRequestPending = false;
 
 		/*
 		 * Process any requests or signals received recently.
@@ -494,6 +501,10 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Don't sleep if pending request exists */
+		if (CheckpointRequestPending)
+			continue;
+
 		/*
 		 * Sleep until we are signaled or it's time for another checkpoint or
 		 * xlog file switch.
@@ -817,6 +828,7 @@ ReqCheckpointHandler(SIGNAL_ARGS)
 	 */
 	SetLatch(MyLatch);
 
+	CheckpointRequestPending = true;
 	errno = save_errno;
 }
 
-- 
2.18.2



Re: 2pc leaks fds

2020-04-08 Thread Antonin Houska
Andres Freund  wrote:

> On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> > Andres Freund  wrote:
> > It should have allowed users to have different ways to *locate the segment*
> > file. The WALSegmentOpen callback could actually return file path instead of
> > the file descriptor and let WALRead() perform the opening/closing, but then
> > the WALRead function would need to be aware whether it is executing in 
> > backend
> > or in frontend (so it can use the correct function to open/close the file).
> > 
> > I was aware of the problem that the correct function should be used to open
> > the file and that's why this comment was added (although "mandatory" would 
> > be
> > more suitable than "preferred"):
> > 
> >  * BasicOpenFile() is the preferred way to open the segment file in backend
> >  * code, whereas open(2) should be used in frontend.
> >  */
> > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext 
> > *segcxt,
> >TimeLineID *tli_p);
> 
> I don't think that BasicOpenFile() really solves anything here? If
> anything it *exascerbates* the problem, because it will trigger all of
> the "virtual file descriptors" for already opened Files to close() the
> underlying OS FDs.  So not even a fully cached table can be seqscanned,
> because that tries to check the file size...

Specifically for 2PC, isn't it better to close some OS-level FD of an
unrelated table scan and then succeed than to ERROR immediately? Anyway,
0dc8ead46 hasn't changed this.

I at least admit that the comment should not recommend particular function,
and that WALRead() should call the appropriate function to close the file,
rather than always calling close().

Can we just pass the existing FD to the callback as an additional argument,
and let the callback close it?

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




Re: [HACKERS] Restricting maximum keep segments by repslots

2020-04-08 Thread Kyotaro Horiguchi
At Wed, 08 Apr 2020 16:46:05 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 08 Apr 2020 14:19:56 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> The latch triggered by checkpoint request by CHECKPOINT command has
> been absorbed by ConditionVariableSleep() in
> InvalidateObsoleteReplicationSlots.  The attached allows checkpointer
> use MyLatch for other than checkpoint request while a checkpoint is
> running.

Checkpoint requests happens during waiting for the CV causes spurious
wake up but that doesn't harm.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Proposal] Global temporary tables

2020-04-08 Thread 曾文旌


> 2020年4月7日 下午6:22,Prabhat Sahu  写道:
> 
> Thanks for review.
> This parameter should support all types of writing of the bool type like 
> parameter autovacuum_enabled.
> So I fixed in global_temporary_table_v24-pg13.patch.
> 
> Thank you Wenjing for the new patch with the fix and the "VACUUM FULL GTT" 
> support.
> I have verified the above issue now its resolved.
> 
> Please check the below findings on VACUUM FULL.
> 
> postgres=# create global temporary table  gtt(c1 int) on commit preserve rows;
> CREATE TABLE
> postgres=# vacuum FULL ;
> WARNING:  global temp table oldest FrozenXid is far in the past
> HINT:  please truncate them or kill those sessions that use them.
> VACUUM

This is expected,
This represents that the GTT FrozenXid is the oldest in the entire db, and dba 
should vacuum the GTT if he want to push the db datfrozenxid.
Also he can use function pg_list_gtt_relfrozenxids() to check which session has 
"too old” data and truncate them or kill the sessions.



> 
> -- 
> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com 



smime.p7s
Description: S/MIME cryptographic signature


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-04-08 Thread Masahiko Sawada
On Wed, 8 Apr 2020 at 16:04, Amit Kapila  wrote:
>
> On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, 8 Apr 2020 at 14:44, Amit Kapila  wrote:
> > >
> > >
> > > Thanks for the investigation.  I don't see we can do anything special
> > > about this.  In an ideal world, this should be done once and not for
> > > each worker but I guess it doesn't matter too much.  I am not sure if
> > > it is worth adding a comment for this, what do you think?
> > >
> >
> > I agree with you. If the differences were considerably large probably
> > we would do something but I think we don't need to anything at this
> > time.
> >
>
> Fair enough, can you once check this in back-branches as this needs to
> be backpatched?  I will do that once by myself as well.

I've done the same test with HEAD of both REL_12_STABLE and
REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
parallel index creation was introduced. I've attached the patches
for PG12 and PG11 I used for this test for reference.

Here are the results:

* PG12

With no worker:
-[ RECORD 1 ]---+-
shared_blks_hit | 119
shared_blks_read| 44283
total_read_blks | 44402
shared_blks_dirtied | 44262
shared_blks_written | 24925

With 4 workers:
-[ RECORD 1 ]---+
shared_blks_hit | 128
shared_blks_read| 8844
total_read_blks | 8972
shared_blks_dirtied | 8822
shared_blks_written | 5393

With 4 workers after patching:
-[ RECORD 1 ]---+
shared_blks_hit | 140
shared_blks_read| 44284
total_read_blks | 44424
shared_blks_dirtied | 44262
shared_blks_written | 26574

* PG11

With no worker:
-[ RECORD 1 ]---+
shared_blks_hit | 124
shared_blks_read| 44284
total_read_blks | 44408
shared_blks_dirtied | 44263
shared_blks_written | 24908

With 4 workers:
-[ RECORD 1 ]---+-
shared_blks_hit | 132
shared_blks_read| 8910
total_read_blks | 9042
shared_blks_dirtied | 
shared_blks_written | 5370

With 4 workers after patched:
-[ RECORD 1 ]---+-
shared_blks_hit | 144
shared_blks_read| 44285
total_read_blks | 44429
shared_blks_dirtied | 44263
shared_blks_written | 26861


Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index edc4a82b02..da5b39eb02 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -67,6 +67,7 @@
 #include "access/xloginsert.h"
 #include "catalog/index.h"
 #include "commands/progress.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/smgr.h"
@@ -81,6 +82,7 @@
 #define PARALLEL_KEY_TUPLESORT			UINT64CONST(0xA002)
 #define PARALLEL_KEY_TUPLESORT_SPOOL2	UINT64CONST(0xA003)
 #define PARALLEL_KEY_QUERY_TEXT			UINT64CONST(0xA004)
+#define PARALLEL_KEY_BUFFER_USAGE		UINT64CONST(0xA005)
 
 /*
  * DISABLE_LEADER_PARTICIPATION disables the leader's participation in
@@ -203,6 +205,7 @@ typedef struct BTLeader
 	Sharedsort *sharedsort;
 	Sharedsort *sharedsort2;
 	Snapshot	snapshot;
+	BufferUsage *bufferusage;
 } BTLeader;
 
 /*
@@ -1336,6 +1339,7 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	Sharedsort *sharedsort2;
 	BTSpool*btspool = buildstate->spool;
 	BTLeader   *btleader = (BTLeader *) palloc0(sizeof(BTLeader));
+	BufferUsage *bufferusage;
 	bool		leaderparticipates = true;
 	char	   *sharedquery;
 	int			querylen;
@@ -1388,6 +1392,17 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 		shm_toc_estimate_keys(&pcxt->estimator, 3);
 	}
 
+	/*
+	 * Estimate space for BufferUsage -- PARALLEL_KEY_BUFFER_USAGE.
+	 *
+	 * If there are no extensions loaded that care, we could skip this.  We
+	 * have no way of knowing whether anyone's looking at pgBufferUsage,
+	 * so do it unconditionally.
+	 */
+	shm_toc_estimate_chunk(&pcxt->estimator,
+		   mul_size(sizeof(BufferUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Finally, estimate PARALLEL_KEY_QUERY_TEXT space */
 	querylen = strlen(debug_query_string);
 	shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
@@ -1459,6 +1474,11 @@ _bt_begin_parallel(BTBuildState *buildstate, bool isconcurrent, int request)
 	memcpy(sharedquery, debug_query_string, querylen + 1);
 	shm_toc_insert(pcxt->toc, PARALLEL_KEY_QUERY_TEXT, sharedquery);
 
+	/* Allocate space for each worker's BufferUsage; no need to initialize */
+	bufferusage = shm_toc_allocate(pcxt->toc,
+   mul_size(sizeof(BufferUsage), pcxt->nworkers));
+	shm_toc_insert(pcxt->toc, PARALLEL_KEY_BUFFER_USAGE, bufferusage);
+
 	/* Launch workers, saving status for leader/caller */
 	LaunchParallelWorkers(pcxt);
 	btleader->pcxt = pcxt;
@@ -1469,6 +1489,7 @@ _bt_begin_paral

Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-08 Thread Fujii Masao



On 2020/04/03 16:26, Julien Rouhaud wrote:

On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:

Fujii Masao-4 wrote

On 2020/04/01 18:19, Fujii Masao wrote:

Finally I pushed the patch!
Many thanks for all involved in this patch!

As a remaining TODO item, I'm thinking that the document would need to
be improved. For example, previously the query was not stored in pgss
when it failed. But, in v13, if pgss_planning is enabled, such a query is
stored because the planning succeeds. Without the explanation about
that behavior in the document, I'm afraid that users will get confused.
Thought?


Thank you all for this work and especially to Julian for its major
contribution !



Thanks a lot to everyone!  This was quite a long journey.



Regarding the TODO point: Yes I agree that it can be improved.
My proposal:

"Note that planning and execution statistics are updated only at their
respective end phase, and only for successfull operations.
For exemple executions counters of a long running SELECT query,
will be updated at the execution end, without showing any progress
report in the interval.
Other exemple, if the statement is successfully planned but fails in
the execution phase, only its planning statistics are stored.
This may give uncorrelated plans vs calls informations."


Thanks for the proposal!


There are numerous reasons for lack of correlation between number of planning
and number of execution, so I'm afraid that this will give users the false
impression that only failed execution can lead to that.

Here's some enhancement on your proposal:

"Note that planning and execution statistics are updated only at their
respective end phase, and only for successful operations.
For example the execution counters of a long running query
will only be updated at the execution end, without showing any progress
report before that.


Probably since this is not the example for explaining the relationship of
planning and execution stats, it's better to explain this separately or just
drop it?

What about the attached patch based on your proposals?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index 3d26108649..5a962feb39 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -129,7 +129,7 @@
   calls
   bigint
   
-  Number of times executed
+  Number of times the statement was executed
  
 
  
@@ -398,6 +398,16 @@
reducing pg_stat_statements.max to prevent
recurrences.
   
+
+  
+   plans and calls aren't
+   always expected to match because planning and execution statistics are
+   updated at their respective end phase, and only for successful operations.
+   For example, if a statement is successfully planned but fails during
+   the execution phase, only its planning statistics will be updated.
+   If planning is skipped because a cached plan is used, only its execution
+   statistics will be updated.
+  
  
 
  


Re: pg_stat_statements issue with parallel maintenance (Was Re: WAL usage calculation patch)

2020-04-08 Thread Amit Kapila
On Wed, Apr 8, 2020 at 1:49 PM Masahiko Sawada
 wrote:
>
> On Wed, 8 Apr 2020 at 16:04, Amit Kapila  wrote:
> >
> > On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
> >  wrote:
> > >
> > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila  wrote:
> > > >
> > > >
> > > > Thanks for the investigation.  I don't see we can do anything special
> > > > about this.  In an ideal world, this should be done once and not for
> > > > each worker but I guess it doesn't matter too much.  I am not sure if
> > > > it is worth adding a comment for this, what do you think?
> > > >
> > >
> > > I agree with you. If the differences were considerably large probably
> > > we would do something but I think we don't need to anything at this
> > > time.
> > >
> >
> > Fair enough, can you once check this in back-branches as this needs to
> > be backpatched?  I will do that once by myself as well.
>
> I've done the same test with HEAD of both REL_12_STABLE and
> REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
> parallel index creation was introduced. I've attached the patches
> for PG12 and PG11 I used for this test for reference.
>

Thanks, I will once again verify and push this tomorrow if there are
no other comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




[PATCH] RUM Postgres 13 patch

2020-04-08 Thread Pavel Borisov
Dear hackers!
Due to changes in PG13 RUM extension had errors on compiling.
I propose a short patch to correct this.

Best regards,

Pavel Borisov


rum_pg13_patch.diff
Description: Binary data


Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread Fujii Masao




On 2020/04/08 16:41, movead...@highgo.ca wrote:


 >> we should notice it in document I think.

There is the following explation about the relationship the recovery
pause and the promotion, in the document. You may want to add more
descriptions into the docs?

 >--

If a promotion is triggered while recovery is paused, the paused
state ends and a promotion continues.
--


For example we can add this words:
First-time pg_wal_replay_pause() called during recovery which triggered
as promotion, pg_wal_replay_pause() show success but it did not really
pause the recovery.


I think this is not true. In your case, pg_wal_replay_pause() succeeded
because the startup process had not detected the promotion request yet
at that moment.

To cover your case, what about adding the following description?

-
There can be delay between a promotion request by users and the trigger of
a promotion in the server. Note that pg_wal_replay_pause() succeeds
during that delay, i.e., until a promotion is actually triggered.
-

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: adding partitioned tables to publications

2020-04-08 Thread Peter Eisentraut

On 2020-04-08 07:45, Amit Langote wrote:

On Wed, Apr 8, 2020 at 1:22 AM Amit Langote  wrote:

On Tue, Apr 7, 2020 at 6:01 PM Peter Eisentraut
 wrote:

The descriptions of the new fields in RelationSyncEntry don't seem to
match the code accurately, or at least it's confusing.
replicate_as_relid is always filled in with an ancestor, even if
pubviaroot is not set.


Given this confusion, I have changed how replicate_as_relid works so
that it's now always set -- if different from the relation's own OID,
the code for "publishing via root" kicks in in various places.


I think the pubviaroot field is actually not necessary.  We only need
replicate_as_relid.


Looking through the code, I agree.  I guess I only kept it around to
go with pubupdate, etc.


Think I broke truncate replication with this.  Fixed in the attached
updated patch.


All committed.

Thank you and everyone very much for working on this.  I'm very happy 
that these two features from PG10 have finally met. :)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [PATCH] RUM Postgres 13 patch

2020-04-08 Thread Alexander Korotkov
Hi, Pavel!

On Wed, Apr 8, 2020 at 12:14 PM Pavel Borisov  wrote:
> Due to changes in PG13 RUM extension had errors on compiling.
> I propose a short patch to correct this.

RUM is an extension managed by Postgres Pro and not discussed in
pgsql-hackers mailing lists.  Please, make a pull request on github.
https://github.com/postgrespro/rum

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-08 Thread Julien Rouhaud
On Wed, Apr 08, 2020 at 05:37:27PM +0900, Fujii Masao wrote:
> 
> 
> On 2020/04/03 16:26, Julien Rouhaud wrote:
> > On Thu, Apr 02, 2020 at 01:04:28PM -0700, legrand legrand wrote:
> > > Fujii Masao-4 wrote
> > > > On 2020/04/01 18:19, Fujii Masao wrote:
> > > > 
> > > > Finally I pushed the patch!
> > > > Many thanks for all involved in this patch!
> > > > 
> > > > As a remaining TODO item, I'm thinking that the document would need to
> > > > be improved. For example, previously the query was not stored in pgss
> > > > when it failed. But, in v13, if pgss_planning is enabled, such a query 
> > > > is
> > > > stored because the planning succeeds. Without the explanation about
> > > > that behavior in the document, I'm afraid that users will get confused.
> > > > Thought?
> > > 
> > > Thank you all for this work and especially to Julian for its major
> > > contribution !
> > 
> > 
> > Thanks a lot to everyone!  This was quite a long journey.
> > 
> > 
> > > Regarding the TODO point: Yes I agree that it can be improved.
> > > My proposal:
> > > 
> > > "Note that planning and execution statistics are updated only at their
> > > respective end phase, and only for successfull operations.
> > > For exemple executions counters of a long running SELECT query,
> > > will be updated at the execution end, without showing any progress
> > > report in the interval.
> > > Other exemple, if the statement is successfully planned but fails in
> > > the execution phase, only its planning statistics are stored.
> > > This may give uncorrelated plans vs calls informations."
> 
> Thanks for the proposal!
> 
> > There are numerous reasons for lack of correlation between number of 
> > planning
> > and number of execution, so I'm afraid that this will give users the false
> > impression that only failed execution can lead to that.
> > 
> > Here's some enhancement on your proposal:
> > 
> > "Note that planning and execution statistics are updated only at their
> > respective end phase, and only for successful operations.
> > For example the execution counters of a long running query
> > will only be updated at the execution end, without showing any progress
> > report before that.
> 
> Probably since this is not the example for explaining the relationship of
> planning and execution stats, it's better to explain this separately or just
> drop it?
> 
> What about the attached patch based on your proposals?
> 

Thanks Fuji-san, it looks perfect to me!




Re: recovery_target_action=pause with confusing hint

2020-04-08 Thread movead...@highgo.ca

>To cover your case, what about adding the following description?
 
>-
>There can be delay between a promotion request by users and the trigger of
>a promotion in the server. Note that pg_wal_replay_pause() succeeds
>during that delay, i.e., until a promotion is actually triggered.
>-
Yes that's it.




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


[Patch] Use internal pthreads reimplementation only when building with MSVC

2020-04-08 Thread Sandro Mani

Hi

The following patch, which we added to build mingw-postgresql on Fedora, 
makes the internal minimal pthreads reimplementation only used when 
building with MSVC, as on MINGW it causes symbol collisions with the 
symbols provided my winpthreads.


Thanks
Sandro


diff -rupN postgresql-11.5/src/interfaces/ecpg/ecpglib/misc.c 
postgresql-11.5-new/src/interfaces/ecpg/ecpglib/misc.c
--- postgresql-11.5/src/interfaces/ecpg/ecpglib/misc.c 2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/ecpg/ecpglib/misc.c 2020-04-08 
11:20:39.850738296 +0200

@@ -449,7 +449,7 @@ ECPGis_noind_null(enum ECPGttype type, c
 return false;
 }

-#ifdef WIN32
+#ifdef _MSC_VER
 #ifdef ENABLE_THREAD_SAFETY

 void
diff -rupN 
postgresql-11.5/src/interfaces/ecpg/include/ecpg-pthread-win32.h 
postgresql-11.5-new/src/interfaces/ecpg/include/ecpg-pthread-win32.h
--- postgresql-11.5/src/interfaces/ecpg/include/ecpg-pthread-win32.h 
2019-08-05 23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/ecpg/include/ecpg-pthread-win32.h 
2020-04-08 11:20:39.851738296 +0200

@@ -7,7 +7,7 @@

 #ifdef ENABLE_THREAD_SAFETY

-#ifndef WIN32
+#ifndef _MSC_VER

 #include 
 #else
diff -rupN postgresql-11.5/src/interfaces/libpq/fe-connect.c 
postgresql-11.5-new/src/interfaces/libpq/fe-connect.c
--- postgresql-11.5/src/interfaces/libpq/fe-connect.c 2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/fe-connect.c 2020-04-08 
11:20:39.853738297 +0200

@@ -50,7 +50,7 @@
 #endif

 #ifdef ENABLE_THREAD_SAFETY
-#ifdef WIN32
+#ifdef _MSC_VER
 #include "pthread-win32.h"
 #else
 #include 
diff -rupN postgresql-11.5/src/interfaces/libpq/fe-secure.c 
postgresql-11.5-new/src/interfaces/libpq/fe-secure.c
--- postgresql-11.5/src/interfaces/libpq/fe-secure.c    2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/fe-secure.c 2020-04-08 
11:20:39.854738297 +0200

@@ -48,7 +48,7 @@
 #include 

 #ifdef ENABLE_THREAD_SAFETY
-#ifdef WIN32
+#ifdef _MSC_VER
 #include "pthread-win32.h"
 #else
 #include 
diff -rupN postgresql-11.5/src/interfaces/libpq/fe-secure-openssl.c 
postgresql-11.5-new/src/interfaces/libpq/fe-secure-openssl.c
--- postgresql-11.5/src/interfaces/libpq/fe-secure-openssl.c 2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/fe-secure-openssl.c 
2020-04-08 11:20:39.855738298 +0200

@@ -47,7 +47,7 @@
 #include 

 #ifdef ENABLE_THREAD_SAFETY
-#ifdef WIN32
+#ifdef _MSC_VER
 #include "pthread-win32.h"
 #else
 #include 
diff -rupN postgresql-11.5/src/interfaces/libpq/libpq-int.h 
postgresql-11.5-new/src/interfaces/libpq/libpq-int.h
--- postgresql-11.5/src/interfaces/libpq/libpq-int.h    2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/libpq-int.h 2020-04-08 
11:20:39.855738298 +0200

@@ -29,7 +29,7 @@
 #endif

 #ifdef ENABLE_THREAD_SAFETY
-#ifdef WIN32
+#ifdef _MSC_VER
 #include "pthread-win32.h"
 #else
 #include 
diff -rupN postgresql-11.5/src/interfaces/libpq/pthread-win32.c 
postgresql-11.5-new/src/interfaces/libpq/pthread-win32.c
--- postgresql-11.5/src/interfaces/libpq/pthread-win32.c 2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/pthread-win32.c 2020-04-08 
11:21:51.674766968 +0200

@@ -10,10 +10,13 @@
 *-
 */

+#ifdef _MSC_VER
+
 #include "postgres_fe.h"

 #include "pthread-win32.h"

+
 DWORD
 pthread_self(void)
 {
@@ -58,3 +61,5 @@ pthread_mutex_unlock(pthread_mutex_t *mp
 LeaveCriticalSection(*mp);
 return 0;
 }
+
+#endif // _MSC_VER





[Patch] Add missing libraries to Libs.private of libpq.pc

2020-04-08 Thread Sandro Mani

Hello

The following patch, which we added to build mingw-postgresql on Fedora, 
adds some missing libraries to Libs.private of libpq.pc, discovered when 
attempting to statically link with libpq:


-lz: is required by -lcrypto
-liconv: is required by -lintl (though possibly depends on whether 
gettext was compiled with iconv support)


Thanks
Sandro


diff -rupN postgresql-11.5/src/interfaces/libpq/Makefile 
postgresql-11.5-new/src/interfaces/libpq/Makefile
--- postgresql-11.5/src/interfaces/libpq/Makefile    2019-08-05 
23:14:59.0 +0200
+++ postgresql-11.5-new/src/interfaces/libpq/Makefile 2020-04-07 
13:49:00.801203610 +0200

@@ -80,10 +80,10 @@ endif
 ifneq ($(PORTNAME), win32)
 SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto 
-lkrb5 -lgssapi_krb5 -lgss -lgssapi -lssl -lsocket -lnsl -lresolv 
-lintl, $(LIBS)) $(LDAP_LIBS_FE) $(PTHREAD_LIBS)

 else
-SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lk5crypto 
-lkrb5 -lgssapi32 -lssl -lsocket -lnsl -lresolv -lintl $(PTHREAD_LIBS), 
$(LIBS)) $(LDAP_LIBS_FE)
+SHLIB_LINK += $(filter -lcrypt -ldes -lcom_err -lcrypto -lz -lk5crypto 
-lkrb5 -lgssapi32 -lssl -lsocket -lnsl -lresolv -lintl $(PTHREAD_LIBS), 
$(LIBS)) $(LDAP_LIBS_FE)

 endif
 ifeq ($(PORTNAME), win32)
-SHLIB_LINK += -lshell32 -lws2_32 -lsecur32 $(filter -leay32 -lssleay32 
-lcomerr32 -lkrb5_32, $(LIBS))
+SHLIB_LINK += -lshell32 -lws2_32 -lsecur32 -liconv $(filter -leay32 
-lssleay32 -lcomerr32 -lkrb5_32, $(LIBS))

 endif

 SHLIB_EXPORTS = exports.txt





Re: 2pc leaks fds

2020-04-08 Thread Ahsan Hadi
I have tested with and without the commit from Andres using the pgbench
script (below) provided in the initial email.

pgbench -n -s 500 -c 4 -j 4 -T 10 -P1 -f pgbench-write-2pc.sql

I am not getting the leak anymore, it seems to be holding up pretty well.


On Wed, Apr 8, 2020 at 12:59 PM Antonin Houska  wrote:

> Andres Freund  wrote:
>
> > On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> > > Andres Freund  wrote:
> > > It should have allowed users to have different ways to *locate the
> segment*
> > > file. The WALSegmentOpen callback could actually return file path
> instead of
> > > the file descriptor and let WALRead() perform the opening/closing, but
> then
> > > the WALRead function would need to be aware whether it is executing in
> backend
> > > or in frontend (so it can use the correct function to open/close the
> file).
> > >
> > > I was aware of the problem that the correct function should be used to
> open
> > > the file and that's why this comment was added (although "mandatory"
> would be
> > > more suitable than "preferred"):
> > >
> > >  * BasicOpenFile() is the preferred way to open the segment file in
> backend
> > >  * code, whereas open(2) should be used in frontend.
> > >  */
> > > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext
> *segcxt,
> > >TimeLineID
> *tli_p);
> >
> > I don't think that BasicOpenFile() really solves anything here? If
> > anything it *exascerbates* the problem, because it will trigger all of
> > the "virtual file descriptors" for already opened Files to close() the
> > underlying OS FDs.  So not even a fully cached table can be seqscanned,
> > because that tries to check the file size...
>
> Specifically for 2PC, isn't it better to close some OS-level FD of an
> unrelated table scan and then succeed than to ERROR immediately? Anyway,
> 0dc8ead46 hasn't changed this.
>
> I at least admit that the comment should not recommend particular function,
> and that WALRead() should call the appropriate function to close the file,
> rather than always calling close().
>
> Can we just pass the existing FD to the callback as an additional argument,
> and let the callback close it?
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
>
>

-- 
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.h...@highgo.ca


Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Etsuro Fujita
Hi Kuntal,

On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh  wrote:
> I'm getting the following warning during compilation.
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
> [-Wunused-variable]
>   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>  ^
> For fixing the same, we can declare inner_binfo as
> PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.

I'd propose to remove an assertion causing this (and  the
outer_binfo/inner_binfo variables) from partition_bounds_merge(),
rather than doing so, because the assertion is redundant, as we have
the same assertion in merge_list_bounds() and merge_range_bounds().
Please find attached a patch.

Best regards,
Etsuro Fujita


fix-compiler-warning.patch
Description: Binary data


Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Ashutosh Bapat
Thanks Kuntal for the report. Let me know if this patch works for you.

On Wed, 8 Apr 2020 at 13:00, Kuntal Ghosh 
wrote:

> Hi,
>
> On Wed, Apr 8, 2020 at 7:07 AM Etsuro Fujita 
> wrote:
> >
> > Pushed after modifying some comments further, based on the suggestions
> > of Ashutosh.
> I'm getting the following warning during compilation.
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
>   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>  ^
> For fixing the same, we can declare inner_binfo as
> PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>
> --
> Thanks & Regards,
> Kuntal Ghosh
> EnterpriseDB: http://www.enterprisedb.com
>


-- 
Best Wishes,
Ashutosh
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 7607501fe7..4681441dcc 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -1021,7 +1021,6 @@ partition_bounds_merge(int partnatts,
 	   List **outer_parts, List **inner_parts)
 {
 	PartitionBoundInfo outer_binfo = outer_rel->boundinfo;
-	PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
 
 	/*
 	 * Currently, this function is called only from try_partitionwise_join(),
@@ -1032,7 +1031,7 @@ partition_bounds_merge(int partnatts,
 		   jointype == JOIN_ANTI);
 
 	/* The partitioning strategies should be the same. */
-	Assert(outer_binfo->strategy == inner_binfo->strategy);
+	Assert(outer_binfo->strategy == inner_rel->boundinfo->strategy);
 
 	*outer_parts = *inner_parts = NIL;
 	switch (outer_binfo->strategy)


Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Ashutosh Bapat
On Wed, 8 Apr 2020 at 15:42, Etsuro Fujita  wrote:

> Hi Kuntal,
>
> On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh 
> wrote:
> > I'm getting the following warning during compilation.
> >
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
> >   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
> >  ^
> > For fixing the same, we can declare inner_binfo as
> > PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>
> I'd propose to remove an assertion causing this (and  the
> outer_binfo/inner_binfo variables) from partition_bounds_merge(),
> rather than doing so, because the assertion is redundant, as we have
> the same assertion in merge_list_bounds() and merge_range_bounds().
> Please find attached a patch.
>

Oh, I didn't see this mail before sending my other mail.

I think it's better to have the assertion in all the three places and also
in merge_hash_bounds() whenever that comes along. The assertion in
merge_*_bounds() will be good to in case those functions are called from
places other than partition_bounds_merge(). The assertion in
partition_bounds_merge() will make sure that when the individual
merge_*_bounds() functions are called based on one of the bounds both of
the bounds have same strategy.
-- 
Best Wishes,
Ashutosh


Re: [Proposal] Global temporary tables

2020-04-08 Thread tushar

On 4/7/20 2:27 PM, 曾文旌 wrote:
Vacuum full GTT, cluster GTT is already 
supported in global_temporary_table_v24-pg13.patch.

Please refer this below scenario , where pg_upgrade is failing
1)Server is up and running (./pg_ctl -D data status)
2)Stop the server ( ./pg_ctl -D data stop)
3)Connect to server using single user mode ( ./postgres --single -D data 
postgres) and create a global temp table

[tushar@localhost bin]$ ./postgres --single -D data1233 postgres

PostgreSQL stand-alone backend 13devel
backend> create global temp table t(n int);

--Press Ctl+D to exit

4)Perform initdb ( ./initdb -D data123)
5.Run pg_upgrade
[tushar@localhost bin]$ ./pg_upgrade -d data -D data123 -b . -B .
--
--
--
Restoring database schemas in the new cluster
  postgres
*failure*
Consult the last few lines of "pg_upgrade_dump_13592.log" for
the probable cause of the failure.
Failure, exiting

log file content  -

[tushar@localhost bin]$ tail -20   pg_upgrade_dump_13592.log
pg_restore: error: could not execute query: ERROR:  pg_type array OID 
value not set when in binary upgrade mode

Command was:
-- For binary upgrade, must preserve pg_type oid
SELECT 
pg_catalog.binary_upgrade_set_next_pg_type_oid('13594'::pg_catalog.oid);



-- For binary upgrade, must preserve pg_class oids
SELECT 
pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('13593'::pg_catalog.oid);


CREATE GLOBAL TEMPORARY TABLE "public"."t" (
    "n" integer
)
WITH ("on_commit_delete_rows"='false');

-- For binary upgrade, set heap's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '0', relminmxid = '0'
WHERE oid = '"public"."t"'::pg_catalog.regclass;

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [Proposal] Global temporary tables

2020-04-08 Thread Prabhat Sahu
On Wed, Apr 8, 2020 at 1:48 PM 曾文旌  wrote:

>
>
> 2020年4月7日 下午6:22,Prabhat Sahu  写道:
>
> Thanks for review.
>> This parameter should support all types of writing of the bool type like
>> parameter autovacuum_enabled.
>> So I fixed in global_temporary_table_v24-pg13.patch.
>>
>
> Thank you Wenjing for the new patch with the fix and the "VACUUM FULL GTT"
> support.
> I have verified the above issue now its resolved.
>
> Please check the below findings on VACUUM FULL.
>
> postgres=# create global temporary table  gtt(c1 int) on commit preserve
> rows;
> CREATE TABLE
> postgres=# vacuum FULL ;
> WARNING:  global temp table oldest FrozenXid is far in the past
> HINT:  please truncate them or kill those sessions that use them.
> VACUUM
>
>
> This is expected,
> This represents that the GTT FrozenXid is the oldest in the entire db, and
> dba should vacuum the GTT if he want to push the db datfrozenxid.
> Also he can use function pg_list_gtt_relfrozenxids() to check which
> session has "too old” data and truncate them or kill the sessions.
>

Again as per the HINT given, as  "HINT:  please truncate them or kill those
sessions that use them."
There is only a single session.
If we try "TRUNCATE" and "VACUUM FULL" still the behavior is same as below.

postgres=# truncate gtt ;
TRUNCATE TABLE
postgres=# vacuum full;
WARNING: global temp table oldest FrozenXid is far in the past
HINT: please truncate them or kill those sessions that use them.
VACUUM

I have one more finding related to "CLUSTER table USING index", Please
check the below issue.
postgres=# create global temporary table gtt(c1 int) on commit preserve
rows;
CREATE TABLE
postgres=# create index idx1 ON gtt (c1);
CREATE INDEX

-- exit and re-connect the psql prompt
postgres=# \q
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# cluster gtt using idx1;
WARNING:  relcache reference leak: relation "gtt" not closed
CLUSTER

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel copy

2020-04-08 Thread Amit Kapila
On Tue, Apr 7, 2020 at 7:08 PM Ants Aasma  wrote:
>
> On Tue, 7 Apr 2020 at 08:24, vignesh C  wrote:
> > Leader will create a circular queue
> > and share it across the workers. The circular queue will be present in
> > DSM. Leader will be using a fixed size queue to share the contents
> > between the leader and the workers. Currently we will have 100
> > elements present in the queue. This will be created before the workers
> > are started and shared with the workers. The data structures that are
> > required by the parallel workers will be initialized by the leader,
> > the size required in dsm will be calculated and the necessary keys
> > will be loaded in the DSM. The specified number of workers will then
> > be launched. Leader will read the table data from the file and copy
> > the contents to the queue element by element. Each element in the
> > queue will have 64K size DSA. This DSA will be used to store tuple
> > contents from the file. The leader will try to copy as much content as
> > possible within one 64K DSA queue element. We intend to store at least
> > one tuple in each queue element. There are some cases where the 64K
> > space may not be enough to store a single tuple. Mostly in cases where
> > the table has toast data present and the single tuple can be more than
> > 64K size. In these scenarios we will extend the DSA space accordingly.
> > We cannot change the size of the dsm once the workers are launched.
> > Whereas in case of DSA we can free the dsa pointer and reallocate the
> > dsa pointer based on the memory size required. This is the very reason
> > for choosing DSA over DSM for storing the data that must be inserted
> > into the relation.
>
> I think the element based approach and requirement that all tuples fit
> into the queue makes things unnecessarily complex. The approach I
> detailed earlier allows for tuples to be bigger than the buffer. In
> that case a worker will claim the long tuple from the ring queue of
> tuple start positions, and starts copying it into its local line_buf.
> This can wrap around the buffer multiple times until the next start
> position shows up. At that point this worker can proceed with
> inserting the tuple and the next worker will claim the next tuple.
>

IIUC, with the fixed size buffer, the parallelism might hit a bit
because till the worker copies the data from shared buffer to local
buffer the reader process won't be able to continue.  I think there
will be somewhat more leader-worker coordination is required with the
fixed buffer size. However, as you pointed out, we can't allow it to
increase it to max_size possible for all tuples as that might require
a lot of memory.  One idea could be that we allow it for first any
such tuple and then if any other element/chunk in the queue required
more memory than the default 64KB, then we will always fallback to use
the memory we have allocated for first chunk.  This will allow us to
not use more memory except for one tuple and won't hit parallelism
much as in many cases not all tuples will be so large.

I think in the proposed approach queue element is nothing but a way to
divide the work among workers based on size rather than based on
number of tuples.  Say if we try to divide the work among workers
based on start offsets, it can be more tricky.  Because it could lead
to either a lot of contentention if we choose say one offset
per-worker (basically copy the data for one tuple, process it and then
pick next tuple) or probably unequal division of work because some can
be smaller and others can be bigger.  I guess division based on size
would be a better idea. OTOH, I see the advantage of your approach as
well and I will think more on it.

>
> > We had a couple of options for the way in which queue elements can be 
> > stored.
> > Option 1:  Each element (DSA chunk) will contain tuples such that each
> > tuple will be preceded by the length of the tuple.  So the tuples will
> > be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
> > tuple-2),  Or Option 2: Each element (DSA chunk) will contain only
> > tuples (tuple-1), (tuple-2), .  And we will have a second
> > ring-buffer which contains a start-offset or length of each tuple. The
> > old design used to generate one tuple of data and process tuple by
> > tuple. In the new design, the server will generate multiple tuples of
> > data per queue element. The worker will then process data tuple by
> > tuple. As we are processing the data tuple by tuple, I felt both of
> > the options are almost the same. However Design1 was chosen over
> > Design 2 as we can save up on some space that was required by another
> > variable in each element of the queue.
>
> With option 1 it's not possible to read input data into shared memory
> and there needs to be an extra memcpy in the time critical sequential
> flow of the leader. With option 2 data could be read directly into the
> shared memory buffer. With future async io support, reading an

Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote
On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
 wrote:
> All committed.
>
> Thank you and everyone very much for working on this.  I'm very happy
> that these two features from PG10 have finally met. :)

Thanks a lot for reviewing and committing.

prion seems to have failed:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13

Also, still unsure why the coverage report for pgoutput.c changes not good:
https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html

Will check.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: WIP: WAL prefetch (another approach)

2020-04-08 Thread Thomas Munro
On Wed, Apr 8, 2020 at 12:52 PM Thomas Munro  wrote:
> * he gave some feedback on the read_local_xlog_page() modifications: I
> probably need to reconsider the change to logical.c that passes NULL
> instead of cxt to the read_page callback; and the switch statement in
> read_local_xlog_page() probably should have a case for the preexisting
> mode

So... logical.c wants to give its LogicalDecodingContext to any
XLogPageReadCB you give it, via "private_data"; that is, it really
only accepts XLogPageReadCB implementations that understand that (or
ignore it).  What I want to do is give every XLogPageReadCB the chance
to have its own state that it is control of (to receive settings
specific to the implementation, or whatever), that you supply along
with it.  We can't do both kinds of things with private_data, so I
have added a second member read_page_data to XLogReaderState.  If you
pass in read_local_xlog_page as read_page, then you can optionally
install a pointer to XLogReadLocalOptions as reader->read_page_data,
to activate the new behaviours I added for prefetching purposes.

While working on that, I realised the readahead XLogReader was
breaking a rule expressed in XLogReadDetermineTimeLine().  Timelines
are really confusing and there were probably several subtle or not to
subtle bugs there.  So I added an option to skip all of that logic,
and just say "I command you to read only from TLI X".  It reads the
same TLI as recovery is reading, until it hits the end of readable
data and that causes prefetching to shut down.  Then the main recovery
loop resets the prefetching module when it sees a TLI switch, so then
it starts up again.  This seems to work reliably, but I've obviously
had limited time to test.  Does this scheme sound sane?

I think this is basically committable (though of course I wish I had
more time to test and review).  Ugh.  Feature freeze in half an hour.
From 8654ea7f2ed61de7ab3f0b305e37d190932ad97c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 17 Mar 2020 15:28:08 +1300
Subject: [PATCH v7 1/4] Rationalize GetWalRcv{Write,Flush}RecPtr().

GetWalRcvWriteRecPtr() previously reported the latest *flushed*
location.  Adopt the conventional terminology used elsewhere in the tree
by renaming it to GetWalRcvFlushRecPtr(), and likewise for some related
variables that used the term "received".

Add a new definition of GetWalRcvWriteRecPtr(), which returns the latest
*written* value.  This will allow later patches to use the value for
non-data-integrity purposes, without having to wait for the flush
pointer to advance.

Reviewed-by: Alvaro Herrera 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
---
 src/backend/access/transam/xlog.c  | 20 +-
 src/backend/access/transam/xlogfuncs.c |  2 +-
 src/backend/replication/README |  2 +-
 src/backend/replication/walreceiver.c  | 15 +-
 src/backend/replication/walreceiverfuncs.c | 24 --
 src/backend/replication/walsender.c|  2 +-
 src/include/replication/walreceiver.h  | 18 
 7 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8a4c1743e5..c60842ea03 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -209,8 +209,8 @@ HotStandbyState standbyState = STANDBY_DISABLED;
 
 static XLogRecPtr LastRec;
 
-/* Local copy of WalRcv->receivedUpto */
-static XLogRecPtr receivedUpto = 0;
+/* Local copy of WalRcv->flushedUpto */
+static XLogRecPtr flushedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
 /*
@@ -9376,7 +9376,7 @@ CreateRestartPoint(int flags)
 	 * Retreat _logSegNo using the current end of xlog replayed or received,
 	 * whichever is later.
 	 */
-	receivePtr = GetWalRcvWriteRecPtr(NULL, NULL);
+	receivePtr = GetWalRcvFlushRecPtr(NULL, NULL);
 	replayPtr = GetXLogReplayRecPtr(&replayTLI);
 	endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr;
 	KeepLogSeg(endptr, &_logSegNo);
@@ -11869,7 +11869,7 @@ retry:
 	/* See if we need to retrieve more data */
 	if (readFile < 0 ||
 		(readSource == XLOG_FROM_STREAM &&
-		 receivedUpto < targetPagePtr + reqLen))
+		 flushedUpto < targetPagePtr + reqLen))
 	{
 		if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
 		 private->randAccess,
@@ -11900,10 +11900,10 @@ retry:
 	 */
 	if (readSource == XLOG_FROM_STREAM)
 	{
-		if (((targetPagePtr) / XLOG_BLCKSZ) != (receivedUpto / XLOG_BLCKSZ))
+		if (((targetPagePtr) / XLOG_BLCKSZ) != (flushedUpto / XLOG_BLCKSZ))
 			readLen = XLOG_BLCKSZ;
 		else
-			readLen = XLogSegmentOffset(receivedUpto, wal_segment_size) -
+			readLen = XLogSegmentOffset(flushedUpto, wal_segment_size) -
 targetPageOff;
 	}
 	else
@@ -12318,7 +12318,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		RequestXLogStreaming(tli, ptr, PrimaryCo

Re: WIP: WAL prefetch (another approach)

2020-04-08 Thread Thomas Munro
On Wed, Apr 8, 2020 at 11:27 PM Thomas Munro  wrote:
> On Wed, Apr 8, 2020 at 12:52 PM Thomas Munro  wrote:
> > * he gave some feedback on the read_local_xlog_page() modifications: I
> > probably need to reconsider the change to logical.c that passes NULL
> > instead of cxt to the read_page callback; and the switch statement in
> > read_local_xlog_page() probably should have a case for the preexisting
> > mode
>
> So... logical.c wants to give its LogicalDecodingContext to any
> XLogPageReadCB you give it, via "private_data"; that is, it really
> only accepts XLogPageReadCB implementations that understand that (or
> ignore it).  What I want to do is give every XLogPageReadCB the chance
> to have its own state that it is control of (to receive settings
> specific to the implementation, or whatever), that you supply along
> with it.  We can't do both kinds of things with private_data, so I
> have added a second member read_page_data to XLogReaderState.  If you
> pass in read_local_xlog_page as read_page, then you can optionally
> install a pointer to XLogReadLocalOptions as reader->read_page_data,
> to activate the new behaviours I added for prefetching purposes.
>
> While working on that, I realised the readahead XLogReader was
> breaking a rule expressed in XLogReadDetermineTimeLine().  Timelines
> are really confusing and there were probably several subtle or not to
> subtle bugs there.  So I added an option to skip all of that logic,
> and just say "I command you to read only from TLI X".  It reads the
> same TLI as recovery is reading, until it hits the end of readable
> data and that causes prefetching to shut down.  Then the main recovery
> loop resets the prefetching module when it sees a TLI switch, so then
> it starts up again.  This seems to work reliably, but I've obviously
> had limited time to test.  Does this scheme sound sane?
>
> I think this is basically committable (though of course I wish I had
> more time to test and review).  Ugh.  Feature freeze in half an hour.

Ok, so the following parts of this work have been committed:

b09ff536:  Simplify the effective_io_concurrency setting.
fc34b0d9:  Introduce a maintenance_io_concurrency setting.
3985b600:  Support PrefetchBuffer() in recovery.
d140f2f3:  Rationalize GetWalRcv{Write,Flush}RecPtr().

However, I didn't want to push the main patch into the tree at
(literally) the last minute after doing such much work on it in the
last few days, without more review from recovery code experts and some
independent testing.  Judging by the comments made in this thread and
elsewhere, I think the feature is in demand so I hope there is a way
we could get it into 13 in the next couple of days, but I totally
accept the release management team's prerogative on that.




Re: adding partitioned tables to publications

2020-04-08 Thread Peter Eisentraut

On 2020-04-08 13:16, Amit Langote wrote:

On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
 wrote:

All committed.

Thank you and everyone very much for working on this.  I'm very happy
that these two features from PG10 have finally met. :)


Thanks a lot for reviewing and committing.

prion seems to have failed:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13


This comes from -DRELCACHE_FORCE_RELEASE.


Also, still unsure why the coverage report for pgoutput.c changes not good:
https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html


I think this is because the END { } section in PostgresNode.pm shuts 
down all running instances in immediate mode, which doesn't save 
coverage properly.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread tushar

Hi,

I just came across this scenario  where - vaccum o/p with (full 1, 
parallel 0) option not working


--working

postgres=# vacuum (parallel 1, full 0 ) foo;
VACUUM
postgres=#

--Not working

postgres=# vacuum (full 1, parallel 0 ) foo;
ERROR:  cannot specify both FULL and PARALLEL options

I think it should work.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company





Re: Thoughts on "killed tuples" index hint bits support on standby

2020-04-08 Thread Michail Nikolaev
Hello, Peter.

Thanks for your feedback.

> Attached is a very rough POC patch of my own, which makes item
> deletion occur "non-opportunistically" in unique indexes. The idea is
> that we exploit the uniqueness property of unique indexes to identify
> "version churn" from non-HOT updates. If any single value on a leaf
> page has several duplicates, then there is a good chance that we can
> safely delete some of them. It's worth going to the heap to check
> whether that's safe selectively, at the point where we'd usually have
> to split the page. We only have to free one or two items to avoid
> splitting the page. If we can avoid splitting the page immediately, we
> may well avoid splitting it indefinitely, or forever.

Yes, it is a brilliant idea to use uniqueness to avoid bloating the index. I am
not able to understand all the code now, but I’ll check the patch in more
detail later.

> This seems fairly relevant to what you're doing. It makes almost all
> index cleanup occur using the new delete infrastructure for some of
> the most interesting workloads where deletion takes place in client
> backends. In practice, a standby will almost be in the same position
> as the primary in a workload that this approach really helps with,
> since setting the LP_DEAD bit itself doesn't really need to happen (we
> can go straight to deleting the items in the new deletion path).

> This is probably
> not limited to the special unique index case that my patch focuses on
> -- we can probably push this general approach forward in a number of
> different ways. I just started with unique indexes because that seemed
> most promising. I have only worked on the project for a few days. I
> don't really know how it will evolve.

Yes, it is relevant, but I think it is «located in a different plane» and
complement each other. Because most of the indexes are not unique these days
and most of the standbys (and even primaries) have long snapshots (up to
minutes, hours) – so, multiple versions of index records are still required for
them. Even if we could avoid multiple versions somehow - it could lead to a very
high rate of query cancelations on standby.

> To address the questions you've asked: I don't really like the idea of
> introducing new rules around tuple visibility and WAL logging to set
> more LP_DEAD bits like this at all. It seems very complicated.

I don’t think it is too complicated. I have polished the idea a little and now
it looks even elegant for me :) I’ll try to explain the concept briefly (there
are no new visibility rules or changes to set more LP_DEAD bits than now –
everything is based on well-tested mechanics):

1) There is some kind of horizon of xmin values primary pushes to a standby
currently. All standby’s snapshots are required to satisfice this horizon to
access heap and indexes. This is done by *ResolveRecoveryConflictWithSnapshot*
and corresponding WAL records (for example -XLOG_BTREE_DELETE).

2) We could introduce a new WAL record (named XLOG_INDEX_HINT in the patch for
now) to define a horizon of xmin required for standby’s snapshot to use LP_DEAD
bits in the indexes.

3) Master sends XLOG_INDEX_HINT in case it sets LP_DEAD bit on the index page
(but before possible FPW caused by hints) by calling *LogIndexHintIfNeeded*. It
is required to send such a record only if the new xmin value is greater than
one send before. I made tests to estimate the amount of new WAL – it is really
small (especially compared to FPW writes done because of LP_DEAD bit set).

4) New XLOG_INDEX_HINT contains only a database id and value of
*latestIndexHintXid* (new horizon position). For simplicity, the primary could
set just set horizon to *RecentGlobalXmin*. But for now in the patch horizon
value extracted from heap in *HeapTupleIsSurelyDead* to reduce the number of
XLOG_INDEX_HINT records even more).


5) There is a new field in PGPROC structure - *indexIgnoreKilledTuples*. If it
is set to true – standby queries are going to use LP_DEAD bits in index scans.
In such a case snapshot is required to satisfice new LP_DEAD-horizon pushed by
XLOG_INDEX_HINT records. It is done by the same mechanism as used for heap -
*ResolveRecoveryConflictWithSnapshot*.

6) The major thing here – it is safe to set *indexIgnoreKilledTuples* to both
‘true’ and ‘false’ from the perspective of correctness. It is just some kind of
performance compromise – use LP_DEAD bits but be aware of XLOG_INDEX_HINT
horizon or vice versa.

7) What is the way to do the right decision about this compromise? It is pretty
simple – if hot_standby_feedback is on and primary confirmed our feedback is
received – then set *indexIgnoreKilledTuples* too ‘true’ – since while feedback
is working as expected – the query will be never canceled by XLOG_INDEX_HINT
horizon!

8) To support cascading standby setups (with a possible break of feedback
chain) – additional byte added to the ‘keep-alive’ message of the feedback
protocol.

9) So, at the moment we are safe to use LP_D

Re: WIP: WAL prefetch (another approach)

2020-04-08 Thread David Steele

On 4/8/20 8:12 AM, Thomas Munro wrote:


Ok, so the following parts of this work have been committed:

b09ff536:  Simplify the effective_io_concurrency setting.
fc34b0d9:  Introduce a maintenance_io_concurrency setting.
3985b600:  Support PrefetchBuffer() in recovery.
d140f2f3:  Rationalize GetWalRcv{Write,Flush}RecPtr().

However, I didn't want to push the main patch into the tree at
(literally) the last minute after doing such much work on it in the
last few days, without more review from recovery code experts and some
independent testing.  


I definitely think that was the right call.


Judging by the comments made in this thread and
elsewhere, I think the feature is in demand so I hope there is a way
we could get it into 13 in the next couple of days, but I totally
accept the release management team's prerogative on that.


That's up to the RMT, of course, but we did already have an extra week. 
Might be best to just get this in at the beginning of the PG14 cycle. 
FWIW, I do think the feature is really valuable.


Looks like you'll need to rebase, so I'll move this to the next CF in 
WoA state.


Regards,
--
-David
da...@pgmasters.net




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 8:22 AM tushar  wrote:
> I just came across this scenario  where - vaccum o/p with (full 1,
> parallel 0) option not working
>
> --working
>
> postgres=# vacuum (parallel 1, full 0 ) foo;
> VACUUM
> postgres=#
>
> --Not working
>
> postgres=# vacuum (full 1, parallel 0 ) foo;
> ERROR:  cannot specify both FULL and PARALLEL options
>
> I think it should work.

Uh, why? There's a clear error message which matches what you tried to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Planning counters in pg_stat_statements (using pgss_store)

2020-04-08 Thread legrand legrand
Fujii Masao-4 wrote
> On 2020/04/03 16:26
> [...]
>> 
>> "Note that planning and execution statistics are updated only at their
>> respective end phase, and only for successful operations.
>> For example the execution counters of a long running query
>> will only be updated at the execution end, without showing any progress
>> report before that.
> 
> Probably since this is not the example for explaining the relationship of
> planning and execution stats, it's better to explain this separately or
> just
> drop it?
> 
> What about the attached patch based on your proposals?

+1
Your patch is perfect ;^>

Regards
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: [patch]socket_timeout in interfaces/libpq

2020-04-08 Thread David Steele

On 3/24/20 10:58 AM, David Steele wrote:

On 11/29/19 12:22 AM, nagaura.ryo...@fujitsu.com wrote:

>

I couldn't understand what you meant.
Do you say that we shouldn't change pqWait() behavior?
Or should I modify my patch to use pqDropConnection()?


This patch no longer applies: http://cfbot.cputube.org/patch_27_2175.log

CF entry has been updated to Waiting on Author.

More importantly it looks like there is still no consensus on this 
patch, which is an uncommitted part of a previous patch [1].


Unless somebody chimes in I'll mark this Returned with Feedback at the 
end of the CF.


Marked Returned with Feedback.

Regards,
--
-David
da...@pgmasters.net




Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2020-04-08 Thread David Steele

On 3/28/20 5:27 AM, Fabien COELHO wrote:


Hello Tom,

Thanks for your feedback,


I'd be rather unclear about what the actual feedback is, though. I'd
interpret it as "pg does not care much about code coverage". Most 
clients
are in the red on coverage.postgresql.org. I'd like pgbench at least 
to be

in the green, but it does not look that it will ever be the case.



The reason why the first iteration failed was that it was insufficiently
insensitive to timing.


This patch has been marked Returned with Feedback.

If the TAP tests could be made to work without the special exceptions 
added to pgbench.c I think this patch would have a better chance.


Regards,
--
-David
da...@pgmasters.net




Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Alexander Korotkov
On Wed, Apr 8, 2020 at 3:43 PM Andres Freund  wrote:
> Realistically it still 2-3 hours of proof-reading.
>
> This makes me sad :(

Can we ask RMT to extend feature freeze for this particular patchset?
I think it's reasonable assuming extreme importance of this patchset.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Robert Haas
On Tue, Apr 7, 2020 at 4:27 PM Andres Freund  wrote:
> The main reason is that we want to be able to cheaply check the current
> state of the variables (mostly when checking a backend's own state). We
> can't access the "dense" ones without holding a lock, but we e.g. don't
> want to make ProcArrayEndTransactionInternal() take a lock just to check
> if vacuumFlags is set.
>
> It turns out to also be good for performance to have the copy for
> another reason: The "dense" arrays share cachelines with other
> backends. That's worth it because it allows to make GetSnapshotData(),
> by far the most frequent operation, touch fewer cache lines. But it also
> means that it's more likely that a backend's "dense" array entry isn't
> in a local cpu cache (it'll be pulled out of there when modified in
> another backend). In many cases we don't need the shared entry at commit
> etc time though, we just need to check if it is set - and most of the
> time it won't be.  The local entry allows to do that cheaply.
>
> Basically it makes sense to access the PGPROC variable when checking a
> single backend's data, especially when we have to look at the PGPROC for
> other reasons already.  It makes sense to look at the "dense" arrays if
> we need to look at many / most entries, because we then benefit from the
> reduced indirection and better cross-process cacheability.

That's a good explanation. I think it should be in the comments or a
README somewhere.

> How about:
> /*
>  * If the current xactCompletionCount is still the same as it was at 
> the
>  * time the snapshot was built, we can be sure that rebuilding the
>  * contents of the snapshot the hard way would result in the same 
> snapshot
>  * contents:
>  *
>  * As explained in transam/README, the set of xids considered running 
> by
>  * GetSnapshotData() cannot change while ProcArrayLock is held. 
> Snapshot
>  * contents only depend on transactions with xids and 
> xactCompletionCount
>  * is incremented whenever a transaction with an xid finishes (while
>  * holding ProcArrayLock) exclusively). Thus the xactCompletionCount 
> check
>  * ensures we would detect if the snapshot would have changed.
>  *
>  * As the snapshot contents are the same as it was before, it is is 
> safe
>  * to re-enter the snapshot's xmin into the PGPROC array. None of the 
> rows
>  * visible under the snapshot could already have been removed (that'd
>  * require the set of running transactions to change) and it fulfills 
> the
>  * requirement that concurrent GetSnapshotData() calls yield the same
>  * xmin.
>  */

That's nice and clear.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Allow auto_explain to log plans before queries are executed

2020-04-08 Thread David Steele

On 3/5/20 8:46 AM, Julien Rouhaud wrote:

On Thu, Feb 27, 2020 at 7:31 AM Julien Rouhaud  wrote:


On Thu, Feb 27, 2020 at 7:12 AM Pavel Stehule  wrote:


čt 27. 2. 2020 v 7:01 odesílatel Yugo NAGATA  napsal:

  I think "query debugger" feature you proposed is out of scope of
auto_explain module. I also think the feature to analyze running
query online is great, but we will need another discussion on a new
module or eature for it.



sure. My note was about using auto_explain like query_debugger. It has not too 
sense, and from this perspective, the original proposal to log plan before 
execution has more sense.

you can log every plan with higher cost than some constant.


Yes I thought about that too.  If you're not in an OLAP environment
(or with a specific user running few expensive queries), setup an
auto_explain.log_before_execution_min_cost.


There was some discussion but no clear consensus on what should really
be done.  I'm marking the patch as waiting on author which seems more
accurate.  Feel free to switch it back if that's a wrong move.


There does seem to be any progress towards a consensus so I'm marking 
this Returned with Feedback.


Regards,
--
-David
da...@pgmasters.net




Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Jonathan S. Katz
On 4/8/20 8:59 AM, Alexander Korotkov wrote:
> On Wed, Apr 8, 2020 at 3:43 PM Andres Freund  wrote:
>> Realistically it still 2-3 hours of proof-reading.
>>
>> This makes me sad :(
> 
> Can we ask RMT to extend feature freeze for this particular patchset?
> I think it's reasonable assuming extreme importance of this patchset.

One of the features of RMT responsibilities[1] is to be "hands off" as
much as possible, so perhaps a reverse ask: how would people feel about
this patch going into PG13, knowing that the commit would come after the
feature freeze date?

My 2¢, with RMT hat off:

As mentioned earlier[2], we know that connection scalability is a major
pain point with PostgreSQL and any effort that can help alleviate that
is a huge win, even in incremental gains. Andres et al experimentation
show that this is more than incremental gains, and will certainly make a
huge difference in people's PostgreSQL experience. It is one of those
features where you can "plug in and win" -- you get a performance
benefit just by upgrading. That is not insignificant.

However, I also want to ensure that we are fair: in the past there have
also been other patches that have been "oh-so-close" to commit before
feature freeze but have not made it in (an example escapes me at the
moment). Therefore, we really need to have consensus among ourselves
that the right decision is to allow this to go in after feature freeze.

Did this come in (very) late into the development cycle? Yes, and I
think normally that's enough to give cause for pause. But I could also
argue that Andres is fixing a "bug" with PostgreSQL (probably several
bugs ;) with PostgreSQL -- and perhaps the fixes can't be backpatched
per se, but they do improve the overall stability and usability of
PostgreSQL and it'd be a shame if we have to wait on them.

Lastly, with the ongoing world events, perhaps time that could have been
dedicated to this and other patches likely affected their completion. I
know most things in my life take way longer than they used to (e.g.
taking out the trash/recycles has gone from a 15s to 240s routine). The
same could be said about other patches as well, but this one has a far
greater impact (a double-edged sword, of course) given it's a feature
that everyone uses in PostgreSQL ;)

So with my RMT hat off, I say +1 to allowing this post feature freeze,
though within a reasonable window.

My 2¢, with RMT hat on:

I believe in[2] I outlined a way a path for including the patch even at
this stage in the game. If it is indeed committed, I think we
immediately put it on the "Recheck a mid-Beta" list. I know it's not as
trivial to change as something like "Determine if jit="on" by default"
(not picking on Andres, I just remember that example from RMT 11), but
it at least provides a notable reminder that we need to ensure we test
this thoroughly, and point people to really hammer it during beta.

So with my RMT hat on, I say +0 but with a ;)

Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/Release_Management_Team#History
[2]
https://www.postgresql.org/message-id/6be8c321-68ea-a865-d8d0-50a3af616463%40postgresql.org



signature.asc
Description: OpenPGP digital signature


Re: NOT IN subquery optimization

2020-04-08 Thread David Steele

On 3/26/20 4:58 PM, Li, Zheng wrote:

 >BTW, so far as I can see, the only reason you're bothering with the whole
 thing is to compare the size of the subquery output with work_mem, because
 that's all that subplan_is_hashable does.  I wonder whether that
 consideration is even still necessary in the wake of 1f39bce02.  If it is,
 I wonder whether there isn't a cheaper way to figure it out.  (Note
 similar comment in make_subplan.)

 The comment in make_subplan says there is no cheaper way to figure out:
 /* At present, however, we can only check hashability after
  * we've made the subplan :-(.  (Determining whether it'll fit in work_mem
  * is the really hard part.)
  */

 I don't see why commit 1f39bce02 is related to this problem. Can you 
expand on this?
 
 >But can't you detect that case directly?  It seems like you'd need to

 figure out the NULL situation anyway to know whether the transformation
 to antijoin is valid in the first place.
 
 Yes, we do need to figure out the NULL situation, and there is always valid transformation

 to antijoin, it's just in the NULL case we need to stuff additional clause 
to the anti join
 condition, and in these cases the transformation actually outperforms 
Subplan (non-hashed),
 but underperforms the hashed Subplan. The unmodified anti hash join has 
similar performance
 compared to hashed Subplan.


There seem to be enough questions about this implementation that I think 
it makes sense to mark this patch Returned with Feedback.


Feel free to resubmit it to a future CF when there is more of a 
consensus on the implementation.


Regards,
--
-David
da...@pgmasters.net




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Improving connection scalability: GetSnapshotData()

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 9:27 AM Jonathan S. Katz  wrote:
> One of the features of RMT responsibilities[1] is to be "hands off" as
> much as possible, so perhaps a reverse ask: how would people feel about
> this patch going into PG13, knowing that the commit would come after the
> feature freeze date?

Letting something be committed after feature freeze, or at any other
time, is just a risk vs. reward trade-off. Every patch carries some
chance of breaking stuff or making things worse. And every patch has a
chance of making something better that people care about.

On general principle, I would categorize this as a moderate-risk
patch. It doesn't change SQL syntax like, e.g. MERGE, nor does it
touch the on-disk format, like, e.g. INSERT .. ON CONFLICT UPDATE. The
changes are relatively localized, unlike, e.g. parallel query. Those
are all things that reduce risk. On the other hand, it's a brand new
patch which has not been thoroughly reviewed by anyone. Moreover,
shakedown time will be minimal because we're so late in the release
cycle. if it has subtle synchronization problems or if it regresses
performance badly in some cases, we might not find out about any of
that until after release. While in theory we could revert it any time,
since no SQL syntax or on-disk format is affected, in practice it will
be difficult to do that if it's making life better for some people and
worse for others.

I don't know what the right thing to do is. I agree with everyone who
says this is a very important problem, and I have the highest respect
for Andres's technical ability. On the other hand, I have been around
here long enough to know that deciding whether to allow late commits
on the basis of how much we like the feature is a bad plan, because it
takes into account only the upside of a commit, and ignores the
possible downside risk. Typically, the commit is late because the
feature was rushed to completion at the last minute, which can have an
effect on quality. I can say, having read through the patches
yesterday, that they don't suck, but I can't say that they're fully
correct. That's not to say that we shouldn't decide to take them, but
it is a concern to be taken seriously. We have made mistakes before in
what we shipped that had serious implications for many users and for
the project; we should all be wary of making more such mistakes. I am
not trying to say that solving problems and making stuff better is NOT
important, just that every coin has two sides.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP/PoC for parallel backup

2020-04-08 Thread Kashif Zeeshan
On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman  wrote:

> Hi,
>
> Thanks, Kashif and Rajkumar. I have fixed the reported issues.
>
> I have added the shared state as previously described. The new grammar
> changes
> are as follows:
>
> START_BACKUP [LABEL ''] [FAST] [MAX_RATE %d]
> - This will generate a unique backupid using pg_strong_random(16) and
> hex-encoded
>   it. which is then returned as the result set.
> - It will also create a shared state and add it to the hashtable. The
> hash table size is set
>   to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically,
> I think it's
>   sufficient initial size. max_wal_senders is not used, because it can
> be set to quite a
>   large values.
>
> JOIN_BACKUP 'backup_id'
> - finds 'backup_id' in hashtable and attaches it to server process.
>
>
> SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
> - renamed SEND_FILES to SEND_FILE
> - removed START_WAL_LOCATION from this because 'startptr' is now
> accessible through
>   shared state.
>
> There is no change in other commands:
> STOP_BACKUP [NOWAIT]
> LIST_TABLESPACES [PROGRESS]
> LIST_FILES [TABLESPACE]
> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>
> The current patches (v11) have been rebased to the latest master. The
> backup manifest is enabled
> by default, so I have disabled it for parallel backup mode and have
> generated a warning so that
> user is aware of it and not expect it in the backup.
>
> Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please
have a look.


1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were
given in previous emails of each bug I tried to verify the fix. Backtrace
is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could
not look up local user ID 1000: Too many open files" when the
max_wal_senders was set to 2000.


[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D
 /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/228 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local
user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$


[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup
/tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from
/home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D
/home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47  if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0  pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1  0x0040904a in cleanup_workers () at pg_basebackup.c:2978
#2  0x00403806 in disconnect_atexit () at pg_basebackup.c:332
#3  0x7f2226f76a49 in __run_exit_handlers (status=1,
listp=0x7f22272f86c8 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true)
at exit.c:77
#4  0x7f2226f76a95 in __GI_exit (status=) at exit.c:99
#5  0x00408c54 in create_parallel_workers (backupinfo=0x952ca0) at
pg_basebackup.c:2811
#6  0x0040798f in BaseBackup () at pg_basebackup.c:2211
#7  0x00408b4d in main (argc=6, argv=0x7ffe3dabc718) at
pg_basebackup.c:2765
(gdb)




b) When executing two backups at the same time, getting FATAL error due to
max_wal_senders and instead of exit  Backup g

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman
On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> >On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >>hyrax is not too happy with this test:
> >>
> >>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
> >>
> >>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >>EXPLAIN output, but it evidently is.
> >>
> >
> >Thanks, I'll investigate. It's not clear to me either what might be
> >causing this, but I guess something must have gone wrong in
> >estimation/planning.
> >
>
> OK, I know what's going on - it's a rather embarassing issue in the
> regression test. There's no analyze on the test tables, so it uses
> default estimates for number of groups etc. But with clobber cache the
> test runs long enough for autoanalyze to kick in and collect stats, so
> we generate better estimates which changes the plan.
>
> I'll get this fixed - explicit analyze and tweaking the data a bit
> should do the trick.

Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.

James




Re: Resume vacuum and autovacuum from interruption and cancellation

2020-04-08 Thread David Steele

On 2/28/20 8:56 AM, Masahiko Sawada wrote:


According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.


I have to say I'm a bit confused by the point of this patch. I get that 
starting in progress is faster but that's only true because the entire 
table is not being vacuumed?


If as you say:

> If we start to vacuum from not first block, we can update neither
> relfrozenxid nor relfrozenxmxid. And we might not be able to update
> even relation statistics.

Then we'll still need to vacuum the entire table before we can be sure 
the oldest xid has been removed/frozen. If we could do those updates on 
a resume then that would change my thoughts on the feature a lot.


What am I missing?

I'm marking this Returned with Feedback due concerns expressed up-thread 
(and mine) and because the patch has been Waiting on Author for nearly 
the entire CF.


Regards,
--
-David
da...@pgmasters.net




Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote
On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
 wrote:
> On 2020-04-08 13:16, Amit Langote wrote:
> > On Wed, Apr 8, 2020 at 6:26 PM Peter Eisentraut
> >  wrote:
> >> All committed.
> >>
> >> Thank you and everyone very much for working on this.  I'm very happy
> >> that these two features from PG10 have finally met. :)
> >
> > Thanks a lot for reviewing and committing.
> >
> > prion seems to have failed:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2020-04-08%2009%3A53%3A13
>
> This comes from -DRELCACHE_FORCE_RELEASE.

I'm seeing some funny stuff on such a build locally too, although
haven't been able to make sense of it yet.

> > Also, still unsure why the coverage report for pgoutput.c changes not good:
> > https://coverage.postgresql.org/src/backend/replication/pgoutput/pgoutput.c.gcov.html
>
> I think this is because the END { } section in PostgresNode.pm shuts
> down all running instances in immediate mode, which doesn't save
> coverage properly.

Thanks for that tip.  Appending the following at the end of the test
file has fixed the coverage reporting for me.

I noticed the following coverage issues:

1. The previous commit f1ac27bfd missed a command that I had included
to cover the following blocks of apply_handle_tuple_routing():

1165 : else
1166 : {
1167   0 : remoteslot =
ExecCopySlot(remoteslot, remoteslot_part);
1168   0 : slot_getallattrs(remoteslot);
1169 : }
...

1200   2 : if (map != NULL)
1201 : {
1202   0 : remoteslot_part =
execute_attr_map_slot(map->attrMap,
1203 :
remoteslot,
1204 :
remoteslot_part);
1205 : }

2. Now that I am able to see proper coverage fo
publish_via_partition_root related changes, I can see that a block in
pgoutput_change() is missing coverage

The attached fixes these coverage issues.

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com


logicalrep-partition-test-coverage-fixes.patch
Description: Binary data


Re: PG compilation error with Visual Studio 2015/2017/2019

2020-04-08 Thread Juan José Santamaría Flecha
On Wed, Apr 8, 2020 at 9:03 AM davinder singh 
wrote:

> On Tue, Apr 7, 2020 at 8:30 PM Juan José Santamaría Flecha <
> juanjo.santama...@gmail.com> wrote:
>
>>
>> * The logic on "defined(_MSC_VER) && (_MSC_VER >= 1900)" is defined as
>> "_WIN32_WINNT >= 0x0600" on other parts of the code. I would
>> recommend using the later.
>>
> I think  "_WIN32_WINNT >= 0x0600" represents windows versions only and
> doesn't include any information about Visual Studio versions. So I am
> sticking to " defined(_MSC_VER) && (_MSC_VER >= 1900)".
>

Let me explain further, in pg_config_os.h you can check that the value of
_WIN32_WINNT is solely based on checking _MSC_VER. This patch should also
be meaningful for WIN32 builds using MinGW, or we might see this issue
reappear in those  systems if update the  MIN_WINNT value to more current
OS versions. So, I still think  _WIN32_WINNT is a better option.

I have resolved other comments. I have attached a new version of the patch.
>

I still see the same last lines in both #ifdef blocks, and pgindent might
change a couple of lines to:
+ MultiByteToWideChar(CP_ACP, 0, winlocname, -1, wc_locale_name,
+ LOCALE_NAME_MAX_LENGTH);
+
+ if ((GetLocaleInfoEx(wc_locale_name, LOCALE_SNAME,
+ (LPWSTR)&buffer, LOCALE_NAME_MAX_LENGTH)) > 0)
+ {

But that is pretty trivial stuff.

Please open an item in the commitfest for this patch.

Regards,

Juan José Santamaría Flecha


Re: adding partitioned tables to publications

2020-04-08 Thread Amit Langote
On Wed, Apr 8, 2020 at 11:07 PM Amit Langote  wrote:
> On Wed, Apr 8, 2020 at 9:21 PM Peter Eisentraut
>  wrote:
> > I think this is because the END { } section in PostgresNode.pm shuts
> > down all running instances in immediate mode, which doesn't save
> > coverage properly.
>
> Thanks for that tip.  Appending the following at the end of the test
> file has fixed the coverage reporting for me.

The patch posted in the previous email has it, but I meant this by
"the following":

+
+$node_publisher->stop('fast');
+$node_subscriber1->stop('fast');
+$node_subscriber2->stop('fast');

-- 

Amit Langote
EnterpriseDB: http://www.enterprisedb.com




Re: Online checksums patch - once again

2020-04-08 Thread David Steele

On 4/1/20 11:30 AM, David Steele wrote:

On 1/18/20 6:18 PM, Daniel Gustafsson wrote:


Attached is a v16 rebased on top of current master which addresses the 
above

commented points, and which I am basing the concurrency work on.


This patch no longer applies cleanly: 
http://cfbot.cputube.org/patch_27_2260.log


The CF entry has been updated to Waiting on Author.

Regards,


There has been review on this patch but no updates in some time. As far 
as I can see there's debate on how to mark relations as fully 
checksummed and/or how to resume.


I'm marking this patch Returned with Feedback. Please feel free to 
resubmit when it is again ready for review.


Regards,
--
-David
da...@pgmasters.net




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Mahendra Singh Thalor
On Wed, 8 Apr 2020 at 17:59, Robert Haas  wrote:
>
> On Wed, Apr 8, 2020 at 8:22 AM tushar  wrote:
> > I just came across this scenario  where - vaccum o/p with (full 1,
> > parallel 0) option not working
> >
> > --working
> >
> > postgres=# vacuum (parallel 1, full 0 ) foo;
> > VACUUM
> > postgres=#
> >
> > --Not working
> >
> > postgres=# vacuum (full 1, parallel 0 ) foo;
> > ERROR:  cannot specify both FULL and PARALLEL options
> >
> > I think it should work.
>
> Uh, why? There's a clear error message which matches what you tried to do.
>

I think, Tushar point is that either we should allow both
vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
both cases, we should through error.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Verify true root on replicas with amcheck

2020-04-08 Thread David Steele

On 1/16/20 7:40 PM, Peter Geoghegan wrote:

On Thu, Jan 9, 2020 at 12:55 AM godjan •  wrote:


I heard that amcheck has an invariant about locking no more than 1 page at a 
moment for avoiding deadlocks. Is there possible a deadlock situation?


This is a conservative principle that I came up with when I wrote the
original version of amcheck. It's not strictly necessary, but it
seemed like a good idea. It should be safe to "couple" buffer locks in
a way that matches the B-Tree code -- as long as it is thought through
very carefully. I am probably going to relax the rule for one specific
case soon -- see:

https://postgr.es/m/f7527087-6e95-4077-b964-d2cafef62...@yandex-team.ru

Your patch looks like it gets it right (it won't deadlock with other
sessions that access the metapage), but I hesitate to commit it
without a strong justification. Acquiring multiple buffer locks
concurrently is worth avoiding wherever possible.


I have marked this patch Returned with Feedback since it has been 
sitting for a while with no response from the author.


Regards,
--
-David
da...@pgmasters.net




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-04-08 Thread Julien Rouhaud
On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
 wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =
>   All 1 tests passed.
> =

Thanks Yamada-san!  Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.




Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2020-04-08 Thread Kuntal Ghosh
Hello Ashutosh, Fujita,

On Wed, Apr 8, 2020 at 3:49 PM Ashutosh Bapat
 wrote:
> On Wed, 8 Apr 2020 at 15:42, Etsuro Fujita  wrote:
>> On Wed, Apr 8, 2020 at 4:30 PM Kuntal Ghosh  
>> wrote:
>> > I'm getting the following warning during compilation.
>> >
>> > partbounds.c: In function ‘partition_bounds_merge’:
>> > partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
>> > [-Wunused-variable]
>> >   PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>> >  ^
>> > For fixing the same, we can declare inner_binfo as
>> > PG_USED_FOR_ASSERTS_ONLY as it is not used for any other purpose.
>>
>> I'd propose to remove an assertion causing this (and  the
>> outer_binfo/inner_binfo variables) from partition_bounds_merge(),
>> rather than doing so, because the assertion is redundant, as we have
>> the same assertion in merge_list_bounds() and merge_range_bounds().
>> Please find attached a patch.
>
>
> I think it's better to have the assertion in all the three places and also in 
> merge_hash_bounds() whenever that comes along. The assertion in 
> merge_*_bounds() will be good to in case those functions are called from 
> places other than partition_bounds_merge(). The assertion in 
> partition_bounds_merge() will make sure that when the individual 
> merge_*_bounds() functions are called based on one of the bounds both of the 
> bounds have same strategy.

Both of your patches fix the problem. I don't have much exposure in
this area to comment on whether we should keep/remove the assertion
from the code. But, here is my opinion:

The code structure looks like following:
Assert(condition A);
if (Condition B)
merge_*_bounds();

Inside merge_*_bounds(), you have both the above assert and the if
condition as another assert:
Assert(condition A and Condition B);

And, merge_*_bounds() are called from only one place. So, something is
redundant here and I'm inclined towards removal of the assert
condition. Another thing I noticed:

/* The partitioning strategies should be the same. */
Assert(outer_binfo->strategy == inner_binfo->strategy);

The comment just reads the assertion aloud which looks unnecessary.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:

On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:

hyrax is not too happy with this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15

It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
EXPLAIN output, but it evidently is.



Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.



OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.



I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/src/test/regress/expected/incremental_sort.out 
b/src/test/regress/expected/incremental_sort.out
index 3072d95643..fb4ab95922 100644
--- a/src/test/regress/expected/incremental_sort.out
+++ b/src/test/regress/expected/incremental_sort.out
@@ -141,7 +141,8 @@ begin
 end;
 $$;
 -- A single large group tested around each mode transition point.
-insert into t(a, b) select 1, i from generate_series(1, 100) n(i);
+insert into t(a, b) select i/100 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 31;
QUERY PLAN
 -
@@ -456,7 +457,8 @@ select * from (select * from t order by a) s order by a, b 
limit 66;
 
 delete from t;
 -- An initial large group followed by a small group.
-insert into t(a, b) select (case when i < 50 then 1 else 2 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select i/50 + 1, i + 1 from generate_series(0, 999) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 55;
QUERY PLAN
 -
@@ -521,7 +523,7 @@ select * from (select * from t order by a) s order by a, b 
limit 55;
  1 | 47
  1 | 48
  1 | 49
- 2 | 50
+ 1 | 50
  2 | 51
  2 | 52
  2 | 53
@@ -538,10 +540,10 @@ select explain_analyze_without_memory('select * from 
(select * from t order by a
  Sort Key: t.a, t.b
  Presorted Key: t.a
  Full-sort Groups: 2 Sort Methods: top-N heapsort, quicksort Memory: 
avg=NNkB peak=NNkB
- ->  Sort (actual rows=100 loops=1)
+ ->  Sort (actual rows=101 loops=1)
Sort Key: t.a
Sort Method: quicksort  Memory: NNkB
-   ->  Seq Scan on t (actual rows=100 loops=1)
+   ->  Seq Scan on t (actual rows=1000 loops=1)
 (9 rows)
 
 select jsonb_pretty(explain_analyze_inc_sort_nodes_without_memory('select * 
from (select * from t order by a) s order by a, b limit 55'));
@@ -584,7 +586,8 @@ select 
explain_analyze_inc_sort_nodes_verify_invariants('select * from (select *
 
 delete from t;
 -- An initial small group followed by a large group.
-insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 100) n(i);
+insert into t(a, b) select (case when i < 5 then i else 9 end), i from 
generate_series(1, 1000) n(i);
+analyze;
 explain (costs off) select * from (select * from t order by a) s order by a, b 
limit 70;
QUERY PLAN
 -
@@ -705,17 +708,17 @@ select * from t left join (select * from (select * from t 
order by a) v order by
 rollback;
 -- Test EXPLAIN ANALYZE with both fullsort and presorted groups.
 select explain_analyze_without_memory('select * from (select * from t order by 
a) s order by a, b limit 70');
-   
explain_analyze_without_memory  
  
--

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread James Coleman
On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:
>
> On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
> >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
> >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
> >> wrote:
> >>>
> >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
> On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
> >hyrax is not too happy with this test:
> >
> >https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
> >
> >It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
> >EXPLAIN output, but it evidently is.
> >
> 
> Thanks, I'll investigate. It's not clear to me either what might be
> causing this, but I guess something must have gone wrong in
> estimation/planning.
> 
> >>>
> >>>OK, I know what's going on - it's a rather embarassing issue in the
> >>>regression test. There's no analyze on the test tables, so it uses
> >>>default estimates for number of groups etc. But with clobber cache the
> >>>test runs long enough for autoanalyze to kick in and collect stats, so
> >>>we generate better estimates which changes the plan.
> >>>
> >>>I'll get this fixed - explicit analyze and tweaking the data a bit
> >>>should do the trick.
> >>
> >>Looking at the tests that failed, I think we should consider just adding:
> >>set enable_sort = off;
> >>because several of those tests have very specific amounts of data to
> >>ensure we test the transition points around the different modes in the
> >>incremental sort node.
> >>
> >
> >Maybe, but I'd much rather tweak the data so that we test both the
> >costing and execution part.
> >
>
> I do think this does the trick by increasing the number of rows a bit
> (from 100 to 1000) to make the Sort more expensive than Incremental
> Sort, while still testing the transition points.
>
> James, can you verify it that's still true?

Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".

I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...

Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?

James




Re: Conflict handling for COPY FROM

2020-04-08 Thread David Steele
On Fri, Mar 27, 2020 at 3:27 PM Tom Lane > wrote:


I took a quick look at this patch, since it was marked "ready for
committer", but I don't see how it can possibly be considered
committable.


Based on Tom's review I've marked this patch Returned with Feedback.

If there's an acceptable implementation, it seems that it would be very 
different from what we have here.


Regards,
--
-David
da...@pgmasters.net




DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Justin Pryzby
This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH FOR"
was first allowed on partition tables (86f575948).

I thought this would work like partitioned indexes (8b08f7d48), where detaching
a partition makes its index non-inherited, and attaching a partition marks a
pre-existing, matching partition as inherited rather than creating a new one.

DROP TABLE t, t1;
CREATE TABLE t(i int)PARTITION BY RANGE(i);
CREATE TABLE t1 PARTITION OF t FOR VALUES FROM(1)TO(2);
CREATE OR REPLACE FUNCTION trigf() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN 
END $$;
CREATE TRIGGER trig AFTER INSERT ON t FOR EACH ROW EXECUTE FUNCTION trigf();
SELECT tgrelid::regclass, * FROM pg_trigger WHERE tgrelid='t1'::regclass;
ALTER TABLE t DETACH PARTITION t1;
ALTER TABLE t ATTACH PARTITION t1 FOR VALUES FROM (1)TO(2);
ERROR:  trigger "trig" for relation "t1" already exists

DROP TRIGGER trig ON t1;
ERROR:  cannot drop trigger trig on table t1 because trigger trig on table t 
requires it
HINT:  You can drop trigger trig on table t instead.

I remember these, but they don't seem to be relevant to this issue, which seems
to be independant.

1fa846f1c9 Fix cloning of row triggers to sub-partitions
b9b408c487 Record parents of triggers

The commit for partitioned indexes talks about using an pre-existing index on
the child as a "convenience gadget", puts indexes into pg_inherit, and
introduces "ALTER INDEX..ATTACH PARTITION" and "CREATE INDEX..ON ONLY".

It's probably rare for a duplicate index to be useful (unless rebuilding to be
more optimal, which is probably not reasonably interspersed with altering
inheritence).  But I don't know if that's equally true for triggers.  So I'm
not sure what the intended behavior is, so I've stopped after implementing
a partial fix.
>From 2c31cac22178d904ee108b77f316886d1e2f6288 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 3 Apr 2020 22:43:26 -0500
Subject: [PATCH] WIP: fix detaching tables with inherited triggers

---
 src/backend/commands/tablecmds.c | 33 
 1 file changed, 33 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 037d457c3d..10a60e158f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16797,6 +16797,39 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
 	}
 	table_close(classRel, RowExclusiveLock);
 
+	/* detach triggers too */
+	{
+		/* XXX: relcache.c */
+		ScanKeyData skey;
+		SysScanDesc	scan;
+		HeapTuple	trigtup;
+		Relation	tgrel = table_open(TriggerRelationId, RowExclusiveLock);
+
+		ScanKeyInit(&skey, Anum_pg_trigger_tgrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGetDatum(RelationGetRelid(partRel)));
+
+		scan = systable_beginscan(tgrel, TriggerRelidNameIndexId,
+true, NULL, 1, &skey);
+
+		while (HeapTupleIsValid(trigtup = systable_getnext(scan)))
+		{
+			Form_pg_trigger pg_trigger;
+			trigtup = heap_copytuple(trigtup);  /* need a modifiable copy */
+			pg_trigger = (Form_pg_trigger) GETSTRUCT(trigtup);
+			/* Set the trigger's parent to Invalid */
+			if (!OidIsValid(pg_trigger->tgparentid))
+continue;
+			if (!pg_trigger->tgisinternal)
+continue;
+			pg_trigger->tgparentid = InvalidOid;
+			pg_trigger->tgisinternal = false;
+			CatalogTupleUpdate(tgrel, &trigtup->t_self, trigtup);
+			heap_freetuple(trigtup);
+		}
+		systable_endscan(scan);
+		table_close(tgrel, RowExclusiveLock);
+	}
+
 	/*
 	 * Detach any foreign keys that are inherited.  This includes creating
 	 * additional action triggers.
-- 
2.17.0



Re: [PATCH] Incremental sort

2020-04-08 Thread David Steele

On 4/8/20 11:13 AM, James Coleman wrote:


James, can you verify it that's still true?


I marked this entry as committed in the 2020-03 CF but it's not clear to 
me if that's entirely true. I'll leave it up to you (all) to move it to 
the 2020-07 CF if there is remaining work (other than making the build 
farm happy).


Regards,
--
-David
da...@pgmasters.net




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tom Lane
James Coleman  writes:
> Should we change `analyze` to `analyze t` to avoid unnecessarily
> re-analyzing all other tables in the regression db?

Yes, a global analyze here is a remarkably horrid idea.

regards, tom lane




Re: [PATCH] Incremental sort

2020-04-08 Thread James Coleman
On Wed, Apr 8, 2020 at 11:29 AM David Steele  wrote:
>
> On 4/8/20 11:13 AM, James Coleman wrote:
> >>
> >> James, can you verify it that's still true?
>
> I marked this entry as committed in the 2020-03 CF but it's not clear to
> me if that's entirely true. I'll leave it up to you (all) to move it to
> the 2020-07 CF if there is remaining work (other than making the build
> farm happy).

Thanks.

I think it's true enough. The vast majority is committed, and the
small amount that isn't we'll leave as future improvements and
separate threads.

James




Re: Function to track shmem reinit time

2020-04-08 Thread David Steele

On 2/24/20 10:57 PM, Robert Haas wrote:

On Sat, Feb 22, 2020 at 10:31 PM Tom Lane  wrote:

I'm still going to object to it, on the grounds that

(1) it's exposing an implementation detail that clients should not be
concerned with, and that we might change in future.  The name isn't
even well chosen --- if the argument is that it's useful to monitor
server uptime, why isn't the name "pg_server_uptime"?

(2) if your server is crashing often enough that postmaster start
time isn't an adequate substitute, you have way worse problems than
whether you can measure it.  I'd rather see us put effort into
fixing whatever the underlying problem is.


I don't accept the first argument, because I think the chances that
we'll change our basic approach to this problem are indistinguishable
from zero.  A few years back, you criticized some idea of mine as
tantamount to rm -rf src/backend/optimizer, which you were not ready
to endorse. That proposal was surely vastly less ambitious than some
proposal which would remove the idea of shared memory reinitialization
after an unsafe backend exit.

I don't accept the second argument either, because it's a false
dichotomy. Adding this function won't reduce the amount of energy that
gets spent fixing crashes. There are always more crashes.

I realize that several other people voted against this proposal so I
don't think it's OK to commit a patch in this area unless some more
positive votes turn up, but I'm still +1 on the idea. Granted, we
don't want an infinite amount of clutter in the system catalogs, but I
have a hard time believing that this proposed function would get any
less use than the hyperbolic trig functions.


Marking this Returned with Feedback again since we are still at an impasse.

Regards,
--
-David
da...@pgmasters.net




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
 wrote:
> I think, Tushar point is that either we should allow both
> vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> both cases, we should through error.

Oh, yeah, good point. Somebody must not've been careful enough with
the options-checking code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>> wrote:
>>>
>>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>hyrax is not too happy with this test:
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>
>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>EXPLAIN output, but it evidently is.
>

Thanks, I'll investigate. It's not clear to me either what might be
causing this, but I guess something must have gone wrong in
estimation/planning.

>>>
>>>OK, I know what's going on - it's a rather embarassing issue in the
>>>regression test. There's no analyze on the test tables, so it uses
>>>default estimates for number of groups etc. But with clobber cache the
>>>test runs long enough for autoanalyze to kick in and collect stats, so
>>>we generate better estimates which changes the plan.
>>>
>>>I'll get this fixed - explicit analyze and tweaking the data a bit
>>>should do the trick.
>>
>>Looking at the tests that failed, I think we should consider just adding:
>>set enable_sort = off;
>>because several of those tests have very specific amounts of data to
>>ensure we test the transition points around the different modes in the
>>incremental sort node.
>>
>
>Maybe, but I'd much rather tweak the data so that we test both the
>costing and execution part.
>

I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.

James, can you verify it that's still true?


Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".



OK, good. I'll push the fix.


I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...



Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).


Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?



Ah, definitely. That was a mistake. Thanks for noticing.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [PATCH] Incremental sort

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 11:42:12AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 11:29 AM David Steele  wrote:


On 4/8/20 11:13 AM, James Coleman wrote:
>>
>> James, can you verify it that's still true?

I marked this entry as committed in the 2020-03 CF but it's not clear to
me if that's entirely true. I'll leave it up to you (all) to move it to
the 2020-07 CF if there is remaining work (other than making the build
farm happy).


Thanks.

I think it's true enough. The vast majority is committed, and the
small amount that isn't we'll leave as future improvements and
separate threads.



Right. The remaining bit is a generic issue not entirely specific to
incremental sort, so we better not hide it in this CF entry.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Alvaro Herrera
On 2020-Apr-08, Justin Pryzby wrote:

> This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH 
> FOR"
> was first allowed on partition tables (86f575948).
> 
> I thought this would work like partitioned indexes (8b08f7d48), where 
> detaching
> a partition makes its index non-inherited, and attaching a partition marks a
> pre-existing, matching partition as inherited rather than creating a new one.

Hmm.  Let's agree to what behavior we want, and then we implement that.
It seems to me there are two choices:

1. on detach, keep the trigger but make it independent of the trigger on
parent.  (This requires that the trigger is made dependent on the
trigger on parent, if the table is attached as partition again;
otherwise you'd end up with multiple copies of the trigger if you
detach/attach multiple times).

2. on detach, remove the trigger from the partition.

I think (2) is easier to implement, but (1) is the more convenient
behavior.

(The current behavior is obviously a bug.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




warning in partioning code

2020-04-08 Thread Erik Rijkers

Hi,

Compiling master on debian stretch, gcc 9.3.0 complains:

partbounds.c: In function ‘partition_bounds_merge’:
partbounds.c:1024:21: warning: unused variable ‘inner_binfo’ 
[-Wunused-variable]

 1024 |  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
  | ^~~

Maybe it can be fixed.

thanks,

Erik Rijkers




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-04-08 Thread Tomas Vondra

On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:

On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
 wrote:


On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>>hyrax is not too happy with this test:
>>
>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>>
>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>>EXPLAIN output, but it evidently is.
>>
>
>Thanks, I'll investigate. It's not clear to me either what might be
>causing this, but I guess something must have gone wrong in
>estimation/planning.
>

OK, I know what's going on - it's a rather embarassing issue in the
regression test. There's no analyze on the test tables, so it uses
default estimates for number of groups etc. But with clobber cache the
test runs long enough for autoanalyze to kick in and collect stats, so
we generate better estimates which changes the plan.

I'll get this fixed - explicit analyze and tweaking the data a bit
should do the trick.


Looking at the tests that failed, I think we should consider just adding:
set enable_sort = off;
because several of those tests have very specific amounts of data to
ensure we test the transition points around the different modes in the
incremental sort node.



Maybe, but I'd much rather tweak the data so that we test both the
costing and execution part.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Let people set host(no)ssl settings from initdb

2020-04-08 Thread David Fetter
On Mon, Apr 06, 2020 at 10:12:16PM +, Cary Huang wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
> 
> Hi 
> I applied the patch 
> "v3-0001-Enable-setting-pg_hba.conf-permissions-from-initd.patch" and did 
> some verification with it. The intended feature works overall and I think it 
> is quite useful to support default auth methods for ssl and gss host types. I 
> have also found some minor things in the patch and would like to share as 
> below:
> 
> > +"# CAUTION: Configuring the system for \"trust\" authentication\n" \
> > +"# allows any user who can reach the databse on the route specified\n" \
> > +"# to connect as any PostgreSQL user, including the database\n" \
> > +"# superuser.  If you do not trust all the users who could\n" \
> > +"# reach the database on the route specified, use a more restrictive\n" \
> > +"# authentication method.\n"
> 
> Found a typo: should be 'database' instead of 'databse'

Fixed.

> >  * a sort of poor man's grep -v
> >  */
> > -#ifndef HAVE_UNIX_SOCKETS
> >  static char **
> >  filter_lines_with_token(char **lines, const char *token)
> >  {
> > @@ -461,7 +466,6 @@ filter_lines_with_token(char **lines, const char *token)
> >  
> > return result;
> > }
> > -#endif
> 
> I see that you have removed "#ifndef HAVE_UNIX_SOCKETS" around the
> filter_lines_with_token() function definition so it would be always
> available, which is used to remove the @tokens@ in case user does
> not specify a default auth method for the new hostssl, hostgss
> options. I think you should also remove the "#ifndef
> HAVE_UNIX_SOCKETS" around its declaration as well so both function
> definition and declaration would make sense.

Fixed.

Thanks very much for the review!

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 3b49afec7f0b708285eadc10a097b4dc23d23927 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Wed, 11 Dec 2019 18:55:03 -0800
Subject: [PATCH v4] Enable setting pg_hba.conf permissions from initdb
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.25.2"

This is a multi-part message in MIME format.
--2.25.2
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


The new optional arguments are --auth-hostssl, --auth-hostnossl,
--auth-hostgssenc, and --auth-hostnogssenc.

diff --git doc/src/sgml/ref/initdb.sgml doc/src/sgml/ref/initdb.sgml
index a04a180165..ee65cf925d 100644
--- doc/src/sgml/ref/initdb.sgml
+++ doc/src/sgml/ref/initdb.sgml
@@ -154,6 +154,50 @@ PostgreSQL documentation
   
  
 
+ 
+  --auth-hostssl=authmethod
+  
+   
+This option specifies the authentication method for users via
+TLS remote connections used in pg_hba.conf
+(hostssl lines).
+   
+  
+ 
+
+ 
+  --auth-hostnossl=authmethod
+  
+   
+This option specifies the authentication method for users via
+non-TLS remote connections used in pg_hba.conf
+(hostnossl lines).
+   
+  
+ 
+
+ 
+  --auth-hostgssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+encrypted GSSAPI remote connections used in pg_hba.conf
+(hostgssenc lines).
+   
+  
+ 
+
+ 
+  --auth-hostnogssenc=authmethod
+  
+   
+This option specifies the authentication method for users via
+remote connections not using encrypted GSSAPI in pg_hba.conf
+(hostnogssenc lines).
+   
+  
+ 
+
  
   --auth-local=authmethod
   
diff --git src/backend/libpq/pg_hba.conf.sample src/backend/libpq/pg_hba.conf.sample
index c853e36232..7d5189dd8f 100644
--- src/backend/libpq/pg_hba.conf.sample
+++ src/backend/libpq/pg_hba.conf.sample
@@ -87,3 +87,15 @@ hostall all ::1/128 @authmethodhost@
 @remove-line-for-nolocal@local   replication all @authmethodlocal@
 hostreplication all 127.0.0.1/32@authmethodhost@
 hostreplication all ::1/128 @authmethodhost@
+
+# hostnossl all   all 0.0.0.0/0   @authmethodhostnossl@
+# hostnossl all   all ::/0@authmethodhostnossl@
+
+# hostssl all all 0.0.0.0/0   @authmethodhostssl@
+# hostssl all all ::/0@authmethodhostssl@
+
+# hostnogssenc all   all 0.0.0.0/0   @authmethodhostnogssenc@
+# hostnogssenc all   all ::/0 

Re: warning in partioning code

2020-04-08 Thread Etsuro Fujita
Hi Erik,

On Thu, Apr 9, 2020 at 1:07 AM Erik Rijkers  wrote:
> Compiling master on debian stretch, gcc 9.3.0 complains:
>
> partbounds.c: In function ‘partition_bounds_merge’:
> partbounds.c:1024:21: warning: unused variable ‘inner_binfo’
> [-Wunused-variable]
>   1024 |  PartitionBoundInfo inner_binfo = inner_rel->boundinfo;
>| ^~~
>
> Maybe it can be fixed.

Yeah, that is a known issue [1].  I'll work on that tomorrow.  (I'm
too tired today.)  Thanks for the report!

Best regards,
Etsuro Fujita

[1] 
https://www.postgresql.org/message-id/CAGz5QCJ1gBhg6upU0%2BpkYmHZsj%2BOaMgXCAf2GBVEm_k6%2BUr0zQ%40mail.gmail.com




Re: Commitfest 2020-03 Now in Progress

2020-04-08 Thread David Steele

On 4/1/20 10:09 AM, David Steele wrote:

On 3/17/20 8:10 AM, David Steele wrote:

On 3/1/20 4:10 PM, David Steele wrote:

The last Commitfest for v13 is now in progress!

Current stats for the Commitfest are:

Needs review: 192
Waiting on Author: 19
Ready for Committer: 4
Total: 215


Halfway through, here's where we stand.  Note that a CF entry was 
added after the stats above were recorded so the total has increased.


Needs review: 151 (-41)
Waiting on Author: 20 (+1)
Ready for Committer: 9 (+5)
Committed: 25
Moved to next CF: 1
Withdrawn: 4
Returned with Feedback: 6
Total: 216


After regulation time here's where we stand:

Needs review: 111 (-40)
Waiting on Author: 26 (+6)
Ready for Committer: 11 (+2)
Committed: 51 (+11)
Moved to next CF: 2 (+1)
Returned with Feedback: 10 (+4)
Rejected: 1
Withdrawn: 5 (+1)
Total: 217

We have one more patch so it appears that one in a completed state 
(committed, etc.) at the beginning of the CF has been moved to an 
uncompleted state. Or perhaps my math is just bad.


The RMT has determined that the CF will be extended for one week so I'll 
hold off on moving and marking patches until April 8.


The 2020-03 Commitfest is officially closed.

Final stats are (for entire CF, not just from March 1 this time):

Committed: 90.
Moved to next CF: 115.
Withdrawn: 8.
Rejected: 1.
Returned with Feedback: 23.
Total: 237.

Good job everyone!
--
-David
da...@pgmasters.net




More efficient RI checks - take 2

2020-04-08 Thread Antonin Houska
After having reviewed [1] more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.

Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).

Comments are welcome.

Setup
=

CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);


Insert many rows into the FK table
==

master:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
   QUERY PLAN

 Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual 
time=32.741..32.741 rows=0 loops=1)
   ->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384 
width=4) (actual time=2.403..4.802 rows=16384 loops=1)
 Planning Time: 0.050 ms
 Trigger for constraint f_i_fkey: time=448.986 calls=16384
 Execution Time: 485.444 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
   QUERY PLAN

 Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual 
time=34.053..34.053 rows=0 loops=1)
   ->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384 
width=4) (actual time=2.223..4.448 rows=16384 loops=1)
 Planning Time: 0.047 ms
 Trigger for constraint f_i_fkey: time=105.164 calls=8
 Execution Time: 141.201 ms


Insert a single row into the FK table
=

master:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
--
 Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 
rows=1 loops=1)
 Planning Time: 0.026 ms
 Trigger for constraint f_i_fkey: time=0.435 calls=1
 Execution Time: 0.517 ms
(5 rows)

patched:

EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
--
 Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 
rows=1 loops=1)
 Planning Time: 0.025 ms
 Trigger for constraint f_i_fkey: time=0.578 calls=1
 Execution Time: 0.670 ms


Check if FK row exists during deletion from the PK
==

master:

DELETE FROM p WHERE i=16384;
ERROR:  update or delete on table "p" violates foreign key constraint 
"f_i_fkey" on table "f"
DETAIL:  Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms

patched:

DELETE FROM p WHERE i=16384;
ERROR:  update or delete on table "p" violates foreign key constraint 
"f_i_fkey" on table "f"
DETAIL:  Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms


Cascaded DELETE --- many PK rows


DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);

master:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
---
 Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=38.334..38.334 rows=0 loops=1)
   ->  Seq Scan on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=0.019..3.925 rows=16384 loops=1)
 Planning Time: 0.049 ms
 Trigger for constraint f_i_fkey: time=31348.756 calls=16384
 Execution Time: 31390.784 ms

patched:

EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
---
 Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual 
time=33.360..33.360 rows=0 loops=1)
   

Re: Commitfest 2020-03 Now in Progress

2020-04-08 Thread Tom Lane
David Steele  writes:
> The 2020-03 Commitfest is officially closed.

> Final stats are (for entire CF, not just from March 1 this time):

> Committed: 90.
> Moved to next CF: 115.
> Withdrawn: 8.
> Rejected: 1.
> Returned with Feedback: 23.
> Total: 237.

> Good job everyone!

Thanks for running it!  I know it's a tedious responsibility.

regards, tom lane




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Justin Pryzby
On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
>  wrote:
> > I think, Tushar point is that either we should allow both
> > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > both cases, we should through error.
> 
> Oh, yeah, good point. Somebody must not've been careful enough with
> the options-checking code.

Actually I think someone was too careful.

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
boolfreeze = false;
boolfull = false;
booldisable_page_skipping = false;
-   boolparallel_option = false;
ListCell   *lc;
 
/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
params.truncate = get_vacopt_ternary_value(opt);
else if (strcmp(opt->defname, "parallel") == 0)
{
-   parallel_option = true;
if (opt->arg == NULL)
{
ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
isTopLevel)
   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
Assert(!(params.options & VACOPT_SKIPTOAST));
 
-   if ((params.options & VACOPT_FULL) && parallel_option)
+   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("cannot specify both FULL and PARALLEL 
options")));
-- 
2.17.0

>From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 8 Apr 2020 11:38:36 -0500
Subject: [PATCH v1] parallel vacuum: options check to use same test as in
 vacuumlazy.c

---
 src/backend/commands/vacuum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 351d5215a9..660c854d49 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	bool		freeze = false;
 	bool		full = false;
 	bool		disable_page_skipping = false;
-	bool		parallel_option = false;
 	ListCell   *lc;
 
 	/* Set default value */
@@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "parallel") == 0)
 		{
-			parallel_option = true;
 			if (opt->arg == NULL)
 			{
 ereport(ERROR,
@@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 		   !(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
 	Assert(!(params.options & VACOPT_SKIPTOAST));
 
-	if ((params.options & VACOPT_FULL) && parallel_option)
+	if ((params.options & VACOPT_FULL) && params.nworkers > 0)
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  errmsg("cannot specify both FULL and PARALLEL options")));
-- 
2.17.0



Re: Conflict handling for COPY FROM

2020-04-08 Thread David G. Johnston
On Fri, Mar 27, 2020 at 3:27 PM Tom Lane  wrote:

> Surafel Temesgen  writes:
> > [ conflict-handling-copy-from-v16.patch ]
>
> I took a quick look at this patch, since it was marked "ready for
> committer", but I don't see how it can possibly be considered committable.
>

[...]


>
> Looking at this patch in terms of whether the functionality is
> available in that approach, it seems like you might want two parts
> of it:
>
> 1. A COPY option to be flexible about the number of columns in the
> input, say by filling omitted columns with NULLs.
>
> 2. INSERT ... ON CONFLICT can be used to transfer data to permanent
> tables with rejection of duplicate keys, but it doesn't have much
> flexibility about just what to do with duplicates.  Maybe we could
> add more ON CONFLICT actions?  Sending conflicted rows to some other
> table, or updating the source table to show which rows were copied
> and which not, might be useful things to think about.
>

tl/dr: patch 1: change the option to something like "processing_mode = row
{default = file}" and relay existing errors across all rows in the table in
the error detail message.

tl/dr patch 2: add an "IGNORE_CONFLICTS = {-1, 0 default, 1+}" option that
converts the errors that appear for speculative insertions into warnings
(in the error detail message) and either aborts the copy (cleanly, no
inserted rows) if the count exceeds the user specified value) or commits if
it is able (i.e., no errors in the error message detail - which would come
from other problems besides conflicts).  I don't really get why a number is
needed here though - its not like ON CONFLICT DO NOTHING has that ability
and all this seems to be wanting to do is enable a subset of ON CONFLICT DO
NOTHING for COPY - in which case no warning or error would appear in the
first place.

Sorry for the rambling below, a decent chuck of the material is stream of
consciousness but it all generally relates to the behaviors that this patch
is touching.

My desired take-away from this would be to have COPY's error response
include one line for each input line that fails to be inserted with the
line number and the underlying error message passed along verbatim.
Importing, fixing one error, trying again, fixing another error, repeat is
a first level goal for me.  Including at most the first 30 characters of
the problematic line would facilitate the common case where the first field
is an identifier which is more useful that a pure line number.  But
including the entire line doesn't seem worth the risk.  Though it should be
considered whether to include the error detail of the underlying error
message - probably as a subsequent line.

After that, consider having a facility for telling COPY that you are OK if
it ignores the problem rows and inserts the ones it is able - and
converting the final exit code to be a warning instead of an error
(whatever that means, if anything, in this context).

The speculative insertion stuff is outside my experience but are we trying
to just make it work, give the COPY user control of whether its used or
not, have an independent facility just for copy, or something else?
Teaching copy to use speculative insertion infrastructure that already
exists is a patch in its own right and seems to be fixing a current POLA
violation - this other stuff about reporting and ignoring errors is besides
the point.  The basic inter-operability fix seems like it could be made to
work under today's copy error handling and report framework just fine.

If there is value in someone augmenting the error handling and response
framework to work in a tabular format instead of a free-form error message
text body that too could be presented and discussed as its own commit.
Granted it may be the case that such a verbose error message from COPY as
described above is undesirable but would be palatable in some other
presentation format.  I'm inclined to believe, however, that even if COPY
is made an exception to the general error message rules (and the detail
information would be part of the errdetail, not the main message, it could
just include a summary count) that the added functionality would make the
exception acceptable.

To be honest I haven't dived into the assumptions baked into the regression
tests to know whether the tabular stuff that was discussed and that is
shown in the regression expected outputs is normal or not.  I assume the
part in question is:

+COPY x from stdin WITH(ERROR_LIMIT 5);
+WARNING:  skipping "70001 22 32" --- missing data for column "d"
+WARNING:  skipping "70002 23 33 43 53 54" --- extra data after last
expected column
+WARNING:  skipping "70003 24 34 44" --- missing data for column "e"
+
+   a   | b  | c  | d  |  e
+---++++--
+ 70005 | 27 | 37 | 47 | before trigger fired
+(1 row)

Right now COPY would just fail.  What would be nice is for it to say:

ERROR: Line 1?  missing data for column "d"
ERROR: Line 2?  --- extra data after last expe

Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Justin Pryzby
On Wed, Apr 08, 2020 at 12:02:39PM -0400, Alvaro Herrera wrote:
> On 2020-Apr-08, Justin Pryzby wrote:
> 
> > This seems to be a bug in master, v12, and (probably) v11, where "FOR EACH 
> > FOR"
> > was first allowed on partition tables (86f575948).
> > 
> > I thought this would work like partitioned indexes (8b08f7d48), where 
> > detaching
> > a partition makes its index non-inherited, and attaching a partition marks a
> > pre-existing, matching partition as inherited rather than creating a new 
> > one.
> 
> Hmm.  Let's agree to what behavior we want, and then we implement that.
> It seems to me there are two choices:
> 
> 1. on detach, keep the trigger but make it independent of the trigger on
> parent.  (This requires that the trigger is made dependent on the
> trigger on parent, if the table is attached as partition again;
> otherwise you'd end up with multiple copies of the trigger if you
> detach/attach multiple times).
> 
> 2. on detach, remove the trigger from the partition.
> 
> I think (2) is easier to implement, but (1) is the more convenient
> behavior.

At telsasoft, we don't care (we uninherit tables before ALTERing parents to
avoid disruptive locking and to avoid worst-case disk use).

(1) is consistent with the behavior for indexes, which is a slight advantage
for users' ability to understand and keep track of the behavior.  But adding
triggers is pretty different so I'm not sure it's a totally compelling
parallel.

-- 
Justin




Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Tom Lane
Alvaro Herrera  writes:
> Hmm.  Let's agree to what behavior we want, and then we implement that.
> It seems to me there are two choices:

> 1. on detach, keep the trigger but make it independent of the trigger on
> parent.  (This requires that the trigger is made dependent on the
> trigger on parent, if the table is attached as partition again;
> otherwise you'd end up with multiple copies of the trigger if you
> detach/attach multiple times).

> 2. on detach, remove the trigger from the partition.

> I think (2) is easier to implement, but (1) is the more convenient
> behavior.

I think that #1 would soon lead to needing all the same infrastructure
as we have for inherited columns and constraints, ie triggers would need
equivalents of attislocal and attinhcount.  I don't really want to go
there, so I'd vote for #2.

regards, tom lane




doc review for v13

2020-04-08 Thread Justin Pryzby
I reviewed docs for v13, like:
git log --cherry-pick origin/master...origin/REL_12_STABLE -p doc

I did something similar for v12 [0].  I've included portions of that here which
still seem lacking 12 months later (but I'm not intending to continue defending
each individual patch hunk).

I previously mailed separately about a few individual patches, some of which
have separate, ongoing discussion and aren't included here (incr sort, parallel
vacuum).

Justin

[0] 
https://www.postgresql.org/message-id/flat/20190709161256.GH22387%40telsasoft.com#56889b868e5886e36b90e9f5a1165186
>From 482b590355cd7df327602dd36e91721b827f9c37 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:31:04 -0500
Subject: [PATCH v1 01/19] docs: pg_statistic_ext.stxstattarget

commit c31132d87c6315bbbe4b4aa383705aaae2348c0e
Author: Tomas Vondra 
Date:   Wed Mar 18 16:48:12 2020 +0100
---
 doc/src/sgml/catalogs.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 386c6d7bd1..ce33df9e58 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6472,7 +6472,7 @@ SCRAM-SHA-256$:&l
.
A zero value indicates that no statistics should be collected.
A negative value says to use the system default statistics target.
-   Positive values stxstattarget
+   Positive values of stxstattarget
determine the target number of most common values
to collect.
   
-- 
2.17.0

>From 2a3a4d7028b02070447fafd37e66e72da59966bf Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:33:44 -0500
Subject: [PATCH v1 02/19] docs: reg* functions

commit 8408e3a557ad26a7e88f867a425b2b9a86c4fa04
Author: Peter Eisentraut 
Date:   Wed Mar 18 14:51:37 2020 +0100
---
 doc/src/sgml/func.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a38387b8c6..fd0f5d64b3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18796,7 +18796,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
to_regnamespace, to_regoper,
to_regoperator, to_regrole,
to_regproc, to_regprocedure, and
-   to_regtype, functions translate relation, collation, schema,
+   to_regtype translate relation, collation, schema,
operator, role, function, and type names (given as text) to
objects of the corresponding reg* type (see  about the types).  These functions differ from a
-- 
2.17.0

>From 6864ced0a9eaeab4c010d1f090b26b337f125742 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:43:42 -0500
Subject: [PATCH v1 03/19] Minus one

See also
ac862376037727e744f25030bd8b6090c707247b
---
 doc/src/sgml/config.sgml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a0da4aabac..ea2749535d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6110,7 +6110,7 @@ local0.*/var/log/postgresql
  unoptimized queries in your applications.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero prints all statement durations.
- Minus-one (the default) disables logging statement durations.
+ -1 (the default) disables logging statement durations.
  Only superusers can change this setting.
 
 
@@ -6162,7 +6162,7 @@ local0.*/var/log/postgresql
  traffic is too high to log all queries.
  If this value is specified without units, it is taken as milliseconds.
  Setting this to zero samples all statement durations.
- Minus-one (the default) disables sampling statement durations.
+ -1 (the default) disables sampling statement durations.
  Only superusers can change this setting.
 
 
-- 
2.17.0

>From bfb8439eb5618db3a36ca2794dbcc35489d98c27 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 29 Mar 2020 19:51:32 -0500
Subject: [PATCH v1 04/19] doc: psql opclass/opfamily

commit b0b5e20cd8d1a58a8782d5dc806a5232db116e2f
Author: Alexander Korotkov 

ALSO, should we rename the "Purpose" column?  I see we have pg_amop.amoppurpose
so maybe it's fine ?
---
 doc/src/sgml/ref/psql-ref.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 0595d1c04b..cdd24fad98 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1244,7 +1244,7 @@ testdb=>
 (see ).
 If access-method-patttern
 is specified, only operator classes associated with access methods whose
-names match pattern are listed.
+names match the pattern are listed.
 If input-type-pattern
 is specified, only operator classes associated with input types whose
 names match the pattern are listed.
@@ -1267,7 +1267,7 @@ testdb=>

Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost
Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 6, 2020 at 2:23 PM Stephen Frost  wrote:
> > So, instead of talking about 'bzip2 > %f.bz2', and then writing into our
> > documentation that that's how this feature can be used, what about
> > proposing something that would actually work reliably with this
> > interface?  Which properly fsync's everything, has good retry logic for
> > when failures happen, is able to actually detect when a failure
> > happened, how to restore from a backup taken this way, and it'd probably
> > be good to show how pg_verifybackup could be used to make sure the
> > backup is actually correct and valid too.
> 
> I don't really understand the problem here.  Suppose I do:
> 
> mkdir ~/my-brand-new-empty-directory
> cd ~/my-brand-new-empty-directory
> pg_basebackup -Ft --pipe-output 'bzip2 > %f.bz2'
> initdb -S --dont-expect-that-this-is-a-data-directory . # because
> right now it would complain about pg_wal and pg_tblspc being missing
> 
> I think if all that works, my backup should be good and durably on
> disk. If it's not, then either pg_basebackup or bzip2 or initdb didn't
> report errors that they should have reported. If you're worried about
> that, say because you suspect those programs are buggy or because you
> think the kernel may not be reporting errors properly, you can use tar
> -jxvf + pg_validatebackup to check.
> 
> What *exactly* do you think can go wrong here?

What if %f.bz2 already exists?  How about if %f has a space in it?  What
about if I'd like to verify that the backup looks reasonably valid
without having to find space to store it entirely decompressed?

Also, this argument feels disingenuous to me.  That isn't the only thing
you're promoting this feature be used for, as you say below.  If the
only thing this feature is *actually* intended for is to add bzip2
support, then we should just add bzip2 support directly and call it a
day, but what you're really talking about here is a generic interface
that you'll want to push users to for things like "how do I back up to
s3" or "how do I back up to GCS" and so we should be thinking about
those cases and not just a relatively simple use case.

This is the same kind of slippery slope that our archive command is
built on- sure, if everything "works" then it's "fine", even with our
documented example, but we know that not everything works in the real
world, and just throwing an 'initdb -S' in there isn't a general
solution because users want to do things like send WAL to s3 or GCS or
such.

I don't think there's any doubt that there'll be no shortage of shell
scripts and various other things that'll be used with this that, yes,
will be provided by our users and therefore we can blame them for doing
it wrong, but then they'll complain on our lists and we'll spend time
educating them as to how to write proper software to be used, or
pointing them to a solution that someone writes specifically for this.
I don't view that as, ultimately, a good solution.

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Apr 6, 2020 at 1:32 PM Magnus Hagander  wrote:
> > Now, if we were just talking about compression, it would actually be
> > interesting to implement some sort of "postgres compression API" if
> > you will, that is implemented by a shared library. This library could
> > then be used from pg_basebackup or from anything else that needs
> > compression. And anybody who wants could then do a "
> > for PostgreSQL" module, removing the need for us to carry such code
> > upstream.
> 
> I think it could be more general than a compression library. It could
> be a store-my-stuff-and-give-it-back-to-me library, which might do
> compression or encryption or cloud storage or any combination of the
> three, and probably other stuff too. Imagine that you first call an
> init function with a namespace that is basically a string provided by
> the user. Then you open a file either for read or for write (but not
> both). Then you read or write a series of chunks (depending on the
> file mode). Then you close the file. Then you can do the same with
> more files. Finally at the end you close the namespace. You don't
> really need to care where or how the functions you are calling store
> the data. You just need them to return proper error indicators if by
> chance they fail.

Yes, having a storage layer makes a lot of sense here, with features
that are understood by the core system and which each driver
understands, and then having a filter system which is also pluggable and
can support things like compression and hashing for this would also be
great.

I can point you to examples of all of the above, already implemented, in
C, all OSS.  Sure seems like a pretty good and reasonable approach to
take when other folks are doing it.

> As compared with my previous proposal, this would work much better for
> pg_basebackup -Fp, because you wouldn't launch a new bzip2 process for
> every file. You'd just bzopen(), which

Re: More efficient RI checks - take 2

2020-04-08 Thread Pavel Stehule
st 8. 4. 2020 v 18:36 odesílatel Antonin Houska  napsal:

> After having reviewed [1] more than a year ago (the problem I found was
> that
> the transient table is not available for deferred constraints), I've tried
> to
> implement the same in an alternative way. The RI triggers still work as row
> level triggers, but if multiple events of the same kind appear in the
> queue,
> they are all passed to the trigger function at once. Thus the check query
> does
> not have to be executed that frequently.
>
> Some performance comparisons are below. (Besides the execution time, please
> note the difference in the number of trigger function executions.) In
> general,
> the checks are significantly faster if there are many rows to process, and
> a
> bit slower when we only need to check a single row. However I'm not sure
> about
> the accuracy if only a single row is measured (if a single row check is
> performed several times, the execution time appears to fluctuate).
>

It is hard task to choose good strategy for immediate constraints, but for
deferred constraints you know how much rows should be checked, and then you
can choose better strategy.

Is possible to use estimation for choosing method of RI checks?



> Comments are welcome.
>
> Setup
> =
>
> CREATE TABLE p(i int primary key);
> INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
> CREATE TABLE f(i int REFERENCES p);
>
>
> Insert many rows into the FK table
> ==
>
> master:
>
> EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>QUERY PLAN
>
> 
>  Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual
> time=32.741..32.741 rows=0 loops=1)
>->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384
> width=4) (actual time=2.403..4.802 rows=16384 loops=1)
>  Planning Time: 0.050 ms
>  Trigger for constraint f_i_fkey: time=448.986 calls=16384
>  Execution Time: 485.444 ms
> (5 rows)
>
> patched:
>
> EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>QUERY PLAN
>
> 
>  Insert on f  (cost=0.00..163.84 rows=16384 width=4) (actual
> time=34.053..34.053 rows=0 loops=1)
>->  Function Scan on generate_series g  (cost=0.00..163.84 rows=16384
> width=4) (actual time=2.223..4.448 rows=16384 loops=1)
>  Planning Time: 0.047 ms
>  Trigger for constraint f_i_fkey: time=105.164 calls=8
>  Execution Time: 141.201 ms
>
>
> Insert a single row into the FK table
> =
>
> master:
>
> EXPLAIN ANALYZE INSERT INTO f VALUES (1);
> QUERY PLAN
>
> --
>  Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060
> rows=0 loops=1)
>->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
> rows=1 loops=1)
>  Planning Time: 0.026 ms
>  Trigger for constraint f_i_fkey: time=0.435 calls=1
>  Execution Time: 0.517 ms
> (5 rows)
>
> patched:
>
> EXPLAIN ANALYZE INSERT INTO f VALUES (1);
> QUERY PLAN
>
> --
>  Insert on f  (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066
> rows=0 loops=1)
>->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
> rows=1 loops=1)
>  Planning Time: 0.025 ms
>  Trigger for constraint f_i_fkey: time=0.578 calls=1
>  Execution Time: 0.670 ms
>
>
> Check if FK row exists during deletion from the PK
> ==
>
> master:
>
> DELETE FROM p WHERE i=16384;
> ERROR:  update or delete on table "p" violates foreign key constraint
> "f_i_fkey" on table "f"
> DETAIL:  Key (i)=(16384) is still referenced from table "f".
> Time: 3.381 ms
>
> patched:
>
> DELETE FROM p WHERE i=16384;
> ERROR:  update or delete on table "p" violates foreign key constraint
> "f_i_fkey" on table "f"
> DETAIL:  Key (i)=(16384) is still referenced from table "f".
> Time: 5.561 ms
>
>
> Cascaded DELETE --- many PK rows
> 
>
> DROP TABLE f;
> CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
> INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
>
> master:
>
> EXPLAIN ANALYZE DELETE FROM p;
> QUERY PLAN
>
> ---
>  Delete on p  (cost=0.00..236.84 rows=16384 width=6) (actual
> time=38.334..38.334 rows=0 loops=1)
> 

Re: A problem about partitionwise join

2020-04-08 Thread Tom Lane
Richard Guo  writes:
> On Sun, Apr 5, 2020 at 4:38 AM Tom Lane  wrote:
>> There is already something in equivclass.c that would almost do what
>> we want here: exprs_known_equal() would tell us whether the partkeys
>> can be found in the same eclass, without having to generate data
>> structures along the way.  The current implementation is not watertight
>> because it doesn't check opclass semantics, but that consideration
>> can be bolted on readily enough.  So that leads me to something like
>> the attached.

> I have some concern about we only check non-nullable partexprs. Is it
> possible that two nullable partexprs come from the same EC? I tried to
> give an example but failed.

Currently the EC infrastructure doesn't really cope with outer join
equijoins.  They are not treated as producing true equivalences,
so I think that the case you're worried about can't occur (which is why
I didn't code for it).  I have hopes of being able to incorporate outer
joins into the EC logic in a less squishy way in the future, by making
the representation of Vars distinguish explicitly between
value-before-outer-join and value-after-outer-join, after which we could
make bulletproof assertions about what is equal to what, even with outer
joins in the mix.  If that works out it might produce a cleaner answer
in this area too.

TBH, now that I have had some exposure to the partitionwise join
matching logic I don't much like any of it.  I feel like it's doing
about the same job as ECs, but in an unprincipled and not very
efficient manner.  Right now is no time to redesign it, of course,
but maybe at some point we could do that.  (I did experiment with
removing all the rest of have_partkey_equi_join() and having it
*only* ask exprs_known_equal() about equivalences, which is more or
less what I'm envisioning here.  That caused some of the existing
regression tests to fail, so there's something that the idea isn't
covering.  I didn't dig any further at the time, and in particular
failed to check whether the problems were specifically about outer
joins, which'd be unsurprising given the above.)

Anyway, this work has missed the window for v13, so we've got plenty
of time to think about it.

regards, tom lane




Re: where should I stick that backup?

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost  wrote:
> What if %f.bz2 already exists?

That cannot occur in the scenario I described.

> How about if %f has a space in it?

For a tar-format backup I don't think that can happen, because the
file names will be base.tar and ${tablespace_oid}.tar. For a plain
format backup it's a potential issue.

> What
> about if I'd like to verify that the backup looks reasonably valid
> without having to find space to store it entirely decompressed?

Then we need to make pg_validatebackup better.

> Also, this argument feels disingenuous to me.
> [ lots more stuff ]

This all just sounds like fearmongering to me. "archive_command
doesn't work very well, so maybe your thing won't either." Maybe it
won't, but the fact that archive_command doesn't isn't a reason.

> Yes, having a storage layer makes a lot of sense here, with features
> that are understood by the core system and which each driver
> understands, and then having a filter system which is also pluggable and
> can support things like compression and hashing for this would also be
> great.

It's good to know that you prefer a C interface to one based on shell
scripting. I hope that we will also get some other opinions on that
question, as my own feelings are somewhat divided (but with some bias
toward trying to making the shell scripting thing work, because I
believe it gives a lot more practical flexibility).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: backup manifests

2020-04-08 Thread Robert Haas
On Wed, Apr 8, 2020 at 1:15 AM Fujii Masao  wrote:
> When there is a backup_manifest in the database cluster, it's included in
> the backup even when --no-manifest is specified. ISTM that this is problematic
> because the backup_manifest is obviously not valid for the backup.
> So, isn't it better to always exclude the *existing* backup_manifest in the
> cluster from the backup, like backup_label/tablespace_map? Patch attached.
>
> Also I found the typo in the document. Patch attached.

Both patches look good. The second one is definitely a mistake on my
part, and the first one seems like a totally reasonable change.
Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Andrew Dunstan

On 4/7/20 9:42 AM, Andrew Dunstan wrote:
> On Tue, Apr 7, 2020 at 12:37 AM Tom Lane  wrote:
>> Robert Haas  writes:
>>> Taking stock of the situation this morning, most of the buildfarm is
>>> now green. There are three failures, on eelpout (6 hours ago),
>>> fairywren (17 hours ago), and hyrax (3 days, 7 hours ago).
>> fairywren has now done this twice in the pg_validatebackupCheck step:
>>
>> exec failed: Bad address at 
>> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
>> line 340.
>>  at 
>> /home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/perl/TestLib.pm 
>> line 340.
>>
>> I'm a tad suspicious that it needs another perl2host()
>> somewhere, but the log isn't very clear as to where.
>>
>> More generally, I wonder if we ought to be trying to
>> centralize those perl2host() calls instead of sticking
>> them into individual test cases.
>>
>>
>
> Not sure about that. I'll see if I can run it by hand and get some
> more info. What's quite odd is that jacana (a very similar setup) is
> passing this happily.
>


OK, tricky, but here's what I did to get this working on fairywren.


First, on Msys2 there is a problem with name mangling. We've had to fix
this before by telling it to ignore certain argument prefixes.


Second, once that was fixed rmdir was failing on the tablespace. On
Windows this is a junction, so unlink is the correct thing to do, I
believe, just as it is on Unix where it's a symlink.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/bin/pg_validatebackup/t/003_corruption.pl b/src/bin/pg_validatebackup/t/003_corruption.pl
index 7a09d02e6c..fe717dfc73 100644
--- a/src/bin/pg_validatebackup/t/003_corruption.pl
+++ b/src/bin/pg_validatebackup/t/003_corruption.pl
@@ -16,6 +16,8 @@ $master->start;
 # Include a user-defined tablespace in the hopes of detecting problems in that
 # area.
 my $source_ts_path = TestLib::perl2host(TestLib::tempdir_short());
+my $source_ts_prefix = $source_ts_path;
+$source_ts_prefix =~ s!([^A-Z]:/[^/]*)/.*!$1!;
 $master->safe_psql('postgres', command_ok(['pg_basebackup', '-D', $backup_path, '--no-sync',
 			'-T', "${source_ts_path}=${backup_ts_path}"],
 			"base backup ok");
@@ -177,14 +180,7 @@ sub mutilate_missing_tablespace
 	my ($tsoid) = grep { $_ ne '.' && $_ ne '..' }
 		 slurp_dir("$backup_path/pg_tblspc");
 	my $pathname = "$backup_path/pg_tblspc/$tsoid";
-	if ($windows_os)
-	{
-		rmdir($pathname) || die "$pathname: $!";
-	}
-	else
-	{
-		unlink($pathname) || die "$pathname: $!";
-	}
+	unlink($pathname) || die "$pathname: $!";
 	return;
 }
 


Re: More efficient RI checks - take 2

2020-04-08 Thread Corey Huinker
On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule 
wrote:

>
>
> st 8. 4. 2020 v 18:36 odesílatel Antonin Houska  napsal:
>
>> After having reviewed [1] more than a year ago (the problem I found was
>> that
>> the transient table is not available for deferred constraints), I've
>> tried to
>> implement the same in an alternative way. The RI triggers still work as
>> row
>> level triggers, but if multiple events of the same kind appear in the
>> queue,
>> they are all passed to the trigger function at once. Thus the check query
>> does
>> not have to be executed that frequently.
>>
>
I'm excited that you picked this up!


>
>> Some performance comparisons are below. (Besides the execution time,
>> please
>> note the difference in the number of trigger function executions.) In
>> general,
>> the checks are significantly faster if there are many rows to process,
>> and a
>> bit slower when we only need to check a single row. However I'm not sure
>> about
>> the accuracy if only a single row is measured (if a single row check is
>> performed several times, the execution time appears to fluctuate).
>>
>
These numbers are very promising, and much more in line with my initial
expectations. Obviously the impact on single-row DML is of major concern,
though.

It is hard task to choose good strategy for immediate constraints, but for
> deferred constraints you know how much rows should be checked, and then you
> can choose better strategy.
>

> Is possible to use estimation for choosing method of RI checks?
>

In doing my initial attempt, the feedback I was getting was that the people
who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers

While #3 is clearly beyond the scope for an endeavor like this, #1 seems
like it would nearly eliminate the 1-row penalty (we'd still have the
TupleStore initi penalty, but it would just be a handy queue structure, and
maybe that cost would be offset by removing the SPI overhead), and once
that is done, we could see about step #2.


Re: backup manifests and contemporaneous buildfarm failures

2020-04-08 Thread Tom Lane
Andrew Dunstan  writes:
> OK, tricky, but here's what I did to get this working on fairywren.
> First, on Msys2 there is a problem with name mangling. We've had to fix
> this before by telling it to ignore certain argument prefixes.
> Second, once that was fixed rmdir was failing on the tablespace. On
> Windows this is a junction, so unlink is the correct thing to do, I
> believe, just as it is on Unix where it's a symlink.

Hmm, no opinion about the name mangling business, but the other part
seems like it might break jacana and/or bowerbird, which are currently
happy with this test?  (AFAICS we only have four Windows animals
running the TAP tests, and the fourth (drongo) hasn't reported in
for awhile.)

I guess we could commit it and find out.  I'm all for the simpler
coding if it works.

regards, tom lane




Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Alvaro Herrera
On 2020-Apr-08, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hmm.  Let's agree to what behavior we want, and then we implement that.
> > It seems to me there are two choices:
> 
> > 1. on detach, keep the trigger but make it independent of the trigger on
> > parent.  (This requires that the trigger is made dependent on the
> > trigger on parent, if the table is attached as partition again;
> > otherwise you'd end up with multiple copies of the trigger if you
> > detach/attach multiple times).
> 
> > 2. on detach, remove the trigger from the partition.
> 
> > I think (2) is easier to implement, but (1) is the more convenient
> > behavior.
> 
> I think that #1 would soon lead to needing all the same infrastructure
> as we have for inherited columns and constraints, ie triggers would need
> equivalents of attislocal and attinhcount.  I don't really want to go
> there, so I'd vote for #2.

Hmm.  Those things are used for the legacy inheritance case supporting
multiple inheritance, where we need to figure out which parent the table
is being detached (disinherited) from.  But for partitioning we know
which parent it is, since there can only be one.  So I don't think that
argument applies.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: where should I stick that backup?

2020-04-08 Thread Stephen Frost
Greeitngs,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Apr 8, 2020 at 1:05 PM Stephen Frost  wrote:
> > What if %f.bz2 already exists?
> 
> That cannot occur in the scenario I described.

Of course it can.

> > How about if %f has a space in it?
> 
> For a tar-format backup I don't think that can happen, because the
> file names will be base.tar and ${tablespace_oid}.tar. For a plain
> format backup it's a potential issue.

I agree that it might not be an issue for tar-format.

> > What
> > about if I'd like to verify that the backup looks reasonably valid
> > without having to find space to store it entirely decompressed?
> 
> Then we need to make pg_validatebackup better.

Sure- but shouldn't the design be contemplating how these various tools
will work together?

> > Also, this argument feels disingenuous to me.
> > [ lots more stuff ]
> 
> This all just sounds like fearmongering to me. "archive_command
> doesn't work very well, so maybe your thing won't either." Maybe it
> won't, but the fact that archive_command doesn't isn't a reason.

I was trying to explain that we have literally gone down exactly this
path before and it's not been a good result, hence we should be really
careful before going down it again.  I don't consider that to be
fearmongering, nor that we should be dismissing that concern out of
hand.

> > Yes, having a storage layer makes a lot of sense here, with features
> > that are understood by the core system and which each driver
> > understands, and then having a filter system which is also pluggable and
> > can support things like compression and hashing for this would also be
> > great.
> 
> It's good to know that you prefer a C interface to one based on shell
> scripting. I hope that we will also get some other opinions on that
> question, as my own feelings are somewhat divided (but with some bias
> toward trying to making the shell scripting thing work, because I
> believe it gives a lot more practical flexibility).

Yes, I do prefer a C interface.  One might even say "been there, done
that."  Hopefully sharing such experience is still useful to do on these
lists.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: DETACH PARTITION and FOR EACH ROW triggers on partitioned tables

2020-04-08 Thread Tom Lane
Alvaro Herrera  writes:
> On 2020-Apr-08, Tom Lane wrote:
>> I think that #1 would soon lead to needing all the same infrastructure
>> as we have for inherited columns and constraints, ie triggers would need
>> equivalents of attislocal and attinhcount.  I don't really want to go
>> there, so I'd vote for #2.

> Hmm.  Those things are used for the legacy inheritance case supporting
> multiple inheritance, where we need to figure out which parent the table
> is being detached (disinherited) from.  But for partitioning we know
> which parent it is, since there can only be one.  So I don't think that
> argument applies.

My point is that so long as you only allow the case of exactly one parent,
you can just delete the child trigger, because it must belong to that
parent.  As soon as there's any flexibility, you are going to end up
reinventing all the stuff we had to invent to manage
maybe-or-maybe-not-inherited columns.  So I think the "detach" idea
is the first step on that road, and I counsel not taking that step.

(This implies that when creating a child trigger, we should error out,
*not* allow the case, if there's already a trigger by that name.  Not
sure if that's what happens today, but again I'd say that's what we
should do to avoid complicated cases.)

regards, tom lane




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Mahendra Singh Thalor
On Wed, 8 Apr 2020 at 22:11, Justin Pryzby  wrote:
>
> On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> > On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
> >  wrote:
> > > I think, Tushar point is that either we should allow both
> > > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > > both cases, we should through error.
> >
> > Oh, yeah, good point. Somebody must not've been careful enough with
> > the options-checking code.
>
> Actually I think someone was too careful.
>
> From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
> From: Justin Pryzby 
> Date: Wed, 8 Apr 2020 11:38:36 -0500
> Subject: [PATCH v1] parallel vacuum: options check to use same test as in
>  vacuumlazy.c
>
> ---
>  src/backend/commands/vacuum.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> index 351d5215a9..660c854d49 100644
> --- a/src/backend/commands/vacuum.c
> +++ b/src/backend/commands/vacuum.c
> @@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
> boolfreeze = false;
> boolfull = false;
> booldisable_page_skipping = false;
> -   boolparallel_option = false;
> ListCell   *lc;
>
> /* Set default value */
> @@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
> params.truncate = get_vacopt_ternary_value(opt);
> else if (strcmp(opt->defname, "parallel") == 0)
> {
> -   parallel_option = true;
> if (opt->arg == NULL)
> {
> ereport(ERROR,
> @@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool 
> isTopLevel)
>!(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
> Assert(!(params.options & VACOPT_SKIPTOAST));
>
> -   if ((params.options & VACOPT_FULL) && parallel_option)
> +   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>  errmsg("cannot specify both FULL and 
> PARALLEL options")));
> --
> 2.17.0
>

Thanks Justin for the patch.

Patch looks fine to me and it is fixing the issue. After applying this
patch, vacuum will work as:

1) vacuum (parallel 1, full 0);
-- vacuuming will be done with 1 parallel worker.
2) vacuum (parallel 0, full 1);
-- full vacuuming will be done.
3) vacuum (parallel 1, full 1);
-- will give error :ERROR:  cannot specify both FULL and PARALLEL options

3rd example is telling that we can't give both FULL and PARALLEL
options but in 1st and 2nd, we are giving both FULL and PARALLEL
options and we are not giving any error. I think, irrespective of
value of both FULL and PARALLEL options, we should give error in all
the above mentioned three cases.

Thoughts?

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




Re: Vacuum o/p with (full 1, parallel 0) option throwing an error

2020-04-08 Thread Justin Pryzby
On Thu, Apr 09, 2020 at 12:06:04AM +0530, Mahendra Singh Thalor wrote:
> On Wed, 8 Apr 2020 at 22:11, Justin Pryzby  wrote:
> >
> > On Wed, Apr 08, 2020 at 11:57:08AM -0400, Robert Haas wrote:
> > > On Wed, Apr 8, 2020 at 10:25 AM Mahendra Singh Thalor
> > >  wrote:
> > > > I think, Tushar point is that either we should allow both
> > > > vacuum(parallel 0, full 1) and vacuum(parallel 1, full 0) or in the
> > > > both cases, we should through error.
> > >
> > > Oh, yeah, good point. Somebody must not've been careful enough with
> > > the options-checking code.
> >
> > Actually I think someone was too careful.
> >
> > From 9256cdb0a77fb33194727e265a346407921055ef Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby 
> > Date: Wed, 8 Apr 2020 11:38:36 -0500
> > Subject: [PATCH v1] parallel vacuum: options check to use same test as in
> >  vacuumlazy.c
> >
> > ---
> >  src/backend/commands/vacuum.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
> > index 351d5215a9..660c854d49 100644
> > --- a/src/backend/commands/vacuum.c
> > +++ b/src/backend/commands/vacuum.c
> > @@ -104,7 +104,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> > boolfreeze = false;
> > boolfull = false;
> > booldisable_page_skipping = false;
> > -   boolparallel_option = false;
> > ListCell   *lc;
> >
> > /* Set default value */
> > @@ -145,7 +144,6 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> > params.truncate = get_vacopt_ternary_value(opt);
> > else if (strcmp(opt->defname, "parallel") == 0)
> > {
> > -   parallel_option = true;
> > if (opt->arg == NULL)
> > {
> > ereport(ERROR,
> > @@ -199,7 +197,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, 
> > bool isTopLevel)
> >!(params.options & (VACOPT_FULL | VACOPT_FREEZE)));
> > Assert(!(params.options & VACOPT_SKIPTOAST));
> >
> > -   if ((params.options & VACOPT_FULL) && parallel_option)
> > +   if ((params.options & VACOPT_FULL) && params.nworkers > 0)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> >  errmsg("cannot specify both FULL and 
> > PARALLEL options")));
> > --
> > 2.17.0
> >
> 
> Thanks Justin for the patch.
> 
> Patch looks fine to me and it is fixing the issue. After applying this
> patch, vacuum will work as:
> 
> 1) vacuum (parallel 1, full 0);
> -- vacuuming will be done with 1 parallel worker.
> 2) vacuum (parallel 0, full 1);
> -- full vacuuming will be done.
> 3) vacuum (parallel 1, full 1);
> -- will give error :ERROR:  cannot specify both FULL and PARALLEL options
> 
> 3rd example is telling that we can't give both FULL and PARALLEL
> options but in 1st and 2nd, we are giving both FULL and PARALLEL
> options and we are not giving any error. I think, irrespective of
> value of both FULL and PARALLEL options, we should give error in all
> the above mentioned three cases.

I think the behavior is correct, but the error message could be improved, like:
|ERROR:  cannot specify FULL with PARALLEL jobs
or similar.

-- 
Justin




  1   2   >