Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-06 Thread Kyotaro HORIGUCHI
Thank you for the comment.

At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada  
wrote in 
> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
> Thank you for updating! Here is the review comment for v8 patch.
> 
> +/*
> + * This slot still has all required segments. Calculate how many
> + * LSN bytes the slot has until it loses restart_lsn.
> + */
> +fragbytes = wal_segment_size - (currLSN % wal_segment_size);
> +XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
> fragbytes,
> +wal_segment_size, *restBytes);
> 
> For the calculation of fragbytes, I think we should calculate the
> fragment bytes of restartLSN instead. The the formula "restartSeg +
> limitSegs - currSeg" means the # of segment between restartLSN and the
> limit by the new parameter. I don't think that the summation of it and
> fragment bytes of currenLSN is correct. As the following result
> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
> actual remains + 16MB (get_bytes function returns the value of
> max_slot_wal_keep_size in bytes).

Since a oldest segment is removed after the current LSN moves to
the next segmen, current LSN naturally determines the fragment
bytes. Maybe you're concerning that the number of segments looks
too much by one segment.

One arguable point of the feature is how max_slot_wal_keep_size
works exactly. I assume that even though the name is named as
"max_", we actually expect that "at least that bytes are
kept". So, for example, with 16MB of segment size and 50MB of
max_s_w_k_s, I designed this so that the size of preserved WAL
doesn't go below 50MB, actually (rounding up to multples of 16MB
of 50MB), and loses the oldest segment when it reaches 64MB +
16MB = 80MB as you saw.

# I believe that the difference is not so significant since we
# have around hunderd or several hundreds of segments in common
# cases.

Do you mean that we should define the GUC parameter literally as
"we won't have exactly that many bytes of WAL segmetns"? That is,
we have at most 48MB preserved WAL records for 50MB of
max_s_w_k_s setting. This is the same to how max_wal_size is
counted but I don't think max_slot_wal_keep_size will be regarded
in the same way.

The another design would be that we remove the oldest segnent
when WAL reaches to 64MB and reduces to 48MB after deletion.

> ---
> For the cosmetic stuff there are code where need the line break.
> 
>  static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
> +static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr
> minSlotPtr, XLogRecPtr restartLSN, uint64 *restBytes);
>  static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
>  static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);
> 
> and
> 
>  +static XLogSegNo
> +GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> XLogRecPtr restartLSN, uint64 *restBytes)
> +{

Thanks, I folded the parameter list in my working repository.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: A strange GiST error message or fillfactor of GiST build

2018-09-06 Thread Kyotaro HORIGUCHI

Hello.

> Just my 2 cents: that was a hacky use case for development reasons.

I know. So I intended to preserve the parameter with default of 100% if  
no one strongly objects to preserve. Then I abandoned that since I had:p


> I think that removing fillfactor is good idea and your latest patch
> looks good from my POV.

Thanks.

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Reopen logfile on SIGHUP

2018-09-06 Thread Kyotaro HORIGUCHI
At Sat, 1 Sep 2018 19:52:16 +0300, Alexander Korotkov 
 wrote in 

> On Thu, Aug 30, 2018 at 2:44 PM Kyotaro HORIGUCHI
>  wrote:
> > At Thu, 30 Aug 2018 13:42:42 +0300, Alexander Korotkov 
> >  wrote in 
> > 
> > > It seems that http://commitfest.cputube.org/ runs only "make check" on
> > > Windows.  But my Postgres Pro colleagues checked that tests passed on
> > > 32-bit and 64-bit versions of Windows Server 2008.  Also I made some
> > > minor beautifications on code and documentation.
> > >
> > > This patch seems to have good shape and generally being quite
> > > harmless.  Do we have any objections to committing this?
> >
> > I checked that on my Win7 box and worked. Of course I have no
> > objection.
> 
> So, pushed.  Thank you.

Thanks!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: remove ancient pre-dlopen dynloader code

2018-09-06 Thread Peter Eisentraut
On 01/09/2018 06:51, Peter Eisentraut wrote:
>> How about this: We only have two nonstandard dlopen() implementations
>> left: Windows and (old) HP-UX.  We move those into src/port/dlopen.c and
>> treat it like a regular libpgport member.  That gets rid of all those
>> duplicative empty per-platform files.
> Updated patch.  It needed some adjustments for Windows, per Appveyor,

I'm going to use this thread for a moment to work out some details with
the cfbot.

The v2 patch I sent previously was created using git format-patch with
default settings.  This detected a rename:

 rename src/{backend/port/dynloader/win32.c => port/dlopen.c} (51%)

which is fair enough.  However, whatever method the cfbot uses to apply
patches fails to handle that.  The tree that is sent for testing by
Appveyor still contains a full src/backend/port/dynloader/win32.c and no
src/port/dlopen.c, which expectedly makes the build fail.  (It also
still contains src/backend/port/dynloader/otherplatform.c, but empty, so
the patching doesn't remove empty files, which is another but minor
problem.)

The v3 patch attached here was made with git format-patch --no-renames.
Let's see how that works out.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 297f4f6ac3ae46b3678554964a5985d71e5908b9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 6 Sep 2018 10:07:24 +0200
Subject: [PATCH v3] Refactor dlopen() support

Nowadays, all platforms except Windows and older HP-UX have standard
dlopen() support.  So having a separate implementation per platform
under src/backend/port/dynloader/ is a bit excessive.  Instead, treat
dlopen() like other library functions that happen to be missing
sometimes and put a replacement implementation under src/port/.
---
 configure|  43 +++-
 configure.in |   8 +-
 src/backend/Makefile |   2 +-
 src/backend/port/.gitignore  |   1 -
 src/backend/port/Makefile|   2 +-
 src/backend/port/dynloader/aix.c |   7 --
 src/backend/port/dynloader/aix.h |  39 ---
 src/backend/port/dynloader/cygwin.c  |   3 -
 src/backend/port/dynloader/cygwin.h  |  36 ---
 src/backend/port/dynloader/darwin.c  |  35 ---
 src/backend/port/dynloader/darwin.h  |   8 --
 src/backend/port/dynloader/freebsd.c |   7 --
 src/backend/port/dynloader/freebsd.h |  38 ---
 src/backend/port/dynloader/hpux.c|  68 -
 src/backend/port/dynloader/hpux.h|  25 -
 src/backend/port/dynloader/linux.c   |   7 --
 src/backend/port/dynloader/linux.h   |  38 ---
 src/backend/port/dynloader/netbsd.c  |   7 --
 src/backend/port/dynloader/netbsd.h  |  38 ---
 src/backend/port/dynloader/openbsd.c |   7 --
 src/backend/port/dynloader/openbsd.h |  38 ---
 src/backend/port/dynloader/solaris.c |   7 --
 src/backend/port/dynloader/solaris.h |  38 ---
 src/backend/port/dynloader/win32.c   |  85 
 src/backend/port/dynloader/win32.h   |  19 
 src/backend/postmaster/postmaster.c  |   1 -
 src/backend/utils/fmgr/dfmgr.c   |  31 +++---
 src/include/.gitignore   |   1 -
 src/include/Makefile |   4 +-
 src/include/pg_config.h.in   |   8 ++
 src/include/pg_config.h.win32|   8 ++
 src/include/port.h   |  23 +
 src/include/utils/dynamic_loader.h   |  25 -
 src/port/dlopen.c| 145 +++
 src/tools/msvc/Install.pm|   5 +-
 src/tools/msvc/Mkvcbuild.pm  |   5 +-
 src/tools/msvc/Solution.pm   |   7 --
 src/tools/msvc/clean.bat |   1 -
 38 files changed, 251 insertions(+), 619 deletions(-)
 delete mode 100644 src/backend/port/dynloader/aix.c
 delete mode 100644 src/backend/port/dynloader/aix.h
 delete mode 100644 src/backend/port/dynloader/cygwin.c
 delete mode 100644 src/backend/port/dynloader/cygwin.h
 delete mode 100644 src/backend/port/dynloader/darwin.c
 delete mode 100644 src/backend/port/dynloader/darwin.h
 delete mode 100644 src/backend/port/dynloader/freebsd.c
 delete mode 100644 src/backend/port/dynloader/freebsd.h
 delete mode 100644 src/backend/port/dynloader/hpux.c
 delete mode 100644 src/backend/port/dynloader/hpux.h
 delete mode 100644 src/backend/port/dynloader/linux.c
 delete mode 100644 src/backend/port/dynloader/linux.h
 delete mode 100644 src/backend/port/dynloader/netbsd.c
 delete mode 100644 src/backend/port/dynloader/netbsd.h
 delete mode 100644 src/backend/port/dynloader/openbsd.c
 delete mode 100644 src/backend/port/dynloader/openbsd.h
 delete mode 100644 src/backend/port/dynloader/solaris.c
 delete mode 100644 src/backend/port/dynloader/solaris.h
 delete mode 100644 src/backend/port/dynloader/win32.c
 delete mode 100644 src/backend/port/dynloader/win32.h
 delete mode 100644 src/include/utils/dynamic_loader.h
 create mode 100644 src/port/dlopen.c

diff --git 

Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
Ok, so here's my current patch (work in progress).  I have not run tests
yet, and finding a way to add a test case is virtually impossible though I
expect we will find ways of using gdb to confirm local fix later.

After careful code review, I settled on the following approach which seemed
to be the least intrusive.

1.  Change the retry logic so that it does not retry of QueryCancelPending
or ProcDiePending are set.  In these cases EINTR is passed back to the
caller
2.  Change the error handling logic of the caller so that EINTR is handled
by the next CHECK_FOR_INTERRUPTS() after cleanup.

This means that the file descriptor is now already closed and that we
handle this the same way we would with a ENOSPC.  The reason for this is
that there are many places in the code where it is not clear what degree of
safety is present for throwing errors using ereport, and the patch needs to
be as unobtrusive as possible to facilitate back porting.

At this point the patch is for discussion.  I have not even compiled it or
tested it but maybe there is something wrong with my approach so figured I
would send it out early.

The major assumptions are:
1.  close() will never take longer than the interrupt interval that caused
the loop to break.
2.  close() does not get interrupted in a way that will not cause cleanup
problems later.
3. We will reach the next interrupt check at a reasonable interval and safe
spot.

Any initial arguments against?

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-06 Thread Alexander Korotkov
On Thu, Sep 6, 2018 at 12:53 AM R, Siva  wrote:
> On Tue, Sep 5, 2018 at 08:55 PM, Alexander Korotkov 
>  wrote:
> > Finally I managed to reproduce the bug.  The scenario is following.
> > Underlying idea is that when insertion of multiple tuples goes to the
> > beginning of the page and this insertion succeed only thanks to
> > collapse of some short segments together, then this insertion wouldn't
> > fit to the page if given alone.
>
> Sorry for the late reply.
> Thank you so much for working on this and reproducing the issue!
> Like you mentioned, the WAL record where we detected this problem
> has future segments deleted due to compaction and written out
> as an insert segment.
>
> > alter index test_idx set (fastupdate = on);
> Just curious why does this help with the repro? This is related to only
> using the Gin pending list vs the posting tree.

With (fastupdate = on) GIN performs bulk update of posting lists,
inserting multiple tuples at once if possible.  With (fastupdate =
off) GIN always inserts tuples one-by-one.  It might be still possible
to reproduce the issue with (fastupdate = off), but it seems even
harder.

BTW, I've tried the patch you've posted.  On my test case it fails
with following assertion.
TRAP: FailedAssertion("!(a_action == 2)", File: "ginxlog.c", Line: 243)

I thought about fixing this issue more, and I decided we can fix it in
less invasive way.  Once modification is started we can copy tail of
the page into separately allocated chunk of memory, and the use it as
the source of original segments.  See the patch attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


ginRedoRecompress-tail-copy-v1.patch
Description: Binary data


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2018-09-06 Thread Kyotaro HORIGUCHI
At Sun, 2 Sep 2018 07:04:19 +1200, Thomas Munro  
wrote in 
> > > # Is it intentional that the patch doesn't touch pgstat.c?
> >
> > Yes.  pgstat.c still uses WL_POSTMASTER_DEATH because it does
> > something special: it calls pgstat_write_statsfiles() before it exits.

Mmm. Exactly..

> Rebased.

Thank you for the new version.

===
In sysloger.c, cur_flags is (just set but) no longer used.

===
In latch.c,

- The parentheses around the symbols don't seem to be needed.
|   (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
|   (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

- The following assertion looks contradicting to the comment.
|/* Postmaster-managed callers must handle postmaster death somehow. */
|Assert(!IsUnderPostmaster ||
|   (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
|   (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

  (Maybe Assert(IsUnderPost && (wakeEv & (WL_EXI | WL_PO)) != 0);)

  And don't we need a description about this restriction in the
  function comment?

- I think it may be better that the followings had parentheses
  around '&' expressions.

|if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster)
|if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster)


===
All the caller sites of WaitLatch, WaitLatchOrSocket and
WaitEventSetWait are covered by this patch and all them look
fine.


bgworker.c, pgstat.c, be-secure-openssl.c, be-secure.c:
  Not modified on purpose. WL_EXIT_POSTMASTER_DEATH is not proper
  to use there.

pgarch.c, syncrep.c, walsender.c:
  Removed redundant check of postmaster death.

syslogger.c:
  Left as it doesn't exit at postmaster death on purpose.  Uses
  reusable wait event set.

walrceiver.c:
  Adds new bailing out point in WalRcvWaitForStartPosition and it
  seems reasonable.

shm_mq.c:
  Adds PMdie bail out in shm_mq_send/receive_bytes, wait_internal.
  It looks fine.

postgres_fdw/connection.c:
  Adds pm_die bailout while getting result. This seems to be a bug fix.
  I'm fine with it included in this patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-09-06 Thread Andrey Lepikhov

And background worker patch in attachment.

05.09.2018 15:25, Andrey Lepikhov пишет:

Hi,

I prepared next version of Background worker (cleaner) based on a retail 
indextuple deletion patch.
This version shows stable behavior on regression tests and pgbench 
workloads.


In this version:
1. Only AccessShareLock are acquired on a cleanup of heap and index 
relations.
2. Some 'aggressive' cleanup strategy introduced - conditional cleanup 
locks not used.

3. Cleanup only an in-memory blocks.
4. The Cleaner calls heap_page_prune() before cleanup a block.

Benchmarks
-

Two factors were evaluated: performance (tps) and relations blowing.

Before each test some rarefaction of pgbench_accounts was modeled by 
deletion 10% of tuples at each block.
Also, I tested normal and Gaussian distribution of queries on 
pgbench_accounts relation.

Autovacuum uses default settings.

Script:
pgbench -i -s 10
psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);"
psql -c $"VACUUM;"
psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts 
USING btree (abalance);" &&

pgbench -T 3600 -c 32 -j 8 -M prepared -P 600

NORMAL distribution:
average tps = 1045 (cleaner); = 1077 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum)
pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum)
pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum)
pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum)

Gaussian distribution:
average tps = 213 (cleaner); = 213 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum)

Conclusions
---
1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by 
ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the 
case of 100% filling of each relation block we get a blowing HEAP and 
index , more or less. When the blocks already have free space, the 
cleaner can delay blowing the heap and index without a vacuum.
2. Cleaner works fine in the case of skewness of access frequency to 
relation blocks.

3. The cleaner does not cause a decrease of performance.



--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

>From b092dfd95b8673d692730ac27a1d6fdb76b66601 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 6 Sep 2018 11:05:42 +0300
Subject: [PATCH 5/5] Heap-and-Index-cleaner

---
 .../postgres_fdw/expected/postgres_fdw.out|   30 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |4 +-
 src/backend/access/heap/pruneheap.c   |6 +-
 src/backend/access/nbtree/nbtree.c|   15 +-
 src/backend/access/transam/xact.c |4 +
 src/backend/catalog/system_views.sql  |   11 +
 src/backend/commands/vacuumlazy.c |   44 +-
 src/backend/postmaster/Makefile   |2 +-
 src/backend/postmaster/bgheap.c   | 1990 +
 src/backend/postmaster/pgstat.c   |   36 +
 src/backend/postmaster/postmaster.c   |  160 +-
 src/backend/storage/buffer/bufmgr.c   |   60 +-
 src/backend/storage/buffer/localbuf.c |   24 +
 src/backend/storage/ipc/ipci.c|3 +
 src/backend/storage/lmgr/lwlocknames.txt  |1 +
 src/backend/storage/lmgr/proc.c   |5 +-
 src/backend/tcop/postgres.c   |   12 +
 src/backend/utils/adt/pgstatfuncs.c   |2 +
 src/backend/utils/hash/Makefile   |2 +-
 src/backend/utils/hash/shash.c|  265 +++
 src/backend/utils/init/miscinit.c |3 +-
 src/backend/utils/init/postinit.c |   11 +-
 src/include/commands/progress.h   |   12 +
 src/include/commands/vacuum.h |3 +
 src/include/pgstat.h  |   14 +-
 src/include/postmaster/bgheap.h   |   36 +
 src/include/storage/buf_internals.h   |1 +
 src/include/storage/bufmgr.h  |5 +-
 src/include/storage/pmsignal.h|2 +
 src/include/utils/shash.h |   54 +
 src/test/regress/expected/rules.out   |   18 +-
 src/test/regress/expected/triggers.out|2 +-
 src/test/regress/input/constraints.source |   12 +-
 src/test/regress/output/constraints.source|   34 +-
 src/test/regress/sql/rules.sql|2 +-
 src/test/regress/sql/triggers.sql |2 +-
 36 files changed, 2803 insertions(+), 84 deletions(-)
 create mode 100644 src/backend/postmaster/bgheap.c
 create mode 100644 src/backend/utils/hash/shash.c
 create mode 100644 src/include/postmaster/bgheap.h
 create mode 100644 src/include/utils/shash.h

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c62bd..622bea2a7d 100644
--- a/contrib/pos

Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-06 Thread Masahiko Sawada
On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you for the comment.
>
> At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada  
> wrote in 
>> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
>> Thank you for updating! Here is the review comment for v8 patch.
>>
>> +/*
>> + * This slot still has all required segments. Calculate how many
>> + * LSN bytes the slot has until it loses restart_lsn.
>> + */
>> +fragbytes = wal_segment_size - (currLSN % wal_segment_size);
>> +XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
>> fragbytes,
>> +wal_segment_size, *restBytes);
>>
>> For the calculation of fragbytes, I think we should calculate the
>> fragment bytes of restartLSN instead. The the formula "restartSeg +
>> limitSegs - currSeg" means the # of segment between restartLSN and the
>> limit by the new parameter. I don't think that the summation of it and
>> fragment bytes of currenLSN is correct. As the following result
>> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the
>> actual remains + 16MB (get_bytes function returns the value of
>> max_slot_wal_keep_size in bytes).
>
> Since a oldest segment is removed after the current LSN moves to
> the next segmen, current LSN naturally determines the fragment
> bytes. Maybe you're concerning that the number of segments looks
> too much by one segment.
>
> One arguable point of the feature is how max_slot_wal_keep_size
> works exactly. I assume that even though the name is named as
> "max_", we actually expect that "at least that bytes are
> kept". So, for example, with 16MB of segment size and 50MB of
> max_s_w_k_s, I designed this so that the size of preserved WAL
> doesn't go below 50MB, actually (rounding up to multples of 16MB
> of 50MB), and loses the oldest segment when it reaches 64MB +
> 16MB = 80MB as you saw.
>
> # I believe that the difference is not so significant since we
> # have around hunderd or several hundreds of segments in common
> # cases.
>
> Do you mean that we should define the GUC parameter literally as
> "we won't have exactly that many bytes of WAL segmetns"? That is,
> we have at most 48MB preserved WAL records for 50MB of
> max_s_w_k_s setting. This is the same to how max_wal_size is
> counted but I don't think max_slot_wal_keep_size will be regarded
> in the same way.

I might be missing something but what I'm expecting to this feature is
to restrict the how much WAL we can keep at a maximum for replication
slots. In other words, the distance between the current LSN and the
minimum restart_lsn of replication slots doesn't over the value of
max_slot_wal_keep_size. It's similar to wal_keep_segments except for
that this feature affects only replication slots. And
wal_keep_segments cannot restrict WAL that replication slots are
holding. For example, with 16MB of segment size and 50MB of
max_slot_wal_keep_size, we can keep at most 50MB WAL for replication
slots. However, once we consumed more than 50MB WAL while not
advancing any restart_lsn the required WAL might be lost by the next
checkpoint, which depends on the min_wal_size. On the other hand, if
we mostly can advance restart_lsn to approximately the current LSN the
size of preserved WAL for replication slots can go below 50MB.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2018-09-06 Thread Kyotaro HORIGUCHI
Ugrrr! PLEASE ignore this! It's not wrong at all.

2018年9月6日(木) 18:58 Kyotaro HORIGUCHI :

> - The following assertion looks contradicting to the comment.
> |/* Postmaster-managed callers must handle postmaster death somehow. */
> |Assert(!IsUnderPostmaster ||
> |   (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
> |   (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);
>

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
On Thu, Sep 6, 2018 at 11:08 AM Chris Travers 
wrote:

> Ok, so here's my current patch (work in progress).  I have not run tests
> yet, and finding a way to add a test case is virtually impossible though I
> expect we will find ways of using gdb to confirm local fix later.
>
> After careful code review, I settled on the following approach which
> seemed to be the least intrusive.
>
> 1.  Change the retry logic so that it does not retry of QueryCancelPending
> or ProcDiePending are set.  In these cases EINTR is passed back to the
> caller
> 2.  Change the error handling logic of the caller so that EINTR is handled
> by the next CHECK_FOR_INTERRUPTS() after cleanup.
>
> This means that the file descriptor is now already closed and that we
> handle this the same way we would with a ENOSPC.  The reason for this is
> that there are many places in the code where it is not clear what degree of
> safety is present for throwing errors using ereport, and the patch needs to
> be as unobtrusive as possible to facilitate back porting.
>
> At this point the patch is for discussion.  I have not even compiled it or
> tested it but maybe there is something wrong with my approach so figured I
> would send it out early.
>
> The major assumptions are:
> 1.  close() will never take longer than the interrupt interval that caused
> the loop to break.
> 2.  close() does not get interrupted in a way that will not cause cleanup
> problems later.
> 3. We will reach the next interrupt check at a reasonable interval and
> safe spot.
>
> Any initial arguments against?
>

The previous patch had two issues found on internal code review here.  I am
sending a new patch.

This patch compiles.  make check passes
make check-world fails but the errors are the same as found on master and
involve ecpg.

We will be doing further internal review and verification and then will run
through pg_indent before final submission.

Changes in this patch:
1.  Previous patch had backwards check for EINTR in calling function.  This
was fixed.
2.  In cases where ERROR elevel was passed in, function would return
instead of error out on case of error.

So in this case we check if we expect to error on error and if so, check
for interrupts.  If not, we go through the normal error handling/logging
path which might result in an warning that shared memory segment could not
be allocated followed by an actual meaningful error.  I could put that in
an else if, if that is seen as a good idea, but figured I would raise it as
a discussion point.


>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: [HACKERS] Bug in to_timestamp().

2018-09-06 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
> On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov  
> wrote:
>> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
>> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
>> >  wrote:
>> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>> > >  wrote:
>> > > > From those results the question is how important is it to force the 
>> > > > following breakage on our users (i.e., introduce FX exact symbol 
>> > > > matching):
>> > > >
>> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
>> > > > - to_timestamp
>> > > > ---
>> > > > - Sun Feb 16 00:00:00 1997 PST
>> > > > -(1 row)
>> > > > -
>> > > > +ERROR:  unexpected character "/", expected character ":"
>> > > > +HINT:  In FX mode, punctuation in the input string must exactly match 
>> > > > the format string.
>> > > >
>> > > > There seemed to be some implicit approvals of this breakage some 30 
>> > > > emails and 10 months ago but given that this is the only change from a 
>> > > > correct result to a failure I'd like to officially put it out there 
>> > > > for opinion/vote gathering.   Mine is a -1; though keeping the 
>> > > > distinction between space and non-alphanumeric characters is expected.
>> > >
>> > > Do I understand correctly that you're -1 to changes to FX mode, but no
>> > > objection to changes in non-FX mode?
>> > >
>> > Ditto.
>>
>> So, if no objections for non-FX mode changes, then I'll extract that
>> part and commit it separately.
>
>
> Yeah, that make sense to me, thank you.

OK!  I've removed FX changes from the patch.  The result is attached.
I'm going to commit this if no objections.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v18.patch
Description: Binary data


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
On Thu, Sep 6, 2018 at 1:31 PM Chris Travers 
wrote:

>
>
> On Thu, Sep 6, 2018 at 11:08 AM Chris Travers 
> wrote:
>
>> Ok, so here's my current patch (work in progress).  I have not run tests
>> yet, and finding a way to add a test case is virtually impossible though I
>> expect we will find ways of using gdb to confirm local fix later.
>>
>> After careful code review, I settled on the following approach which
>> seemed to be the least intrusive.
>>
>> 1.  Change the retry logic so that it does not retry of
>> QueryCancelPending or ProcDiePending are set.  In these cases EINTR is
>> passed back to the caller
>> 2.  Change the error handling logic of the caller so that EINTR is
>> handled by the next CHECK_FOR_INTERRUPTS() after cleanup.
>>
>> This means that the file descriptor is now already closed and that we
>> handle this the same way we would with a ENOSPC.  The reason for this is
>> that there are many places in the code where it is not clear what degree of
>> safety is present for throwing errors using ereport, and the patch needs to
>> be as unobtrusive as possible to facilitate back porting.
>>
>> At this point the patch is for discussion.  I have not even compiled it
>> or tested it but maybe there is something wrong with my approach so figured
>> I would send it out early.
>>
>> The major assumptions are:
>> 1.  close() will never take longer than the interrupt interval that
>> caused the loop to break.
>> 2.  close() does not get interrupted in a way that will not cause cleanup
>> problems later.
>> 3. We will reach the next interrupt check at a reasonable interval and
>> safe spot.
>>
>> Any initial arguments against?
>>
>
> The previous patch had two issues found on internal code review here.  I
> am sending a new patch.
>
> This patch compiles.  make check passes
> make check-world fails but the errors are the same as found on master and
> involve ecpg.
>
> We will be doing further internal review and verification and then will
> run through pg_indent before final submission.
>
> Changes in this patch:
> 1.  Previous patch had backwards check for EINTR in calling function.
> This was fixed.
> 2.  In cases where ERROR elevel was passed in, function would return
> instead of error out on case of error.
>
> So in this case we check if we expect to error on error and if so, check
> for interrupts.  If not, we go through the normal error handling/logging
> path which might result in an warning that shared memory segment could not
> be allocated followed by an actual meaningful error.  I could put that in
> an else if, if that is seen as a good idea, but figured I would raise it as
> a discussion point.
>

Attaching correct patch

>
>
>>
>> --
>> Best Regards,
>> Chris Travers
>> Head of Database
>>
>> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
>> Saarbrücker Straße 37a, 10405 Berlin
>>
>>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


racecondition.patch
Description: Binary data


Re: Optimze usage of immutable functions as relation

2018-09-06 Thread Aleksandr Parfenov
On Tue, 10 Jul 2018 17:34:20 -0400
Tom Lane  wrote:

>Heikki Linnakangas  writes:
>> But stepping back a bit, it's a bit weird that we're handling this 
>> differently from VALUES and other subqueries. The planner knows how
>> to do this trick for simple subqueries:  
>
>> postgres=# explain select * from tenk1, (select abs(100)) as a (a)
>> where unique1 < a;
>>  QUERY PLAN
>> ---
>>   Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
>> Filter: (unique1 < 100)
>> (2 rows)  
>
>> Note that it not only evaluated the function into a constant, but
>> also got rid of the join. For a function RTE, however, it can't do
>> that:  
>
>> postgres=# explain select * from tenk1, abs(100) as a (a) where
>> unique1 < a; QUERY PLAN
>> ---
>>   Nested Loop  (cost=0.00..583.01 rows= width=248)
>> Join Filter: (tenk1.unique1 < a.a)  
>> ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
>> ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=1 width=244)  
>> (4 rows)  
>
>> Could we handle this in pull_up_subqueries(), similar to the 
>> RTE_SUBQUERY and RTE_VALUES cases?  
>
>Perhaps.  You could only do it for non-set-returning functions, which
>isn't the typical use of function RTEs, which is probably why we've not
>thought hard about it before.  I'm not sure what would need to happen
>for lateral functions.  Also to be considered, if it's not foldable to
>a constant, is whether we're risking evaluating it more times than
>before.
>
>   regards, tom lane

I reworked the patch and implemented processing of FuncScan in
pull_up_subqueries() in a way similar to VALUES processing. In order to
prevent folding of non-foldable functions it checks provolatile of the
function and are arguments const or not and return type to prevent
folding of SRF.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index c3f46a26c3..25539bbfae 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -23,7 +23,9 @@
  */
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "catalog/pg_type.h"
+#include "catalog/pg_proc.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -35,6 +37,8 @@
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
 
 
 typedef struct pullup_replace_vars_context
@@ -86,6 +90,8 @@ static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte,
    bool deletion_ok);
 static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode,
 	  RangeTblEntry *rte);
+static Node *pull_up_simple_function(PlannerInfo *root, Node *jtnode,
+	  RangeTblEntry *rte);
 static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte,
  bool deletion_ok);
 static bool is_simple_union_all(Query *subquery);
@@ -595,6 +601,54 @@ inline_set_returning_functions(PlannerInfo *root)
 	}
 }
 
+static bool
+is_simple_stable_function(RangeTblEntry *rte)
+{
+	Form_pg_type type_form;
+	RangeTblFunction *tblFunction = linitial_node(RangeTblFunction, rte->functions);
+	FuncExpr   *expr = (FuncExpr *) tblFunction->funcexpr;
+	HeapTuple tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(expr->funcresulttype));
+	bool		result = false;
+
+	if (!HeapTupleIsValid(tuple))
+		elog(ERROR, "cache lookup failed for type %u", expr->funcresulttype);
+
+	type_form = (Form_pg_type) GETSTRUCT(tuple);
+
+	if (type_form->typtype == TYPTYPE_BASE &&
+		!type_is_array(expr->funcresulttype))
+	{
+		Form_pg_proc func_form;
+		ListCell   *arg;
+		bool		has_nonconst_input = false;
+		bool		has_null_input = false;
+
+		ReleaseSysCache(tuple);
+		tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(expr->funcid));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for function %u", expr->funcid);
+		func_form = (Form_pg_proc) GETSTRUCT(tuple);
+
+		foreach(arg, expr->args)
+		{
+			if (IsA(lfirst(arg), Const))
+has_null_input |= ((Const *) lfirst(arg))->constisnull;
+			else
+has_nonconst_input = true;
+		}
+
+		result = func_form->prorettype != RECORDOID &&
+ func_form->prokind == PROKIND_FUNCTION &&
+ !func_form->proretset &&
+ func_form->provolatile == PROVOLATILE_IMMUTABLE &&
+ !has_null_input &&
+ !has_nonconst_input;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
+
 /*
  * pull_up_subqueries
  *		Look for subqueries in the rangetable that can be pulled up into
@@ -725,6 +779,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
 			is_simple_values(root, rte, deletion_ok))
 			return pull_up_simple_values(root, jtnode, rte);
 
+		if (rte->rtekind 

Re: remove ancient pre-dlopen dynloader code

2018-09-06 Thread Peter Eisentraut
On 06/09/2018 10:16, Peter Eisentraut wrote:
> The v3 patch attached here was made with git format-patch --no-renames.
> Let's see how that works out.

That worked, and the patch has been committed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



[no subject]

2018-09-06 Thread call_ji...@126.com
Why  can not those shared catalog tables  take  lock  via fast  path  if 
eligible? User  tables and local catalog tables can take lock  via fast path .

Re: [HACKERS] Bug in to_timestamp().

2018-09-06 Thread Alexander Korotkov
On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
 wrote:
>
> On Wed, Sep 5, 2018 at 7:28 PM amul sul  wrote:
> > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov  
> > wrote:
> >> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
> >> >  wrote:
> >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> >> > >  wrote:
> >> > > > From those results the question is how important is it to force the 
> >> > > > following breakage on our users (i.e., introduce FX exact symbol 
> >> > > > matching):
> >> > > >
> >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> >> > > > - to_timestamp
> >> > > > ---
> >> > > > - Sun Feb 16 00:00:00 1997 PST
> >> > > > -(1 row)
> >> > > > -
> >> > > > +ERROR:  unexpected character "/", expected character ":"
> >> > > > +HINT:  In FX mode, punctuation in the input string must exactly 
> >> > > > match the format string.
> >> > > >
> >> > > > There seemed to be some implicit approvals of this breakage some 30 
> >> > > > emails and 10 months ago but given that this is the only change from 
> >> > > > a correct result to a failure I'd like to officially put it out 
> >> > > > there for opinion/vote gathering.   Mine is a -1; though keeping the 
> >> > > > distinction between space and non-alphanumeric characters is 
> >> > > > expected.
> >> > >
> >> > > Do I understand correctly that you're -1 to changes to FX mode, but no
> >> > > objection to changes in non-FX mode?
> >> > >
> >> > Ditto.
> >>
> >> So, if no objections for non-FX mode changes, then I'll extract that
> >> part and commit it separately.
> >
> >
> > Yeah, that make sense to me, thank you.
>
> OK!  I've removed FX changes from the patch.  The result is attached.
> I'm going to commit this if no objections.

Attached revision fixes usage of two subsequent spaces in the documentation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-to-timestamp-format-checking-v19.patch
Description: Binary data


Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread Surafel Temesgen
On Tue, Jul 3, 2018 at 1:11 AM, Carter Thaxton 
wrote:

> The whole reason for the colon in the --where option is to indicate which
> table the WHERE clause should refer to, so that one can dump less than all
> of the rows.
> The --table option is totally different.  It specifies which tables to
> dump at all.
>

Thank you for explaining,

I just have one comment . I found the error message generated on incorrect
where clause specification strange for pg_dump. I think query result status
check needed to handle it and generate more friendly error message.


regards

Surafel


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-06 Thread Chris Travers
As a status note, the above patch has not been run through pg_indent and
while I have run make check-world on linux (passed) and a non-Linux system
(which failed both with and without my patch) I will be making one more
small revision before final submission unless anyone thinks I need to
change approaches.

Currently, all code paths that hit the resize call the relevant functions
with an ERROR elevel.  So running CHECK_FOR_INTERRUPTS after the initial
cleanup is currently safe, but the surrounding function doesn't guarantee
safety so I don't think it is a good idea to raise exceptions there.

In above patch I test for whether the level is ERROR and check for
interrupts when that is true.  However as a colleague pointed out to me, if
anyone ever calls this with FATAL or PANIC as the desired error level, that
would not be safe so I will be checking for whether the level is equal to
or greater than ERROR.

We will also go through a manual test using a debugger to try to test the
behavior in these cases and make sure the patch actually resolves our
problem.

Assuming no objections here and that the manual test works, you can expect
a formal submission in the next couple of days.

>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-06 Thread Tom Lane
Andrew Gierth  writes:
> "Michael" == Michael Paquier  writes:
>  Michael> prairiedog is unhappy with this commit:

> What version of GNU Make is on there, do you know? Tom? I don't see it
> mentioned in the output anywhere.

$ make -v
GNU Make 3.80
Copyright (C) 2002  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

regards, tom lane



Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-06 Thread Tom Lane
Andrew Gierth  writes:
> Consulting old manuals suggests it may have version 3.80 (c. 2002),
> which lacks the 'else if..' syntax introduced in 3.81 (c. 2006).

Right; I haven't updated it because that's specifically called out to
be the minimum supported version in our docs (cf installation.sgml).

regards, tom lane



Re: pread() and pwrite()

2018-09-06 Thread Jesper Pedersen

Hi,

On 09/05/2018 02:42 PM, Jesper Pedersen wrote:

On 07/26/2018 10:04 PM, Thomas Munro wrote:

Done.  Rebased.



This needs a rebase again.



Would it be of benefit to update these call sites

* slru.c
  - SlruPhysicalReadPage
  - SlruPhysicalWritePage
* xlogutils.c
  - XLogRead
* pg_receivewal.c
  - FindStreamingStart
* rewriteheap.c
  - heap_xlog_logical_rewrite
* walreceiver.c
  - XLogWalRcvWrite

too ?

Best regards,
 Jesper




Re: remove ancient pre-dlopen dynloader code

2018-09-06 Thread Tom Lane
Peter Eisentraut  writes:
> On 06/09/2018 10:16, Peter Eisentraut wrote:
>> The v3 patch attached here was made with git format-patch --no-renames.
>> Let's see how that works out.

> That worked, and the patch has been committed.

Sure enough, gaur's not happy.  I'll take a look in a bit.

regards, tom lane



Re: remove ancient pre-dlopen dynloader code

2018-09-06 Thread Thomas Munro
On Thu, Sep 6, 2018 at 1:16 AM Peter Eisentraut
 wrote:
> I'm going to use this thread for a moment to work out some details with
> the cfbot.
>
> The v2 patch I sent previously was created using git format-patch with
> default settings.  This detected a rename:
>
>  rename src/{backend/port/dynloader/win32.c => port/dlopen.c} (51%)
>
> which is fair enough.  However, whatever method the cfbot uses to apply
> patches fails to handle that.

Interesting.  Its version of "patch" doesn't understand that.  I am
not sure if other versions of patch do.  Currently cfbot doesn't use
git am because not everyone is posting patches created with
format-patch, and I thought good old patch could handle basically
anything.  I wasn't aware of this quirk.   I'll see if there is some
way I can convince patch to respect renames, or I should try to apply
with git first and then fall back to patch only if that fails.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread Jeremy Finzel
On Mon, May 21, 2018 at 6:34 AM Carter Thaxton 
wrote:

> Many times I've wanted to export a subset of a database, using some sort
> of row filter condition on some of the large tables.  E.g. copying a
> production database to a staging environment, but with some time series
> data only from the past month.
>
> We have the existing options:
>   --include-table=table(and its -t synonym)
>   --exclude-table=table
>   --exclude-table-data=table
>
> I propose a new option:
>   --include-table-data-where=table:filter_clause
>
> One would use this option as follows:
>
>   pg_dump --include-table-data-where=largetable:"created_at >=
> '2018-05-01'" database_name
>
> The filter_clause is used as the contents of a WHERE clause when querying
> the data to generate the COPY statement produced by pg_dump.
>
> I've prepared a proposed patch for this, which is attached.  The code
> changes are rather straightforward.  I did have to add the ability to carry
> around an extra pointer-sized object to the simple_list implementation, in
> order to allow the filter clause to be associated to the matching oids of
> the table pattern.  It seemed the best way to augment the existing
> simple_list implementation, but change as little as possible elsewhere in
> the codebase.  (Note that SimpleOidList is actually only used by pg_dump).
>
> Feel free to review and propose any amendments.
>
>
Why not simply use \copy (select * from largetable where created_at >=
'2018-05-01') to stdout? That is what I’ve always done when I need
something like this and have not found it particularly bothersome but
rather quite powerful. And here you have tons of flexibility because you
can do joins and whatever else.

FWIW. Thanks,
Jeremy


Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread David G. Johnston
On Thu, Sep 6, 2018 at 8:40 AM, Jeremy Finzel  wrote:

> Why not simply use \copy (select * from largetable where created_at >=
>> '2018-05-01') to stdout? That is what I’ve always done when I need
>> something like this and have not found it particularly bothersome but
>> rather quite powerful. And here you have tons of flexibility because you
>> can do joins and whatever else.
>>
>
Just skimming the thread but I'd have to say being able to leverage
pg_dump's dependency resolution is a major reason for adding features to it
instead sticking to writing psql scripts.  This feature in a multi-tenant
situation is something with, I suspect, reasonably wide appeal.

David J.


Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data

2018-09-06 Thread Jeremy Finzel
>
> Just skimming the thread but I'd have to say being able to leverage
> pg_dump's dependency resolution is a major reason for adding features to it
> instead sticking to writing psql scripts.  This feature in a multi-tenant
> situation is something with, I suspect, reasonably wide appeal.
>

That I would agree with if in fact it's true people want that, but that
wasn't how the problem trying to be solved was communicated.  From what I
read in the initial examples given, just using psql is more than sufficient
in those cases.

I will grant that copying the structure and data at the same time would be
much easier, however.  Because using psql, you need pg_dump to create your
structure then a separate psql script to copy the data.

But again with --data-only examples given, it's so easy to do that with
psql copy I just don't understand the value of the feature unless you
really are saying you require the dependency resolution.

I agree with some of the hesitation of complicating the syntax and allowing
too much customization for what pg_dump is designed for.  Really, if you
need more customization, copy gives you that.  So I don't personally
consider it a missing feature because both tools have different uses and I
haven't found that any of this disrupts my workflow.  FWIW...

Thanks,
Jeremy


Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> Consulting old manuals suggests it may have version 3.80 (c. 2002),
 >> which lacks the 'else if..' syntax introduced in 3.81 (c. 2006).

 Tom> Right; I haven't updated it because that's specifically called out to
 Tom> be the minimum supported version in our docs (cf installation.sgml).

Yeah; I checked the other constructs I used for what version they were
added in, but that one slipped through. Fix coming shortly.

-- 
Andrew (irc:RhodiumToad)



Re: test_pg_dump missing cleanup actions

2018-09-06 Thread Michael Paquier
Hi Stephen,

On Tue, Sep 04, 2018 at 04:14:15PM -0700, Michael Paquier wrote:
> On Tue, Sep 04, 2018 at 06:02:51PM -0400, Stephen Frost wrote:
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>>> I'm confused.  Isn't the point of that script exactly to create a modified
>>> extension for testing pg_dump with?
> 
> Not really, the regression tests run for pg_regress and the TAP test
> suite are two completely isolated things and share no dependencies.
> e54f757 has actually changed test_pg_dump.sql so as it adds regression
> tests for pg_init_privs for ALTER EXTENSION ADD/DROP, and visibly those
> have been added to test_pg_dump because they were easier to add there,
> and this has no interactions with pg_dump.  What I think should have
> been done initially is to add those new tests in test_extensions
> instead.

I am able to come back to this thread, and I still don't grep from where
sql/test_pg_dump.sql is called.  Stephen, if the test suite is aiming at
tracking that pg_init_privs is correctly set up with ALTER EXTENSION
ADD/DROP, shouldn't it also query the catalog to make sure that the
correct entries are added and removed after running the so-said command?
At least that's one way I could see to perform the necessary sanity
checks without running directly pg_dump.  Perhaps I am missing
something?

>>> What I'd do is leave the final state as-is and add a "drop role if exists"
>>> at the start, similar to what some of the core regression tests do.
>> 
>> I've not followed this thread but based on Tom's response, I agree with
>> his suggestion of what to do here.
> 
> Not as far as I can see..  Please note that using installcheck on the
> main regression test suite does not leave around any extra roles.  I can
> understand the role of having a DROP ROLE IF EXISTS though: if you get a
> crash while testing, then the beginning of the tests are repeatable, so
> independently of the root issue Tom's suggestion makes sense to me.

Attached is a patch with more comments about the intents of the test
suite, and the separate issue pointed out by Tom fixed.  It seems to me
that actually checking the contents of pg_init_privs would improve the
reason why the test exists..  I would welcome input about this last
point.
--
Michael
diff --git a/src/test/modules/test_pg_dump/expected/test_pg_dump.out b/src/test/modules/test_pg_dump/expected/test_pg_dump.out
index c9bc6f7625..ad730f18fd 100644
--- a/src/test/modules/test_pg_dump/expected/test_pg_dump.out
+++ b/src/test/modules/test_pg_dump/expected/test_pg_dump.out
@@ -1,3 +1,11 @@
+--
+-- Set of regression tests to test that ALTER EXTENSION ADD/DROP
+-- handles correctly pg_init_privs.
+--
+-- Clean up in case a prior regression run failed
+SET client_min_messages TO 'warning';
+DROP ROLE IF EXISTS regress_dump_test_role;
+RESET client_min_messages;
 CREATE ROLE regress_dump_test_role;
 CREATE EXTENSION test_pg_dump;
 ALTER EXTENSION test_pg_dump ADD DATABASE postgres; -- error
@@ -92,3 +100,12 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0;
 ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1;
 ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1;
 ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1;
+-- Clean up
+DROP EXTENSION test_pg_dump;
+DROP SCHEMA test_pg_dump_s1;
+DROP MATERIALIZED VIEW test_pg_dump_mv1;
+DROP FOREIGN TABLE ft1;
+DROP SERVER s0;
+DROP TYPE test_pg_dump_e1;
+DROP FUNCTION test_pg_dump(integer);
+DROP ROLE regress_dump_test_role;
diff --git a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
index e463dec404..1becc2b609 100644
--- a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
+++ b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql
@@ -1,3 +1,13 @@
+--
+-- Set of regression tests to test that ALTER EXTENSION ADD/DROP
+-- handles correctly pg_init_privs.
+--
+
+-- Clean up in case a prior regression run failed
+SET client_min_messages TO 'warning';
+DROP ROLE IF EXISTS regress_dump_test_role;
+RESET client_min_messages;
+
 CREATE ROLE regress_dump_test_role;
 CREATE EXTENSION test_pg_dump;
 
@@ -105,3 +115,13 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0;
 ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1;
 ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1;
 ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1;
+
+-- Clean up
+DROP EXTENSION test_pg_dump;
+DROP SCHEMA test_pg_dump_s1;
+DROP MATERIALIZED VIEW test_pg_dump_mv1;
+DROP FOREIGN TABLE ft1;
+DROP SERVER s0;
+DROP TYPE test_pg_dump_e1;
+DROP FUNCTION test_pg_dump(integer);
+DROP ROLE regress_dump_test_role;


signature.asc
Description: PGP signature


Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-06 Thread Tom Lane
I wrote:
> Chapman Flack  writes:
>> Another alternative might be to have SPI_connect save them and
>> SPI_finish put them back, which leaves you just responsible for
>> reasoning about your own code. You'd still be expected to save them
>> across your own uses of other SPI calls, but no longer exposed to
>> spooky action at a distance from nested uses of SPI in stuff you call.

> Hmm.  I'd thought about that briefly and concluded that it didn't offer
> a full fix, but on reflection it's not clear why it'd be any less of
> a fix than the macroized-SPI_tuptable approach.  You end up with
> per-SPI-stack-level storage either way, and while that isn't perfect
> it does go a long way, as you say.  More, this has the huge advantage
> of being back-patchable, because there'd be no API/ABI change.

Here's a proposed patch along that line.  I included SPI_result (SPI's
equivalent of errno) in the set of saved-and-restored variables,
so that all the exposed SPI globals behave the same.

Also, in further service of providing insulation between SPI nesting
levels, I thought it'd be a good idea for SPI_connect() to reset the
globals to zero/NULL after saving them.  This ensures that a nesting
level can't accidentally be affected by the state of an outer level.
It's barely possible that there's somebody out there who's *intentionally*
accessing state from a calling SPI level, but that seems like it'd be
pretty dangerous programming practice.  Still, maybe there's an argument
for omitting that hunk in released branches.

Comments?

regards, tom lane

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 5756365..11ca800 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -36,10 +36,16 @@
 #include "utils/typcache.h"
 
 
+/*
+ * These global variables are part of the API for various SPI functions
+ * (a horrible API choice, but it's too late now).  To reduce the risk of
+ * interference between different SPI callers, we save and restore them
+ * when entering/exiting a SPI nesting level.
+ */
 uint64		SPI_processed = 0;
 Oid			SPI_lastoid = InvalidOid;
 SPITupleTable *SPI_tuptable = NULL;
-int			SPI_result;
+int			SPI_result = 0;
 
 static _SPI_connection *_SPI_stack = NULL;
 static _SPI_connection *_SPI_current = NULL;
@@ -132,6 +138,10 @@ SPI_connect_ext(int options)
 	_SPI_current->queryEnv = NULL;
 	_SPI_current->atomic = (options & SPI_OPT_NONATOMIC ? false : true);
 	_SPI_current->internal_xact = false;
+	_SPI_current->outer_processed = SPI_processed;
+	_SPI_current->outer_lastoid = SPI_lastoid;
+	_SPI_current->outer_tuptable = SPI_tuptable;
+	_SPI_current->outer_result = SPI_result;
 
 	/*
 	 * Create memory contexts for this procedure
@@ -154,6 +164,15 @@ SPI_connect_ext(int options)
 	/* ... and switch to procedure's context */
 	_SPI_current->savedcxt = MemoryContextSwitchTo(_SPI_current->procCxt);
 
+	/*
+	 * Reset API global variables so that current caller cannot accidentally
+	 * depend on state of an outer caller.
+	 */
+	SPI_processed = 0;
+	SPI_lastoid = InvalidOid;
+	SPI_tuptable = NULL;
+	SPI_result = 0;
+
 	return SPI_OK_CONNECT;
 }
 
@@ -176,12 +195,13 @@ SPI_finish(void)
 	_SPI_current->procCxt = NULL;
 
 	/*
-	 * Reset result variables, especially SPI_tuptable which is probably
+	 * Restore outer API variables, especially SPI_tuptable which is probably
 	 * pointing at a just-deleted tuptable
 	 */
-	SPI_processed = 0;
-	SPI_lastoid = InvalidOid;
-	SPI_tuptable = NULL;
+	SPI_processed = _SPI_current->outer_processed;
+	SPI_lastoid = _SPI_current->outer_lastoid;
+	SPI_tuptable = _SPI_current->outer_tuptable;
+	SPI_result = _SPI_current->outer_result;
 
 	/* Exit stack level */
 	_SPI_connected--;
@@ -274,9 +294,11 @@ SPICleanup(void)
 {
 	_SPI_current = NULL;
 	_SPI_connected = -1;
+	/* Reset API global variables, too */
 	SPI_processed = 0;
 	SPI_lastoid = InvalidOid;
 	SPI_tuptable = NULL;
+	SPI_result = 0;
 }
 
 /*
@@ -336,18 +358,20 @@ AtEOSubXact_SPI(bool isCommit, SubTransactionId mySubid)
 		}
 
 		/*
-		 * Pop the stack entry and reset global variables.  Unlike
+		 * Restore outer global variables and pop the stack entry.  Unlike
 		 * SPI_finish(), we don't risk switching to memory contexts that might
 		 * be already gone.
 		 */
+		SPI_processed = connection->outer_processed;
+		SPI_lastoid = connection->outer_lastoid;
+		SPI_tuptable = connection->outer_tuptable;
+		SPI_result = connection->outer_result;
+
 		_SPI_connected--;
 		if (_SPI_connected < 0)
 			_SPI_current = NULL;
 		else
 			_SPI_current = &(_SPI_stack[_SPI_connected]);
-		SPI_processed = 0;
-		SPI_lastoid = InvalidOid;
-		SPI_tuptable = NULL;
 	}
 
 	if (found && isCommit)
diff --git a/src/include/executor/spi_priv.h b/src/include/executor/spi_priv.h
index 401fd99..0da3a41 100644
--- a/src/include/executor/spi_priv.h
+++ b/src/include/executor/spi_priv.h
@@ -42,6 +42,12 @@ typedef struct
  * transactions */
 	bool		internal_xact;	/* SPI-managed t

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2018-09-06 Thread James Coleman
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

A fairly common planning problem for us is what we call "most recent first" 
queries; i.e., "the 50 most recent  rows for a ".

Here's a basic setup:

-- created_at has very high cardinality
create table foo(pk serial primary key, owner_fk integer, created_at timestamp);
create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at);

-- technically this data guarantees unique created_at values,
-- but there's no reason it couldn't be modified to have a few
-- random non-unique values to prove the point
insert into foo(owner_fk, created_at)
  select i % 100, now() - (i::text || ' minutes')::interval
  from generate_series(1, 100) t(i);


And here's the naive query to get the results we want:

select *
from foo
where owner_fk = 23
-- pk is only here to disambiguate/guarantee a stable sort
-- in the rare case that there are collisions in the other
-- sort field(s)
order by created_at desc, pk desc
limit 50;


On stock Postgres this ends up being pretty terrible for cases where the fk 
filter represents a large number of rows, because the planner generates a sort 
node under the limit node and therefore fetches all matches, sorts them, and 
then applies the limit. Here's the plan:

 Limit  (cost=61386.12..61391.95 rows=50 width=16) (actual 
time=187.814..191.653 rows=50 loops=1)
   ->  Gather Merge  (cost=61386.12..70979.59 rows=82224 width=16) (actual 
time=187.813..191.647 rows=50 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort  (cost=60386.10..60488.88 rows=41112 width=16) (actual 
time=185.639..185.642 rows=42 loops=3)
   Sort Key: created_at DESC, pk DESC
   Sort Method: top-N heapsort  Memory: 27kB
   Worker 0:  Sort Method: top-N heapsort  Memory: 27kB
   Worker 1:  Sort Method: top-N heapsort  Memory: 27kB
   ->  Parallel Bitmap Heap Scan on foo  (cost=3345.24..59020.38 
rows=41112 width=16) (actual time=25.150..181.804 rows=3 loops=3)
 Recheck Cond: (owner_fk = 23)
 Heap Blocks: exact=18014
 ->  Bitmap Index Scan on idx_foo_on_owner_and_created_at  
(cost=0.00..3320.57 rows=98668 width=0) (actual time=16.992..16.992 rows=10 
loops=1)
   Index Cond: (owner_fk = 23)
 Planning Time: 0.384 ms
 Execution Time: 191.704 ms


I have a recursive CTE that implements the algorithm:
- Find first n+1 results
- If result at n+1’s created_at value differs from the n’s value, return first 
n values.
- If those equal, gather more results until a new created_at value is 
encountered.
- Sort all results by created_at and a tie-breaker (e.g., pk) and return the 
first n values.
But nobody wants to use/write that normally (it's quite complex).

This patch solves the problem presented; here's the plan:

 Limit  (cost=2.70..2.76 rows=50 width=16) (actual time=0.233..0.367 rows=50 
loops=1)
   ->  Incremental Sort  (cost=2.70..111.72 rows=98668 width=16) (actual 
time=0.232..0.362 rows=50 loops=1)
 Sort Key: created_at DESC, pk DESC
 Presorted Key: created_at
 Sort Method: quicksort  Memory: 26kB
 Sort Groups: 2
 ->  Index Scan Backward using idx_foo_on_owner_and_created_at on foo  
(cost=0.56..210640.79 rows=98668 width=16) (actual time=0.054..0.299 rows=65 
loops=1)
   Index Cond: (owner_fk = 23)
 Planning Time: 0.428 ms
 Execution Time: 0.393 ms


While check world fails, the only failure appears to be a plan output change in 
test/isolation/expected/drop-index-concurrently-1.out that just needs to be 
updated (incremental sort is now used in this plan); I don't see any 
functionality breakage.

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2018-09-06 Thread Tom Lane
James Coleman  writes:
> A fairly common planning problem for us is what we call "most recent first" 
> queries; i.e., "the 50 most recent  rows for a ".

> Here's a basic setup:

> -- created_at has very high cardinality
> create table foo(pk serial primary key, owner_fk integer, created_at 
> timestamp);
> create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at);

> -- technically this data guarantees unique created_at values,
> -- but there's no reason it couldn't be modified to have a few
> -- random non-unique values to prove the point
> insert into foo(owner_fk, created_at)
>   select i % 100, now() - (i::text || ' minutes')::interval
>   from generate_series(1, 100) t(i);

> And here's the naive query to get the results we want:

> select *
> from foo
> where owner_fk = 23
> -- pk is only here to disambiguate/guarantee a stable sort
> -- in the rare case that there are collisions in the other
> -- sort field(s)
> order by created_at desc, pk desc
> limit 50;

If you're concerned about the performance of this case, why don't you make
an index that actually matches the query?

regression=# create index on foo (owner_fk, created_at, pk); 
CREATE INDEX
regression=# explain analyze select * from foo where owner_fk = 23 order by 
created_at desc, pk desc limit 50;
   
QUERY PLAN  
  
-
 Limit  (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280 rows=50 
loops=1)
   ->  Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on foo  
(cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255 rows=50 
loops=1)
 Index Cond: (owner_fk = 23)
 Heap Fetches: 50
 Planning Time: 0.290 ms
 Execution Time: 0.361 ms
(6 rows)

There may be use-cases for Alexander's patch, but I don't find this
one to be terribly convincing.

regards, tom lane



Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2018-09-06 Thread James Coleman
I disagree because it's not ideal to basically have to append pk to every
index in the system just to get the ability to tie-break when there's
actually very low likelihood of ties anyway.

A similar use case is trying to batch through a result set, in which case
you need a stable sort as well.

The benefit is retaining the generality of indexes (and saving space in
them etc.) while still allowing using them for more specific sorts. Any
time you paginate or batch this way you benefit from this, which in many
applications applies to a very high percentage of indexes.

On Thu, Sep 6, 2018 at 10:39 AM Tom Lane  wrote:

> James Coleman  writes:
> > A fairly common planning problem for us is what we call "most recent
> first" queries; i.e., "the 50 most recent  rows for a ".
>
> > Here's a basic setup:
>
> > -- created_at has very high cardinality
> > create table foo(pk serial primary key, owner_fk integer, created_at
> timestamp);
> > create index idx_foo_on_owner_and_created_at on foo(owner_fk,
> created_at);
>
> > -- technically this data guarantees unique created_at values,
> > -- but there's no reason it couldn't be modified to have a few
> > -- random non-unique values to prove the point
> > insert into foo(owner_fk, created_at)
> >   select i % 100, now() - (i::text || ' minutes')::interval
> >   from generate_series(1, 100) t(i);
>
> > And here's the naive query to get the results we want:
>
> > select *
> > from foo
> > where owner_fk = 23
> > -- pk is only here to disambiguate/guarantee a stable sort
> > -- in the rare case that there are collisions in the other
> > -- sort field(s)
> > order by created_at desc, pk desc
> > limit 50;
>
> If you're concerned about the performance of this case, why don't you make
> an index that actually matches the query?
>
> regression=# create index on foo (owner_fk, created_at, pk);
> CREATE INDEX
> regression=# explain analyze select * from foo where owner_fk = 23 order
> by created_at desc, pk desc limit 50;
>
>  QUERY PLAN
>
>
> -
>  Limit  (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280
> rows=50 loops=1)
>->  Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on
> foo  (cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255
> rows=50 loops=1)
>  Index Cond: (owner_fk = 23)
>  Heap Fetches: 50
>  Planning Time: 0.290 ms
>  Execution Time: 0.361 ms
> (6 rows)
>
> There may be use-cases for Alexander's patch, but I don't find this
> one to be terribly convincing.
>
> regards, tom lane
>


Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-06 Thread Michael Paquier
On Wed, Sep 05, 2018 at 12:14:41AM -0700, Jimmy Yih wrote:
> Attached the isolation spec file.  I originally was only going to fix the
> simple CREATE TYPE scenario but decided to look up other objects that can
> reside in namespaces and ended up finding a handful of others.  I tested
> each one manually before and after adding the AccessShareLock acquire on
> the schema.

(You should avoid top-posting, this is not the style of the lists.)

Thanks.  One problem I have with what you have here is that you just
test only locking path as the session dropping the session would just
block on the first concurrent object it finds.  If you don't mind I am
just stealing it, and extending it a bit ;)
--
Michael


signature.asc
Description: PGP signature


Re: Collation versioning

2018-09-06 Thread Peter Eisentraut
On 05/09/2018 23:18, Thomas Munro wrote:
> On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg  wrote:
>>> So, it's not ideal but perhaps worth considering on the grounds that
>>> it's better than nothing?
>>
>> Ack.
> 
> Ok, here's a little patch like that.
> 
> postgres=# select collname, collcollate, collversion from pg_collation
> where collname = 'en_NZ';
>  collname | collcollate | collversion
> --+-+-
>  en_NZ| en_NZ.utf8  | 2.24
> (1 row)

But wouldn't that also have the effect that glibc updates that don't
change anything about the locales would trigger the version
incompatibility warning?

Also, note that this mechanism only applies to collation objects, not to
database-global locales.  So most users wouldn't be helped by this approach.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Allow parallelism for deferrable serializable txns

2018-09-06 Thread Andres Freund
Hi,

The code currently says:

 * We can't use parallelism in serializable mode because the predicate
 * locking code is not parallel-aware.  It's not catastrophic if someone
 * tries to run a parallel plan in serializable mode; it just won't get
 * any workers and will run serially.  But it seems like a good 
heuristic
 * to assume that the same serialization level will be in effect at plan
 * time and execution time, so don't generate a parallel plan if we're 
in
 * serializable mode.
 */
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
parse->commandType == CMD_SELECT &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker() &&
!IsolationIsSerializable())
{
/* all the cheap tests pass, so scan the query tree */
glob->maxParallelHazard = max_parallel_hazard(parse);
glob->parallelModeOK = (glob->maxParallelHazard != 
PROPARALLEL_UNSAFE);
}

afaict for deferrable READ ONLY DEFERRABLE transactions we could
trivially allow parallelism?  Am I missing something?

Greetings,

Andres Freund



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-06 Thread Michael Paquier
On Wed, Sep 05, 2018 at 09:23:37AM -0700, Andres Freund wrote:
> Well, we kinda do, during some of their own DDL. CF
> AcquireDeletionLock(), RangeVarGetAndCheckCreationNamespace(), and other
> LockDatabaseObject() callers.  The
> RangeVarGetAndCheckCreationNamespace() even locks the schema an object
> is created in , which is pretty much what we're discussing here.
> 
> I think the problem with the current logic is more that the
> findDependentObjects() scan doesn't use a "dirty" scan, so it doesn't
> ever get to seeing conflicting operations.

Well, it seems to me that we don't necessarily want to go down to that
for sure on back-branches.  What's actually striking me as a very bad
thing is the inconsistent behavior when we need to get a namespace OID
after calling QualifiedNameGetCreationNamespace using a list of defnames
which does not lock the schema the DDL is working on.  And this happens
for basically all the object types Jimmy has mentioned.

For example, when creating a composite type, the namespace lock is taken
through RangeVarGetAndCheckCreationNamespace.  If a user tries to create
a root type, then we simply don't lock it, which leads to an
inconsistency of behavior between composite types and root types.  In my
opinion the inconsistency of behavior for the same DDL is not logic.

So I have been looking at that, and finished with the attached, which
actually fixes the set of problems reported.  Thoughts?
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 5d13e6a3d7..49bd61aff6 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3011,6 +3011,14 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p)
 	 errmsg("no schema has been selected to create in")));
 	}
 
+	Assert(OidIsValid(namespaceId));
+
+	/*
+	 * Lock the creation namespace to protect against concurrent namespace
+	 * drop.
+	 */
+	LockDatabaseObject(NamespaceRelationId, namespaceId, 0, AccessShareLock);
+
 	return namespaceId;
 }
 
diff --git a/src/test/isolation/expected/schema-drop.out b/src/test/isolation/expected/schema-drop.out
new file mode 100644
index 00..5ea14abb0b
--- /dev/null
+++ b/src/test/isolation/expected/schema-drop.out
@@ -0,0 +1,43 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_create_type s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_type: CREATE TYPE testschema.testtype;
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
+
+starting permutation: s1_begin s1_create_agg s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_agg: CREATE AGGREGATE testschema.a1(int4) (SFUNC=int4pl, STYPE=int4);
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
+
+starting permutation: s1_begin s1_create_func s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_func: CREATE FUNCTION testschema.f1() RETURNS bool LANGUAGE SQL AS 'SELECT true';
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
+
+starting permutation: s1_begin s1_create_op s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_op: CREATE OPERATOR testschema.@+@ (LEFTARG=int4, RIGHTARG=int4, PROCEDURE=int4pl);
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
+
+starting permutation: s1_begin s1_create_opfamily s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_opfamily: CREATE OPERATOR FAMILY testschema.opfam1 USING btree;
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
+
+starting permutation: s1_begin s1_create_opclass s2_drop_schema s1_commit
+step s1_begin: BEGIN;
+step s1_create_opclass: CREATE OPERATOR CLASS testschema.opclass1 FOR TYPE uuid USING hash AS STORAGE uuid;
+step s2_drop_schema: DROP SCHEMA testschema CASCADE; 
+step s1_commit: COMMIT;
+step s2_drop_schema: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index c23b401225..12b7a96d7e 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -78,3 +78,4 @@ test: partition-key-update-3
 test: partition-key-update-4
 test: plpgsql-toast
 test: truncate-conflict
+test: schema-drop
diff --git a/src/test/isolation/specs/schema-drop.spec b/src/test/isolation/specs/schema-drop.spec
new file mode 100644
index 00..d40703d3a6
--- /dev/null
+++ b/src/test/isolation/specs/schema-drop.spec
@@ -0,0 +1,32 @@
+# Tests for schema drop with concurrently-created objects
+#
+# When an empty namespace is being initially populated with the below
+# objects, it is possible to DROP SCHEMA without a CASCADE before the
+# objects are committed.  DROP SCHEMA should wait for the transaction
+# creating t

Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-06 Thread Fabrízio de Royes Mello
On Thu, Sep 6, 2018 at 4:22 PM Michael Paquier  wrote:
>
> On Wed, Sep 05, 2018 at 09:23:37AM -0700, Andres Freund wrote:
> > Well, we kinda do, during some of their own DDL. CF
> > AcquireDeletionLock(), RangeVarGetAndCheckCreationNamespace(), and other
> > LockDatabaseObject() callers.  The
> > RangeVarGetAndCheckCreationNamespace() even locks the schema an object
> > is created in , which is pretty much what we're discussing here.
> >
> > I think the problem with the current logic is more that the
> > findDependentObjects() scan doesn't use a "dirty" scan, so it doesn't
> > ever get to seeing conflicting operations.
>
> Well, it seems to me that we don't necessarily want to go down to that
> for sure on back-branches.  What's actually striking me as a very bad
> thing is the inconsistent behavior when we need to get a namespace OID
> after calling QualifiedNameGetCreationNamespace using a list of defnames
> which does not lock the schema the DDL is working on.  And this happens
> for basically all the object types Jimmy has mentioned.
>
> For example, when creating a composite type, the namespace lock is taken
> through RangeVarGetAndCheckCreationNamespace.  If a user tries to create
> a root type, then we simply don't lock it, which leads to an
> inconsistency of behavior between composite types and root types.  In my
> opinion the inconsistency of behavior for the same DDL is not logic.
>
> So I have been looking at that, and finished with the attached, which
> actually fixes the set of problems reported.  Thoughts?

Hi,

I applied to current master and run "make check-world" and everything is
ok...

I also run some similar tests as Jimmy pointed and using PLpgSQL to execute
DDLs and the new consistent behavior is ok. Also I run one session using
DROP SCHEMA at end and after COMMIT the session 2 report 'ERROR:  schema
"testschema" does not exist', so avoiding concerns about lock overhead
seems the proposed patch is ok.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-06 Thread Peter Eisentraut
On 06/09/2018 18:25, Tom Lane wrote:
> Here's a proposed patch along that line.  I included SPI_result (SPI's
> equivalent of errno) in the set of saved-and-restored variables,
> so that all the exposed SPI globals behave the same.
> 
> Also, in further service of providing insulation between SPI nesting
> levels, I thought it'd be a good idea for SPI_connect() to reset the
> globals to zero/NULL after saving them.  This ensures that a nesting
> level can't accidentally be affected by the state of an outer level.
> It's barely possible that there's somebody out there who's *intentionally*
> accessing state from a calling SPI level, but that seems like it'd be
> pretty dangerous programming practice.  Still, maybe there's an argument
> for omitting that hunk in released branches.

These look like good changes.  I'm not sure about backpatching them, but
as you say the chances that someone wants to do this intentionally are low.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Restricting maximum keep segments by repslots

2018-09-06 Thread Peter Eisentraut
This documentation

+   
+Specify the maximum size of WAL files
+that replication
+slots are allowed to retain in the
pg_wal
+directory at checkpoint time.
+If max_slot_wal_keep_size is zero (the default),
+replication slots retain unlimited size of WAL files.
+   

doesn't say anything about what happens when the limit is exceeded.
Does the system halt until the WAL is fetched from the slots?  Do the
slots get invalidated?

Also, I don't think 0 is a good value for the default behavior.  0 would
mean that a slot is not allowed to retain any more WAL than already
exists anyway.  Maybe we don't want to support that directly, but it's a
valid configuration.  So maybe use -1 for infinity.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2018-09-06 Thread Tomas Vondra
On 09/06/2018 08:04 PM, James Coleman wrote:
> I disagree because it's not ideal to basically have to append pk to
> every index in the system just to get the ability to tie-break when
> there's actually very low likelihood of ties anyway.
> 
> A similar use case is trying to batch through a result set, in which
> case you need a stable sort as well.
> 
> The benefit is retaining the generality of indexes (and saving space in
> them etc.) while still allowing using them for more specific sorts. Any
> time you paginate or batch this way you benefit from this, which in many
> applications applies to a very high percentage of indexes.
> 

I 100% with this.

I see incremental sort as a way to run queries with fewer indexes that
are less query-specific, while still benefiting from them. Which means
lower overhead when writing data, lower disk space usage, and so on.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-06 Thread Tom Lane
Peter Eisentraut  writes:
> On 06/09/2018 18:25, Tom Lane wrote:
>> Also, in further service of providing insulation between SPI nesting
>> levels, I thought it'd be a good idea for SPI_connect() to reset the
>> globals to zero/NULL after saving them.  This ensures that a nesting
>> level can't accidentally be affected by the state of an outer level.
>> It's barely possible that there's somebody out there who's *intentionally*
>> accessing state from a calling SPI level, but that seems like it'd be
>> pretty dangerous programming practice.  Still, maybe there's an argument
>> for omitting that hunk in released branches.

> These look like good changes.  I'm not sure about backpatching them, but
> as you say the chances that someone wants to do this intentionally are low.

Yeah.  I'd put a higher probability on the idea that this will fix
somebody's latent bug in code that's never been tested inside an
outer level of SPI call.  It would, for example, work to rely on
SPI_tuptable being initially NULL, as long as you hadn't tried it
inside an outer call.

regards, tom lane



Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-06 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Also, in further service of providing insulation between SPI
 Tom> nesting levels, I thought it'd be a good idea for SPI_connect() to
 Tom> reset the globals to zero/NULL after saving them. This ensures
 Tom> that a nesting level can't accidentally be affected by the state
 Tom> of an outer level. It's barely possible that there's somebody out
 Tom> there who's *intentionally* accessing state from a calling SPI
 Tom> level, but that seems like it'd be pretty dangerous programming
 Tom> practice. Still, maybe there's an argument for omitting that hunk
 Tom> in released branches.

 Tom> Comments?

+1 to backpatching it all, including the initialization in SPI_connect.

-- 
Andrew (irc:RhodiumToad)



Commit fest 2018-09

2018-09-06 Thread Michael Paquier
Hi all,

We are already in September, hence it is time to move on with the 2nd
commit fest for v12.  As usual, there are many patches waiting for
review and integration:
https://commitfest.postgresql.org/19/
With a couple of days of delay, I have switched the CF app as
in-progress on Tuesday AOE.  Please note that I am fine to act as CFM
and help with the patch classification.  If somebody wishes to act as
such and is familiar with the community process, feel free to reply to
this thread to manifest yourself.

If you are a patch author, please note that the commit fest relies on
the balance between patches submitted and patch reviews done.  If you
are not reviewing a patch for one patch submitted, please make sure to
look at something with equal difficulty.  Doing more reviews than patch
authoring is not that bad either, quite the contrary :)

If you are a patch reviewer and is registered as such, it is equally
important to make progress so as something can get integrated in the
tree.  Please make sure to discuss things at the extent of what you
know, and do not hesitate to discover new areas of the code and
knowledge around it.  There are many patches around waiting for your
lookup.

If you are a committer, there are currently 11 patches waiting for
a commit.  Hence if you are registered already let's move on with the
patch integration.  There are also a couple of patches which have not
been looked at.

Thanks, and happy hacking.
--
Michael


signature.asc
Description: PGP signature


Re: Commit fest 2018-09

2018-09-06 Thread Dmitry Dolgov
> On Thu, 6 Sep 2018 at 22:50, Michael Paquier  wrote:
>
> If somebody wishes to act as such and is familiar with the community process,
> feel free to reply to this thread to manifest yourself.

I always wondered what would be the reaction on a first-time CFM? If it's ok I
would love to try out, but for the next commitfest (and this time I'll watch
CFM activity closer than usual).



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Tom Lane
I wrote:
> So where are we on this?  Should I proceed with my patch, or are we
> going to do further investigation?  Does anyone want to do an actual
> patch review?

[ crickets... ]

So I took that as license to proceed, but while doing a final round of
testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
because now that's an infinite recursion.  On reflection it's a bit
surprising that it wasn't so all along.  What I'm inclined to do about
it is to adjust AcceptInvalidationMessages so that there's a finite
recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
would be enough.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Andres Freund
On 2018-09-06 17:38:38 -0400, Tom Lane wrote:
> I wrote:
> > So where are we on this?  Should I proceed with my patch, or are we
> > going to do further investigation?  Does anyone want to do an actual
> > patch review?
> 
> [ crickets... ]

Sorry, bit busy with postgres open, and a few people being in town due
to that.

Could you attach the current version of the patch, or were there no
meaningful changes?

> So I took that as license to proceed, but while doing a final round of
> testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
> because now that's an infinite recursion.  On reflection it's a bit
> surprising that it wasn't so all along.  What I'm inclined to do about
> it is to adjust AcceptInvalidationMessages so that there's a finite
> recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
> already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
> would be enough.

Hm, but wouldn't that pretty much mean that the risk exposed in this
thread would still be present?  The reason your approach fixes things is
that if we're processing invalidation event N, which depends on
invalidation changes at N+y being processed, is that the recursion leads
to N+y being processed before the invalidation processing of N finishes,
due to the recursion.  Now you can argue that your restriction would
only apply to CLOBBER_CACHE_RECURSIVELY, and thus not in normal builds,
but that still makes me uncomfortable.

Greetings,

Andres Freund



Re: Commit fest 2018-09

2018-09-06 Thread Michael Paquier
On Thu, Sep 06, 2018 at 11:25:01PM +0200, Dmitry Dolgov wrote:
> I always wondered what would be the reaction on a first-time CFM?

There is a first time for all.
--
Michael


signature.asc
Description: PGP signature


Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-06 Thread Tom Lane
Andres Freund  writes:
> Could you attach the current version of the patch, or were there no
> meaningful changes?

No changes.

>> So I took that as license to proceed, but while doing a final round of
>> testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails,
>> because now that's an infinite recursion.  On reflection it's a bit
>> surprising that it wasn't so all along.  What I'm inclined to do about
>> it is to adjust AcceptInvalidationMessages so that there's a finite
>> recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there
>> already is in the CLOBBER_CACHE_ALWAYS case.  Maybe 3 or so levels
>> would be enough.

> Hm, but wouldn't that pretty much mean that the risk exposed in this
> thread would still be present?  The reason your approach fixes things is
> that if we're processing invalidation event N, which depends on
> invalidation changes at N+y being processed, is that the recursion leads
> to N+y being processed before the invalidation processing of N finishes,
> due to the recursion.

Right, which makes the maximum recursion depth equal to basically whatever
you think the "backlog" of relevant catalog inval messages could be.  It's
finite, because we do after all have lock on one more catalog at each
level, and eventually we'd hold lock on every system catalog or index
that could be referenced inside AcceptInvalidationMessages.  In practice
I doubt the worst-case nesting level is more than two or three, but
it's not clear how to establish just what it is.

CLOBBER_CACHE_RECURSIVELY breaks this because it recurses whether or not
any relevant inval activity occurred.  I think that's good for verifying
that recursion of AcceptInvalidationMessages works at all, but it's
not relevant to whether infinite recursion could happen without that.

regards, tom lane



Re: Problem while setting the fpw with SIGHUP

2018-09-06 Thread Michael Paquier
On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote:
> Thanks for prompting. The difference is in a comment and I'm fine
> with the change.

/*
 * Properly accept or ignore signals the postmaster might send us.
 */
-   pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
+   pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */

I am finally coming back to this patch set, and that's one of the first
things I am going to help moving on for this CF.  And this bit from the
last patch series is not acceptable as there are some parameters which
are used by the startup process which can be reloaded.  One of them is
wal_retrieve_retry_interval for tuning when fetching WAL at recovery.
--
Michael


signature.asc
Description: PGP signature


Re: Collation versioning

2018-09-06 Thread Thomas Munro
On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut
 wrote:
> On 05/09/2018 23:18, Thomas Munro wrote:
> > On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg  wrote:
> >>> So, it's not ideal but perhaps worth considering on the grounds that
> >>> it's better than nothing?
> >>
> >> Ack.
> >
> > Ok, here's a little patch like that.
> >
> > postgres=# select collname, collcollate, collversion from pg_collation
> > where collname = 'en_NZ';
> >  collname | collcollate | collversion
> > --+-+-
> >  en_NZ| en_NZ.utf8  | 2.24
> > (1 row)
>
> But wouldn't that also have the effect that glibc updates that don't
> change anything about the locales would trigger the version
> incompatibility warning?

Right.  And likewise, a glibc update that does change some locales but
not the locales that you are actually using will trigger false alarm
warnings. The same goes for the ICU provider, which appears to return
the same collversion for every collation, even though presumably some
don't change from one ICU version to the next.

I wonder if someone here knows how many "locales" packages have been
released over the lifetime of (say) the current Debian stable distro,
whether any LC_COLLATE files changed over those releases, and whether
libc6 had the same MAJOR.MINOR for the whole lifetime.  That is, even
though they might have been through 2.19-17+blah, 2.19-18+blah, ...
did they all report "2.19" and were the collations actually stable?
If that's the case, I think it'd be quite good: we'd only raise the
alarm after a big dist-upgrade Debian 8->9, or when doing streaming
replication from a Debian 8 box to a Debian 9 box.

> Also, note that this mechanism only applies to collation objects, not to
> database-global locales.  So most users wouldn't be helped by this approach.

Yeah, right, that would have to work for this to be useful.  I will
look into that.

--
Thomas Munro
http://www.enterprisedb.com



Re: pgsql: Refactor dlopen() support

2018-09-06 Thread Tom Lane
Peter Eisentraut  writes:
> Refactor dlopen() support

Buildfarm member locust doesn't like this much.  I've been able to
reproduce the problem on an old Mac laptop running the same macOS release,
viz 10.5.8.  (Note that we're not seeing it on earlier or later releases,
which is odd in itself.)  According to my machine, the crash is happening
here:

#0  _PG_init () at plpy_main.c:98
98  *plpython_version_bitmask_ptr |= (1 << PY_MAJOR_VERSION);

and the reason is that the rendezvous variable sometimes contains garbage.
Most sessions correctly see it as initially zero, but sometimes it
contains

(gdb) p plpython_version_bitmask_ptr
$1 = (int *) 0x1d

and I've also seen

(gdb) p plpython_version_bitmask_ptr
$1 = (int *) 0x7f7f7f7f

It's mostly repeatable but not completely so: the 0x1d case seems
to come up every time through the plpython_do test, but I don't
always see the 0x7f7f7f7f case.  (Maybe that's a timing artifact?
It takes a variable amount of time to recover from the first crash
in plpython_do, so the rest of the plpython test run isn't exactly
operating in uniform conditions.)

No idea what's going on here, and I'm about out of steam for tonight.

regards, tom lane