Re: [HACKERS] Restricting maximum keep segments by repslots
Thank you for the comment. At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada wrote in > On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI > Thank you for updating! Here is the review comment for v8 patch. > > +/* > + * This slot still has all required segments. Calculate how many > + * LSN bytes the slot has until it loses restart_lsn. > + */ > +fragbytes = wal_segment_size - (currLSN % wal_segment_size); > +XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg, > fragbytes, > +wal_segment_size, *restBytes); > > For the calculation of fragbytes, I think we should calculate the > fragment bytes of restartLSN instead. The the formula "restartSeg + > limitSegs - currSeg" means the # of segment between restartLSN and the > limit by the new parameter. I don't think that the summation of it and > fragment bytes of currenLSN is correct. As the following result > (max_slot_wal_keep_size is 128MB) shows, the remain column shows the > actual remains + 16MB (get_bytes function returns the value of > max_slot_wal_keep_size in bytes). Since a oldest segment is removed after the current LSN moves to the next segmen, current LSN naturally determines the fragment bytes. Maybe you're concerning that the number of segments looks too much by one segment. One arguable point of the feature is how max_slot_wal_keep_size works exactly. I assume that even though the name is named as "max_", we actually expect that "at least that bytes are kept". So, for example, with 16MB of segment size and 50MB of max_s_w_k_s, I designed this so that the size of preserved WAL doesn't go below 50MB, actually (rounding up to multples of 16MB of 50MB), and loses the oldest segment when it reaches 64MB + 16MB = 80MB as you saw. # I believe that the difference is not so significant since we # have around hunderd or several hundreds of segments in common # cases. Do you mean that we should define the GUC parameter literally as "we won't have exactly that many bytes of WAL segmetns"? That is, we have at most 48MB preserved WAL records for 50MB of max_s_w_k_s setting. This is the same to how max_wal_size is counted but I don't think max_slot_wal_keep_size will be regarded in the same way. The another design would be that we remove the oldest segnent when WAL reaches to 64MB and reduces to 48MB after deletion. > --- > For the cosmetic stuff there are code where need the line break. > > static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags); > +static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr > minSlotPtr, XLogRecPtr restartLSN, uint64 *restBytes); > static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo); > static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void); > > and > > +static XLogSegNo > +GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN, > XLogRecPtr restartLSN, uint64 *restBytes) > +{ Thanks, I folded the parameter list in my working repository. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A strange GiST error message or fillfactor of GiST build
Hello. > Just my 2 cents: that was a hacky use case for development reasons. I know. So I intended to preserve the parameter with default of 100% if no one strongly objects to preserve. Then I abandoned that since I had:p > I think that removing fillfactor is good idea and your latest patch > looks good from my POV. Thanks. reagrds. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Reopen logfile on SIGHUP
At Sat, 1 Sep 2018 19:52:16 +0300, Alexander Korotkov wrote in > On Thu, Aug 30, 2018 at 2:44 PM Kyotaro HORIGUCHI > wrote: > > At Thu, 30 Aug 2018 13:42:42 +0300, Alexander Korotkov > > wrote in > > > > > It seems that http://commitfest.cputube.org/ runs only "make check" on > > > Windows. But my Postgres Pro colleagues checked that tests passed on > > > 32-bit and 64-bit versions of Windows Server 2008. Also I made some > > > minor beautifications on code and documentation. > > > > > > This patch seems to have good shape and generally being quite > > > harmless. Do we have any objections to committing this? > > > > I checked that on my Win7 box and worked. Of course I have no > > objection. > > So, pushed. Thank you. Thanks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: remove ancient pre-dlopen dynloader code
On 01/09/2018 06:51, Peter Eisentraut wrote: >> How about this: We only have two nonstandard dlopen() implementations >> left: Windows and (old) HP-UX. We move those into src/port/dlopen.c and >> treat it like a regular libpgport member. That gets rid of all those >> duplicative empty per-platform files. > Updated patch. It needed some adjustments for Windows, per Appveyor, I'm going to use this thread for a moment to work out some details with the cfbot. The v2 patch I sent previously was created using git format-patch with default settings. This detected a rename: rename src/{backend/port/dynloader/win32.c => port/dlopen.c} (51%) which is fair enough. However, whatever method the cfbot uses to apply patches fails to handle that. The tree that is sent for testing by Appveyor still contains a full src/backend/port/dynloader/win32.c and no src/port/dlopen.c, which expectedly makes the build fail. (It also still contains src/backend/port/dynloader/otherplatform.c, but empty, so the patching doesn't remove empty files, which is another but minor problem.) The v3 patch attached here was made with git format-patch --no-renames. Let's see how that works out. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 297f4f6ac3ae46b3678554964a5985d71e5908b9 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 6 Sep 2018 10:07:24 +0200 Subject: [PATCH v3] Refactor dlopen() support Nowadays, all platforms except Windows and older HP-UX have standard dlopen() support. So having a separate implementation per platform under src/backend/port/dynloader/ is a bit excessive. Instead, treat dlopen() like other library functions that happen to be missing sometimes and put a replacement implementation under src/port/. --- configure| 43 +++- configure.in | 8 +- src/backend/Makefile | 2 +- src/backend/port/.gitignore | 1 - src/backend/port/Makefile| 2 +- src/backend/port/dynloader/aix.c | 7 -- src/backend/port/dynloader/aix.h | 39 --- src/backend/port/dynloader/cygwin.c | 3 - src/backend/port/dynloader/cygwin.h | 36 --- src/backend/port/dynloader/darwin.c | 35 --- src/backend/port/dynloader/darwin.h | 8 -- src/backend/port/dynloader/freebsd.c | 7 -- src/backend/port/dynloader/freebsd.h | 38 --- src/backend/port/dynloader/hpux.c| 68 - src/backend/port/dynloader/hpux.h| 25 - src/backend/port/dynloader/linux.c | 7 -- src/backend/port/dynloader/linux.h | 38 --- src/backend/port/dynloader/netbsd.c | 7 -- src/backend/port/dynloader/netbsd.h | 38 --- src/backend/port/dynloader/openbsd.c | 7 -- src/backend/port/dynloader/openbsd.h | 38 --- src/backend/port/dynloader/solaris.c | 7 -- src/backend/port/dynloader/solaris.h | 38 --- src/backend/port/dynloader/win32.c | 85 src/backend/port/dynloader/win32.h | 19 src/backend/postmaster/postmaster.c | 1 - src/backend/utils/fmgr/dfmgr.c | 31 +++--- src/include/.gitignore | 1 - src/include/Makefile | 4 +- src/include/pg_config.h.in | 8 ++ src/include/pg_config.h.win32| 8 ++ src/include/port.h | 23 + src/include/utils/dynamic_loader.h | 25 - src/port/dlopen.c| 145 +++ src/tools/msvc/Install.pm| 5 +- src/tools/msvc/Mkvcbuild.pm | 5 +- src/tools/msvc/Solution.pm | 7 -- src/tools/msvc/clean.bat | 1 - 38 files changed, 251 insertions(+), 619 deletions(-) delete mode 100644 src/backend/port/dynloader/aix.c delete mode 100644 src/backend/port/dynloader/aix.h delete mode 100644 src/backend/port/dynloader/cygwin.c delete mode 100644 src/backend/port/dynloader/cygwin.h delete mode 100644 src/backend/port/dynloader/darwin.c delete mode 100644 src/backend/port/dynloader/darwin.h delete mode 100644 src/backend/port/dynloader/freebsd.c delete mode 100644 src/backend/port/dynloader/freebsd.h delete mode 100644 src/backend/port/dynloader/hpux.c delete mode 100644 src/backend/port/dynloader/hpux.h delete mode 100644 src/backend/port/dynloader/linux.c delete mode 100644 src/backend/port/dynloader/linux.h delete mode 100644 src/backend/port/dynloader/netbsd.c delete mode 100644 src/backend/port/dynloader/netbsd.h delete mode 100644 src/backend/port/dynloader/openbsd.c delete mode 100644 src/backend/port/dynloader/openbsd.h delete mode 100644 src/backend/port/dynloader/solaris.c delete mode 100644 src/backend/port/dynloader/solaris.h delete mode 100644 src/backend/port/dynloader/win32.c delete mode 100644 src/backend/port/dynloader/win32.h delete mode 100644 src/include/utils/dynamic_loader.h create mode 100644 src/port/dlopen.c diff --git
Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
Ok, so here's my current patch (work in progress). I have not run tests yet, and finding a way to add a test case is virtually impossible though I expect we will find ways of using gdb to confirm local fix later. After careful code review, I settled on the following approach which seemed to be the least intrusive. 1. Change the retry logic so that it does not retry of QueryCancelPending or ProcDiePending are set. In these cases EINTR is passed back to the caller 2. Change the error handling logic of the caller so that EINTR is handled by the next CHECK_FOR_INTERRUPTS() after cleanup. This means that the file descriptor is now already closed and that we handle this the same way we would with a ENOSPC. The reason for this is that there are many places in the code where it is not clear what degree of safety is present for throwing errors using ereport, and the patch needs to be as unobtrusive as possible to facilitate back porting. At this point the patch is for discussion. I have not even compiled it or tested it but maybe there is something wrong with my approach so figured I would send it out early. The major assumptions are: 1. close() will never take longer than the interrupt interval that caused the loop to break. 2. close() does not get interrupted in a way that will not cause cleanup problems later. 3. We will reach the next interrupt check at a reasonable interval and safe spot. Any initial arguments against? -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin racecondition.patch Description: Binary data
Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun
On Thu, Sep 6, 2018 at 12:53 AM R, Siva wrote: > On Tue, Sep 5, 2018 at 08:55 PM, Alexander Korotkov > wrote: > > Finally I managed to reproduce the bug. The scenario is following. > > Underlying idea is that when insertion of multiple tuples goes to the > > beginning of the page and this insertion succeed only thanks to > > collapse of some short segments together, then this insertion wouldn't > > fit to the page if given alone. > > Sorry for the late reply. > Thank you so much for working on this and reproducing the issue! > Like you mentioned, the WAL record where we detected this problem > has future segments deleted due to compaction and written out > as an insert segment. > > > alter index test_idx set (fastupdate = on); > Just curious why does this help with the repro? This is related to only > using the Gin pending list vs the posting tree. With (fastupdate = on) GIN performs bulk update of posting lists, inserting multiple tuples at once if possible. With (fastupdate = off) GIN always inserts tuples one-by-one. It might be still possible to reproduce the issue with (fastupdate = off), but it seems even harder. BTW, I've tried the patch you've posted. On my test case it fails with following assertion. TRAP: FailedAssertion("!(a_action == 2)", File: "ginxlog.c", Line: 243) I thought about fixing this issue more, and I decided we can fix it in less invasive way. Once modification is started we can copy tail of the page into separately allocated chunk of memory, and the use it as the source of original segments. See the patch attached. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company ginRedoRecompress-tail-copy-v1.patch Description: Binary data
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
At Sun, 2 Sep 2018 07:04:19 +1200, Thomas Munro wrote in > > > # Is it intentional that the patch doesn't touch pgstat.c? > > > > Yes. pgstat.c still uses WL_POSTMASTER_DEATH because it does > > something special: it calls pgstat_write_statsfiles() before it exits. Mmm. Exactly.. > Rebased. Thank you for the new version. === In sysloger.c, cur_flags is (just set but) no longer used. === In latch.c, - The parentheses around the symbols don't seem to be needed. | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); - The following assertion looks contradicting to the comment. |/* Postmaster-managed callers must handle postmaster death somehow. */ |Assert(!IsUnderPostmaster || | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); (Maybe Assert(IsUnderPost && (wakeEv & (WL_EXI | WL_PO)) != 0);) And don't we need a description about this restriction in the function comment? - I think it may be better that the followings had parentheses around '&' expressions. |if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster) |if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster) === All the caller sites of WaitLatch, WaitLatchOrSocket and WaitEventSetWait are covered by this patch and all them look fine. bgworker.c, pgstat.c, be-secure-openssl.c, be-secure.c: Not modified on purpose. WL_EXIT_POSTMASTER_DEATH is not proper to use there. pgarch.c, syncrep.c, walsender.c: Removed redundant check of postmaster death. syslogger.c: Left as it doesn't exit at postmaster death on purpose. Uses reusable wait event set. walrceiver.c: Adds new bailing out point in WalRcvWaitForStartPosition and it seems reasonable. shm_mq.c: Adds PMdie bail out in shm_mq_send/receive_bytes, wait_internal. It looks fine. postgres_fdw/connection.c: Adds pm_die bailout while getting result. This seems to be a bug fix. I'm fine with it included in this patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [WIP] [B-Tree] Retail IndexTuple deletion
And background worker patch in attachment. 05.09.2018 15:25, Andrey Lepikhov пишет: Hi, I prepared next version of Background worker (cleaner) based on a retail indextuple deletion patch. This version shows stable behavior on regression tests and pgbench workloads. In this version: 1. Only AccessShareLock are acquired on a cleanup of heap and index relations. 2. Some 'aggressive' cleanup strategy introduced - conditional cleanup locks not used. 3. Cleanup only an in-memory blocks. 4. The Cleaner calls heap_page_prune() before cleanup a block. Benchmarks - Two factors were evaluated: performance (tps) and relations blowing. Before each test some rarefaction of pgbench_accounts was modeled by deletion 10% of tuples at each block. Also, I tested normal and Gaussian distribution of queries on pgbench_accounts relation. Autovacuum uses default settings. Script: pgbench -i -s 10 psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);" psql -c $"VACUUM;" psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts USING btree (abalance);" && pgbench -T 3600 -c 32 -j 8 -M prepared -P 600 NORMAL distribution: average tps = 1045 (cleaner); = 1077 (autovacuum) Relations size at the end of test, MB: pgbench_accounts: 128 (cleaner); 128 (autovacuum) pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum) pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum) pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum) pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum) Gaussian distribution: average tps = 213 (cleaner); = 213 (autovacuum) Relations size at the end of test, MB: pgbench_accounts: 128 (cleaner); 128 (autovacuum) pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum) Conclusions --- 1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the case of 100% filling of each relation block we get a blowing HEAP and index , more or less. When the blocks already have free space, the cleaner can delay blowing the heap and index without a vacuum. 2. Cleaner works fine in the case of skewness of access frequency to relation blocks. 3. The cleaner does not cause a decrease of performance. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company >From b092dfd95b8673d692730ac27a1d6fdb76b66601 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 6 Sep 2018 11:05:42 +0300 Subject: [PATCH 5/5] Heap-and-Index-cleaner --- .../postgres_fdw/expected/postgres_fdw.out| 30 +- contrib/postgres_fdw/sql/postgres_fdw.sql |4 +- src/backend/access/heap/pruneheap.c |6 +- src/backend/access/nbtree/nbtree.c| 15 +- src/backend/access/transam/xact.c |4 + src/backend/catalog/system_views.sql | 11 + src/backend/commands/vacuumlazy.c | 44 +- src/backend/postmaster/Makefile |2 +- src/backend/postmaster/bgheap.c | 1990 + src/backend/postmaster/pgstat.c | 36 + src/backend/postmaster/postmaster.c | 160 +- src/backend/storage/buffer/bufmgr.c | 60 +- src/backend/storage/buffer/localbuf.c | 24 + src/backend/storage/ipc/ipci.c|3 + src/backend/storage/lmgr/lwlocknames.txt |1 + src/backend/storage/lmgr/proc.c |5 +- src/backend/tcop/postgres.c | 12 + src/backend/utils/adt/pgstatfuncs.c |2 + src/backend/utils/hash/Makefile |2 +- src/backend/utils/hash/shash.c| 265 +++ src/backend/utils/init/miscinit.c |3 +- src/backend/utils/init/postinit.c | 11 +- src/include/commands/progress.h | 12 + src/include/commands/vacuum.h |3 + src/include/pgstat.h | 14 +- src/include/postmaster/bgheap.h | 36 + src/include/storage/buf_internals.h |1 + src/include/storage/bufmgr.h |5 +- src/include/storage/pmsignal.h|2 + src/include/utils/shash.h | 54 + src/test/regress/expected/rules.out | 18 +- src/test/regress/expected/triggers.out|2 +- src/test/regress/input/constraints.source | 12 +- src/test/regress/output/constraints.source| 34 +- src/test/regress/sql/rules.sql|2 +- src/test/regress/sql/triggers.sql |2 +- 36 files changed, 2803 insertions(+), 84 deletions(-) create mode 100644 src/backend/postmaster/bgheap.c create mode 100644 src/backend/utils/hash/shash.c create mode 100644 src/include/postmaster/bgheap.h create mode 100644 src/include/utils/shash.h diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index f5498c62bd..622bea2a7d 100644 --- a/contrib/pos
Re: [HACKERS] Restricting maximum keep segments by repslots
On Thu, Sep 6, 2018 at 4:10 PM, Kyotaro HORIGUCHI wrote: > Thank you for the comment. > > At Wed, 5 Sep 2018 14:31:10 +0900, Masahiko Sawada > wrote in >> On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI >> Thank you for updating! Here is the review comment for v8 patch. >> >> +/* >> + * This slot still has all required segments. Calculate how many >> + * LSN bytes the slot has until it loses restart_lsn. >> + */ >> +fragbytes = wal_segment_size - (currLSN % wal_segment_size); >> +XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg, >> fragbytes, >> +wal_segment_size, *restBytes); >> >> For the calculation of fragbytes, I think we should calculate the >> fragment bytes of restartLSN instead. The the formula "restartSeg + >> limitSegs - currSeg" means the # of segment between restartLSN and the >> limit by the new parameter. I don't think that the summation of it and >> fragment bytes of currenLSN is correct. As the following result >> (max_slot_wal_keep_size is 128MB) shows, the remain column shows the >> actual remains + 16MB (get_bytes function returns the value of >> max_slot_wal_keep_size in bytes). > > Since a oldest segment is removed after the current LSN moves to > the next segmen, current LSN naturally determines the fragment > bytes. Maybe you're concerning that the number of segments looks > too much by one segment. > > One arguable point of the feature is how max_slot_wal_keep_size > works exactly. I assume that even though the name is named as > "max_", we actually expect that "at least that bytes are > kept". So, for example, with 16MB of segment size and 50MB of > max_s_w_k_s, I designed this so that the size of preserved WAL > doesn't go below 50MB, actually (rounding up to multples of 16MB > of 50MB), and loses the oldest segment when it reaches 64MB + > 16MB = 80MB as you saw. > > # I believe that the difference is not so significant since we > # have around hunderd or several hundreds of segments in common > # cases. > > Do you mean that we should define the GUC parameter literally as > "we won't have exactly that many bytes of WAL segmetns"? That is, > we have at most 48MB preserved WAL records for 50MB of > max_s_w_k_s setting. This is the same to how max_wal_size is > counted but I don't think max_slot_wal_keep_size will be regarded > in the same way. I might be missing something but what I'm expecting to this feature is to restrict the how much WAL we can keep at a maximum for replication slots. In other words, the distance between the current LSN and the minimum restart_lsn of replication slots doesn't over the value of max_slot_wal_keep_size. It's similar to wal_keep_segments except for that this feature affects only replication slots. And wal_keep_segments cannot restrict WAL that replication slots are holding. For example, with 16MB of segment size and 50MB of max_slot_wal_keep_size, we can keep at most 50MB WAL for replication slots. However, once we consumed more than 50MB WAL while not advancing any restart_lsn the required WAL might be lost by the next checkpoint, which depends on the min_wal_size. On the other hand, if we mostly can advance restart_lsn to approximately the current LSN the size of preserved WAL for replication slots can go below 50MB. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Ugrrr! PLEASE ignore this! It's not wrong at all. 2018年9月6日(木) 18:58 Kyotaro HORIGUCHI : > - The following assertion looks contradicting to the comment. > |/* Postmaster-managed callers must handle postmaster death somehow. */ > |Assert(!IsUnderPostmaster || > | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || > | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); > -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
On Thu, Sep 6, 2018 at 11:08 AM Chris Travers wrote: > Ok, so here's my current patch (work in progress). I have not run tests > yet, and finding a way to add a test case is virtually impossible though I > expect we will find ways of using gdb to confirm local fix later. > > After careful code review, I settled on the following approach which > seemed to be the least intrusive. > > 1. Change the retry logic so that it does not retry of QueryCancelPending > or ProcDiePending are set. In these cases EINTR is passed back to the > caller > 2. Change the error handling logic of the caller so that EINTR is handled > by the next CHECK_FOR_INTERRUPTS() after cleanup. > > This means that the file descriptor is now already closed and that we > handle this the same way we would with a ENOSPC. The reason for this is > that there are many places in the code where it is not clear what degree of > safety is present for throwing errors using ereport, and the patch needs to > be as unobtrusive as possible to facilitate back porting. > > At this point the patch is for discussion. I have not even compiled it or > tested it but maybe there is something wrong with my approach so figured I > would send it out early. > > The major assumptions are: > 1. close() will never take longer than the interrupt interval that caused > the loop to break. > 2. close() does not get interrupted in a way that will not cause cleanup > problems later. > 3. We will reach the next interrupt check at a reasonable interval and > safe spot. > > Any initial arguments against? > The previous patch had two issues found on internal code review here. I am sending a new patch. This patch compiles. make check passes make check-world fails but the errors are the same as found on master and involve ecpg. We will be doing further internal review and verification and then will run through pg_indent before final submission. Changes in this patch: 1. Previous patch had backwards check for EINTR in calling function. This was fixed. 2. In cases where ERROR elevel was passed in, function would return instead of error out on case of error. So in this case we check if we expect to error on error and if so, check for interrupts. If not, we go through the normal error handling/logging path which might result in an warning that shared memory segment could not be allocated followed by an actual meaningful error. I could put that in an else if, if that is seen as a good idea, but figured I would raise it as a discussion point. > > -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin racecondition.patch Description: Binary data
Re: [HACKERS] Bug in to_timestamp().
On Wed, Sep 5, 2018 at 7:28 PM amul sul wrote: > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov > wrote: >> On Wed, Sep 5, 2018 at 3:10 PM amul sul wrote: >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov >> > wrote: >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston >> > > wrote: >> > > > From those results the question is how important is it to force the >> > > > following breakage on our users (i.e., introduce FX exact symbol >> > > > matching): >> > > > >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); >> > > > - to_timestamp >> > > > --- >> > > > - Sun Feb 16 00:00:00 1997 PST >> > > > -(1 row) >> > > > - >> > > > +ERROR: unexpected character "/", expected character ":" >> > > > +HINT: In FX mode, punctuation in the input string must exactly match >> > > > the format string. >> > > > >> > > > There seemed to be some implicit approvals of this breakage some 30 >> > > > emails and 10 months ago but given that this is the only change from a >> > > > correct result to a failure I'd like to officially put it out there >> > > > for opinion/vote gathering. Mine is a -1; though keeping the >> > > > distinction between space and non-alphanumeric characters is expected. >> > > >> > > Do I understand correctly that you're -1 to changes to FX mode, but no >> > > objection to changes in non-FX mode? >> > > >> > Ditto. >> >> So, if no objections for non-FX mode changes, then I'll extract that >> part and commit it separately. > > > Yeah, that make sense to me, thank you. OK! I've removed FX changes from the patch. The result is attached. I'm going to commit this if no objections. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-to-timestamp-format-checking-v18.patch Description: Binary data
Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
On Thu, Sep 6, 2018 at 1:31 PM Chris Travers wrote: > > > On Thu, Sep 6, 2018 at 11:08 AM Chris Travers > wrote: > >> Ok, so here's my current patch (work in progress). I have not run tests >> yet, and finding a way to add a test case is virtually impossible though I >> expect we will find ways of using gdb to confirm local fix later. >> >> After careful code review, I settled on the following approach which >> seemed to be the least intrusive. >> >> 1. Change the retry logic so that it does not retry of >> QueryCancelPending or ProcDiePending are set. In these cases EINTR is >> passed back to the caller >> 2. Change the error handling logic of the caller so that EINTR is >> handled by the next CHECK_FOR_INTERRUPTS() after cleanup. >> >> This means that the file descriptor is now already closed and that we >> handle this the same way we would with a ENOSPC. The reason for this is >> that there are many places in the code where it is not clear what degree of >> safety is present for throwing errors using ereport, and the patch needs to >> be as unobtrusive as possible to facilitate back porting. >> >> At this point the patch is for discussion. I have not even compiled it >> or tested it but maybe there is something wrong with my approach so figured >> I would send it out early. >> >> The major assumptions are: >> 1. close() will never take longer than the interrupt interval that >> caused the loop to break. >> 2. close() does not get interrupted in a way that will not cause cleanup >> problems later. >> 3. We will reach the next interrupt check at a reasonable interval and >> safe spot. >> >> Any initial arguments against? >> > > The previous patch had two issues found on internal code review here. I > am sending a new patch. > > This patch compiles. make check passes > make check-world fails but the errors are the same as found on master and > involve ecpg. > > We will be doing further internal review and verification and then will > run through pg_indent before final submission. > > Changes in this patch: > 1. Previous patch had backwards check for EINTR in calling function. > This was fixed. > 2. In cases where ERROR elevel was passed in, function would return > instead of error out on case of error. > > So in this case we check if we expect to error on error and if so, check > for interrupts. If not, we go through the normal error handling/logging > path which might result in an warning that shared memory segment could not > be allocated followed by an actual meaningful error. I could put that in > an else if, if that is seen as a good idea, but figured I would raise it as > a discussion point. > Attaching correct patch > > >> >> -- >> Best Regards, >> Chris Travers >> Head of Database >> >> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com >> Saarbrücker Straße 37a, 10405 Berlin >> >> > > -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin racecondition.patch Description: Binary data
Re: Optimze usage of immutable functions as relation
On Tue, 10 Jul 2018 17:34:20 -0400 Tom Lane wrote: >Heikki Linnakangas writes: >> But stepping back a bit, it's a bit weird that we're handling this >> differently from VALUES and other subqueries. The planner knows how >> to do this trick for simple subqueries: > >> postgres=# explain select * from tenk1, (select abs(100)) as a (a) >> where unique1 < a; >> QUERY PLAN >> --- >> Seq Scan on tenk1 (cost=0.00..483.00 rows=100 width=248) >> Filter: (unique1 < 100) >> (2 rows) > >> Note that it not only evaluated the function into a constant, but >> also got rid of the join. For a function RTE, however, it can't do >> that: > >> postgres=# explain select * from tenk1, abs(100) as a (a) where >> unique1 < a; QUERY PLAN >> --- >> Nested Loop (cost=0.00..583.01 rows= width=248) >> Join Filter: (tenk1.unique1 < a.a) >> -> Function Scan on a (cost=0.00..0.01 rows=1 width=4) >> -> Seq Scan on tenk1 (cost=0.00..458.00 rows=1 width=244) >> (4 rows) > >> Could we handle this in pull_up_subqueries(), similar to the >> RTE_SUBQUERY and RTE_VALUES cases? > >Perhaps. You could only do it for non-set-returning functions, which >isn't the typical use of function RTEs, which is probably why we've not >thought hard about it before. I'm not sure what would need to happen >for lateral functions. Also to be considered, if it's not foldable to >a constant, is whether we're risking evaluating it more times than >before. > > regards, tom lane I reworked the patch and implemented processing of FuncScan in pull_up_subqueries() in a way similar to VALUES processing. In order to prevent folding of non-foldable functions it checks provolatile of the function and are arguments const or not and return type to prevent folding of SRF. -- Aleksandr Parfenov Postgres Professional: http://www.postgrespro.com Russian Postgres Company diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index c3f46a26c3..25539bbfae 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -23,7 +23,9 @@ */ #include "postgres.h" +#include "access/htup_details.h" #include "catalog/pg_type.h" +#include "catalog/pg_proc.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" @@ -35,6 +37,8 @@ #include "parser/parse_relation.h" #include "parser/parsetree.h" #include "rewrite/rewriteManip.h" +#include "utils/syscache.h" +#include "utils/lsyscache.h" typedef struct pullup_replace_vars_context @@ -86,6 +90,8 @@ static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte, bool deletion_ok); static Node *pull_up_simple_values(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte); +static Node *pull_up_simple_function(PlannerInfo *root, Node *jtnode, + RangeTblEntry *rte); static bool is_simple_values(PlannerInfo *root, RangeTblEntry *rte, bool deletion_ok); static bool is_simple_union_all(Query *subquery); @@ -595,6 +601,54 @@ inline_set_returning_functions(PlannerInfo *root) } } +static bool +is_simple_stable_function(RangeTblEntry *rte) +{ + Form_pg_type type_form; + RangeTblFunction *tblFunction = linitial_node(RangeTblFunction, rte->functions); + FuncExpr *expr = (FuncExpr *) tblFunction->funcexpr; + HeapTuple tuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(expr->funcresulttype)); + bool result = false; + + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for type %u", expr->funcresulttype); + + type_form = (Form_pg_type) GETSTRUCT(tuple); + + if (type_form->typtype == TYPTYPE_BASE && + !type_is_array(expr->funcresulttype)) + { + Form_pg_proc func_form; + ListCell *arg; + bool has_nonconst_input = false; + bool has_null_input = false; + + ReleaseSysCache(tuple); + tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(expr->funcid)); + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for function %u", expr->funcid); + func_form = (Form_pg_proc) GETSTRUCT(tuple); + + foreach(arg, expr->args) + { + if (IsA(lfirst(arg), Const)) +has_null_input |= ((Const *) lfirst(arg))->constisnull; + else +has_nonconst_input = true; + } + + result = func_form->prorettype != RECORDOID && + func_form->prokind == PROKIND_FUNCTION && + !func_form->proretset && + func_form->provolatile == PROVOLATILE_IMMUTABLE && + !has_null_input && + !has_nonconst_input; + } + + ReleaseSysCache(tuple); + return result; +} + /* * pull_up_subqueries * Look for subqueries in the rangetable that can be pulled up into @@ -725,6 +779,11 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode, is_simple_values(root, rte, deletion_ok)) return pull_up_simple_values(root, jtnode, rte); + if (rte->rtekind
Re: remove ancient pre-dlopen dynloader code
On 06/09/2018 10:16, Peter Eisentraut wrote: > The v3 patch attached here was made with git format-patch --no-renames. > Let's see how that works out. That worked, and the patch has been committed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
[no subject]
Why can not those shared catalog tables take lock via fast path if eligible? User tables and local catalog tables can take lock via fast path .
Re: [HACKERS] Bug in to_timestamp().
On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov wrote: > > On Wed, Sep 5, 2018 at 7:28 PM amul sul wrote: > > On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov > > wrote: > >> On Wed, Sep 5, 2018 at 3:10 PM amul sul wrote: > >> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov > >> > wrote: > >> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston > >> > > wrote: > >> > > > From those results the question is how important is it to force the > >> > > > following breakage on our users (i.e., introduce FX exact symbol > >> > > > matching): > >> > > > > >> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD'); > >> > > > - to_timestamp > >> > > > --- > >> > > > - Sun Feb 16 00:00:00 1997 PST > >> > > > -(1 row) > >> > > > - > >> > > > +ERROR: unexpected character "/", expected character ":" > >> > > > +HINT: In FX mode, punctuation in the input string must exactly > >> > > > match the format string. > >> > > > > >> > > > There seemed to be some implicit approvals of this breakage some 30 > >> > > > emails and 10 months ago but given that this is the only change from > >> > > > a correct result to a failure I'd like to officially put it out > >> > > > there for opinion/vote gathering. Mine is a -1; though keeping the > >> > > > distinction between space and non-alphanumeric characters is > >> > > > expected. > >> > > > >> > > Do I understand correctly that you're -1 to changes to FX mode, but no > >> > > objection to changes in non-FX mode? > >> > > > >> > Ditto. > >> > >> So, if no objections for non-FX mode changes, then I'll extract that > >> part and commit it separately. > > > > > > Yeah, that make sense to me, thank you. > > OK! I've removed FX changes from the patch. The result is attached. > I'm going to commit this if no objections. Attached revision fixes usage of two subsequent spaces in the documentation. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company 0001-to-timestamp-format-checking-v19.patch Description: Binary data
Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data
On Tue, Jul 3, 2018 at 1:11 AM, Carter Thaxton wrote: > The whole reason for the colon in the --where option is to indicate which > table the WHERE clause should refer to, so that one can dump less than all > of the rows. > The --table option is totally different. It specifies which tables to > dump at all. > Thank you for explaining, I just have one comment . I found the error message generated on incorrect where clause specification strange for pg_dump. I think query result status check needed to handle it and generate more friendly error message. regards Surafel
Re: Funny hang on PostgreSQL 10 during parallel index scan on slave
As a status note, the above patch has not been run through pg_indent and while I have run make check-world on linux (passed) and a non-Linux system (which failed both with and without my patch) I will be making one more small revision before final submission unless anyone thinks I need to change approaches. Currently, all code paths that hit the resize call the relevant functions with an ERROR elevel. So running CHECK_FOR_INTERRUPTS after the initial cleanup is currently safe, but the surrounding function doesn't guarantee safety so I don't think it is a good idea to raise exceptions there. In above patch I test for whether the level is ERROR and check for interrupts when that is true. However as a colleague pointed out to me, if anyone ever calls this with FATAL or PANIC as the desired error level, that would not be safe so I will be checking for whether the level is equal to or greater than ERROR. We will also go through a manual test using a debugger to try to test the behavior in these cases and make sure the patch actually resolves our problem. Assuming no objections here and that the manual test works, you can expect a formal submission in the next couple of days. > > -- > Best Regards, > Chris Travers > Head of Database > > Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com > Saarbrücker Straße 37a, 10405 Berlin > > -- Best Regards, Chris Travers Head of Database Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com Saarbrücker Straße 37a, 10405 Berlin
Re: pgsql: Allow extensions to install built as well as unbuilt headers.
Andrew Gierth writes: > "Michael" == Michael Paquier writes: > Michael> prairiedog is unhappy with this commit: > What version of GNU Make is on there, do you know? Tom? I don't see it > mentioned in the output anywhere. $ make -v GNU Make 3.80 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. regards, tom lane
Re: pgsql: Allow extensions to install built as well as unbuilt headers.
Andrew Gierth writes: > Consulting old manuals suggests it may have version 3.80 (c. 2002), > which lacks the 'else if..' syntax introduced in 3.81 (c. 2006). Right; I haven't updated it because that's specifically called out to be the minimum supported version in our docs (cf installation.sgml). regards, tom lane
Re: pread() and pwrite()
Hi, On 09/05/2018 02:42 PM, Jesper Pedersen wrote: On 07/26/2018 10:04 PM, Thomas Munro wrote: Done. Rebased. This needs a rebase again. Would it be of benefit to update these call sites * slru.c - SlruPhysicalReadPage - SlruPhysicalWritePage * xlogutils.c - XLogRead * pg_receivewal.c - FindStreamingStart * rewriteheap.c - heap_xlog_logical_rewrite * walreceiver.c - XLogWalRcvWrite too ? Best regards, Jesper
Re: remove ancient pre-dlopen dynloader code
Peter Eisentraut writes: > On 06/09/2018 10:16, Peter Eisentraut wrote: >> The v3 patch attached here was made with git format-patch --no-renames. >> Let's see how that works out. > That worked, and the patch has been committed. Sure enough, gaur's not happy. I'll take a look in a bit. regards, tom lane
Re: remove ancient pre-dlopen dynloader code
On Thu, Sep 6, 2018 at 1:16 AM Peter Eisentraut wrote: > I'm going to use this thread for a moment to work out some details with > the cfbot. > > The v2 patch I sent previously was created using git format-patch with > default settings. This detected a rename: > > rename src/{backend/port/dynloader/win32.c => port/dlopen.c} (51%) > > which is fair enough. However, whatever method the cfbot uses to apply > patches fails to handle that. Interesting. Its version of "patch" doesn't understand that. I am not sure if other versions of patch do. Currently cfbot doesn't use git am because not everyone is posting patches created with format-patch, and I thought good old patch could handle basically anything. I wasn't aware of this quirk. I'll see if there is some way I can convince patch to respect renames, or I should try to apply with git first and then fall back to patch only if that fails. -- Thomas Munro http://www.enterprisedb.com
Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data
On Mon, May 21, 2018 at 6:34 AM Carter Thaxton wrote: > Many times I've wanted to export a subset of a database, using some sort > of row filter condition on some of the large tables. E.g. copying a > production database to a staging environment, but with some time series > data only from the past month. > > We have the existing options: > --include-table=table(and its -t synonym) > --exclude-table=table > --exclude-table-data=table > > I propose a new option: > --include-table-data-where=table:filter_clause > > One would use this option as follows: > > pg_dump --include-table-data-where=largetable:"created_at >= > '2018-05-01'" database_name > > The filter_clause is used as the contents of a WHERE clause when querying > the data to generate the COPY statement produced by pg_dump. > > I've prepared a proposed patch for this, which is attached. The code > changes are rather straightforward. I did have to add the ability to carry > around an extra pointer-sized object to the simple_list implementation, in > order to allow the filter clause to be associated to the matching oids of > the table pattern. It seemed the best way to augment the existing > simple_list implementation, but change as little as possible elsewhere in > the codebase. (Note that SimpleOidList is actually only used by pg_dump). > > Feel free to review and propose any amendments. > > Why not simply use \copy (select * from largetable where created_at >= '2018-05-01') to stdout? That is what I’ve always done when I need something like this and have not found it particularly bothersome but rather quite powerful. And here you have tons of flexibility because you can do joins and whatever else. FWIW. Thanks, Jeremy
Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data
On Thu, Sep 6, 2018 at 8:40 AM, Jeremy Finzel wrote: > Why not simply use \copy (select * from largetable where created_at >= >> '2018-05-01') to stdout? That is what I’ve always done when I need >> something like this and have not found it particularly bothersome but >> rather quite powerful. And here you have tons of flexibility because you >> can do joins and whatever else. >> > Just skimming the thread but I'd have to say being able to leverage pg_dump's dependency resolution is a major reason for adding features to it instead sticking to writing psql scripts. This feature in a multi-tenant situation is something with, I suspect, reasonably wide appeal. David J.
Re: Add --include-table-data-where option to pg_dump, to export only a subset of table data
> > Just skimming the thread but I'd have to say being able to leverage > pg_dump's dependency resolution is a major reason for adding features to it > instead sticking to writing psql scripts. This feature in a multi-tenant > situation is something with, I suspect, reasonably wide appeal. > That I would agree with if in fact it's true people want that, but that wasn't how the problem trying to be solved was communicated. From what I read in the initial examples given, just using psql is more than sufficient in those cases. I will grant that copying the structure and data at the same time would be much easier, however. Because using psql, you need pg_dump to create your structure then a separate psql script to copy the data. But again with --data-only examples given, it's so easy to do that with psql copy I just don't understand the value of the feature unless you really are saying you require the dependency resolution. I agree with some of the hesitation of complicating the syntax and allowing too much customization for what pg_dump is designed for. Really, if you need more customization, copy gives you that. So I don't personally consider it a missing feature because both tools have different uses and I haven't found that any of this disrupts my workflow. FWIW... Thanks, Jeremy
Re: pgsql: Allow extensions to install built as well as unbuilt headers.
> "Tom" == Tom Lane writes: > Andrew Gierth writes: >> Consulting old manuals suggests it may have version 3.80 (c. 2002), >> which lacks the 'else if..' syntax introduced in 3.81 (c. 2006). Tom> Right; I haven't updated it because that's specifically called out to Tom> be the minimum supported version in our docs (cf installation.sgml). Yeah; I checked the other constructs I used for what version they were added in, but that one slipped through. Fix coming shortly. -- Andrew (irc:RhodiumToad)
Re: test_pg_dump missing cleanup actions
Hi Stephen, On Tue, Sep 04, 2018 at 04:14:15PM -0700, Michael Paquier wrote: > On Tue, Sep 04, 2018 at 06:02:51PM -0400, Stephen Frost wrote: >> * Tom Lane (t...@sss.pgh.pa.us) wrote: >>> I'm confused. Isn't the point of that script exactly to create a modified >>> extension for testing pg_dump with? > > Not really, the regression tests run for pg_regress and the TAP test > suite are two completely isolated things and share no dependencies. > e54f757 has actually changed test_pg_dump.sql so as it adds regression > tests for pg_init_privs for ALTER EXTENSION ADD/DROP, and visibly those > have been added to test_pg_dump because they were easier to add there, > and this has no interactions with pg_dump. What I think should have > been done initially is to add those new tests in test_extensions > instead. I am able to come back to this thread, and I still don't grep from where sql/test_pg_dump.sql is called. Stephen, if the test suite is aiming at tracking that pg_init_privs is correctly set up with ALTER EXTENSION ADD/DROP, shouldn't it also query the catalog to make sure that the correct entries are added and removed after running the so-said command? At least that's one way I could see to perform the necessary sanity checks without running directly pg_dump. Perhaps I am missing something? >>> What I'd do is leave the final state as-is and add a "drop role if exists" >>> at the start, similar to what some of the core regression tests do. >> >> I've not followed this thread but based on Tom's response, I agree with >> his suggestion of what to do here. > > Not as far as I can see.. Please note that using installcheck on the > main regression test suite does not leave around any extra roles. I can > understand the role of having a DROP ROLE IF EXISTS though: if you get a > crash while testing, then the beginning of the tests are repeatable, so > independently of the root issue Tom's suggestion makes sense to me. Attached is a patch with more comments about the intents of the test suite, and the separate issue pointed out by Tom fixed. It seems to me that actually checking the contents of pg_init_privs would improve the reason why the test exists.. I would welcome input about this last point. -- Michael diff --git a/src/test/modules/test_pg_dump/expected/test_pg_dump.out b/src/test/modules/test_pg_dump/expected/test_pg_dump.out index c9bc6f7625..ad730f18fd 100644 --- a/src/test/modules/test_pg_dump/expected/test_pg_dump.out +++ b/src/test/modules/test_pg_dump/expected/test_pg_dump.out @@ -1,3 +1,11 @@ +-- +-- Set of regression tests to test that ALTER EXTENSION ADD/DROP +-- handles correctly pg_init_privs. +-- +-- Clean up in case a prior regression run failed +SET client_min_messages TO 'warning'; +DROP ROLE IF EXISTS regress_dump_test_role; +RESET client_min_messages; CREATE ROLE regress_dump_test_role; CREATE EXTENSION test_pg_dump; ALTER EXTENSION test_pg_dump ADD DATABASE postgres; -- error @@ -92,3 +100,12 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0; ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1; ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1; ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1; +-- Clean up +DROP EXTENSION test_pg_dump; +DROP SCHEMA test_pg_dump_s1; +DROP MATERIALIZED VIEW test_pg_dump_mv1; +DROP FOREIGN TABLE ft1; +DROP SERVER s0; +DROP TYPE test_pg_dump_e1; +DROP FUNCTION test_pg_dump(integer); +DROP ROLE regress_dump_test_role; diff --git a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql index e463dec404..1becc2b609 100644 --- a/src/test/modules/test_pg_dump/sql/test_pg_dump.sql +++ b/src/test/modules/test_pg_dump/sql/test_pg_dump.sql @@ -1,3 +1,13 @@ +-- +-- Set of regression tests to test that ALTER EXTENSION ADD/DROP +-- handles correctly pg_init_privs. +-- + +-- Clean up in case a prior regression run failed +SET client_min_messages TO 'warning'; +DROP ROLE IF EXISTS regress_dump_test_role; +RESET client_min_messages; + CREATE ROLE regress_dump_test_role; CREATE EXTENSION test_pg_dump; @@ -105,3 +115,13 @@ ALTER EXTENSION test_pg_dump DROP SERVER s0; ALTER EXTENSION test_pg_dump DROP TABLE test_pg_dump_t1; ALTER EXTENSION test_pg_dump DROP TYPE test_pg_dump_e1; ALTER EXTENSION test_pg_dump DROP VIEW test_pg_dump_v1; + +-- Clean up +DROP EXTENSION test_pg_dump; +DROP SCHEMA test_pg_dump_s1; +DROP MATERIALIZED VIEW test_pg_dump_mv1; +DROP FOREIGN TABLE ft1; +DROP SERVER s0; +DROP TYPE test_pg_dump_e1; +DROP FUNCTION test_pg_dump(integer); +DROP ROLE regress_dump_test_role; signature.asc Description: PGP signature
Re: *_to_xml() should copy SPI_processed/SPI_tuptable
I wrote: > Chapman Flack writes: >> Another alternative might be to have SPI_connect save them and >> SPI_finish put them back, which leaves you just responsible for >> reasoning about your own code. You'd still be expected to save them >> across your own uses of other SPI calls, but no longer exposed to >> spooky action at a distance from nested uses of SPI in stuff you call. > Hmm. I'd thought about that briefly and concluded that it didn't offer > a full fix, but on reflection it's not clear why it'd be any less of > a fix than the macroized-SPI_tuptable approach. You end up with > per-SPI-stack-level storage either way, and while that isn't perfect > it does go a long way, as you say. More, this has the huge advantage > of being back-patchable, because there'd be no API/ABI change. Here's a proposed patch along that line. I included SPI_result (SPI's equivalent of errno) in the set of saved-and-restored variables, so that all the exposed SPI globals behave the same. Also, in further service of providing insulation between SPI nesting levels, I thought it'd be a good idea for SPI_connect() to reset the globals to zero/NULL after saving them. This ensures that a nesting level can't accidentally be affected by the state of an outer level. It's barely possible that there's somebody out there who's *intentionally* accessing state from a calling SPI level, but that seems like it'd be pretty dangerous programming practice. Still, maybe there's an argument for omitting that hunk in released branches. Comments? regards, tom lane diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 5756365..11ca800 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -36,10 +36,16 @@ #include "utils/typcache.h" +/* + * These global variables are part of the API for various SPI functions + * (a horrible API choice, but it's too late now). To reduce the risk of + * interference between different SPI callers, we save and restore them + * when entering/exiting a SPI nesting level. + */ uint64 SPI_processed = 0; Oid SPI_lastoid = InvalidOid; SPITupleTable *SPI_tuptable = NULL; -int SPI_result; +int SPI_result = 0; static _SPI_connection *_SPI_stack = NULL; static _SPI_connection *_SPI_current = NULL; @@ -132,6 +138,10 @@ SPI_connect_ext(int options) _SPI_current->queryEnv = NULL; _SPI_current->atomic = (options & SPI_OPT_NONATOMIC ? false : true); _SPI_current->internal_xact = false; + _SPI_current->outer_processed = SPI_processed; + _SPI_current->outer_lastoid = SPI_lastoid; + _SPI_current->outer_tuptable = SPI_tuptable; + _SPI_current->outer_result = SPI_result; /* * Create memory contexts for this procedure @@ -154,6 +164,15 @@ SPI_connect_ext(int options) /* ... and switch to procedure's context */ _SPI_current->savedcxt = MemoryContextSwitchTo(_SPI_current->procCxt); + /* + * Reset API global variables so that current caller cannot accidentally + * depend on state of an outer caller. + */ + SPI_processed = 0; + SPI_lastoid = InvalidOid; + SPI_tuptable = NULL; + SPI_result = 0; + return SPI_OK_CONNECT; } @@ -176,12 +195,13 @@ SPI_finish(void) _SPI_current->procCxt = NULL; /* - * Reset result variables, especially SPI_tuptable which is probably + * Restore outer API variables, especially SPI_tuptable which is probably * pointing at a just-deleted tuptable */ - SPI_processed = 0; - SPI_lastoid = InvalidOid; - SPI_tuptable = NULL; + SPI_processed = _SPI_current->outer_processed; + SPI_lastoid = _SPI_current->outer_lastoid; + SPI_tuptable = _SPI_current->outer_tuptable; + SPI_result = _SPI_current->outer_result; /* Exit stack level */ _SPI_connected--; @@ -274,9 +294,11 @@ SPICleanup(void) { _SPI_current = NULL; _SPI_connected = -1; + /* Reset API global variables, too */ SPI_processed = 0; SPI_lastoid = InvalidOid; SPI_tuptable = NULL; + SPI_result = 0; } /* @@ -336,18 +358,20 @@ AtEOSubXact_SPI(bool isCommit, SubTransactionId mySubid) } /* - * Pop the stack entry and reset global variables. Unlike + * Restore outer global variables and pop the stack entry. Unlike * SPI_finish(), we don't risk switching to memory contexts that might * be already gone. */ + SPI_processed = connection->outer_processed; + SPI_lastoid = connection->outer_lastoid; + SPI_tuptable = connection->outer_tuptable; + SPI_result = connection->outer_result; + _SPI_connected--; if (_SPI_connected < 0) _SPI_current = NULL; else _SPI_current = &(_SPI_stack[_SPI_connected]); - SPI_processed = 0; - SPI_lastoid = InvalidOid; - SPI_tuptable = NULL; } if (found && isCommit) diff --git a/src/include/executor/spi_priv.h b/src/include/executor/spi_priv.h index 401fd99..0da3a41 100644 --- a/src/include/executor/spi_priv.h +++ b/src/include/executor/spi_priv.h @@ -42,6 +42,12 @@ typedef struct * transactions */ bool internal_xact; /* SPI-managed t
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested A fairly common planning problem for us is what we call "most recent first" queries; i.e., "the 50 most recent rows for a ". Here's a basic setup: -- created_at has very high cardinality create table foo(pk serial primary key, owner_fk integer, created_at timestamp); create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at); -- technically this data guarantees unique created_at values, -- but there's no reason it couldn't be modified to have a few -- random non-unique values to prove the point insert into foo(owner_fk, created_at) select i % 100, now() - (i::text || ' minutes')::interval from generate_series(1, 100) t(i); And here's the naive query to get the results we want: select * from foo where owner_fk = 23 -- pk is only here to disambiguate/guarantee a stable sort -- in the rare case that there are collisions in the other -- sort field(s) order by created_at desc, pk desc limit 50; On stock Postgres this ends up being pretty terrible for cases where the fk filter represents a large number of rows, because the planner generates a sort node under the limit node and therefore fetches all matches, sorts them, and then applies the limit. Here's the plan: Limit (cost=61386.12..61391.95 rows=50 width=16) (actual time=187.814..191.653 rows=50 loops=1) -> Gather Merge (cost=61386.12..70979.59 rows=82224 width=16) (actual time=187.813..191.647 rows=50 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=60386.10..60488.88 rows=41112 width=16) (actual time=185.639..185.642 rows=42 loops=3) Sort Key: created_at DESC, pk DESC Sort Method: top-N heapsort Memory: 27kB Worker 0: Sort Method: top-N heapsort Memory: 27kB Worker 1: Sort Method: top-N heapsort Memory: 27kB -> Parallel Bitmap Heap Scan on foo (cost=3345.24..59020.38 rows=41112 width=16) (actual time=25.150..181.804 rows=3 loops=3) Recheck Cond: (owner_fk = 23) Heap Blocks: exact=18014 -> Bitmap Index Scan on idx_foo_on_owner_and_created_at (cost=0.00..3320.57 rows=98668 width=0) (actual time=16.992..16.992 rows=10 loops=1) Index Cond: (owner_fk = 23) Planning Time: 0.384 ms Execution Time: 191.704 ms I have a recursive CTE that implements the algorithm: - Find first n+1 results - If result at n+1’s created_at value differs from the n’s value, return first n values. - If those equal, gather more results until a new created_at value is encountered. - Sort all results by created_at and a tie-breaker (e.g., pk) and return the first n values. But nobody wants to use/write that normally (it's quite complex). This patch solves the problem presented; here's the plan: Limit (cost=2.70..2.76 rows=50 width=16) (actual time=0.233..0.367 rows=50 loops=1) -> Incremental Sort (cost=2.70..111.72 rows=98668 width=16) (actual time=0.232..0.362 rows=50 loops=1) Sort Key: created_at DESC, pk DESC Presorted Key: created_at Sort Method: quicksort Memory: 26kB Sort Groups: 2 -> Index Scan Backward using idx_foo_on_owner_and_created_at on foo (cost=0.56..210640.79 rows=98668 width=16) (actual time=0.054..0.299 rows=65 loops=1) Index Cond: (owner_fk = 23) Planning Time: 0.428 ms Execution Time: 0.393 ms While check world fails, the only failure appears to be a plan output change in test/isolation/expected/drop-index-concurrently-1.out that just needs to be updated (incremental sort is now used in this plan); I don't see any functionality breakage.
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
James Coleman writes: > A fairly common planning problem for us is what we call "most recent first" > queries; i.e., "the 50 most recent rows for a ". > Here's a basic setup: > -- created_at has very high cardinality > create table foo(pk serial primary key, owner_fk integer, created_at > timestamp); > create index idx_foo_on_owner_and_created_at on foo(owner_fk, created_at); > -- technically this data guarantees unique created_at values, > -- but there's no reason it couldn't be modified to have a few > -- random non-unique values to prove the point > insert into foo(owner_fk, created_at) > select i % 100, now() - (i::text || ' minutes')::interval > from generate_series(1, 100) t(i); > And here's the naive query to get the results we want: > select * > from foo > where owner_fk = 23 > -- pk is only here to disambiguate/guarantee a stable sort > -- in the rare case that there are collisions in the other > -- sort field(s) > order by created_at desc, pk desc > limit 50; If you're concerned about the performance of this case, why don't you make an index that actually matches the query? regression=# create index on foo (owner_fk, created_at, pk); CREATE INDEX regression=# explain analyze select * from foo where owner_fk = 23 order by created_at desc, pk desc limit 50; QUERY PLAN - Limit (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280 rows=50 loops=1) -> Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on foo (cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255 rows=50 loops=1) Index Cond: (owner_fk = 23) Heap Fetches: 50 Planning Time: 0.290 ms Execution Time: 0.361 ms (6 rows) There may be use-cases for Alexander's patch, but I don't find this one to be terribly convincing. regards, tom lane
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
I disagree because it's not ideal to basically have to append pk to every index in the system just to get the ability to tie-break when there's actually very low likelihood of ties anyway. A similar use case is trying to batch through a result set, in which case you need a stable sort as well. The benefit is retaining the generality of indexes (and saving space in them etc.) while still allowing using them for more specific sorts. Any time you paginate or batch this way you benefit from this, which in many applications applies to a very high percentage of indexes. On Thu, Sep 6, 2018 at 10:39 AM Tom Lane wrote: > James Coleman writes: > > A fairly common planning problem for us is what we call "most recent > first" queries; i.e., "the 50 most recent rows for a ". > > > Here's a basic setup: > > > -- created_at has very high cardinality > > create table foo(pk serial primary key, owner_fk integer, created_at > timestamp); > > create index idx_foo_on_owner_and_created_at on foo(owner_fk, > created_at); > > > -- technically this data guarantees unique created_at values, > > -- but there's no reason it couldn't be modified to have a few > > -- random non-unique values to prove the point > > insert into foo(owner_fk, created_at) > > select i % 100, now() - (i::text || ' minutes')::interval > > from generate_series(1, 100) t(i); > > > And here's the naive query to get the results we want: > > > select * > > from foo > > where owner_fk = 23 > > -- pk is only here to disambiguate/guarantee a stable sort > > -- in the rare case that there are collisions in the other > > -- sort field(s) > > order by created_at desc, pk desc > > limit 50; > > If you're concerned about the performance of this case, why don't you make > an index that actually matches the query? > > regression=# create index on foo (owner_fk, created_at, pk); > CREATE INDEX > regression=# explain analyze select * from foo where owner_fk = 23 order > by created_at desc, pk desc limit 50; > > QUERY PLAN > > > - > Limit (cost=0.42..110.92 rows=50 width=16) (actual time=0.151..0.280 > rows=50 loops=1) >-> Index Only Scan Backward using foo_owner_fk_created_at_pk_idx on > foo (cost=0.42..20110.94 rows=9100 width=16) (actual time=0.146..0.255 > rows=50 loops=1) > Index Cond: (owner_fk = 23) > Heap Fetches: 50 > Planning Time: 0.290 ms > Execution Time: 0.361 ms > (6 rows) > > There may be use-cases for Alexander's patch, but I don't find this > one to be terribly convincing. > > regards, tom lane >
Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace
On Wed, Sep 05, 2018 at 12:14:41AM -0700, Jimmy Yih wrote: > Attached the isolation spec file. I originally was only going to fix the > simple CREATE TYPE scenario but decided to look up other objects that can > reside in namespaces and ended up finding a handful of others. I tested > each one manually before and after adding the AccessShareLock acquire on > the schema. (You should avoid top-posting, this is not the style of the lists.) Thanks. One problem I have with what you have here is that you just test only locking path as the session dropping the session would just block on the first concurrent object it finds. If you don't mind I am just stealing it, and extending it a bit ;) -- Michael signature.asc Description: PGP signature
Re: Collation versioning
On 05/09/2018 23:18, Thomas Munro wrote: > On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg wrote: >>> So, it's not ideal but perhaps worth considering on the grounds that >>> it's better than nothing? >> >> Ack. > > Ok, here's a little patch like that. > > postgres=# select collname, collcollate, collversion from pg_collation > where collname = 'en_NZ'; > collname | collcollate | collversion > --+-+- > en_NZ| en_NZ.utf8 | 2.24 > (1 row) But wouldn't that also have the effect that glibc updates that don't change anything about the locales would trigger the version incompatibility warning? Also, note that this mechanism only applies to collation objects, not to database-global locales. So most users wouldn't be helped by this approach. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Allow parallelism for deferrable serializable txns
Hi, The code currently says: * We can't use parallelism in serializable mode because the predicate * locking code is not parallel-aware. It's not catastrophic if someone * tries to run a parallel plan in serializable mode; it just won't get * any workers and will run serially. But it seems like a good heuristic * to assume that the same serialization level will be in effect at plan * time and execution time, so don't generate a parallel plan if we're in * serializable mode. */ if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && parse->commandType == CMD_SELECT && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker() && !IsolationIsSerializable()) { /* all the cheap tests pass, so scan the query tree */ glob->maxParallelHazard = max_parallel_hazard(parse); glob->parallelModeOK = (glob->maxParallelHazard != PROPARALLEL_UNSAFE); } afaict for deferrable READ ONLY DEFERRABLE transactions we could trivially allow parallelism? Am I missing something? Greetings, Andres Freund
Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace
On Wed, Sep 05, 2018 at 09:23:37AM -0700, Andres Freund wrote: > Well, we kinda do, during some of their own DDL. CF > AcquireDeletionLock(), RangeVarGetAndCheckCreationNamespace(), and other > LockDatabaseObject() callers. The > RangeVarGetAndCheckCreationNamespace() even locks the schema an object > is created in , which is pretty much what we're discussing here. > > I think the problem with the current logic is more that the > findDependentObjects() scan doesn't use a "dirty" scan, so it doesn't > ever get to seeing conflicting operations. Well, it seems to me that we don't necessarily want to go down to that for sure on back-branches. What's actually striking me as a very bad thing is the inconsistent behavior when we need to get a namespace OID after calling QualifiedNameGetCreationNamespace using a list of defnames which does not lock the schema the DDL is working on. And this happens for basically all the object types Jimmy has mentioned. For example, when creating a composite type, the namespace lock is taken through RangeVarGetAndCheckCreationNamespace. If a user tries to create a root type, then we simply don't lock it, which leads to an inconsistency of behavior between composite types and root types. In my opinion the inconsistency of behavior for the same DDL is not logic. So I have been looking at that, and finished with the attached, which actually fixes the set of problems reported. Thoughts? -- Michael diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 5d13e6a3d7..49bd61aff6 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -3011,6 +3011,14 @@ QualifiedNameGetCreationNamespace(List *names, char **objname_p) errmsg("no schema has been selected to create in"))); } + Assert(OidIsValid(namespaceId)); + + /* + * Lock the creation namespace to protect against concurrent namespace + * drop. + */ + LockDatabaseObject(NamespaceRelationId, namespaceId, 0, AccessShareLock); + return namespaceId; } diff --git a/src/test/isolation/expected/schema-drop.out b/src/test/isolation/expected/schema-drop.out new file mode 100644 index 00..5ea14abb0b --- /dev/null +++ b/src/test/isolation/expected/schema-drop.out @@ -0,0 +1,43 @@ +Parsed test spec with 2 sessions + +starting permutation: s1_begin s1_create_type s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_type: CREATE TYPE testschema.testtype; +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> + +starting permutation: s1_begin s1_create_agg s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_agg: CREATE AGGREGATE testschema.a1(int4) (SFUNC=int4pl, STYPE=int4); +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> + +starting permutation: s1_begin s1_create_func s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_func: CREATE FUNCTION testschema.f1() RETURNS bool LANGUAGE SQL AS 'SELECT true'; +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> + +starting permutation: s1_begin s1_create_op s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_op: CREATE OPERATOR testschema.@+@ (LEFTARG=int4, RIGHTARG=int4, PROCEDURE=int4pl); +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> + +starting permutation: s1_begin s1_create_opfamily s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_opfamily: CREATE OPERATOR FAMILY testschema.opfam1 USING btree; +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> + +starting permutation: s1_begin s1_create_opclass s2_drop_schema s1_commit +step s1_begin: BEGIN; +step s1_create_opclass: CREATE OPERATOR CLASS testschema.opclass1 FOR TYPE uuid USING hash AS STORAGE uuid; +step s2_drop_schema: DROP SCHEMA testschema CASCADE; +step s1_commit: COMMIT; +step s2_drop_schema: <... completed> diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule index c23b401225..12b7a96d7e 100644 --- a/src/test/isolation/isolation_schedule +++ b/src/test/isolation/isolation_schedule @@ -78,3 +78,4 @@ test: partition-key-update-3 test: partition-key-update-4 test: plpgsql-toast test: truncate-conflict +test: schema-drop diff --git a/src/test/isolation/specs/schema-drop.spec b/src/test/isolation/specs/schema-drop.spec new file mode 100644 index 00..d40703d3a6 --- /dev/null +++ b/src/test/isolation/specs/schema-drop.spec @@ -0,0 +1,32 @@ +# Tests for schema drop with concurrently-created objects +# +# When an empty namespace is being initially populated with the below +# objects, it is possible to DROP SCHEMA without a CASCADE before the +# objects are committed. DROP SCHEMA should wait for the transaction +# creating t
Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace
On Thu, Sep 6, 2018 at 4:22 PM Michael Paquier wrote: > > On Wed, Sep 05, 2018 at 09:23:37AM -0700, Andres Freund wrote: > > Well, we kinda do, during some of their own DDL. CF > > AcquireDeletionLock(), RangeVarGetAndCheckCreationNamespace(), and other > > LockDatabaseObject() callers. The > > RangeVarGetAndCheckCreationNamespace() even locks the schema an object > > is created in , which is pretty much what we're discussing here. > > > > I think the problem with the current logic is more that the > > findDependentObjects() scan doesn't use a "dirty" scan, so it doesn't > > ever get to seeing conflicting operations. > > Well, it seems to me that we don't necessarily want to go down to that > for sure on back-branches. What's actually striking me as a very bad > thing is the inconsistent behavior when we need to get a namespace OID > after calling QualifiedNameGetCreationNamespace using a list of defnames > which does not lock the schema the DDL is working on. And this happens > for basically all the object types Jimmy has mentioned. > > For example, when creating a composite type, the namespace lock is taken > through RangeVarGetAndCheckCreationNamespace. If a user tries to create > a root type, then we simply don't lock it, which leads to an > inconsistency of behavior between composite types and root types. In my > opinion the inconsistency of behavior for the same DDL is not logic. > > So I have been looking at that, and finished with the attached, which > actually fixes the set of problems reported. Thoughts? Hi, I applied to current master and run "make check-world" and everything is ok... I also run some similar tests as Jimmy pointed and using PLpgSQL to execute DDLs and the new consistent behavior is ok. Also I run one session using DROP SCHEMA at end and after COMMIT the session 2 report 'ERROR: schema "testschema" does not exist', so avoiding concerns about lock overhead seems the proposed patch is ok. Regards, -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: *_to_xml() should copy SPI_processed/SPI_tuptable
On 06/09/2018 18:25, Tom Lane wrote: > Here's a proposed patch along that line. I included SPI_result (SPI's > equivalent of errno) in the set of saved-and-restored variables, > so that all the exposed SPI globals behave the same. > > Also, in further service of providing insulation between SPI nesting > levels, I thought it'd be a good idea for SPI_connect() to reset the > globals to zero/NULL after saving them. This ensures that a nesting > level can't accidentally be affected by the state of an outer level. > It's barely possible that there's somebody out there who's *intentionally* > accessing state from a calling SPI level, but that seems like it'd be > pretty dangerous programming practice. Still, maybe there's an argument > for omitting that hunk in released branches. These look like good changes. I'm not sure about backpatching them, but as you say the chances that someone wants to do this intentionally are low. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Restricting maximum keep segments by repslots
This documentation + +Specify the maximum size of WAL files +that replication +slots are allowed to retain in the pg_wal +directory at checkpoint time. +If max_slot_wal_keep_size is zero (the default), +replication slots retain unlimited size of WAL files. + doesn't say anything about what happens when the limit is exceeded. Does the system halt until the WAL is fetched from the slots? Do the slots get invalidated? Also, I don't think 0 is a good value for the default behavior. 0 would mean that a slot is not allowed to retain any more WAL than already exists anyway. Maybe we don't want to support that directly, but it's a valid configuration. So maybe use -1 for infinity. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Incremental sort (was: PoC: Partial sort)
On 09/06/2018 08:04 PM, James Coleman wrote: > I disagree because it's not ideal to basically have to append pk to > every index in the system just to get the ability to tie-break when > there's actually very low likelihood of ties anyway. > > A similar use case is trying to batch through a result set, in which > case you need a stable sort as well. > > The benefit is retaining the generality of indexes (and saving space in > them etc.) while still allowing using them for more specific sorts. Any > time you paginate or batch this way you benefit from this, which in many > applications applies to a very high percentage of indexes. > I 100% with this. I see incremental sort as a way to run queries with fewer indexes that are less query-specific, while still benefiting from them. Which means lower overhead when writing data, lower disk space usage, and so on. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: *_to_xml() should copy SPI_processed/SPI_tuptable
Peter Eisentraut writes: > On 06/09/2018 18:25, Tom Lane wrote: >> Also, in further service of providing insulation between SPI nesting >> levels, I thought it'd be a good idea for SPI_connect() to reset the >> globals to zero/NULL after saving them. This ensures that a nesting >> level can't accidentally be affected by the state of an outer level. >> It's barely possible that there's somebody out there who's *intentionally* >> accessing state from a calling SPI level, but that seems like it'd be >> pretty dangerous programming practice. Still, maybe there's an argument >> for omitting that hunk in released branches. > These look like good changes. I'm not sure about backpatching them, but > as you say the chances that someone wants to do this intentionally are low. Yeah. I'd put a higher probability on the idea that this will fix somebody's latent bug in code that's never been tested inside an outer level of SPI call. It would, for example, work to rely on SPI_tuptable being initially NULL, as long as you hadn't tried it inside an outer call. regards, tom lane
Re: *_to_xml() should copy SPI_processed/SPI_tuptable
> "Tom" == Tom Lane writes: Tom> Also, in further service of providing insulation between SPI Tom> nesting levels, I thought it'd be a good idea for SPI_connect() to Tom> reset the globals to zero/NULL after saving them. This ensures Tom> that a nesting level can't accidentally be affected by the state Tom> of an outer level. It's barely possible that there's somebody out Tom> there who's *intentionally* accessing state from a calling SPI Tom> level, but that seems like it'd be pretty dangerous programming Tom> practice. Still, maybe there's an argument for omitting that hunk Tom> in released branches. Tom> Comments? +1 to backpatching it all, including the initialization in SPI_connect. -- Andrew (irc:RhodiumToad)
Commit fest 2018-09
Hi all, We are already in September, hence it is time to move on with the 2nd commit fest for v12. As usual, there are many patches waiting for review and integration: https://commitfest.postgresql.org/19/ With a couple of days of delay, I have switched the CF app as in-progress on Tuesday AOE. Please note that I am fine to act as CFM and help with the patch classification. If somebody wishes to act as such and is familiar with the community process, feel free to reply to this thread to manifest yourself. If you are a patch author, please note that the commit fest relies on the balance between patches submitted and patch reviews done. If you are not reviewing a patch for one patch submitted, please make sure to look at something with equal difficulty. Doing more reviews than patch authoring is not that bad either, quite the contrary :) If you are a patch reviewer and is registered as such, it is equally important to make progress so as something can get integrated in the tree. Please make sure to discuss things at the extent of what you know, and do not hesitate to discover new areas of the code and knowledge around it. There are many patches around waiting for your lookup. If you are a committer, there are currently 11 patches waiting for a commit. Hence if you are registered already let's move on with the patch integration. There are also a couple of patches which have not been looked at. Thanks, and happy hacking. -- Michael signature.asc Description: PGP signature
Re: Commit fest 2018-09
> On Thu, 6 Sep 2018 at 22:50, Michael Paquier wrote: > > If somebody wishes to act as such and is familiar with the community process, > feel free to reply to this thread to manifest yourself. I always wondered what would be the reaction on a first-time CFM? If it's ok I would love to try out, but for the next commitfest (and this time I'll watch CFM activity closer than usual).
Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
I wrote: > So where are we on this? Should I proceed with my patch, or are we > going to do further investigation? Does anyone want to do an actual > patch review? [ crickets... ] So I took that as license to proceed, but while doing a final round of testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails, because now that's an infinite recursion. On reflection it's a bit surprising that it wasn't so all along. What I'm inclined to do about it is to adjust AcceptInvalidationMessages so that there's a finite recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there already is in the CLOBBER_CACHE_ALWAYS case. Maybe 3 or so levels would be enough. regards, tom lane
Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
On 2018-09-06 17:38:38 -0400, Tom Lane wrote: > I wrote: > > So where are we on this? Should I proceed with my patch, or are we > > going to do further investigation? Does anyone want to do an actual > > patch review? > > [ crickets... ] Sorry, bit busy with postgres open, and a few people being in town due to that. Could you attach the current version of the patch, or were there no meaningful changes? > So I took that as license to proceed, but while doing a final round of > testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails, > because now that's an infinite recursion. On reflection it's a bit > surprising that it wasn't so all along. What I'm inclined to do about > it is to adjust AcceptInvalidationMessages so that there's a finite > recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there > already is in the CLOBBER_CACHE_ALWAYS case. Maybe 3 or so levels > would be enough. Hm, but wouldn't that pretty much mean that the risk exposed in this thread would still be present? The reason your approach fixes things is that if we're processing invalidation event N, which depends on invalidation changes at N+y being processed, is that the recursion leads to N+y being processed before the invalidation processing of N finishes, due to the recursion. Now you can argue that your restriction would only apply to CLOBBER_CACHE_RECURSIVELY, and thus not in normal builds, but that still makes me uncomfortable. Greetings, Andres Freund
Re: Commit fest 2018-09
On Thu, Sep 06, 2018 at 11:25:01PM +0200, Dmitry Dolgov wrote: > I always wondered what would be the reaction on a first-time CFM? There is a first time for all. -- Michael signature.asc Description: PGP signature
Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes
Andres Freund writes: > Could you attach the current version of the patch, or were there no > meaningful changes? No changes. >> So I took that as license to proceed, but while doing a final round of >> testing I found out that a CLOBBER_CACHE_RECURSIVELY build fails, >> because now that's an infinite recursion. On reflection it's a bit >> surprising that it wasn't so all along. What I'm inclined to do about >> it is to adjust AcceptInvalidationMessages so that there's a finite >> recursion depth limit in the CLOBBER_CACHE_RECURSIVELY case, as there >> already is in the CLOBBER_CACHE_ALWAYS case. Maybe 3 or so levels >> would be enough. > Hm, but wouldn't that pretty much mean that the risk exposed in this > thread would still be present? The reason your approach fixes things is > that if we're processing invalidation event N, which depends on > invalidation changes at N+y being processed, is that the recursion leads > to N+y being processed before the invalidation processing of N finishes, > due to the recursion. Right, which makes the maximum recursion depth equal to basically whatever you think the "backlog" of relevant catalog inval messages could be. It's finite, because we do after all have lock on one more catalog at each level, and eventually we'd hold lock on every system catalog or index that could be referenced inside AcceptInvalidationMessages. In practice I doubt the worst-case nesting level is more than two or three, but it's not clear how to establish just what it is. CLOBBER_CACHE_RECURSIVELY breaks this because it recurses whether or not any relevant inval activity occurred. I think that's good for verifying that recursion of AcceptInvalidationMessages works at all, but it's not relevant to whether infinite recursion could happen without that. regards, tom lane
Re: Problem while setting the fpw with SIGHUP
On Tue, Aug 28, 2018 at 07:34:36PM +0900, Kyotaro HORIGUCHI wrote: > Thanks for prompting. The difference is in a comment and I'm fine > with the change. /* * Properly accept or ignore signals the postmaster might send us. */ - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ I am finally coming back to this patch set, and that's one of the first things I am going to help moving on for this CF. And this bit from the last patch series is not acceptable as there are some parameters which are used by the startup process which can be reloaded. One of them is wal_retrieve_retry_interval for tuning when fetching WAL at recovery. -- Michael signature.asc Description: PGP signature
Re: Collation versioning
On Thu, Sep 6, 2018 at 12:01 PM Peter Eisentraut wrote: > On 05/09/2018 23:18, Thomas Munro wrote: > > On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg wrote: > >>> So, it's not ideal but perhaps worth considering on the grounds that > >>> it's better than nothing? > >> > >> Ack. > > > > Ok, here's a little patch like that. > > > > postgres=# select collname, collcollate, collversion from pg_collation > > where collname = 'en_NZ'; > > collname | collcollate | collversion > > --+-+- > > en_NZ| en_NZ.utf8 | 2.24 > > (1 row) > > But wouldn't that also have the effect that glibc updates that don't > change anything about the locales would trigger the version > incompatibility warning? Right. And likewise, a glibc update that does change some locales but not the locales that you are actually using will trigger false alarm warnings. The same goes for the ICU provider, which appears to return the same collversion for every collation, even though presumably some don't change from one ICU version to the next. I wonder if someone here knows how many "locales" packages have been released over the lifetime of (say) the current Debian stable distro, whether any LC_COLLATE files changed over those releases, and whether libc6 had the same MAJOR.MINOR for the whole lifetime. That is, even though they might have been through 2.19-17+blah, 2.19-18+blah, ... did they all report "2.19" and were the collations actually stable? If that's the case, I think it'd be quite good: we'd only raise the alarm after a big dist-upgrade Debian 8->9, or when doing streaming replication from a Debian 8 box to a Debian 9 box. > Also, note that this mechanism only applies to collation objects, not to > database-global locales. So most users wouldn't be helped by this approach. Yeah, right, that would have to work for this to be useful. I will look into that. -- Thomas Munro http://www.enterprisedb.com
Re: pgsql: Refactor dlopen() support
Peter Eisentraut writes: > Refactor dlopen() support Buildfarm member locust doesn't like this much. I've been able to reproduce the problem on an old Mac laptop running the same macOS release, viz 10.5.8. (Note that we're not seeing it on earlier or later releases, which is odd in itself.) According to my machine, the crash is happening here: #0 _PG_init () at plpy_main.c:98 98 *plpython_version_bitmask_ptr |= (1 << PY_MAJOR_VERSION); and the reason is that the rendezvous variable sometimes contains garbage. Most sessions correctly see it as initially zero, but sometimes it contains (gdb) p plpython_version_bitmask_ptr $1 = (int *) 0x1d and I've also seen (gdb) p plpython_version_bitmask_ptr $1 = (int *) 0x7f7f7f7f It's mostly repeatable but not completely so: the 0x1d case seems to come up every time through the plpython_do test, but I don't always see the 0x7f7f7f7f case. (Maybe that's a timing artifact? It takes a variable amount of time to recover from the first crash in plpython_do, so the rest of the plpython test run isn't exactly operating in uniform conditions.) No idea what's going on here, and I'm about out of steam for tonight. regards, tom lane