Re: Allow CURRENT_ROLE in GRANTED BY
On 2020-12-30 13:43, Simon Riggs wrote: On Thu, 10 Dec 2020 at 18:40, Peter Eisentraut wrote: On 2020-06-24 20:21, Peter Eisentraut wrote: On 2020-06-24 10:12, Vik Fearing wrote: On 6/24/20 8:35 AM, Peter Eisentraut wrote: I was checking some loose ends in SQL conformance, when I noticed: We support GRANT role ... GRANTED BY CURRENT_USER, but we don't support CURRENT_ROLE in that place, even though in PostgreSQL they are equivalent. Here is a trivial patch to add that. The only thing that isn't dead-obvious about this patch is the commit message says "[PATCH 1/2]". What is in the other part? Hehe. The second patch is some in-progress work to add the GRANTED BY clause to the regular GRANT command. More on that perhaps at a later date. Here is the highly anticipated and quite underwhelming second part of this patch set. Looks great, but no test to confirm it works. I would suggest adding a test and committing directly since I don't see any cause for further discussion. Committed with some tests. Thanks. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: SELECT INTO deprecation
On 2020-12-17 02:30, Michael Paquier wrote: On Wed, Dec 16, 2020 at 06:07:08PM +0100, Peter Eisentraut wrote: Right, we would very likely not add it now. But it doesn't seem to cause a lot of ongoing maintenance burden, so if there is a use case, it's not unreasonable to keep it around. I have no other motive here, I was just curious. From the point of view of the code, this would simplify the grammar rules of SELECT, which is a good thing. And this would also simplify some code in the rewriter where we create some CreateTableAsStmt on-the-fly where an IntoClause is specified, but my take is that this is not really a burden when it comes to long-term maintenance. And the git history tells, at least it seems to me so, that this is not manipulated much. I have committed my small documentation patch that points out compatibility with other implementations.
Re: [PATCH] pg_hba.conf error messages for logical replication connections
On Thu, Jan 28, 2021 at 8:17 PM Amit Kapila wrote: > > What exactly are you bothered about here? Is the database name not > present in the message your concern or the message uses 'replication' > but actually it doesn't relate to 'replication' specified in > pg_hba.conf your concern? I think with the current scheme one might > say that replication word in error message helps them to distinguish > logical replication connection error from a regular connection error. > I am not telling what you are proposing is wrong but just that the > current scheme of things might be helpful to some users. If you can > explain a bit how the current message misled you and the proposed one > solves that confusion that would be better? > My main confusion arose from conflating the word "replication" in the error message with the "replication" keyword in pg_hba.conf. In my case, I was actually confused because I was creating logical replication connections that weren't getting rejected, despite a lack of any "replication" rules in my pg_hba.conf. I had the faulty assumption that replication connection requires "replication" keyword, and my change to the documentation makes it explicit that logical replication connections do not match the "replication" keyword. I was digging through the code trying to understand why it was working, and also making manual connections before I stumbled upon these error messages. The fact that the error message doesn't include the database name definitely contributed to my confusion. It only mentions the word "replication", and not a database name, and that reinforces the idea that the "replication" keyword rule should apply, because it seems "replication" has replaced the database name. But overall, I would agree that the current messages aren't wrong, and my fix could still cause confusion because now logical replication connections won't be described as "replication" connections. How about explicitly specifying physical vs. logical replication in the error message, and also adding hints for clarifying the use of the "replication" keyword in pg_hba.conf? if physical replication Error "pg_hba.conf rejects physical replication connection ..." Hint "Physical replication connections only match pg_hba.conf rules using the "replication" keyword" else if logical replication Error "pg_hba.conf rejects logical replication connection to database %s ..." // Maybe add this? Hint "Logical replication connections do not match pg_hba.conf rules using the "replication" keyword" else Error "pg_hba.conf rejects connection to database %s ..." If I did go with this approach, would it be better to have three separate branches, or to just introduce another variable for the connection type? I'm not sure what is optimal for translation. (If both types of replication connections get hints then probably three branches is better.) const char *connection_type; connection_type = am_db_walsender ? _("logical replication connection") : am_walsender ? _("physical replication connection") : _("connection") - Paul
Re: pgbench: option delaying queries till connections establishment?
Hello Thomas, 3 . Decide if it's sane for the Windows-based emulation to be in here too, or if it should stay in pgbench.c. Or alternatively, if we're emulating pthread stuff on Windows, why not also put the other pthread emulation stuff from pgbench.c into a "ports" file; that seems premature and overkill for your project. I dunno. I decided to solve only the macOS problem for now. So in this version, the A and B patches are exactly as you had them in your v7, except that B includes “port/pg_pthread.h” instead of . Maybe it’d make sense to move the Win32 pthread emulation stuff out of pgbench.c into src/port too (the pre-existing stuff, and the new barrier stuff you added), but that seems like a separate patch, one that I’m not best placed to write, and it’s not clear to me that we’ll want to be using pthread APIs as our main abstraction if/when thread usage increases in the PG source tree anyway. Other opinions welcome. I think it would be much more consistent to move all the thread complement stuff there directly: Currently (v8) the windows implementation is in pgbench and the MacOS implementation in port, which is quite messy. Attached is a patch set which does that. I cannot test it neither on Windows nor on MacOS. Path 1 & 2 are really independent. -- Fabien.diff --git a/configure b/configure index e202697bbf..54c5a7963f 100755 --- a/configure +++ b/configure @@ -11668,6 +11668,62 @@ if test "$ac_res" != no; then : fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing pthread_barrier_wait" >&5 +$as_echo_n "checking for library containing pthread_barrier_wait... " >&6; } +if ${ac_cv_search_pthread_barrier_wait+:} false; then : + $as_echo_n "(cached) " >&6 +else + ac_func_search_save_LIBS=$LIBS +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +#ifdef __cplusplus +extern "C" +#endif +char pthread_barrier_wait (); +int +main () +{ +return pthread_barrier_wait (); + ; + return 0; +} +_ACEOF +for ac_lib in '' pthread; do + if test -z "$ac_lib"; then +ac_res="none required" + else +ac_res=-l$ac_lib +LIBS="-l$ac_lib $ac_func_search_save_LIBS" + fi + if ac_fn_c_try_link "$LINENO"; then : + ac_cv_search_pthread_barrier_wait=$ac_res +fi +rm -f core conftest.err conftest.$ac_objext \ +conftest$ac_exeext + if ${ac_cv_search_pthread_barrier_wait+:} false; then : + break +fi +done +if ${ac_cv_search_pthread_barrier_wait+:} false; then : + +else + ac_cv_search_pthread_barrier_wait=no +fi +rm conftest.$ac_ext +LIBS=$ac_func_search_save_LIBS +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_pthread_barrier_wait" >&5 +$as_echo "$ac_cv_search_pthread_barrier_wait" >&6; } +ac_res=$ac_cv_search_pthread_barrier_wait +if test "$ac_res" != no; then : + test "$ac_res" = "none required" || LIBS="$ac_res $LIBS" + +fi + # Solaris: { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5 $as_echo_n "checking for library containing fdatasync... " >&6; } @@ -15853,6 +15909,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "pthread_barrier_wait" "ac_cv_func_pthread_barrier_wait" +if test "x$ac_cv_func_pthread_barrier_wait" = xyes; then : + $as_echo "#define HAVE_PTHREAD_BARRIER_WAIT 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" pthread_barrier_wait.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS pthread_barrier_wait.$ac_objext" + ;; +esac + +fi + ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite" if test "x$ac_cv_func_pwrite" = xyes; then : $as_echo "#define HAVE_PWRITE 1" >>confdefs.h diff --git a/configure.ac b/configure.ac index a5ad072ee4..23ad861b27 100644 --- a/configure.ac +++ b/configure.ac @@ -1152,6 +1152,7 @@ AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt]) AC_SEARCH_LIBS(shm_open, rt) AC_SEARCH_LIBS(shm_unlink, rt) AC_SEARCH_LIBS(clock_gettime, [rt posix4]) +AC_SEARCH_LIBS(pthread_barrier_wait, pthread) # Solaris: AC_SEARCH_LIBS(fdatasync, [rt posix4]) # Required for thread_test.c on Solaris @@ -1734,6 +1735,7 @@ AC_REPLACE_FUNCS(m4_normalize([ mkdtemp pread preadv + pthread_barrier_wait pwrite pwritev random diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index a4a3f40048..0ac0e186ea 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -66,6 +66,7 @@ #include "libpq-fe.h" #include "pgbench.h" #include "portability/instr_time.h" +#include "port/pg_pthread.h" #ifndef M_PI #define M_PI 3.14159265358979323846 @@ -109,26 +110,6 @@ typedef struct socket_set #endif /* POLL_USING_SELECT */ -/* - * Multi-platform pthread implementations - */ - -#ifdef WIN32 -/* Use native win32 threads on Windows */ -typedef struct win32_pthread *pthread_t; -typedef int pthread_attr_t; - -static int pthread_create(pthread_t
Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Hello, In tablespace.c, a comment explains that DROP TABLESPACE can fail bogusly because of Windows file semantics: * XXX On Windows, an unlinked file persists in the directory listing * until no process retains an open handle for the file. The DDL * commands that schedule files for unlink send invalidation messages * directing other PostgreSQL processes to close the files. DROP * TABLESPACE should not give up on the tablespace becoming empty * until all relevant invalidation processing is complete. While trying to get the AIO patchset working on more operating systems, this turned out to be a problem. Andres mentioned the new ProcSignalBarrier stuff as a good way to tackle this, so I tried it and it seems to work well so far. The idea in this initial version is to tell every process in the cluster to close all fds, and then try again. That's a pretty large hammer, but it isn't reached on Unix, and with slightly more work it could be made to happen only after 2 failures on Windows. It was tempting to try to figure out how to use the sinval mechanism to close precisely the right files instead, but it doesn't look safe to run sinval at arbitrary CFI() points. It's easier to see that the pre-existing closeAllVfds() function has an effect that is local to fd.c and doesn't affect the VFDs or SMgrRelations, so any CFI() should be an OK time to run that. While reading the ProcSignalBarrier code, I couldn't resist replacing its poll/sleep loop with condition variables. Thoughts? From bf32e7426cf780d33ae1140c443ae8964397a3c9 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 30 Jan 2021 14:02:32 +1300 Subject: [PATCH 1/2] Use a global barrier to fix DROP TABLESPACE on Windows. Previously, it was possible for DROP TABLESPACE to fail because a table had been dropped, but other backends still had open file handles. Windows won't allow a directory containing unlinked-but-still-open files to be unlinked. Tackle this problem by forcing all backends to close all fds. No change for Unix systems, which didn't suffer from the problem, but simulate the need to do it in assertion builds just to exercise the code. XXX maybe only do this after 2nd failure? --- src/backend/commands/tablespace.c| 22 +++--- src/backend/storage/ipc/procsignal.c | 28 src/backend/storage/smgr/md.c| 6 ++ src/backend/storage/smgr/smgr.c | 6 ++ src/include/storage/md.h | 1 + src/include/storage/procsignal.h | 7 +-- src/include/storage/smgr.h | 1 + 7 files changed, 42 insertions(+), 29 deletions(-) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 69ea155d50..8853f1b658 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt) * but we can't tell them apart from important data files that we * mustn't delete. So instead, we force a checkpoint which will clean * out any lingering files, and try again. - * - * XXX On Windows, an unlinked file persists in the directory listing - * until no process retains an open handle for the file. The DDL - * commands that schedule files for unlink send invalidation messages - * directing other PostgreSQL processes to close the files. DROP - * TABLESPACE should not give up on the tablespace becoming empty - * until all relevant invalidation processing is complete. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); + /* + * On Windows, an unlinked file persists in the directory listing until + * no process retains an open handle for the file. The DDL + * commands that schedule files for unlink send invalidation messages + * directing other PostgreSQL processes to close the files, but nothing + * guarantees they'll be processed in time. So, we'll also use a + * global barrier to ask all backends to close all files, and wait + * until they're finished. + */ +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) + LWLockRelease(TablespaceCreateLock); + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); + LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE); +#endif + /* And now try again. */ if (!destroy_tablespace_directories(tablespaceoid, false)) { /* Still not empty, the files must be important then */ diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c43cdd685b..c7cce5967d 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -27,6 +27,7 @@ #include "storage/latch.h" #include "storage/proc.h" #include "storage/shmem.h" +#include "storage/smgr.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" @@ -98,7 +99,7 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL; static bool CheckProc
Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
> While reading the ProcSignalBarrier code, I couldn't resist replacing > its poll/sleep loop with condition variables. Oops, that version accidentally added and then removed an unnecessary change due to incorrect commit squashing. Here's a better pair of patches. From 6a3f396bac604aa0b96b79d75983e1dfa2302ea6 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 30 Jan 2021 14:02:32 +1300 Subject: [PATCH v2 1/2] Use a global barrier to fix DROP TABLESPACE on Windows. Previously, it was possible for DROP TABLESPACE to fail because a table had been dropped, but other backends still had open file handles. Windows won't allow a directory containing unlinked-but-still-open files to be unlinked. Tackle this problem by forcing all backends to close all fds. No change for Unix systems, which didn't suffer from the problem, but simulate the need to do it in assertion builds just to exercise the code. XXX maybe only do this after 2nd failure? --- src/backend/commands/tablespace.c| 22 +++--- src/backend/storage/ipc/procsignal.c | 22 ++ src/backend/storage/smgr/md.c| 6 ++ src/backend/storage/smgr/smgr.c | 6 ++ src/include/storage/md.h | 1 + src/include/storage/procsignal.h | 7 +-- src/include/storage/smgr.h | 1 + 7 files changed, 36 insertions(+), 29 deletions(-) diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 69ea155d50..8853f1b658 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt) * but we can't tell them apart from important data files that we * mustn't delete. So instead, we force a checkpoint which will clean * out any lingering files, and try again. - * - * XXX On Windows, an unlinked file persists in the directory listing - * until no process retains an open handle for the file. The DDL - * commands that schedule files for unlink send invalidation messages - * directing other PostgreSQL processes to close the files. DROP - * TABLESPACE should not give up on the tablespace becoming empty - * until all relevant invalidation processing is complete. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); + /* + * On Windows, an unlinked file persists in the directory listing until + * no process retains an open handle for the file. The DDL + * commands that schedule files for unlink send invalidation messages + * directing other PostgreSQL processes to close the files, but nothing + * guarantees they'll be processed in time. So, we'll also use a + * global barrier to ask all backends to close all files, and wait + * until they're finished. + */ +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) + LWLockRelease(TablespaceCreateLock); + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); + LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE); +#endif + /* And now try again. */ if (!destroy_tablespace_directories(tablespaceoid, false)) { /* Still not empty, the files must be important then */ diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c index c43cdd685b..30082d443f 100644 --- a/src/backend/storage/ipc/procsignal.c +++ b/src/backend/storage/ipc/procsignal.c @@ -27,6 +27,7 @@ #include "storage/latch.h" #include "storage/proc.h" #include "storage/shmem.h" +#include "storage/smgr.h" #include "storage/sinval.h" #include "tcop/tcopprot.h" @@ -98,7 +99,7 @@ static volatile ProcSignalSlot *MyProcSignalSlot = NULL; static bool CheckProcSignal(ProcSignalReason reason); static void CleanupProcSignalState(int status, Datum arg); static void ResetProcSignalBarrierBits(uint32 flags); -static bool ProcessBarrierPlaceholder(void); +static bool ProcessBarrierSmgrRelease(void); /* * ProcSignalShmemSize @@ -538,8 +539,8 @@ ProcessProcSignalBarrier(void) type = (ProcSignalBarrierType) pg_rightmost_one_pos32(flags); switch (type) { - case PROCSIGNAL_BARRIER_PLACEHOLDER: - processed = ProcessBarrierPlaceholder(); + case PROCSIGNAL_BARRIER_SMGRRELEASE: + processed = ProcessBarrierSmgrRelease(); break; } @@ -605,20 +606,9 @@ ResetProcSignalBarrierBits(uint32 flags) } static bool -ProcessBarrierPlaceholder(void) +ProcessBarrierSmgrRelease(void) { - /* - * XXX. This is just a placeholder until the first real user of this - * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to - * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something - * appropriately descriptive. Get rid of this function and instead have - * ProcessBarrierSomethingElse. Most likely, that function should live in - * the file pertaining to that subsystem, rather than here. - * - * The return value should be 'true' if the barrier was successfully - * absorbed and 'f
Re: Single transaction in the tablesync worker?
On Fri, Jan 29, 2021 at 4:07 PM Peter Smith wrote: > > > Differences from v21: > + Patch is rebased to latest OSS HEAD @ 29/Jan. > + Includes new code as suggested [ak0128] to ensure no dangling slots > at Drop/AlterSubscription. > + Removes the slot/origin cleanup down by process interrupt logic > (cleanup_at_shutdown function). > + Addresses some minor review comments. > I have made the below changes in the patch. Let me know what you think about these? 1. It was a bit difficult to understand the code in DropSubscription so I have rearranged the code to match the way we are doing in HEAD where we drop the slots at the end after finishing all the other cleanup. 2. In AlterSubscription_refresh(), we can't allow workers to be stopped at commit time as we have already dropped the slots because the worker can access the dropped slot. We need to stop the workers before dropping slots. This makes all the code related to logicalrep_worker_stop_at_commit redundant. 3. In AlterSubscription_refresh(), we need to acquire the lock on pg_subscription_rel only when we try to remove any subscription rel. 4. Added/Changed quite a few comments. -- With Regards, Amit Kapila. v23-0001-Tablesync-Solution1.patch Description: Binary data
Re: Key management with tests
Thanks Stephen, Bruce and Masahiko, > > discussions so far and the point behind the design so that everyone > > can understand why this feature is designed in that way. To do that, > > it might be a good start to sort the wiki page since it has data > > encryption part, KMS, and ToDo mixed. > > I hope it's pretty clear that I'm also very much in support of both this > effort with the KMS and of TDE in general- TDE is specifically, > repeatedly, called out as a capability whose lack is blocking PG from > being able to be used for certain use-cases that it would otherwise be > well suited for, and that's really unfortunate. > It is clear you are supportive. As you know, I share your point of view that PG adoption is suffering for certain use cases because it does not have TDE. I appreciate the recent discussion and reviews of the KMS in particular, > and of the patches which have been sent enabling TDE based on the KMS > patches. Having them be relatively independent seems to be an ongoing > concern and perhaps we should figure out a way to more clearly put them > together. That is- the KMS patches have been posted on one thread, and > TDE PoC patches which use the KMS patches have been on another thread, > leading some to not realize that there's already been TDE PoC work done > based on the KMS patches. Seems like it might make sense to get one > patch set which goes all the way from the KMS and includes the TDE PoC, > even if they don't all go in at once. > Sounds good, thanks Masahiko, let's see if we can get consensus on the approach for moving this forward see below. > > together, as a few on this thread have voiced, but there's no doubt that > this is a large project and it's hard to see how we could possibly > commit all of it at once. > I propose that we meet to discuss what approach we want to use to move TDE forward. We then start a new thread with a proposal on the approach and finalize it via community consensus. I will invite Bruce, Stephen and Masahiko to this meeting. If anybody else would like to participate in this discussion and subsequently in the effort to get TDE in PG1x, please let me know. Assuming Bruce, Stephen and Masahiko are down for this, I (or a volunteer from this meeting) will post the proposal for how we move this patch forward in another thread. Hopefully, we can get consensus on that and subsequently restart the execution of delivering this feature. > Thanks! > > Stephen > -- Thomas John Kincaid
Re: Should we make Bitmapsets a kind of Node?
Peter Geoghegan writes: > On Fri, Jan 29, 2021 at 6:44 PM Tom Lane wrote: >> Pointer width is interesting, but really it's a solved problem >> compared to these. > What about USE_FLOAT8_BYVAL? That's an annoyance, sure, but I don't recall many recent bugs related to violations of that coding rule. As long as you don't violate the Datum abstraction it's pretty safe. regards, tom lane
Re: Thoughts on "killed tuples" index hint bits support on standby
Hello, Peter. > Yeah, it would help a lot. But those bits are precious. So it makes > sense to think about what to do with both of them in index AMs at the > same time. Otherwise we risk missing some important opportunity. Hm. I was trying to "expand the scope" as you said and got an idea... Why we even should do any conflict resolution for hint bits? Or share precious LP bits? The only way standby could get an "invalid" hint bit - is an FPI from the primary. We could just use the current *btree_mask* infrastructure and clear all "probably invalid" bits from primary in case of standby while applying FPI (in `XLogReadBufferForRedoExtended`)! For binary compatibility, we could use one of `btpo_flags` bits to mark the page as "primary bits masked". The same way would work for hash\gist too. No conflicts, only LP_DEAD bit is used by standby, `btpo_flags` has many free bits for now, easy to implement, page content of primary\standby already differs in this bits... Looks like an easy and effective solution for me. What do you think? >> Also, btw, do you know any reason to keep minRecoveryPoint at a low value? > Not offhand. If so, looks like it is not a bad idea to move minRecoveryPoint forward from time to time (for more aggressive standby index hint bits). For each `xl_running_xacts` (about each 15s), for example. > BTW, what happens when the page splits on the primary, btw? Does your > patch "move over" the LP_DEAD bits to each half of the split? That part is not changed in any way. Thanks, Michail.
Re: Perform COPY FROM encoding conversions in larger chunks
On Thu, Jan 28, 2021 at 7:36 AM Heikki Linnakangas wrote: > > Even more surprising was that the second patch > (0002-Replace-pg_utf8_verifystr-with-a-faster-implementati.patch) > actually made things worse again. I thought it would give a modest gain, > but nope. Hmm, that surprised me too. > Based on these results, I'm going to commit the first patch, but not the > second one. There are much faster UTF-8 verification routines out there, > using SIMD instructions and whatnot, and we should consider adopting one > of those, but that's future work. I have something in mind for that. I took a look at v2, and for the first encoding I tried, it fails to report the error for invalid input: create database euctest WITH ENCODING 'EUC_CN' LC_COLLATE='zh_CN.eucCN' LC_CTYPE='zh_CN.eucCN' TEMPLATE=template0; \c euctest create table foo (a text); master: euctest=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> ä >> \. ERROR: character with byte sequence 0xc3 0xa4 in encoding "UTF8" has no equivalent in encoding "EUC_CN" CONTEXT: COPY foo, line 1 patch: euctest=# copy foo from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. >> ä >> \. COPY 0 euctest=# I believe the problem is in UtfToLocal(). I've attached a fix formatted as a text file to avoid confusing the cfbot. The fix keeps the debugging ereport() in case you find it useful. Some additional test coverage might be good here, but not sure how much work that would be. I didn't check any other conversions yet. v2-0002 seems fine to me, I just have cosmetic comments here: + * the same, no conversion is required by we must still validate that the s/by/but/ This comment in copyfrom_internal.h above the *StateData struct is the same as the corresponding one in copyto.c: * Multi-byte encodings: all supported client-side encodings encode multi-byte * characters by having the first byte's high bit set. Subsequent bytes of the * character can have the high bit not set. When scanning data in such an * encoding to look for a match to a single-byte (ie ASCII) character, we must * use the full pg_encoding_mblen() machinery to skip over multibyte * characters, else we might find a false match to a trailing byte. In * supported server encodings, there is no possibility of a false match, and * it's faster to make useless comparisons to trailing bytes than it is to * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true * when we have to do it the hard way. The references to pg_encoding_mblen() and encoding_embeds_ascii, are out of date for copy-from. I'm not sure the rest is relevant to copy-from anymore, either. Can you confirm? This comment inside the struct is now out of date as well: * Similarly, line_buf holds the whole input line being processed. The * input cycle is first to read the whole line into line_buf, convert it * to server encoding there, and then extract the individual attribute HEAD has this macro already: /* Shorthand for number of unconsumed bytes available in raw_buf */ #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index) It might make sense to create a CONVERSION_BUF_BYTES equivalent since the patch calculates cstate->conversion_buf_len - cstate->conversion_buf_index in a couple places. -- John Naylor EDB: http://www.enterprisedb.com diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c index 60dfebb0bd..1681efc7a0 100644 --- a/src/backend/commands/copyfromparse.c +++ b/src/backend/commands/copyfromparse.c @@ -397,6 +397,9 @@ CopyConvertBuf(CopyFromState cstate) (unsigned char *) cstate->raw_buf + cstate->raw_buf_len, dstlen, true); + +ereport(NOTICE, errmsg_internal("convertedbytes: %d", convertedbytes)); + if (convertedbytes == 0) { /* diff --git a/src/backend/utils/mb/conv.c b/src/backend/utils/mb/conv.c index b83358bc7a..26600e68ee 100644 --- a/src/backend/utils/mb/conv.c +++ b/src/backend/utils/mb/conv.c @@ -497,7 +497,7 @@ UtfToLocal(const unsigned char *utf, int len, int l; const pg_utf_to_local_combined *cp; const unsigned char *start = utf; - const unsigned char *cur = utf; + unsigned char *cur = unconstify(unsigned char *, utf); if (!PG_VALID_ENCODING(encoding)) ereport(ERROR, @@ -511,8 +511,6 @@ UtfToLocal(const unsigned char *utf, int len, unsigned char b3 = 0; unsigned char b4 = 0; - cur = iso; -
Re: Add primary keys to system catalogs
On 2021-01-21 18:15, Tom Lane wrote: After reading the patch again, I have a couple more nits about comments, which I'll just present as a proposed delta patch. Otherwise it's good. I'll mark it RFC. Committed with your update, thanks. -- Peter Eisentraut 2ndQuadrant, an EDB company https://www.2ndquadrant.com/
Re: WIP: BRIN multi-range indexes
On Tue, Jan 26, 2021 at 6:59 PM Tomas Vondra wrote: > > > > On 1/26/21 7:52 PM, John Naylor wrote: > > On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra > > mailto:tomas.von...@enterprisedb.com>> > > wrote: > > > Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in > > > replacement for minmax (essentially, using these opclasses as the > > > default ones, with the option to switch back to plain minmax). I'm not > > > convinced we should do that - though. Imagine you have minmax indexes in > > > your existing DB, it's working perfectly fine, and then we come and just > > > silently change that during dump/restore. Is there some past example > > > when we did something similar and it turned it to be OK? > > > > I was assuming pg_dump can be taught to insert explicit opclasses for > > minmax indexes, so that upgrade would not cause surprises. If that's > > true, only new indexes would have the different default opclass. > > > > Maybe, I suppose we could do that. But I always found such changes > happening silently in the background a bit suspicious, because it may be > quite confusing. I certainly wouldn't expect such difference between > creating a new index and index created by dump/restore. Did we do such > changes in the past? That might be a precedent, but I don't recall any > example ... I couldn't think of a comparable example either. It comes down to evaluating risk. On the one hand it's nice if users get an enhancement without having to know about it, on the other hand if there is some kind of noticeable regression, that's bad. -- John Naylor EDB: http://www.enterprisedb.com
Re: Allow matching whole DN from a client certificate
On 1/29/21 10:10 AM, Andrew Dunstan wrote: > On 1/28/21 5:10 PM, Andrew Dunstan wrote: >>> (I'd still recommend switching to use the RFC >>> flag to OpenSSL, to ease future improvements.) There should be a bunch >>> of warning documentation saying not to do anything more complex unless >>> you're an expert, and that the exact regular expression needed may >>> change depending on the TLS backend. >> I'll play around with the RFC flag. >> >> >>> If you want to add UTF-8 support to the above, so that things outside >>> ASCII can be matched nicely, then the ASN1_STRFLGS_ESC_MSB flag should >>> be removed from the _print_ex() call per OpenSSL documentation, and we >>> should investigate how it plays with other non-UTF-8 database >>> encodings. That seems like work but not a huge amount of it. >> How about if we remove ASN1_STRFLGS_ESC_MSB when the server encoding is >> UTF8? That should be an extremely simple change. >> >> >> > Of course, we don't have any idea what the database is at this stage, so > that wasn't well thought out. > > > I'm inclined at this stage just to turn on RFC2253. If someone wants to > deal with UTF8 (or other encodings) at a later stage they can. > > Here's a version of the patch that does it that way. For fun I have modified the certificate so it has two OU fields in the DN. Finding out how to generate the new cert without regenerating everything else was also fun :-) Here's what I did in a pristine source with patch applied, but where configure hasn't been run: cd src/test/ssl touch ../../Makefile.global rm -f ssl/client-dn.crt ssl/client-dn.key touch ssl/root_ca-certindex echo 01> ssl/root_ca.srl make ssl/client-dn.crt rm -rf ssl/*certindex* ssl/root_ca.srl ssl/new_certs_dir rm ../../Makefile.global Making incremental additions to the certificate set easier wouldn't be a bad thing. I wonder if we should really be setting 1 as the serial number, though. Might it not be better to use, say, `date +%Y%m%d01` rather like we do with catalog version numbers? cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c4b9971a20..948c4f8aab 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -596,7 +596,7 @@ hostnogssenc database user - In addition to the method-specific options listed below, there is one + In addition to the method-specific options listed below, there is a method-independent authentication option clientcert, which can be specified in any hostssl record. This option can be set to verify-ca or @@ -610,6 +610,19 @@ hostnogssenc database userhostssl entries. + + On any record using client certificate authentication, that is one + using the cert authentication method or one + using the clientcert option, you can specify + which part of the client certificate credentials to match using + the clientname option. This option can have one + of two values. If you specify clientname=CN, which + is the default, the username is matched against the certificate's + Common Name (CN). If instead you specify + clientname=DN the username is matched against the + entire Distinguished Name (DN) of the certificate. + This option is probably best used in comjunction with a username map. + diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 545635f41a..d2e4c0b8a8 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -2848,12 +2848,15 @@ static int CheckCertAuth(Port *port) { int status_check_usermap = STATUS_ERROR; + char *peer_username; Assert(port->ssl); /* Make sure we have received a username in the certificate */ - if (port->peer_cn == NULL || - strlen(port->peer_cn) <= 0) + peer_username = port->hba->clientcertname == clientCertCN ? port->peer_cn : port->peer_dn; + + if (peer_username == NULL || + strlen(peer_username) <= 0) { ereport(LOG, (errmsg("certificate authentication failed for user \"%s\": client certificate contains no user name", @@ -2861,8 +2864,8 @@ CheckCertAuth(Port *port) return STATUS_ERROR; } - /* Just pass the certificate cn to the usermap check */ - status_check_usermap = check_usermap(port->hba->usermap, port->user_name, port->peer_cn, false); + /* Just pass the certificate cn/dn to the usermap check */ + status_check_usermap = check_usermap(port->hba->usermap, port->user_name, peer_username, false); if (status_check_usermap != STATUS_OK) { /* @@ -2873,7 +2876,7 @@ CheckCertAuth(Port *port) if (port->hba->clientcert == clientCertFull && port->hba->auth_method != uaCert) { ereport(LOG, - (errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN mismatch", + (errmsg("certificate validation (clientcert=verify-full) failed for user \"%s\": CN/
Re: Is it worth accepting multiple CRLs?
On 2021-01-19 09:32, Kyotaro Horiguchi wrote: At Tue, 19 Jan 2021 09:17:34 +0900 (JST), Kyotaro Horiguchi wrote in By the way we can do the same thing on CA file/dir, but I personally think that the benefit from the specify-by-directory for CA files is far less than CRL files. So I'm not going to do this for CA files for now. This is it. A new guc ssl_crl_dir and connection option crldir are added. This looks pretty good to me overall. You need to update the expected result of the postgres_fdw test. Also check your patch for whitespace errors with git diff --check or similar. One problem raised upthread is the footprint for test is quite large because all certificate and key files are replaced by this patch. I think we can shrink the footprint by generating that files on-demand but that needs openssl frontend to be installed on the development environment. I don't understand why you need to recreate all these files. All your patch should contain are the new *.r0 files that are computed from the existing *.crl files. Nothing else should change, AIUI. Some of the makefile rules for generating the CRL files need some refinement. In +ssl/root+server-crldir: ssl/server.crl + mkdir ssl/root+server-crldir + cp ssl/server.crl ssl/root+server-crldir/`openssl crl -hash -noout -in ssl/server.crl`.r0 + cp ssl/root.crl ssl/root+server-crldir/`openssl crl -hash -noout -in ssl/root.crl`.r0 +ssl/root+client-crldir: ssl/client.crl + mkdir ssl/root+client-crldir + cp ssl/client.crl ssl/root+client-crldir/`openssl crl -hash -noout -in ssl/client.crl`.r0 + cp ssl/root.crl ssl/root+client-crldir/`openssl crl -hash -noout -in ssl/root.crl`.r0 the rules should also have a dependency on ssl/root.crl in addition to ssl/server.crl. By the way: - print $sslconf "ssl_crl_file='root+client.crl'\n"; + print $sslconf "ssl_crl_file='$crlfile'\n" if (defined $crlfile); + print $sslconf "ssl_crl_dir='$crldir'\n" if (defined $crldir); Trailing "if" doesn't need parentheses.
Re: Add primary keys to system catalogs
Peter Eisentraut writes: > Committed with your update, thanks. Hmm, shouldn't there have been a catversion bump in there? regards, tom lane
Re: [sqlsmith] Failed assertion during partition pruning
I wrote: > As I said, I'm now thinking it's not the Assert that's faulty. > If I'm right about that, it's likely that the mistaken labeling > of these paths has other consequences beyond triggering this > assertion. (If it has none, I think we'd be better off to remove > these Path fields altogether, and re-identify the parent rels > here from the RelOptInfo data.) After more time poking at this, I've concluded that my parenthetical remark is the right solution: [Merge]AppendPath.partitioned_rels ought to be nuked from orbit. It requires a whole lot of squirrely, now-known-to-be-buggy logic in allpaths.c, and practically the only place where we use the result is in make_partition_pruneinfo(). That means we're expending a lot of work *per AppendPath*, which is quite foolish unless there's some reason we can't get the information at create_plan() time instead. Which we can. For simplicity of review I divided the patch into two parts. 0001 revises make_partition_pruneinfo() and children to identify the relevant parent partitions for themselves, which is not too hard to do by chasing up the child-to-parent AppendRelInfo links. Not formerly documented, AFAICT, is that we want to collect just the parent partitions that are between the Append path's own rel (possibly equal to it) and the subpaths' rels. I'd first tried to code this by using the top_parent_relids and part_rels[] links in the RelOptInfos, but that turns out not to work. We might ascend to a top parent that's above the Append's rel (if considering an appendpath for a sub-partition, which happens in partitionwise join). We could also descend to a child at or below some subpath level, since there are also cases where subpaths correspond to unflattened non-leaf partitions. Either of those things result in failures. But once you wrap your head around handling those restrictions, it's quite simple. Then 0002 just nukes all the no-longer-necessary logic to compute the partitioned_rels fields. There are a couple of changes that are not quite trivial. create_[merge_]append_plan were testing best_path->partitioned_rels != NIL to shortcut calling make_partition_pruneinfo at all. I just dropped that, reasoning that it wasn't saving enough to worry about. Also, create_append_path was similarly testing partitioned_rels != NIL to decide if it needed to go to the trouble of using get_baserel_parampathinfo. I changed that to an IS_PARTITIONED_REL() test, which isn't *quite* the same thing but seems close enough. (It changes no regression results, anyway.) This fixes the cases reported by Andreas and Jaime, leaving me more confident that there's nothing wrong with David's Assert. I did not try to add a regression test case, mainly because it's not quite clear to me where the original bug is. (I'm a little suspicious that the blame might lie with the "match_partition_order" cases in generate_orderedappend_paths, which try to bypass accumulate_append_subpath without updating the partitioned_rels data. But I'm not excited about trying to prove that.) I wonder whether we should consider back-patching this. Another thing that seems unclear is whether there is any serious consequence to omitting some intermediate partitions from the set considered by make_partition_pruneinfo. Presumably it could lead to missing some potential run-time-pruning opportunities, but is there any worse issue? If there isn't, this is a bigger change than I want to put in the back braches. regards, tom lane diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c index 42c7c5f554..c08cfd 100644 --- a/src/backend/partitioning/partprune.c +++ b/src/backend/partitioning/partprune.c @@ -138,10 +138,12 @@ typedef struct PruneStepResult } PruneStepResult; +static List *add_part_rel_list(List *partrellists, List *partrellist); static List *make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel, + List *partrellist, + List *prunequal, int *relid_subplan_map, - Relids partrelids, List *prunequal, Bitmapset **matchedsubplans); static void gen_partprune_steps(RelOptInfo *rel, List *clauses, PartClauseTarget target, @@ -213,67 +215,103 @@ static void partkey_datum_from_expr(PartitionPruneContext *context, * * 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list * of scan paths for its child rels. - * - * 'partitioned_rels' is a List containing Lists of relids of partitioned - * tables (a/k/a non-leaf partitions) that are parents of some of the child - * rels. Here we attempt to populate the PartitionPruneInfo by adding a - * 'prune_infos' item for each sublist in the 'partitioned_rels' list. - * However, some of the sets of partitioned relations may not require any - * run-time pruning. In these cases we'll simply not include a 'prune_infos' - * item for that set and instead we'll add a
Re: shared tempfile was not removed on statement_timeout
On Wed, Jan 27, 2021 at 9:34 AM Thomas Munro wrote: > On Wed, Jan 27, 2021 at 12:22 AM Kyotaro Horiguchi > wrote: > > At Tue, 26 Jan 2021 11:00:56 +0200, Heikki Linnakangas > > wrote in > > > Don't we potentially have the same problem with all on_dsm_detach > > > callbacks? Looking at the other on_dsm_detach callbacks, I don't see > > > any CHECK_FOR_INTERRUPT() calls in them, but it seems fragile to > > > assume that. > > > > > > I'd suggest adding HOLD/RESUME_INTERRUPTS() to dsm_detach(). At least > > > around the removal of the callback from the list and calling the > > > callback. Maybe even over the whole function. > > > > Yes, I first came up with HOLD/RESUME_QUERY_INTERRUPTS() to the same > > location. > > +1, this seems like a good idea. This is a little bit like the code > near the comments "Don't joggle the elbow of proc_exit". So that gives a very simple back-patchable patch. From b27cbabee9a5980c8673c4fee4ea6f7e0c89bdbc Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sun, 31 Jan 2021 14:04:02 +1300 Subject: [PATCH] Hold interrupts while running dsm_detach() callbacks. While cleaning up after a parallel query or parallel index creation that created temporary files, we could be interrupted by a statement timeout. The error handling path would then fail to clean up the files too, because the callback was already popped off the list. Prevent this hazard by holding interrupts while the cleanup code runs. Thanks to Heikki Linnakangas for this suggestion. Back-patch to all supported releases. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20191212180506.gr2...@telsasoft.com --- src/backend/storage/ipc/dsm.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c index ae82b4bdc0..23bf192727 100644 --- a/src/backend/storage/ipc/dsm.c +++ b/src/backend/storage/ipc/dsm.c @@ -771,8 +771,10 @@ dsm_detach(dsm_segment *seg) /* * Invoke registered callbacks. Just in case one of those callbacks * throws a further error that brings us back here, pop the callback - * before invoking it, to avoid infinite error recursion. + * before invoking it, to avoid infinite error recursion. Don't allow + * interrupts to prevent cleanup from running to completion. */ + HOLD_INTERRUPTS(); while (!slist_is_empty(&seg->on_detach)) { slist_node *node; @@ -788,6 +790,7 @@ dsm_detach(dsm_segment *seg) function(seg, arg); } + RESUME_INTERRUPTS(); /* * Try to remove the mapping, if one exists. Normally, there will be, but -- 2.20.1
Re: Thoughts on "killed tuples" index hint bits support on standby
On Sat, Jan 30, 2021 at 9:11 AM Michail Nikolaev wrote: > > Yeah, it would help a lot. But those bits are precious. So it makes > > sense to think about what to do with both of them in index AMs at the > > same time. Otherwise we risk missing some important opportunity. > > Hm. I was trying to "expand the scope" as you said and got an idea... Why we > even should do any conflict resolution for hint bits? Or share precious LP > bits? What does it mean today when an LP_DEAD bit is set on a standby? I don't think that it means nothing at all -- at least not if you think about it in the most general terms. In a way it actually means something like "recently dead" even today, at least in one narrow specific sense: recovery may end, and then we actually *will* do *something* with the LP_DEAD bit, without directly concerning ourselves with when or how each LP_DEAD bit was set. During recovery, we will probably always have to consider the possibility that LP_DEAD bits that get set on the primary may be received by a replica through some implementation detail (e.g. LP_DEAD bits are set in FPIs we replay, maybe even some other thing that neither of us have thought of). We can't really mask LP_DEAD bits from the primary in recovery anyway, because of stuff like page-level checksums. I suspect that it just isn't useful to fight against that. These preexisting assumptions are baked into everything already. Why should we assume that *anybody* understands all of the ways in which that is true? Even today, "LP_DEAD actually means a limited kind of 'recently dead' during recovery + hot standby" is something that is true (as I just went into), but at the same time also has a fuzzy definition. My gut instinct is to be conservative about that. As I suggested earlier, you could invent some distinct kind of "recently dead" that achieves your goals in a way that is 100% orthogonal. The two unused LP dead bits (unused in indexes, though not tables) are only precious if we assume that somebody will eventually use them for something -- if nobody ever uses them then they're 100% useless. The number of possible projects that might end up using the two bits for something is not that high -- certainly not more than 3 or 4. Besides, it is always a good idea to keep the on-disk format as simple and explicit as possible -- it should be easy to analyze forensically in the event of bugs or some other kind of corruption, especially by using tools like amcheck. As I said in my earlier email, we can even play tricks during page deletion by treating certain kinds of "recently dead" bits as approximate things. As a further example, we can "rely" on the "dead-to-all but only on standby" bits when recovery ends, during a subsequent write transactions. We can do so by simply using them in _bt_deadblocks() as if they were LP_DEAD (we'll recheck heap tuples in heapam.c instead). > The only way standby could get an "invalid" hint bit - is an FPI from the > primary. We could just use the current *btree_mask* infrastructure and clear > all "probably invalid" bits from primary in case of standby while applying > FPI (in `XLogReadBufferForRedoExtended`)! I don't like that idea. Apart from what I said already, you're assuming that setting LP_DEAD bits in indexes on the primary won't eventually have some value on the standby after it is promoted and can accept writes -- they really do have meaning and value on standbys. Plus you'll introduce new overhead for this process during replay, which creates significant overhead -- often most leaf pages have some LP_DEAD bits set during recovery. > For binary compatibility, we could use one of `btpo_flags` bits to mark the > page as "primary bits masked". The same way would work for hash\gist too. I agree that there are plenty of btpo_flags bits. However, I have my doubts about this. Why shouldn't this break page-level checksums (or wal_log_hints) in some way? What about pg_rewind, some eventual implementation of incremental backups, etc? I suspect that it will be necessary to invent some entirely new concept that is like a hint bit, but works on standbys (without breaking anything else). > No conflicts, only LP_DEAD bit is used by standby, `btpo_flags` has many free > bits for now, easy to implement, page content of primary\standby already > differs in this bits... > Looks like an easy and effective solution for me. Note that the BTP_HAS_GARBAGE flag (which goes in btpo_flags) was deprecated in commit cf2acaf4. It was never a reliable indicator of whether or not some LP_DEAD bits are set in the page. And I think that it never could be made to work like that. > >> Also, btw, do you know any reason to keep minRecoveryPoint at a low value? > > Not offhand. > > If so, looks like it is not a bad idea to move minRecoveryPoint forward from > time to time (for more aggressive standby index hint bits). For each > `xl_running_xacts` (about each 15s), for example. It's necessary for recoverypoints
Re: shared tempfile was not removed on statement_timeout
Thomas Munro writes: >> +1, this seems like a good idea. This is a little bit like the code >> near the comments "Don't joggle the elbow of proc_exit". > So that gives a very simple back-patchable patch. Hmm, so is the *rest* of that function perfectly okay with being interrupted? regards, tom lane
Re: Is Recovery actually paused?
On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar wrote: > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > wrote: > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when > > > > > > > > > "is" exists > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > better > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > alone and add a new one with the name you suggest that returns > > > > > > > > text. > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > > > returns whether pause is requested or not? If so, it seems good to > > > > > modify > > > > > the documentation of this function in order to note that this could > > > > > not > > > > > return the actual pause state. > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > requested. I am changing that in my new patch. > > > > > > I have modified the patch, changes > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > the pause request state > > > - Now, we are not waiting for the recovery to actually get paused so I > > > think it doesn't make sense to put a lot of checkpoints to check the > > > pause requested so I have removed that check from the > > > recoveryApplyDelay but I think it better we still keep that check in > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > next wal get available. > > > > I think basically the check in WaitForWalToBecomeAvailable is independent > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > actual pause state. This function could just return 'pause requested' > > if a pause is requested during waiting for WAL. > > > > However, I agree the change to allow recovery to transit the state to > > 'paused' during WAL waiting because 'paused' has more useful information > > for users than 'pause requested'. Returning 'paused' lets users know > > clearly that no more WAL are applied until recovery is resumed. On the > > other hand, when 'pause requested' is returned, user can't say whether > > the next WAL wiill be applied or not from this information. > > > > For the same reason, I think it is also useful to call recoveryPausesHere > > in recoveryApplyDelay. > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > available so it can not be controlled by user so it is good to put a > check for the recovery pause, however recoveryApplyDelay wait for the > apply delay which is configured by user and it is predictable value by > the user. I don't have much objection to putting that check in the > recoveryApplyDelay as well but I feel it is not necessary. Any other > thoughts on this? > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused > > if a parameter value has a problem. However, pg_get_wal_replay_pause_state > > will return 'pause requested' in this case. So, I think, we should pass > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). > > Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change > this, thanks for noticing this. I have changed this in the new patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v9-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch Description: Binary data
Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
On Fri, Jan 22, 2021 at 12:32 PM James Hilliard wrote: > > Fixes: > gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement > -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security > -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -O2 > -I../../../../src/include -isysroot > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk > -c -o fd.o fd.c > fd.c:3661:10: warning: 'pwritev' is only available on macOS 11.0 or newer > [-Wunguarded-availability-new] > part = pg_pwritev(fd, iov, iovcnt, offset); >^~ > ../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro > 'pg_pwritev' >^~~ > /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include/sys/uio.h:104:9: > note: 'pwritev' has been marked as being introduced in macOS 11.0 > here, but the deployment target is macOS 10.15.0 > ssize_t pwritev(int, const struct iovec *, int, off_t) > __DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0), > watchos(7.0), tvos(14.0)); > ^ > fd.c:3661:10: note: enclose 'pwritev' in a __builtin_available check to > silence this warning > part = pg_pwritev(fd, iov, iovcnt, offset); >^~ > ../../../../src/include/port/pg_iovec.h:49:20: note: expanded from macro > 'pg_pwritev' >^~~ > 1 warning generated. > > This results in a runtime error: > running bootstrap script ... dyld: lazy symbol binding failed: Symbol not > found: _pwritev > Referenced from: /usr/local/pgsql/bin/postgres > Expected in: /usr/lib/libSystem.B.dylib > > dyld: Symbol not found: _pwritev > Referenced from: /usr/local/pgsql/bin/postgres > Expected in: /usr/lib/libSystem.B.dylib > > child process was terminated by signal 6: Abort trap: 6 > > To fix this we set -Werror=unguarded-availability-new so that a declaration > check for preadv/pwritev will fail if the symbol is unavailable on the > requested > SDK version. > --- > Changes v2 -> v3: > - Replace compile check with AC_CHECK_DECLS > - Fix preadv detection as well > Changes v1 -> v2: > - Add AC_LIBOBJ(pwritev) when pwritev not available > - set -Werror=unguarded-availability-new for CXX flags as well > --- > configure | 164 ++-- > configure.ac| 9 +- > src/include/pg_config.h.in | 14 +-- > src/include/port/pg_iovec.h | 4 +- > src/tools/msvc/Solution.pm | 4 +- > 5 files changed, 157 insertions(+), 38 deletions(-) > > diff --git a/configure b/configure > index 8af4b99021..07a9b08d80 100755 > --- a/configure > +++ b/configure > @@ -5373,6 +5373,98 @@ if test x"$pgac_cv_prog_CC_cflags__Werror_vla" = > x"yes"; then > fi > > > + # Prevent usage of symbols marked as newer than our target. > + > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports > -Werror=unguarded-availability-new, for CFLAGS" >&5 > +$as_echo_n "checking whether ${CC} supports > -Werror=unguarded-availability-new, for CFLAGS... " >&6; } > +if ${pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new+:} false; > then : > + $as_echo_n "(cached) " >&6 > +else > + pgac_save_CFLAGS=$CFLAGS > +pgac_save_CC=$CC > +CC=${CC} > +CFLAGS="${CFLAGS} -Werror=unguarded-availability-new" > +ac_save_c_werror_flag=$ac_c_werror_flag > +ac_c_werror_flag=yes > +cat confdefs.h - <<_ACEOF >conftest.$ac_ext > +/* end confdefs.h. */ > + > +int > +main () > +{ > + > + ; > + return 0; > +} > +_ACEOF > +if ac_fn_c_try_compile "$LINENO"; then : > + pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=yes > +else > + pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new=no > +fi > +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext > +ac_c_werror_flag=$ac_save_c_werror_flag > +CFLAGS="$pgac_save_CFLAGS" > +CC="$pgac_save_CC" > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: > $pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&5 > +$as_echo "$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" >&6; } > +if test x"$pgac_cv_prog_CC_cflags__Werror_unguarded_availability_new" = > x"yes"; then > + CFLAGS="${CFLAGS} -Werror=unguarded-availability-new" > +fi > + > + > + { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports > -Werror=unguarded-availability-new, for CXXFLAGS" >&5 > +$as_echo_n "checking whether ${CXX} supports > -Werror=unguarded-availability-new, for CXXFLAGS... " >&6; } > +if ${pgac_cv_prog_CXX_cxxflags__Werror_unguarded_availability_new+:} false; > then : > + $as_echo_n "(cached) " >&6 > +else > + pgac_save_CXXFLAGS=$CXXFLAGS > +pgac_save_CXX=$CXX > +CXX=${CXX} > +CXXFLAGS="${CXXFLAGS} -Werror=unguarded-availability-new" > +ac_save_cxx_werror_flag=$ac_cxx_werror_flag > +ac_cxx_werror_flag=yes > +ac_ext=cpp > +ac_cpp='$CXX