Re: Asynchronous Append on postgres_fdw nodes.
On Mon, Dec 14, 2020 at 4:01 PM Kyotaro Horiguchi wrote: > At Sat, 12 Dec 2020 18:25:57 +0900, Etsuro Fujita > wrote in > > On Fri, Nov 20, 2020 at 3:51 PM Kyotaro Horiguchi > > wrote: > > > The reason for > > > the early fetching is letting fdw send the next request as early as > > > possible. (However, I didn't measure the effect of the > > > nodeAppend-level prefetching.) > > > > I agree that that would lead to an improved efficiency in some cases, > > but I still think that that would be useless in some other cases like > > SELECT * FROM sharded_table LIMIT 1. Also, I think the situation > > would get worse if we support Append on top of joins or aggregates > > over ForeignScans, which would be more expensive to perform than these > > ForeignScans. > > I'm not sure which gain we weigh on, but if doing "LIMIT 1" on Append > for many times is more common than fetching all or "LIMIT multiples of fetch_size>", that discussion would be convincing... Is > it really the case? I don't have a clear answer for that... Performance in the case you mentioned would be improved by async execution without prefetching by Append, so it seemed reasonable to me to remove that prefetching to avoid unnecessary overheads in the case I mentioned. BUT: I started to think my proposal, which needs an additional FDW callback routine (ie, ForeignAsyncBegin()), might be a bad idea, because it would increase the burden on FDW authors. > > If we do prefetching, I think it would be better that it’s the > > responsibility of the FDW to do prefetching, and I think that that > > could be done by letting the FDW to start another data fetch, > > independently of the core, in the ForeignAsyncNotify callback routine, > > FDW does prefetching (if it means sending request to remote) in my > patch, so I agree to that. It suspect that you were intended to say > the opposite. The core (ExecAppendAsyncGetNext()) controls > prefetching in your patch. No. That function just tries to retrieve a tuple from any of the ready subplans (ie, subplans marked as as_needrequest). > > which I revived from Robert's original patch. I think that that would > > be more efficient, because the FDW would no longer need to wait until > > all buffered tuples are returned to the core. In the WIP patch, I > > I don't understand. My patch sends a prefetch-query as soon as all the > tuples of the last remote-request is stored into FDW storage. The > reason for removing ExecAsyncNotify() was it is just redundant as far > as concerning Append asynchrony. But I particulary oppose to revive > the function. Sorry, my explanation was not good, but what I'm saying here is about my patch, not your patch. I think this FDW callback routine would be useful; it allows an FDW to perform another asynchronous data fetch before delivering a tuple to the core as discussed in [1]. Also, it would be useful when extending to the case where we have intermediate nodes between an Append and a ForeignScan such as joins or aggregates, which I'll explain below. > > only allowed the callback routine to put the corresponding ForeignScan > > node into a state where it’s either ready for a new request or needing > > a callback for another data fetch, but I think we could probably relax > > the restriction so that the ForeignScan node can be put into another > > state where it’s ready for a new request while needing a callback for > > the prefetch. > > I don't understand this, too. ExecAsyncNotify() doesn't touch any of > the bitmaps, as_needrequest, callback_pending nor as_asyncpending in > your patch. Am I looking into something wrong? I'm looking > async-wip-2020-11-17.patch. In the WIP patch I post, these bitmaps are modified in the core side based on the callback_pending and request_complete flags in AsyncRequests returned from FDWs (See ExecAppendAsyncEventWait()). > (By the way, it is one of those that make the code hard to read to me > that the "callback" means "calling an API function". I think none of > them (ExecAsyncBegin, ExecAsyncRequest, ExecAsyncNotify) are a > "callback".) I thought the word “callback” was OK, because these functions would call the corresponding FDW callback routines, but I’ll revise the wording. > > The reason why I disabled async execution when executing EPQ is to > > avoid sending asynchronous queries to the remote sides, which would be > > useless, because scan tuples for an EPQ recheck are obtained in a > > dedicated way. > > If EPQ is performed onto Append, I think it should gain from > asynchronous execution since it is going to fetch *a* tuple from > several partitions or children. I believe EPQ doesn't contain Append > in major cases, though. (Or I didn't come up with the steps for the > case to happen...) Sorry, I don’t understand this part. Could you elaborate a bit more on it? > > What do you mean by "push-up style executor"? > > The reverse of the volcano-style executor, which enters from the > topmost node and down to the
Re: Asynchronous Append on postgres_fdw nodes.
On Mon, Dec 14, 2020 at 5:56 PM Kyotaro Horiguchi wrote: > At Sat, 12 Dec 2020 19:06:51 +0900, Etsuro Fujita > wrote in > > On Fri, Nov 20, 2020 at 8:16 PM Kyotaro Horiguchi > > wrote: > > > + /* wait or poll async events */ > > > + if (!bms_is_empty(node->as_asyncpending)) > > > + { > > > + Assert(!node->as_syncdone); > > > + Assert(bms_is_empty(node->as_needrequest)); > > > + ExecAppendAsyncEventWait(node); > > > > > > You moved the function to wait for events from execAsync to > > > nodeAppend. The former is a generic module that can be used from any > > > kind of executor nodes, but the latter is specialized for nodeAppend. > > > In other words, the abstraction level is lowered here. What is the > > > reason for the change? > > > > The reason is just because that function is only called from > > ExecAppend(). I put some functions only called from nodeAppend.c in > > execAsync.c, though. > > (I think) You told me that you preferred the genericity of the > original interface, but you're doing the opposite. If you think we > can move such a generic feature to a part of Append node, all other > features can be move the same way. I guess there's a reason you want > only the this specific feature out of all of them be Append-spcific > and I want to know that. The reason is that I’m thinking to add a small feature for multiplexing Append subplans, not a general feature for async execution as discussed in [1], because this would be an interim solution until the executor rewrite is done. > > I think that when an async-aware node gets a tuple from an > > async-capable node, they should use ExecAsyncRequest() / > > ExecAyncHogeResponse() rather than ExecProcNode() [1]. I modified the > > patch so that ExecAppendAsyncResponse() is called from Append, but to > > support bubbling up the plan tree discussed in [2], I think it should > > be called from ForeignScans (the sides of async-capable nodes). Am I > > right? Anyway, I’ll rename ExecAppendAyncResponse() to the one > > proposed in Robert’s original patch. > > Even though I understand the concept but to make work it we need to > remember the parent *async* node somewhere. In my faint memory the > very early patch did something like that. > > So I think just providing ExecAsyncResponse() doesn't make it > true. But if we make it true, it would be something like > partially-reversed steps from what the current Exec*()s do for some of > the existing nodes and further code is required for some other nodes > like WindowFunction. Bubbling up works only in very simple cases where > a returned tuple is thrown up to further parent as-is or at least when > the node convers a tuple into another shape. If an async-receiver node > wants to process multiple tuples from a child or from multiple > children, it is no longer be just a bubbling up. I explained the meaning of “bubbling up the plan tree” in a previous email I sent a moment ago. > And.. I think the reason I feel uneasy for the patch may be that the > patch uses the interface names in somewhat different context. > Origianlly the fraemework resides in-between executor nodes, not on a > node of either side. ExecAsyncNotify() notifies the requestee about an > event and ExecAsyncResonse() notifies the requestor about a new > tuple. I don't feel strangeness in this usage. But this patch feels to > me using the same names in different (and somewhat wrong) context. Sorry, this is a WIP patch. Will fix. > > > + /* Perform the actual callback. */ > > > + ExecAsyncNotify(areq); > > > > > > Mmm. The usage of the function (or its name) looks completely reverse > > > to me. > > As mentioned in a previous email, this is an FDW callback routine > > revived from Robert’s patch. I think the naming is reasonable, > > because the callback routine notifies the FDW of readiness of a file > > descriptor. And actually, the callback routine tells the core whether > > the corresponding ForeignScan node is ready for a new request or not, > > by setting the callback_pending flag accordingly. > > Hmm. Agreed. The word "callback" is also used there [3]... I > remember and it seems reasonable that the core calls AsyncNotify() on > FDW and the FDW calls ExecForeignScan as a response to it and notify > back to core of that using ExecAsyncRequestDone(). But the patch here > feels a little strange, or uneasy, to me. I’m not sure what I should do to improve the patch. Could you elaborate a bit more on this part? > > > postgres_fdw.c > > > > > > > postgresIterateForeignScan(ForeignScanState *node) > > > > { > > > > PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state; > > > > TupleTableSlot *slot = node->ss.ss_ScanTupleSlot; > > > > > > > > /* > > > >* If this is the first call after Begin or ReScan, we need to > > > > create the > > > >
Re: Proposed patch for key managment
Hi Stephen On Fri, 18 Dec 2020 at 21:36, Stephen Frost wrote: > > Greetings Alastair, > > * Alastair Turner (min...@decodable.me) wrote: > > On Wed, 16 Dec 2020 at 22:43, Stephen Frost wrote: ... > > passphrase key wrapper, the secret store and the cloud/HW KMS. > > > > Since the examples expand the purpose of cluster_passphrase_command, > > let's call it cluster_key_challenge_command in the examples. > > These ideas don't seem too bad but I'd think we'd pass which key we want > on the command-line using a %i or something like that, rather than using > stdin, unless there's some reason that would be an issue..? Certainly > the CLI utilities I've seen tend to expect the name of the secret that > you're asking for to be passed on the command line. Agreed. I was working on the assumption that the calling process (initdb or pg_ctl) would have access to the challenge material and be passing it to the utility, so putting it in a command line would allow it to leak. If the utility is accessing the challenge material directly, then just an indicator of which key to work with would be a lot simpler, true. > > > Starting with the passphrase key wrapper, since it's what's in place now. > > > > - cluster_key_challenge_command = 'password_key_wrapper %q' > > I do tend to like this idea of having what > cluster_key_challenge_command, or whatever we call it, expects is an > actual key and have the command that is run be a separate command that > takes the passphrase and runs the KDF (key derivation function) on it, > when a passphrase is what the user wishes to use. > > That generally makes more sense to me than having the key derivation > effort built into the backend which I have a hard time seeing any > particular reason for, as long we're already calling out to some > external utility of some kind to get the key. > ... > > With the thought of trying to keep it a reasonably simple interface, I > had a thought along these lines: > > - Separate command that runs the KDF, this is simple enough as it's > really just a hash, and it returns the key on stdout. > - initdb time option that says if we're going to have PG manage the > sub-keys, or not. > - cluster_key_command defined as "a command that is passed the ID of > the key, or keys, required for the cluster to start" > > initdb --pg-subkeys > - Calls cluster_key_command once with "$PG_SYSTEM_ID-main" or similar > and expects the main key to be provided on stdout. Everything else > is then more-or-less as is today: PG generates DEK sub-keys for data > and WAL and then encrypts them with the main key and stores them. > > As long as the enveloped keys file exists on the filesystem, when PG > starts, it'll call the cluster_key_command and will expect the 'main' > key to be provided and it'll then decrypt and verify the DEK sub-keys, > very similar to today. > > In this scenario, cluster_key_command might be set to call a command > which accepts a passphrase and runs a KDF on it, or it might be set to > call out to an external vaulting system or to a Yubikey, etc. > > initdb --no-pg-subkeys > - Calls cluster_key_command for each of the sub-keys that "pg-subkeys" > would normally generate itself, passing "$PG_SYSTEM_ID-keyid" for > each (eg: $PG_SYSTEM_ID-data, $PG_SYSTEM_ID-wal), and getting back > the keys on stdout to use. > > When PG starts, it sees that the enveloped keys file doesn't exist and > does the same as initdb did- calls cluster_key_command multiple times to > get the keys which are needed. We'd want to make sure to have a way > early on that checks that the data DEK provided actually decrypts the > cluster, and bail out otherwise, before actually writing any data with > that key. I'll note though that this approach would actually allow you > to change the WAL key, if you wanted to, though, which could certainly > be nice (naturally there would be various caveats about doing so and > that replicas would have to also be switched to the new key, etc, but > that all seems reasonably solvable). Having a stand-alone utility that > could do that for the --pg-subkeys case would be useful too (and just > generally decrypt it for viewing/backup/replacement/etc). > > Down the line this might even allow us to do an online re-keying, at > least once the challenges around enabling online data checksums are > sorted out, but I don't think that's something to worry about today. > Still, it seems like this potentially provides a pretty clean interface > for that to happen eventually. > Changing the WAL key independently of the local storage key is the big operational advantage I see of managing the two keys separately. It lays the basis for a full key rotation story. Rotating WAL keys during switchover between a pair with the new key applicable from a certain timeline number maye be close enough to online to satisfy most requirements. Not something to worry about today, as you say. > > Right, I think we can agree on this aspect an
Re: Proposed patch for key managment
Hi Bruce On Sat, 19 Dec 2020 at 02:38, Bruce Momjian wrote: > > I am not going be as kind. Our workflow is: > > Desirability -> Design -> Implement -> Test -> Review -> Commit > https://wiki.postgresql.org/wiki/Todo#Development_Process > > I have already asked about the first item, and received a reply talking > about the design --- that is not helpful. I only have so much patience > for the "I want my secret decoder ring to glow in the dark" type of > feature additions to this already complex feature. Unless we stay on > Desirability, I will not be replying. If you can't answer the > Desirability question, well, talking about items farther right on the > list is not very helpful. > > Now, I will say that your question about how a KMS will use this got me > thinking about how to test this, and that got me to implement the AWS > Secret Manager script, so that we definitely helpful. My point is that > I don't think it is helpful to get into long discussions unless the > Desirability is clearly answered. This is not just related to this > thread --- this kind of jump-over-Desirability has happened a lot in > relation to this feature, so I thought it would be good to clearly state > it now. > Sorry, I have waved Desirability through under the headings of ease of adoption or not raising barriers to adoption, without detailing what barriers I see or how to avoid them. I also realise that "don't scare the users" is so open-ended so as to be actively unhelpful and very quickly starts to sound like "I want my secret decoder ring to glow pink, or blue, or green when anyone asks". Given the complexity of the feature and pixels spilled in discussing it, I understand that it gets frustrating. That said, I believe there is an important case to be made here. In summary, I believe that forcing an approach for key generation and wrapping onto users of Cluster File Encryption limits the Desirability of the feature. Cluster File Encryption for Postgres would be Desirable to many users I meet if, and only if, the generation and handling of keys fits with their corporate policies. Therefore, giving a user the option to pass an encryption key to Postgres for CFE is Desirable. I realise that the option for feeding the keys in directly is an additional feature, and that it has a user experience impact if a passphrase is used. It is, however, a feature which opens up near-infinite flexibility. To stretch the analogy, it is the clip for attaching coloured or opaque coverings to the glowy bit on the secret decoder ring. The generation of keys and the wrapping of keys are contentious issues and are frequently addressed/specified in corporate security policies, standards and technical papers (NIST 800-38F is often mentioned in product docs). There are, for instance, policies in the wild which require that keys for long term use are generated from the output of a True Random Number Generator - the practical implication being that the key used to encrypt data at rest should be created by an HSM. When and why to use symmetric or asymmetric cyphers for key wrapping is another item which different organisations have different policies on - the hardware and cloud service vendors all offer both options, even if they recommend one and make it easier to use. Thanks Alastair
Re: Double partition lock in bufmgr
On 19.12.2020 10:53, Zhihong Yu wrote: Hi, w.r.t. the code in BufferAlloc(), the pointers are compared. Should we instead compare the tranche Id of the two LWLock ? Cheers As far as LWlocks are stored in the array, comparing indexes in this array (tranche Id) is equivalent to comparing element's pointers. So I do not see any problem here. Just as experiment I tried a version of BufferAlloc without double locking (patch is attached). I am not absolutely sure that my patch is correct: my main intention was to estimate influence of this buffer reassignment on performance. I just run standard pgbench for database with scale 100 and default shared buffers size (256Mb). So there are should be a lot of page replacements. I do not see any noticeable difference: vanilla: 13087.596845 patch: 13184.442130 diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index ad0d1a9..91eb93d 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -70,6 +70,8 @@ #define RELS_BSEARCH_THRESHOLD 20 +#define NO_DOUBLE_BUFFER_LOCK 1 + typedef struct PrivateRefCountEntry { Buffer buffer; @@ -197,6 +199,7 @@ static PrivateRefCountEntry *NewPrivateRefCountEntry(Buffer buffer); static PrivateRefCountEntry *GetPrivateRefCountEntry(Buffer buffer, bool do_move); static inline int32 GetPrivateRefCount(Buffer buffer); static void ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref); +static void InvalidateBuffer(BufferDesc *buf); /* * Ensure that the PrivateRefCountArray has sufficient space to store one more @@ -1093,12 +1096,22 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0); +#if NO_DOUBLE_BUFFER_LOCK + if (buf_state & BM_TAG_VALID) + { + InvalidateBuffer(buf); /* releases spinlock */ + continue; + } +#endif /* Must copy buffer flags while we still hold the spinlock */ oldFlags = buf_state & BUF_FLAG_MASK; /* Pin the buffer and then release the buffer spinlock */ PinBuffer_Locked(buf); +#if NO_DOUBLE_BUFFER_LOCK + Assert(!(oldFlags & (BM_DIRTY|BM_TAG_VALID))); +#else /* * If the buffer was dirty, try to write it out. There is a race * condition here, in that someone might dirty it after we released it @@ -1216,6 +1229,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } } else +#endif { /* if it wasn't valid, we need only the new partition */ LWLockAcquire(newPartitionLock, LW_EXCLUSIVE);
Re: multi-install PostgresNode
On 12/17/20 7:55 PM, Michael Paquier wrote: > On Thu, Dec 17, 2020 at 04:37:54PM -0500, Andrew Dunstan wrote: >> The proposed module would look something like this: >> >> [...] >> >> use parent PostgresNode; >> >> sub get_new_node >> { >> my $installpath= shift; >> my $node = PostgresNode::get_new_node(@_); >> bless $node; # re-bless into current class >> $node->{_installpath} = $installpath; >> return $node; >> } > Passing down the installpath as argument and saving it within a > PostgresNode or child class looks like the correct way of doing things > to me. This would require an extra routine to be able to get the > install path from a node as _installpath would remain internal to the > module file, right? Shouldn't it be something that ought to be > directly part of PostgresNode actually, where we could enforce the lib > and bin paths to the output of pg_config if an _installpath is not > provided by the caller? In short, I am not sure that we need an extra > module here. > >> and then for each class method in PostgresNode.pm we'd have an override >> something like: >> >> sub foo >> { >> my $node=shift; >> my $inst = $node->{_installpath}; >> local %ENV = %ENV; >> $ENV{PATH} = "$inst/bin:$ENV{PATH}"; >> $ENV{LD_LIBRARY_PATH} = "$inst/lib:$ENV{LD_LIBRARY_PATH}"; >> $node->SUPER::foo(@_); >> } >> >> There might be more elegant ways of doing this, but that's what I came >> up with. > As long as it does not become necessary to pass down _installpath to > all indidivual binary calls we have in PostgresNode or the extra > module, this gets a +1 from me. So, if I am getting that right, the > key point is the use of local %ENV here to make sure that PATH and > LD_LIBRARY_PATH are only enforced when it comes to calls within > PostgresNode.pm, right? That's an elegant solution. This is > something I have wanted for a long time for pg_upgrade to be able to > get rid of its test.sh. > >> My main question is: do we want something like this in the core code >> (presumably in src/test/perl), or is it not of sufficiently general >> interest? If it's wanted I'll submit a patch, probably for the March CF, >> but January if I manage to get my running shoes on. If not, I'll put it >> in the buildfarm code, but then any TAP tests that want it will likewise >> need to live there. > This facility gives us the possibility to clean up the test code of > pg_upgrade and move it to a TAP test, so I'd say that it is worth > having in the core code in the long-term. This turns out to be remarkably short, with the use of a little eval magic. Give the attached, this test program works just fine: #!/bin/perl use PostgresNodePath; $ENV{PG_REGRESS} = '/home/andrew/pgl/vpath.12/src/test/regress/pg_regress'; my $node = get_new_node('/home/andrew/pgl/inst.12.5711','blurfl'); print $node->info; print $node->connstr(),"\n"; $node->init(); cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com PostgresNodePath.pm Description: Perl program
Re: Proposed patch for key managment
On Sat, Dec 19, 2020 at 11:45:15AM +, Alastair Turner wrote: > Sorry, I have waved Desirability through under the headings of ease of > adoption or not raising barriers to adoption, without detailing what > barriers I see or how to avoid them. I also realise that "don't scare > the users" is so open-ended so as to be actively unhelpful and very > quickly starts to sound like "I want my secret decoder ring to glow > pink, or blue, or green when anyone asks". Given the complexity of the > feature and pixels spilled in discussing it, I understand that it gets > frustrating. That said, I believe there is an important case to be > made here. I am pleased you understood my feelings on this. Our last big discussion on this topic is here: https://www.postgresql.org/message-id/flat/CA%2Bfd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD%3D54Zp6NBQTCQ%40mail.gmail.com and it was so unproductive that we started having closed voice calls every other Friday so we could discuss this without lots of "decoder ring" ideas that had to be explained. The result is our wiki page: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption It has taken me a while to understand why this topic seems to almost uniquely gravitate toward "decoder ring" discussion. > In summary, I believe that forcing an approach for key generation and > wrapping onto users of Cluster File Encryption limits the Desirability > of the feature. > > Cluster File Encryption for Postgres would be Desirable to many users > I meet if, and only if, the generation and handling of keys fits with > their corporate policies. Therefore, giving a user the option to pass > an encryption key to Postgres for CFE is Desirable. I realise that the > option for feeding the keys in directly is an additional feature, and > that it has a user experience impact if a passphrase is used. It is, > however, a feature which opens up near-infinite flexibility. To > stretch the analogy, it is the clip for attaching coloured or opaque > coverings to the glowy bit on the secret decoder ring. > > The generation of keys and the wrapping of keys are contentious issues > and are frequently addressed/specified in corporate security policies, > standards and technical papers (NIST 800-38F is often mentioned in > product docs). There are, for instance, policies in the wild which > require that keys for long term use are generated from the output of a > True Random Number Generator - the practical implication being that > the key used to encrypt data at rest should be created by an HSM. When > and why to use symmetric or asymmetric cyphers for key wrapping is > another item which different organisations have different policies on > - the hardware and cloud service vendors all offer both options, even > if they recommend one and make it easier to use. To enable the direct injection of keys into the server, we would need a new command for this, since trying to make the passphrase command do this will lead to unnecessary complexity. The passphrase command should do one thing, and you can't have it changing its behavior based on parameters, since the parameters are for the script to process, not to change behavior of the server. If we wanted to do this, we would need a new command, and one of the parameters would be %k or something that would identify the key number we want to retrieve from the KMS. Stephen pointed out that we could still validate that key; the key would not be stored wrapped in the file system, but we could store an HMAC in the file system, and use that for validation. On other interesting approach would be to allow the server to call out for a KMS when it needs to generate the initial data keys that are wrapped by the passphrase; this is in contrast to calling the KMS everytime it needs the data keys. We should create code for the general use-case, which is currently passphrase-wrapped system-generated data keys, and then get feedback on what else we need. However, I should point out that the community is somewhat immune to the "my company needs this" kind of argument without logic to back it up, though sometimes I think this entire feature is in that category. The data keys are generated using this random value code: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/port/pg_strong_random.c;hb=HEAD so someone would have to explain why generating this remotely in a KMS is superior, not just based on company policy. My final point is that we can find ways to do what you are suggesting as an addition to what we are adding now. What we need is clear justification of why these additional features are needed. Requiring the use of a true random number generator is a valid argument, but we need to figure out, once this is implemented, who really wants that, and how to implement it cleanly. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cu
RE: libpq @windows : leaked singlethread_lock makes AppVerifier unhappy
Hi Here is my workaround (from unit_tests.dll DLL_PROCESS_DETACH): //3. Destroy LIBPQ!static pthread_mutex_t singlethread_lock - 327 HMODULE hLeakedLibPQ = ::GetModuleHandleA("libpq.dll"); //libpq.dll v13.0.1.20323 (https://ftp.postgresql.org/pub/odbc/versions/msi/psqlodbc_13_00_.zip) - 328 if (hLeakedLibPQ) { - 329 void **singlethread_lock_ptr = (void **)(((BYTE *)hLeakedLibPQ) + - 330 #ifdef _WIN64 - 331 0x484b8 - 332 #else - 333 0x3F26C - 334 #endif //_WIN64 - 335 ); - 336 if (*singlethread_lock_ptr) { - 337 DeleteCriticalSection((LPCRITICAL_SECTION)(*singlethread_lock_ptr)); - 338 typedef void(*pthread_mutex_destroy)(void *mutex); - 339 pthread_mutex_destroy freemtx = (pthread_mutex_destroy)::GetProcAddress(hLeakedLibPQ, "PQfreemem"); - 340 assert(freemtx != NULL); - 341 if (freemtx) freemtx(*singlethread_lock_ptr); - 342 } - 343 }
v10 release notes for extended stats
2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients 2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats 2017-05-12 [bc085205c] Change CREATE STATISTICS syntax The existing notes say: |Add multi-column optimizer statistics to compute the correlation ratio and number of distinct values (Tomas Vondra, David Rowley, Álvaro Herrera) |New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. |This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. "correlation ratio" is referring to stxkind=d (dependencies), right ? That's very unclear. "helpful in estimating query memory usage": I guess it means that this allows the planner to correctly account for large vs small number of GROUP BY values, but it sounds more like it's going to help a user to estimate memory use. "when combining the statistics from individual columns." this is referring to stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user to know that. Also, maybe it should say "combining stats from columns OF THE SAME TABLE". So I propose: |Allow creation of multi-column statistics objects, for computing the |dependencies between columns and number of distinct values of combinations of columns |(Tomas Vondra, |David Rowley, Álvaro Herrera) |The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. |Improved statistics allow the planner to generate better query plans with more accurate |estimates of the row count and memory usage when grouping by multiple |columns, and more accurate estimates of the row count if WHERE clauses apply |to multiple columns and values of some columns are correlated with values of |other columns.
Re: v10 release notes for extended stats
On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote: > 2017-03-24 [7b504eb28] Implement multivariate n-distinct coefficients > 2017-04-05 [2686ee1b7] Collect and use multi-column dependency stats > 2017-05-12 [bc085205c] Change CREATE STATISTICS syntax > > The existing notes say: > |Add multi-column optimizer statistics to compute the correlation ratio and > number of distinct values (Tomas Vondra, David Rowley, Álvaro Herrera) > |New commands are CREATE STATISTICS, ALTER STATISTICS, and DROP STATISTICS. > |This feature is helpful in estimating query memory usage and when combining > the statistics from individual columns. > > "correlation ratio" is referring to stxkind=d (dependencies), right ? That's > very unclear. > > "helpful in estimating query memory usage": I guess it means that this allows > the planner to correctly account for large vs small number of GROUP BY values, > but it sounds more like it's going to help a user to estimate memory use. > > "when combining the statistics from individual columns." this is referring to > stxkind=d, handling correlated/redundant clauses, but it'd be hard for a user > to know that. > > Also, maybe it should say "combining stats from columns OF THE SAME TABLE". > > So I propose: > |Allow creation of multi-column statistics objects, for computing the > |dependencies between columns and number of distinct values of combinations > of columns > |(Tomas Vondra, |David Rowley, Álvaro Herrera) > |The new commands are CREATE STATISTICS, ALTER STATISTICS, and DROP > STATISTICS. > |Improved statistics allow the planner to generate better query plans with > more accurate > |estimates of the row count and memory usage when grouping by multiple > |columns, and more accurate estimates of the row count if WHERE clauses apply > |to multiple columns and values of some columns are correlated with values of > |other columns. Uh, at the time, that was the best text we could come up with. We don't usually go back to update them unless there is a very good reason, and I am not seeing that above. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Proposed patch for key managment
On Sat, Dec 19, 2020 at 11:58:37AM -0500, Bruce Momjian wrote: > My final point is that we can find ways to do what you are suggesting as > an addition to what we are adding now. What we need is clear > justification of why these additional features are needed. Requiring > the use of a true random number generator is a valid argument, but we > need to figure out, once this is implemented, who really wants that, and > how to implement it cleanly. I added a comment to this script to explain how to generate a true random passphrase, attached. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee pass_fd.sh Description: Bourne shell script
Re: Weird special case in jsonb_concat()
I wrote: > However, further experimentation found a case that fails: > regression=# select '3'::jsonb || '{}'::jsonb; > ERROR: invalid concatenation of jsonb objects > I wonder what is the point of this weird exception, and whether > whoever devised it can provide a concise explanation of what > they think the full behavior of "jsonb || jsonb" is. Why isn't > '[3, {}]' a reasonable result here, if the cases above are OK? Here is a proposed patch for that. It turns out that the third else-branch in IteratorConcat() already does the right thing, if we just remove its restrictive else-condition and let it handle everything except the two-objects and two-arrays cases. But it seemed to me that trying to handle both the object || array and array || object cases in that one else-branch was poorly thought out: only one line of code can actually be shared, and it took several extra lines of infrastructure to support the sharing. So I split those cases into separate else-branches. This also addresses the inadequate documentation that was the original complaint. Thoughts? Should we back-patch this? The existing behavior seems to me to be inconsistent enough to be arguably a bug, but we've not had field complaints saying "this should work". regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index df29af6371..5f3c393a25 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14715,8 +14715,12 @@ table2-mapping Concatenates two jsonb values. -Concatenating two objects generates an object with the union of their +Concatenating two arrays generates an array containing all the +elements of each input. Concatenating two objects generates an +object containing the union of their keys, taking the second object's value when there are duplicate keys. +All other cases are treated by converting a non-array input into a +single-element array, and then proceeding as for two arrays. Does not operate recursively: only the top-level array or object structure is merged. @@ -14727,6 +14731,14 @@ table2-mapping '{"a": "b"}'::jsonb || '{"c": "d"}'::jsonb {"a": "b", "c": "d"} + + +'[1, 2]'::jsonb || '3'::jsonb +[1, 2, 3] + + +'{"a": "b"}'::jsonb || '42'::jsonb +[{"a": "b"}, 42] diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 7a25415078..5beade00f4 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -4690,11 +4690,14 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2, rk2 = JsonbIteratorNext(it2, &v2, false); /* - * Both elements are objects. + * JsonbIteratorNext reports raw scalars as if they were single-element + * arrays; hence we only need consider "object" and "array" cases here. */ if (rk1 == WJB_BEGIN_OBJECT && rk2 == WJB_BEGIN_OBJECT) { /* + * Both elements are objects. + * * Append all the tokens from v1 to res, except last WJB_END_OBJECT * (because res will not be finished yet). */ @@ -4703,18 +4706,18 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2, pushJsonbValue(state, r1, &v1); /* - * Append all the tokens from v2 to res, include last WJB_END_OBJECT - * (the concatenation will be completed). + * Append all the tokens from v2 to res, including last WJB_END_OBJECT + * (the concatenation will be completed). Any duplicate keys will + * automatically override the value from the first object. */ while ((r2 = JsonbIteratorNext(it2, &v2, true)) != WJB_DONE) res = pushJsonbValue(state, r2, r2 != WJB_END_OBJECT ? &v2 : NULL); } - - /* - * Both elements are arrays (either can be scalar). - */ else if (rk1 == WJB_BEGIN_ARRAY && rk2 == WJB_BEGIN_ARRAY) { + /* + * Both elements are arrays. + */ pushJsonbValue(state, rk1, NULL); while ((r1 = JsonbIteratorNext(it1, &v1, true)) != WJB_END_ARRAY) @@ -4731,46 +4734,40 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2, res = pushJsonbValue(state, WJB_END_ARRAY, NULL /* signal to sort */ ); } - /* have we got array || object or object || array? */ - else if (((rk1 == WJB_BEGIN_ARRAY && !(*it1)->isScalar) && rk2 == WJB_BEGIN_OBJECT) || - (rk1 == WJB_BEGIN_OBJECT && (rk2 == WJB_BEGIN_ARRAY && !(*it2)->isScalar))) + else if (rk1 == WJB_BEGIN_OBJECT) { - JsonbIterator **it_array = rk1 == WJB_BEGIN_ARRAY ? it1 : it2; - JsonbIterator **it_object = rk1 == WJB_BEGIN_OBJECT ? it1 : it2; - bool prepend = (rk1 == WJB_BEGIN_OBJECT); + /* + * We have object || array. + */ + Assert(rk2 == WJB_BEGIN_ARRAY); pushJsonbValue(state, WJB_BEGIN_ARRAY, NULL); - if (prepend) - { - pushJsonbValue(state, WJB_BEGIN_OBJECT, NULL); - while ((r1 = JsonbIteratorNext(it_object, &v1, true)) != WJB_DONE) -pushJs
Re: v10 release notes for extended stats
Bruce Momjian writes: > On Sat, Dec 19, 2020 at 01:39:27PM -0600, Justin Pryzby wrote: >> So I propose: > Uh, at the time, that was the best text we could come up with. We don't > usually go back to update them unless there is a very good reason, and I > am not seeing that above. Yeah, it's a couple years too late to be worth spending effort on improving the v10 notes, I fear. If there's text in the main documentation that could be improved, that's a different story. regards, tom lane
pg_preadv() and pg_pwritev()
Hello hackers, I want to be able to do synchronous vectored file I/O, so I made wrapper macros for preadv() and pwritev() with fallbacks for systems that don't have them. Following the precedent of the pg_pread() and pg_pwrite() macros, the "pg_" prefix reflects a subtle contract change: the fallback paths might have the side effect of changing the file position. They're non-standard system calls, but the BSDs and Linux have had them for a long time, and for other systems we can use POSIX readv()/writev() with an additional lseek(). The worst case is Windows (and maybe our favourite antique Unix build farm animal?) which has none of those things, so there is a further fallback to a loop. Windows does have ReadFileScatter() and WriteFileGather(), but those only work for overlapped (= asynchronous), unbuffered, page aligned access. They'll very likely be useful for native AIO+DIO support in the future, but don't fit the bill here. This is part of a project to consolidate and offload I/O (about which more soon), but seemed isolated enough to post separately and I guess it could be independently useful. From 4f4e26d0e2c023d62865365e3cb5ce1c89e63cb0 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 26 Nov 2020 15:48:31 +1300 Subject: [PATCH 07/11] Add pg_preadv() and pg_pwritev(). Provide synchronous scatter/gather I/O routines. These map to preadv(), pwritev() with various fallbacks for systems that don't have them. --- configure | 30 ++-- configure.ac | 9 +++-- src/include/pg_config.h.in | 15 ++ src/include/port.h | 37 ++ src/port/Makefile | 2 ++ src/port/pread.c | 41 -- src/port/pwrite.c | 41 -- src/tools/msvc/Solution.pm | 5 + 8 files changed, 146 insertions(+), 34 deletions(-) diff --git a/configure b/configure index cc2089e596..d25b8f2900 100755 --- a/configure +++ b/configure @@ -13199,7 +13199,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h fi -for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h wctype.h +for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h do : as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" @@ -15293,7 +15293,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l +for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pread preadv pstat pthread_is_threaded_np pwrite pwritev readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l writev do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" @@ -15970,32 +15970,6 @@ esac fi -ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread" -if test "x$ac_cv_func_pread" = xyes; then : - $as_echo "#define HAVE_PREAD 1" >>confdefs.h - -else - case " $LIBOBJS " in - *" pread.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS pread.$ac_objext" - ;; -esac - -fi - -ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite" -if test "x$ac_cv_func_pwrite" = xyes; then : - $as_echo "#define HAVE_PWRITE 1" >>confdefs.h - -else - case " $LIBOBJS " in - *" pwrite.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS pwrite.$ac_objext" - ;; -esac - -fi - ac_fn_c_check_func "$LINENO" "random" "ac_cv_func_random" if test "x$ac_cv_func_random" = xyes; then : $as_echo "#define HAVE_RANDOM 1" >>confdefs.h diff --git a/configure.ac b/configure.ac index c819aeab26..9c21e46ec8 100644 --- a/configure.ac +++ b/configure.ac @@ -1342,6 +1342,7 @@ AC_CHECK_HEADERS(m4_normalize([ sys/shm.h sys/sockio.h sys/tas.h + sys/uio.h sys/un.h termios.h ucred.h @@ -1671,9 +1672,14 @@ AC_CHECK_FUNCS(m4_normalize([ poll posix_fallocate ppoll + pread + preadv pstat pthread_is_threaded_np + pwrite + pwritev readlink + readv setproctitle setproctitle_fast setsid @@ -1684,6 +1690,7 @@ AC_CHE
Re: pg_preadv() and pg_pwritev()
Thomas Munro writes: > I want to be able to do synchronous vectored file I/O, so I made > wrapper macros for preadv() and pwritev() with fallbacks for systems > that don't have them. Following the precedent of the pg_pread() and > pg_pwrite() macros, the "pg_" prefix reflects a subtle contract > change: the fallback paths might have the side effect of changing the > file position. In a quick look, seems OK with some nits: 1. port.h cannot assume that has already been included; nor do I want to fix that by including there. Do we really need to define a fallback value of IOV_MAX? If so, maybe the answer is to put the replacement struct iovec and IOV_MAX in some new header. 2. I'm not really that happy about loading into every compilation we do, which would be another reason for a new specialized header that either includes or provides fallback definitions. 3. The patch as given won't prove anything except that the code compiles. Is it worth fixing at least one code path to make use of pg_preadv and pg_pwritev, so we can make sure this code is tested before there's layers of other new code on top? regards, tom lane
Re: Deleting older versions in unique indexes to avoid page splits
On Wed, Dec 9, 2020 at 5:12 PM Peter Geoghegan wrote: > Most of the real changes in v11 (compared to v10) are in heapam.c. > I've completely replaced the table_compute_xid_horizon_for_tuples() > interface with a new interface that supports all existing requirements > (from index deletions that support LP_DEAD deletion), while also > supporting these new requirements (e.g. bottom-up index deletion). I came up with a microbenchmark that is designed to give some general sense of how helpful it is to include "extra" TIDs alongside LP_DEAD-in-index TIDs (when they happen to point to the same table block as the LP_DEAD-in-index TIDs, which we'll have to visit anyway). The basic idea is simple: Add custom instrumentation that summarizes each B-Tree index deletion in LOG output, and then run the regression tests. Attached in the result of this for a "make installcheck-world". If you're just looking at this thread for the first time in a while: note that what I'm about to go into isn't technically bottom-up deletion (though you will see some of that in the full log output if you look). It's a closely related optimization that only appeared in recent versions of the patch. So I'm comparing the current approach to simple deletion of LP_DEAD-marked index tuples to a new enhanced approach (that makes it a little more like bottom-up deletion, but only a little). Here is some sample output (selected almost at random from a text file consisting of 889 lines of similar output): ... exact TIDs deleted 17, LP_DEAD tuples 4, LP_DEAD-related table blocks 2 ) ... exact TIDs deleted 38, LP_DEAD tuples 28, LP_DEAD-related table blocks 1 ) ... exact TIDs deleted 39, LP_DEAD tuples 1, LP_DEAD-related table blocks 1 ) ... exact TIDs deleted 44, LP_DEAD tuples 42, LP_DEAD-related table blocks 3 ) ... exact TIDs deleted 6, LP_DEAD tuples 2, LP_DEAD-related table blocks 2 ) (The initial contents of each line were snipped here, to focus on the relevant metrics.) Here we see that the actual number of TIDs/index tuples deleted often *vastly* exceeds the number of LP_DEAD-marked tuples (which are all we would have been able to delete with the existing approach of just deleting LP_DEAD items). It's pretty rare for us to fail to at least delete a couple of extra TIDs. Clearly this technique is broadly effective, because in practice there are significant locality-related effects that we can benefit from. It doesn't really matter that it's hard to precisely describe all of these locality effects. IMV the question that really matters is whether or not the cost of trying is consistently very low (relative to the cost of our current approach to simple LP_DEAD deletion). We do need to understand that fully. It's tempting to think about this quantitatively (and it also bolsters the case for the patch), but that misses the point. The right way to think about this is as a *qualitative* thing. The presence of LP_DEAD bits gives us certain reliable information about the nature of the index tuple (that it is dead-to-all, and therefore safe to delete), but it also *suggests* quite a lot more than that. In practice bloat is usually highly correlated/concentrated, especially when we limit our consideration to workloads where bloat is noticeable in any practical sense. So we're very well advised to look for nearby deletable index tuples in passing -- especially since it's practically free to do so. (Which is what the patch does here, of course.) Let's compare this to an extreme variant of my patch that runs the same test, to see what changes. Consider a variant that exhaustively checks every index tuple on the page at the point of a simple LP_DEAD deletion operation, no matter what. Rather than only including those TIDs that happen to be on the same heap/table blocks (and thus are practically free to check), we include all of them. This design isn't acceptable in the real world because it does a lot of unnecessary I/O, but that shouldn't invalidate any comparison I make here. This is still a reasonable approximation of a version of the patch with magical foreknowledge of where to find dead TIDs. It's an Oracle (ahem) that we can sensibly compare to the real patch within certain constraints. The results of this comparative analysis seem to suggest something important about the general nature of what's going on. The results are: There are only 844 deletion operations total with the Oracle. While this is less than the actual patch's 889 deletion operations, you would expect a bigger improvement from using what is after all supposed to apply magic! This suggests to me that the precise intervention of the patch here (the new LP_DEAD deletion stuff) has an outsized impact. The correlations that naturally exist make this asymmetrical payoff-to-cost situation possible. Simple cheap heuristics once again go a surprisingly long way, kind of like bottom-up deletion itself. -- Peter Geoghegan delete_stats_regression_tests.txt.gz Description: application/gz
Re: pg_preadv() and pg_pwritev()
On Sun, Dec 20, 2020 at 12:34 PM Tom Lane wrote: > Thomas Munro writes: > > I want to be able to do synchronous vectored file I/O, so I made > > wrapper macros for preadv() and pwritev() with fallbacks for systems > > that don't have them. Following the precedent of the pg_pread() and > > pg_pwrite() macros, the "pg_" prefix reflects a subtle contract > > change: the fallback paths might have the side effect of changing the > > file position. > > In a quick look, seems OK with some nits: Thanks for looking! > 1. port.h cannot assume that has already been included; > nor do I want to fix that by including there. Do we > really need to define a fallback value of IOV_MAX? If so, > maybe the answer is to put the replacement struct iovec and > IOV_MAX in some new header. Ok, I moved all this stuff into port/pg_uio.h. > 2. I'm not really that happy about loading into > every compilation we do, which would be another reason for a > new specialized header that either includes or > provides fallback definitions. Ack. > 3. The patch as given won't prove anything except that the code > compiles. Is it worth fixing at least one code path to make > use of pg_preadv and pg_pwritev, so we can make sure this code > is tested before there's layers of other new code on top? OK, here's a patch to zero-fill fresh WAL segments with pwritev(). I'm drawing a blank on trivial candidate uses for preadv(), without infrastructure from later patches. From d7178f296642e177bc57fabe93abec395cbaac5f Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 26 Nov 2020 15:48:31 +1300 Subject: [PATCH v2 1/2] Add pg_preadv() and pg_pwritev(). Provide synchronous scatter/gather I/O routines. These map to preadv(), pwritev() with various fallbacks for systems that don't have them. Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CA%2BhUKGJA%2Bu-220VONeoREBXJ9P3S94Y7J%2BkqCnTYmahvZJwM%3Dg%40mail.gmail.com --- configure | 30 ++ configure.ac | 9 +-- src/include/pg_config.h.in | 15 +++ src/include/port.h | 2 ++ src/include/port/pg_uio.h | 51 ++ src/port/Makefile | 2 ++ src/port/pread.c | 43 ++-- src/port/pwrite.c | 43 ++-- src/tools/msvc/Solution.pm | 5 9 files changed, 166 insertions(+), 34 deletions(-) create mode 100644 src/include/port/pg_uio.h diff --git a/configure b/configure index 11a4284e5b..5887c712cc 100755 --- a/configure +++ b/configure @@ -13061,7 +13061,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h fi -for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h wctype.h +for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h do : as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" @@ -15155,7 +15155,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pstat pthread_is_threaded_np readlink setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l +for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pread preadv pstat pthread_is_threaded_np pwrite pwritev readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink sync_file_range uselocale wcstombs_l writev do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" @@ -15832,32 +15832,6 @@ esac fi -ac_fn_c_check_func "$LINENO" "pread" "ac_cv_func_pread" -if test "x$ac_cv_func_pread" = xyes; then : - $as_echo "#define HAVE_PREAD 1" >>confdefs.h - -else - case " $LIBOBJS " in - *" pread.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS pread.$ac_objext" - ;; -esac - -fi - -ac_fn_c_check_func "$LINENO" "pwrite" "ac_cv_func_pwrite" -if test "x$ac_cv_func_pwrite" = xyes; then : - $as_echo "#define HAVE_PWRITE 1" >>confdefs.h - -else - case " $LIBOBJS " in - *" pwrite.$ac_objext "* ) ;; - *) LIBOBJS="$LIBOBJS pwrite.$ac_objext" - ;; -esac - -fi - ac_fn_c_check_func "$LINENO" "random" "ac_cv_func_random" if test "x$ac_cv_func_random" = xyes; then : $as_echo
Re: [PATCH] Logical decoding of TRUNCATE
On Sat, Apr 07, 2018 at 07:40:11PM -0400, Peter Eisentraut wrote: > Committed with those changes. Since commit 039eb6e added logical replication support for TRUNCATE, logical apply of the TRUNCATE fails if it chooses a parallel index build: cat >/tmp/most_parallel.conf <
Re: Weird special case in jsonb_concat()
so 19. 12. 2020 v 21:35 odesílatel Tom Lane napsal: > I wrote: > > However, further experimentation found a case that fails: > > regression=# select '3'::jsonb || '{}'::jsonb; > > ERROR: invalid concatenation of jsonb objects > > I wonder what is the point of this weird exception, and whether > > whoever devised it can provide a concise explanation of what > > they think the full behavior of "jsonb || jsonb" is. Why isn't > > '[3, {}]' a reasonable result here, if the cases above are OK? > > Here is a proposed patch for that. It turns out that the third > else-branch in IteratorConcat() already does the right thing, if > we just remove its restrictive else-condition and let it handle > everything except the two-objects and two-arrays cases. But it > seemed to me that trying to handle both the object || array > and array || object cases in that one else-branch was poorly > thought out: only one line of code can actually be shared, and it > took several extra lines of infrastructure to support the sharing. > So I split those cases into separate else-branches. > > This also addresses the inadequate documentation that was the > original complaint. > > Thoughts? Should we back-patch this? The existing behavior > seems to me to be inconsistent enough to be arguably a bug, > but we've not had field complaints saying "this should work". > +1 Pavel > regards, tom lane > >
Re: On login trigger: take three
čt 17. 12. 2020 v 19:30 odesílatel Pavel Stehule napsal: > > > čt 17. 12. 2020 v 14:04 odesílatel Konstantin Knizhnik < > k.knizh...@postgrespro.ru> napsal: > >> >> >> On 17.12.2020 9:31, Pavel Stehule wrote: >> >> >> >> st 16. 12. 2020 v 20:38 odesílatel Pavel Stehule >> napsal: >> >>> Attached please find new versoin of the patch based on >>> on_connect_event_trigger_WITH_SUGGESTED_UPDATES.patch >>> So there is still only "disable_client_connection_trigger" GUC? because we need possibility to disable client connect triggers and there is no such need for other event types. As you suggested have added "dathaslogontriggers" flag which is set when client connection trigger is created. This flag is never cleaned (to avoid visibility issues mentioned in my previous mail). >>> >>> This is much better - I don't see any slowdown when logon trigger is not >>> defined >>> >>> I did some benchmarks and looks so starting language handler is >>> relatively expensive - it is about 25% and starting handler like event >>> trigger has about 35%. But this problem can be solved later and elsewhere. >>> >>> I prefer the inverse form of disable_connection_trigger. Almost all GUC >>> are in positive form - so enable_connection_triggger is better >>> >>> I don't think so current handling dathaslogontriggers is good for >>> production usage. The protection against race condition can be solved by >>> lock on pg_event_trigger >>> >> >> I thought about it, and probably the counter of connect triggers will be >> better there. The implementation will be simpler and more robust. >> >> >> >> >> I prefer to implement different approach: unset dathaslogontriggers flag >> in event trigger itself when no triggers are returned by >> EventTriggerCommonSetup. >> I am using double checking to prevent race condition. >> The main reason for this approach is that dropping of triggers is not >> done in event_trigger.c: it is done by generic code for all resources. >> It seems to be there is no right place where decrementing of trigger >> counters can be inserted. >> Also I prefer to keep all logon-trigger specific code in one file. >> >> Also, as you suggested, I renamed disable_connection_trigger to >> enable_connection_trigger. >> >> >> New version of the patch is attached. >> > > yes, it can work > for complete review the new field in pg_doc should be documented Regards Pavel > Regards > > Pavel > > > > >> >> >> >> -- >> Konstantin Knizhnik >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >>
Re: pg_preadv() and pg_pwritev()
Thomas Munro writes: > OK, here's a patch to zero-fill fresh WAL segments with pwritev(). > I'm drawing a blank on trivial candidate uses for preadv(), without > infrastructure from later patches. This looks OK to me. I tried it on prairiedog (has writev and pwrite but not pwritev) as well as gaur (has only writev). They seem happy. One minor thought is that in + struct iovec iov[Min(IOV_MAX, 1024)]; /* cap stack space */ it seems like pretty much every use of IOV_MAX would want some similar cap. Should we centralize that idea with, say, #define PG_IOV_MAX Min(IOV_MAX, 1024) ? Or will the plausible cap vary across uses? regards, tom lane
Re: Weird special case in jsonb_concat()
On Sat, Dec 19, 2020, at 21:35, Tom Lane wrote: >Here is a proposed patch for that. I've tested the patch and "All 202 tests passed". In addition, I've tested it on a json intensive project, which passes all its own tests. I haven't studied the jsonfuncs.c code in detail, but the new code looks much cleaner, nice. >This also addresses the inadequate documentation that was the >original complaint. Looks good. In addition, to the user wondering how to append a json array-value "as is", I think it would be useful to provide an example on how to do this in the documentation. I think there is a risk users will attempt much more fragile hacks to achieve this, if we don't provide guidance in the documentation. Suggestion: '["a", "b"]'::jsonb || '["a", "d"]'::jsonb ["a", "b", "a", "d"] + +'["a", "b"]'::jsonb || jsonb_build_array('["a", "d"]'::jsonb) +["a", "b", ["a", "d"]] + '{"a": "b"}'::jsonb || '{"c": "d"}'::jsonb {"a": "b", "c": "d"} > Thoughts? Should we back-patch this? The existing behavior > seems to me to be inconsistent enough to be arguably a bug, > but we've not had field complaints saying "this should work". +1 back-patch, I think it's a bug. Best regards, Joel