Re: Comment fix of config_default.pl
On Fri, Jul 12, 2019 at 05:01:41PM +0900, Michael Paquier wrote: > I would also patch GetFakeConfigure in Solution.pm (no need to send a > new patch), and I thought that you'd actually do the change. What do > you think? OK, applied as I have been able to look at it again, and after fixing the portion for GetFakeConfigure. Thanks! -- Michael signature.asc Description: PGP signature
Re: [PATCH] Implement uuid_version()
On 13/7/19 8:31, Fabien COELHO wrote: Hello Jose, Hello, Fabien Thanks for taking a look Got it, and done. Please find attached a v2 patch with the upgrade script included. Patch v2 applies cleanly. Compiles cleanly (once running configure --with-uuid=...). Local make check ok. Doc build ok. There are no tests, I'd suggest to add some under sql & change expected if possible which would return all possible values, including with calling pg_random_uuid() which should return 4. Documentation describes uuid_version(), should it not describe uuid_version(namespace uuid)? I'd suggest to add an example. The extension update script seems ok, but ISTM that "uuid-ossp-1.1.sql" should be replaced with an updated "uuid-ossp-1.2.sql". This was a quite naïf approach to the issue on my part, more a "scratch my own itch" than anything else but definitively sparked some interest. Thanks to all involved. Considering the later arguments on-list, I plan on submitting a more elaborate patchset integrating the feedback received so far, and along the following lines: - uuid type, v4 generation and uuid_version() in core - Provide a means to rename/supercede extensions keeping backwards compatibility (i.e. uuid-ossp -> uuid, keep old code working) - Miscellaneous other functionality - Drop "dead" code ...but I've tried to keep quiet so as not to disturb too much around release time. I intend to continue working on this in late July, aiming for the following commitfest (once more "urgent" patches will have landed) Thanks again. J.L.
Re: POC: Cleaning up orphaned files using undo logs
On Fri, Jul 12, 2019 at 7:08 PM Robert Haas wrote: > > On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila wrote: > > > Apart from this, the duplicate key (ex. for size queues the size of > > two requests can be same) handling might need some work. Basically, > > either special combine function needs to be written (not sure yet what > > we should do there) or we always need to ensure that the key is unique > > like (size + start_urec_ptr). If the size is the same, then we can > > decide based on start_urec_ptr. > > I think that this problem is somewhat independent of whether we use an > rbtree or a binaryheap or some other data structure. > I think then I am missing something because what I am talking about is below code in rbt_insert: rbt_insert() { .. cmp = rbt->comparator(data, current, rbt->arg); if (cmp == 0) { /* * Found node with given key. Apply combiner. */ rbt->combiner(current, data, rbt->arg); *isNew = false; return current; } .. } If you see, here it doesn't add the duplicate key in the tree and that is not the case with binary_heap as far as I can understand. > I would be > inclined to use XID as a tiebreak for the size queue, so that it's > more likely to match the ordering of the XID queue, but if that's > inconvenient, then some other arbitrary value like start_urec_ptr > should be fine. > I think it would be better to use start_urec_ptr because XID can be non-unique in our case. As I explained in one of the emails above [1] that we register the requests for logged and unlogged relations separately, so XID can be non-unique. > > > > I think even if we currently go with a binary heap, it will be > > possible to change it to rbtree later, but I am fine either way. > > Well, I don't see much point in revising all of this logic twice. We > should pick the way we want it to work and make it work that way. > Yeah, I agree. So, I am assuming here that as you have discussed this idea with Andres offlist, he is on board with changing it as he has originally suggested using binary_heap. Andres, do let us know if you think differently here. It would be good if anyone else following the thread can also weigh in. [1] - https://www.postgresql.org/message-id/CAA4eK1LEKyPZD5Dy4j1u2smUUyMzxgC2YLj8E%2BaJpsvG7sVJYA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: refactoring - share str2*int64 functions
On Mon, Jul 8, 2019 at 3:22 PM Thomas Munro wrote: > Here's some semi-automated feedback, noted while going through > failures on cfbot.cputube.org. You have a stray editor file > src/backend/parser/parse_node.c.~1~. Something is failing to compile > while doing the temp-install in make check-world, which probably > indicates that some test or contrib module is using the interface you > changed? Please disregard the the comment about the ".~1~" file, my mistake. As for the check-world failure, it's here: pg_stat_statements.c:1024:11: error: implicit declaration of function 'pg_strtouint64' is invalid in C99 [-Werror,-Wimplicit-function-declaration] rows = pg_strtouint64(completionTag + 5, NULL, 10); ^ Apparently it needs to include common/string.h. -- Thomas Munro https://enterprisedb.com
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On 7/12/19 2:45 PM, Bruce Momjian wrote: > On Fri, Jul 12, 2019 at 12:41:19PM -0600, Ryan Lambert wrote: >> >> I vote for AES 256 rather than 128. >> > >> > Why? This page seems to think 128 is sufficient: >> > >> > https://crypto.stackexchange.com/questions/20/ >> what-are-the-practical-differences-between-256-bit-192-bit-and-128-bit-aes-enc >> > >> > For practical purposes, 128-bit keys are sufficient to ensure >> security. >> > The larger key sizes exist mostly to satisfy some US military >> > regulations which call for the existence of several distinct >> "security >> > levels", regardless of whether breaking the lowest level is already >> far >> > beyond existing technology. >> >> After researching AES key sizes a bit more my vote is (surprisingly?) for >> AES-128. My reasoning is about security, I did not consider performance >> impacts in my decision. > > Thank you for this exhaustive research. First of all, that is a mischaracterization of the issue. That paper also states: "we describe several key derivation attacks of practical complexity on AES-256 when its number of rounds is reduced to approximately that of AES-128. The best previously published attacks on such variants were far from practical, requiring 4 related keys and 2^120 time to break a 9-round version of AES-256 [9], and 64 related keys and 2^172time to break a 10-round version of AES-256 ([9], see also [2]). In this paper we describe an attack on 9-round AES-256 which can findits complete 256-bit key in 239time by using only the simplest type of related keys (in which the chosenplaintexts are encrypted under two keys whose XOR difference can be chosen in many different ways).Our best attack on 10-round AES-256 requires only two keys and 245time, but it uses a stronger type ofrelated subkey attack. These attacks can be extended into a quasi-practical 270attack on 11-round AES,and into a trivial 226attack on 8-round AES." Notice 2 key things about this: 1. The attacks described are against a reduced number of rounds. AES256 is 14 rounds, not 9 or 10. 2, They are "related key" attacks, which can be avoided by not using related keys, and we certainly should not be doing that. Also, please read the links that Sehrope sent earlier if you have not done so. In particular this one: https://www.ecrypt.eu.org/csa/documents/PQC-whitepaper.pdf which says: "Post-quantum cryptography is an area of cryptography in which systems are studied under the security assumption that the attacker has a quantum computer. This attack model is interesting because Shor has shown a quantum algorithm that breaks RSA, ECC, and finite field discrete logarithms in polynomial time. This means that in this model all commonly used public-key systems are no longer secure.Symmetric cryptography is also affected but significantly less. For systems that do not rely on mathematical structures the main effect is that an algorithm due to Grover halves the security level, i.e., breaking AES-128 takes 2^64 quantum operations while current attacks take 2^128 steps. While this is a big change, it can be managed quite easily by doubling the key sizes, e.g., by deploying AES-256. The operations needed in Grover’s algorithm are inherently sequential which has led some to doubt that even 264quantum operations are feasible, but since the remedy of changing to larger key sizes is very inexpensive it is generally recommended to do so." In addition, that first paper was written in 2010, yet in 2016 NSA published one of the other documents referenced by Sehrope: https://apps.nsa.gov/iaarchive/customcf/openAttachment.cfm?FilePath=/iad/library/ia-guidance/ia-solutions-for-classified/algorithm-guidance/assets/public/upload/CNSA-Suite-and-Quantum-Computing-FAQ.pdf&WpKes=aF6woL7fQp3dJiyWTFKrYn3ZZShmLnzECSjJhf Which states: "If you have already implemented Suite B algorithms using the larger (for TOP SECRET) key sizes, you should continue to use those algorithms and key sizes through this upcoming transition period. In many products changing to a larger key size can be done via a configuration change.Implementations using only the algorithms previously approved for SECRET and below in Suite B should not be used in NSS. In more precise terms this means that NSS (National Security Systems) should no longer use •ECDH and ECDSA with NIST P-256 •SHA-256 •AES-128 •RSA with 2048-bit keys •Diffie-Hellman with 2048-bit keys" I stand by my position. At a minimum, we need a choice of AES128 and AES256. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [PATCH] Implement uuid_version()
On 2019-07-13 08:08, Fabien COELHO wrote: > About doc: I'd consider "generation" instead of "generating" as a > secondary index term. We do use the "-ing" form for other secondary index terms. It's useful because the concatenation of primary and secondary term should usually make a phrase of some sort. The alternative would be "generation of", but that doesn't seem clearly better. >> (There is also precedent for redirecting the extension function to the >> internal one by changing the SQL-level function definition using CREATE >> OR REPLACE FUNCTION ... LANGUAGE INTERNAL. But that seems more >> complicated and would require a new extension version. It could maybe >> be included if the extension version is changed for other reasons.) > > What about avoiding a redirection with something like: > > Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid; That seems very confusing. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: initdb recommendations
On 2019-07-11 21:34, Julien Rouhaud wrote: >> Note that with this change, running initdb without arguments will now >> error on those platforms: You need to supply either a password or select >> a different default authentication method. > Should we make this explicitly stated in the documentation? As a > reference, it's saying: > > The default client authentication setup is such that users can connect > over the Unix-domain socket to the same database user name as their > operating system user names (on operating systems that support this, > which are most modern Unix-like systems, but not Windows) and > otherwise with a password. To assign a password to the initial > database superuser, use one of initdb's -W, --pwprompt or -- pwfile > options. Do you have a suggestion for where to put this and exactly how to phrase this? I think the initdb reference page would be more appropriate than runtime.sgml. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bad canonicalization for dateranges with 'infinity' bounds
On Fri, May 3, 2019 at 12:49 AM Laurenz Albe wrote: > > I propose the attached patch which fixes the problem. Hi Laurenz, I agree that the patch makes the code match the documentation. The documented behaviour seems to make more sense than the code, since unpatched master gives this nonsense result when it flips the inclusive flag but doesn't adjust the value (because it can't): postgres=# select '(-infinity,infinity]'::daterange @> 'infinity'::date; ?column? -- f (1 row) -if (!upper.infinite && upper.inclusive) +if (!(upper.infinite || DATE_NOT_FINITE(upper.val)) && upper.inclusive) Even though !(X || Y) is equivalent to !X && !Y, by my reading of range_in(), lower.value can be uninitialised when lower.infinite is true, and it's also a bit hard to read IMHO, so I'd probably write that as !upper.infinite && !DATE_NOT_FINITE(upper.val) && upper.inclusive. I don't think it can affect the result but it might upset Valgrind or similar. -- Thomas Munro https://enterprisedb.com
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On 7/11/19 9:05 PM, Bruce Momjian wrote: > On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote: >> On 7/11/19 6:37 PM, Bruce Momjian wrote: >> > Our first implementation will encrypt the entire cluster. We can later >> > consider encryption per table or tablespace. It is unclear if >> > encrypting different parts of the system with different keys is useful >> > or feasible. (This is separate from key rotation.) >> >> I still object strongly to using a single key for the entire database. I >> think we can use a single key for WAL, but we need some way to split the >> heap so that multiple keys are used. If not by tablespace, then some >> other method. > > What do you base this on? I have stated this more than once, and you and Stephen discussed it earlier as well, but will find all the links back into the thread and references and address in a separate email. >> Regardless of the method to split the heap into different keys, I think >> there should be an option for some tables to not be encrypted. If we >> decide it must be all or nothing for the first implementation I guess I >> could live with it but would be very disappointed. > > What does it mean you "could live this it"? Why do you consider having > some tables unencrypted important? I think it is pretty obvious isn't it? You have talked about the performance impact. Why would I want to encrypt, for example a lookup table, if there is nothing in that table warranting encryption? I think in many if not most applications the sensitive data is limited to much less than all of the tables, and I'd rather not take the hit for those tables. >> The keys themselves should be in an file which is encrypted by a master >> key. Obtaining the master key should be pattern it after the GUC >> ssl_passphrase_command. >> >> > We will use CBC AES128 mode for tables/indexes, and CTR AES128 for WAL. >> > 8k pages will use the LSN as a nonce, which will be encrypted to >> > generate the initialization vector (IV). We will not encrypt the first >> > 16 bytes of each pages so the LSN can be used in this way. The WAL will >> > use the WAL file segment number as the nonce and the IV will be created >> > in the same way. >> >> I vote for AES 256 rather than 128. > > Why? This page seems to think 128 is sufficient: Addressed in the other email >> Thinking out loud (and I believe somewhere in this massive thread >> someone else already said this), if we had a way to flag "key version" >> at the page level it seems like we could potentially rekey page-by-page >> while online, locking only one page at a time. We really only need to >> support 2 key versions and could ping-pong between them as they change. >> Or maybe this is a crazy idea. > > Yes, we did talk about this. It is certainly possible, but we would > still need a tool to guarantee all pages are using the new version, so I > am not sure what per-page buys us except making the later check faster. > I don't see this as a version-1 feature, frankly. If we allow for say, 2 versions of the key to exist at any given time, and if we could store that key version information on each page, we could change the key from old to new without locking the entire table at once, just locking one page at a time. Or at least that was my thinking. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [PATCH] Implement uuid_version()
Hello Peter, About doc: I'd consider "generation" instead of "generating" as a secondary index term. We do use the "-ing" form for other secondary index terms. It's useful because the concatenation of primary and secondary term should usually make a phrase of some sort. The alternative would be "generation of", but that doesn't seem clearly better. Ok, fine. I looked but did not find other instances of "generating". What about avoiding a redirection with something like: Datum (* const pg_random_uuid)(PG_FUNCTION_ARGS) = gen_random_uuid; That seems very confusing. Dunno. Possibly. The user does not have to look at the implementation, and probably such code would deserve a comment. The point is to avoid one call so as to perform the same (otherwise the pg_random_uuid would be slightly slower), and to ensure that it behaves the same, as it would be the very same function by construction. I've switched the patch to ready anyway. -- Fabien.
Re: Check-out mutable functions in check constraints
Tomas Vondra writes: > On Fri, Jul 12, 2019 at 07:59:13PM -0400, Tom Lane wrote: >> I'm pretty sure this change has been proposed before, and rejected before. >> Has anybody excavated in the archives for prior discussions? > Yes, I've done some quick searches like "volatile constraint" and so on. > There are a couple of relevant discussions: > 2004: > https://www.postgresql.org/message-id/flat/0C3A1AEC-6BE4-11D8-9224-000A95C88220%40myrealbox.com > 2010: > https://www.postgresql.org/message-id/flat/12849.1277918175%40sss.pgh.pa.us#736c8ef9d7810c0bb85f495490fd40f5 > But I don't think the conclusions are particularly clear. > In the first thread you seem to agree with requiring immutable functions > for check constraints (and triggers for one-time checks). The second > thread ended up discussing some new related stuff in SQL standard. Well, I think that second thread is very relevant here, because it correctly points out that we are *required by spec* to allow check constraints of the form CHECK(datecol <= CURRENT_DATE) and related tests. See the stuff about "retrospectively deterministic" predicates in SQL:2003 or later. I suppose you could imagine writing some messy logic that allowed the specific cases called out by the spec but not any other non-immutable function calls. But that just leaves us with an inconsistent restriction. If the spec is allowing this because it can be seen to be safe, why should we not allow other cases that the user has taken the trouble to prove to themselves are safe? (If their proof is wrong, well, it wouldn't be the first bug in anyone's SQL application.) regards, tom lane
Re: [PATCH] Implement uuid_version()
Hello Jose, Considering the later arguments on-list, I plan on submitting a more elaborate patchset integrating the feedback received so far, and along the following lines: - uuid type, v4 generation and uuid_version() in core - Provide a means to rename/supercede extensions keeping backwards compatibility (i.e. uuid-ossp -> uuid, keep old code working) - Miscellaneous other functionality - Drop "dead" code ...but I've tried to keep quiet so as not to disturb too much around release time. I intend to continue working on this in late July, aiming for the following commitfest (once more "urgent" patches will have landed) Ok. I've changed the patch status for this CF to "moved do next CF", and to "Waiting on author" there. The idea is to go on in the same thread when you are ready. -- Fabien.
Re: refactoring - share str2*int64 functions
Hello Thomas, pg_stat_statements.c:1024:11: error: implicit declaration of function 'pg_strtouint64' is invalid in C99 [-Werror,-Wimplicit-function-declaration] rows = pg_strtouint64(completionTag + 5, NULL, 10); ^ Apparently it needs to include common/string.h. Yep, I gathered that as well, but did not act promptly on your help. Thanks for it! Here is the updated patch on which I checked "make check-world". -- Fabien.diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 00cae04eea..8f40245ac4 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -79,6 +79,7 @@ #include "utils/builtins.h" #include "utils/hashutils.h" #include "utils/memutils.h" +#include "common/string.h" PG_MODULE_MAGIC; diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 8eedb613a1..e8744f054c 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -21,6 +21,7 @@ #include "catalog/heap.h" #include "catalog/pg_type.h" #include "commands/trigger.h" +#include "common/string.h" #include "executor/executor.h" #include "executor/spi_priv.h" #include "miscadmin.h" diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 6c2626ee62..3735268e3a 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -34,6 +34,7 @@ #include "fmgr.h" #include "miscadmin.h" +#include "common/string.h" #include "nodes/extensible.h" #include "nodes/parsenodes.h" #include "nodes/plannodes.h" diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c index 1baf7ef31f..53b89ece39 100644 --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -25,10 +25,10 @@ #include "parser/parse_expr.h" #include "parser/parse_relation.h" #include "utils/builtins.h" -#include "utils/int8.h" #include "utils/lsyscache.h" #include "utils/syscache.h" #include "utils/varbit.h" +#include "common/string.h" static void pcb_error_callback(void *arg); @@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location) case T_Float: /* could be an oversize integer as well as a float ... */ - if (scanint8(strVal(value), true, &val64)) + if (pg_strtoint64(strVal(value), true, &val64)) { /* * It might actually fit in int32. Probably only INT_MIN can diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index d317fd7006..70681588a1 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -12,6 +12,7 @@ */ #include "postgres.h" +#include "common/string.h" #include "catalog/pg_publication.h" #include "replication/logical.h" @@ -20,7 +21,6 @@ #include "replication/pgoutput.h" #include "utils/inval.h" -#include "utils/int8.h" #include "utils/memutils.h" #include "utils/syscache.h" #include "utils/varlena.h" @@ -111,7 +111,7 @@ parse_output_parameters(List *options, uint32 *protocol_version, errmsg("conflicting or redundant options"))); protocol_version_given = true; - if (!scanint8(strVal(defel->arg), true, &parsed)) + if (!pg_strtoint64(strVal(defel->arg), true, &parsed)) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid proto_version"))); diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c index c92e9d5046..618660f372 100644 --- a/src/backend/utils/adt/cash.c +++ b/src/backend/utils/adt/cash.c @@ -26,7 +26,6 @@ #include "libpq/pqformat.h" #include "utils/builtins.h" #include "utils/cash.h" -#include "utils/int8.h" #include "utils/numeric.h" #include "utils/pg_locale.h" diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c index 206576d4bd..1f17987e85 100644 --- a/src/backend/utils/adt/formatting.c +++ b/src/backend/utils/adt/formatting.c @@ -92,7 +92,6 @@ #include "utils/datetime.h" #include "utils/float.h" #include "utils/formatting.h" -#include "utils/int8.h" #include "utils/memutils.h" #include "utils/numeric.h" #include "utils/pg_locale.h" diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c index 0ff9394a2f..98e7b52ce9 100644 --- a/src/backend/utils/adt/int8.c +++ b/src/backend/utils/adt/int8.c @@ -18,12 +18,12 @@ #include #include "common/int.h" +#include "common/string.h" #include "funcapi.h" #include "libpq/pqformat.h" #include "nodes/nodeFuncs.h" #include "nodes/supportnodes.h" #include "optimizer/optimizer.h" -#include "utils/int8.h" #include "utils/builtins.h" @@ -47,89 +47,6 @@ typedef struct * Formatting and conversion routines. *-*/ -/* - * scanint8 --- try to parse a string into an int8. - * - * If errorOK is false, ereport a useful error message if the string is ba
Re: [sqlsmith] Crash in mcv_get_match_bitmap
Tomas Vondra writes: > On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote: >> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote: >>> The right way to determine operator semantics is to look to see >>> whether they are in a btree opclass. That's what the rest of the >>> planner does, and there is no good reason for the mcv code to >>> do it some other way. >> Hmmm, but that will mean we're unable to estimate operators that are not >> part of a btree opclass. Which is a bit annoying, because that would also >> affect equalities (and thus functional dependencies), in which case the >> type may easily have just hash opclass or something. After thinking about this more, I may have been analogizing to the wrong code. It's necessary to use opclass properties when we're reasoning about operators in a way that *must* be correct, for instance to conclude that a partition can be pruned from a query. But this code is just doing selectivity estimation, so the correctness standards are a lot lower. In particular I see that the long-established range-query-detection code in clauselist_selectivity is looking for operators that have F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you lifted parts of the mcv code from that, cause it looks pretty similar). So if we've gotten away with that so far over there, there's probably no reason not to do likewise here. I am a little troubled by the point I made about operators possibly wanting to have a more-specific estimator than scalarltsel, but that seems like an issue to solve some other time; and if we do change that logic then clauselist_selectivity needs to change too. > Here are WIP patches addressing two of the issues: > 1) determining operator semantics by matching them to btree opclasses Per above, I'm sort of inclined to drop this, unless you feel better about doing it like this than the existing way. > 2) deconstructing OpExpr into Var/Const nodes deconstruct_opexpr is still failing to verify that the Var is a Var. I'd try something like leftop = linitial(expr->args); while (IsA(leftop, RelabelType)) leftop = ((RelabelType *) leftop)->arg; // and similarly for rightop if (IsA(leftop, Var) && IsA(rightop, Const)) // return appropriate results else if (IsA(leftop, Const) && IsA(rightop, Var)) // return appropriate results else // fail Also, I think deconstruct_opexpr is far too generic a name for what this is actually doing. It'd be okay as a static function name perhaps, but not if you're going to expose it globally. > a) I don't think unary-argument OpExpr are an issue, because this is > checked when determining which clauses are compatible (and the code only > allows the case with 2 arguments). OK. > b) Const with constisnull=true - I'm not yet sure what to do about this. > The easiest fix would be to deem those clauses incompatible, but that > seems a bit too harsh. The right thing is probably passing the NULL to > the operator proc (but that means we can't use FunctionCall). No, because most of the functions in question are strict and will just crash on null inputs. Perhaps you could just deem that cases involving a null Const don't match what you're looking for. > Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when > calling the operator is the right thing. We're using type->typcollation > when building the stats, so maybe we should do the same thing here. Yeah, I was wondering that too. But really you should be using the column's collation not the type's default collation. See commit 5e0928005. regards, tom lane
Re: POC: converting Lists into arrays
David Rowley writes: > Thanks for the speedy turnaround. I've looked at v8, as far as a diff > between the two patches and I'm happy. > I've marked as ready for committer. So ... last chance for objections? I see from the cfbot that v8 is already broken (new call of lnext to be fixed). Don't really want to keep chasing a moving target, so unless I hear objections I'm going to adjust the additional spot(s) and commit this pretty soon, like tomorrow or Monday. regards, tom lane
Re: initdb recommendations
On Sat, Jul 13, 2019 at 2:44 PM Peter Eisentraut wrote: > > On 2019-07-11 21:34, Julien Rouhaud wrote: > >> Note that with this change, running initdb without arguments will now > >> error on those platforms: You need to supply either a password or select > >> a different default authentication method. > > Should we make this explicitly stated in the documentation? As a > > reference, it's saying: > > > > The default client authentication setup is such that users can connect > > over the Unix-domain socket to the same database user name as their > > operating system user names (on operating systems that support this, > > which are most modern Unix-like systems, but not Windows) and > > otherwise with a password. To assign a password to the initial > > database superuser, use one of initdb's -W, --pwprompt or -- pwfile > > options. > > Do you have a suggestion for where to put this and exactly how to phrase > this? > > I think the initdb reference page would be more appropriate than > runtime.sgml. Yes initdb.sgml seems more suitable. I was thinking something very similar to your note, maybe like (also attached if my MUA ruins it): diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index c47b9139eb..764cf737c7 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -143,6 +143,15 @@ PostgreSQL documentation connections. + + + Running initdb without arguments on platforms lacking + peer or Unix-domain socket connections will exit + with an error. On such environments, you need to either provide a + password or choose a different authentication method. + + + Do not use diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index c47b9139eb..764cf737c7 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -143,6 +143,15 @@ PostgreSQL documentation connections. + + + Running initdb without arguments on platforms lacking + peer or Unix-domain socket connections will exit + with an error. On such environments, you need to either provide a + password or choose a different authentication method. + + + Do not use trust unless you trust all local users on your system.
Re: Add CREATE DATABASE LOCALE option
Hello Peter, I think pg_dump/t/002_pg_dump.pl might be a good place. Not the easiest program in the world to work with, admittedly. Updated patch with test and expanded documentation. Patch v2 applies cleanly, compiles, make check-world ok with tap enabled. Doc gen ok. The addition looks reasonable. The second error message about conflicting option could more explicit than a terse "conflicting or redundant options"? The user may expect later options to superseedes earlier options, for instance. About the pg_dump code, I'm wondering whether it is worth generating LOCALE as it breaks backward compatibility (eg dumping a new db to load it into a older version). If it is to be generated, I'd do merge the two conditions instead of nesting. if (strlen(collate) > 0 && strcmp(collate, ctype) == 0) // generate LOCALE -- Fabien.
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On 7/13/19 9:38 AM, Joe Conway wrote: > On 7/11/19 9:05 PM, Bruce Momjian wrote: >> On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote: >>> On 7/11/19 6:37 PM, Bruce Momjian wrote: >>> > Our first implementation will encrypt the entire cluster. We can later >>> > consider encryption per table or tablespace. It is unclear if >>> > encrypting different parts of the system with different keys is useful >>> > or feasible. (This is separate from key rotation.) >>> >>> I still object strongly to using a single key for the entire database. I >>> think we can use a single key for WAL, but we need some way to split the >>> heap so that multiple keys are used. If not by tablespace, then some >>> other method. >> >> What do you base this on? Ok, so here we go. See links below. I skimmed through the entire thread and FWIW it was exhausting. To some extent this degenerated into a general search for relevant information: --- [1] and [2] show that at least some file system encryption uses a different key per file. --- [2] also shows that file system encryption uses a KDF (key derivation function) which we may want to use ourselves. The analogy would be per-table derived key instead of per file derived key. Note that KDF is a safe way to derive a key and it is not the same as a "related key" which was mentioned on another email as an attack vector. --- [2] also says provides additional support for AES 256. It also mentions CBC versus XTS -- I came across this elsewhere and it bears discussion: "Currently, the following pairs of encryption modes are supported: AES-256-XTS for contents and AES-256-CTS-CBC for filenames AES-128-CBC for contents and AES-128-CTS-CBC for filenames Adiantum for both contents and filenames If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair. AES-128-CBC was added only for low-powered embedded devices with crypto accelerators such as CAAM or CESA that do not support XTS." --- [2] also states this, which again makes me think in terms of table being the moral equivalent to a file: "Unlike dm-crypt, fscrypt operates at the filesystem level rather than at the block device level. This allows it to encrypt different files with different keys and to have unencrypted files on the same filesystem. This is useful for multi-user systems where each user’s data-at-rest needs to be cryptographically isolated from the others. However, except for filenames, fscrypt does not encrypt filesystem metadata." --- [3] suggests 68 GB per key and unique IV in GCM mode. --- [4] specifies 68 GB per key and unique IV in CTR mode -- this applies directly to our proposal to use CTR for WAL. --- [5] has this to say which seems independent of mode: "When encrypting data with a symmetric block cipher, which uses blocks of n bits, some security concerns begin to appear when the amount of data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2 bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit blocks). This means a limit of more than 250 millions of terabytes, which is sufficiently large not to be a problem. That's precisely why AES was defined with 128-bit blocks, instead of the more common (at that time) 64-bit blocks: so that data size is practically unlimited." But goes on to say: "I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you reach that number of bits the probability of a collision will grow quickly and you will be way over 50% probability of a collision by the time you reach 2*n*2^(n/2) bits. In order to keep the probability of a collision negligible I recommend encrypting no more than n*2^(n/4) bits with the same key. In the case of AES that works out to 64GB" It is hard to say if that recommendation is per key or per key+IV. --- [6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to imply a single key is used although at one point it says "transparent data encryption master key, also known as the transparent data encryption protector". The term "master key" indicates that they likely use derived keys under the covers. --- [7] is generally useful read about how many of the things we have been discussing are done in SQL Server --- [8] was referenced by Sehrope. In addition to support for AES 256 for long term use, table 5.1 is interesting. It lists CBC mode as "legacy" but not "future". --- [9] IETF RFC for KDF --- [10] IETF RFC for Key wrapping -- this is probably how we should wrap the master key with the Key Encryption Key (KEK) -- i.e. the outer key provided by the user or command on postmaster start --- Based on all of that I cannot find a requirement that we use more than one key per database. But I did find that files in an encrypted file system are encrypted with derived keys from a master key, and I view this as analogous to what we are doing. As an aside to the specific question, I also found more evidence that AES 256 is appropriate. Joe [1] https://www.po
Re: [PATCH] Improve performance of NOTIFY over many databases (issue blocking on AccessExclusiveLock on object 0 of class 1262 of database 0)
Martijn van Oosterhout writes: > Please find attached updated versions of the patches, I've now tested > them. Also attached is a reproduction script to verify that they > actually work. I looked through these (a bit cursorily). I'm generally on board with the idea of 0001, but not with the patch details. As coded, asyncQueueAdvanceTail is supposing that it can examine the shared QUEUE_HEAD and QUEUE_TAIL pointers without any lock whatsoever. That's probably unsafe, and if it is safe for some reason, you haven't made the argument why. Moreover, it seems unnecessary to make any such assumption. Why not put back the advanceTail tests you removed, but adjust them so that advanceTail isn't set true unless QUEUE_HEAD and QUEUE_TAIL point to different pages? (Note that in the existing coding, those tests are made while holding an appropriate lock, so it's safe to look at those pointers there.) It might be a good idea to make a macro encapsulating this new, more complicated rule for setting advanceTail, instead of relying on keeping the various call sites in sync. More attention to comments is also needed. For instance, you've made a lie out of the documentation of the tail pointer: QueuePosition tail; /* the global tail is equivalent to the pos of * the "slowest" backend */ It needs to say something like "is <= the pos of the slowest backend", instead. I think the explanation of why this algorithm is good could use more effort, too. Comments for 0002 are about the same: for no explained reason, and certainly no savings, you've put the notify_all test in an unsafe place rather than a safe one (viz, two lines down, *after* taking the relevant lock). And 0002 needs more commentary about why its optimization is safe and useful, too. In particular it's not obvious why QUEUE_HEAD being on a different page from QUEUE_TAIL has anything to do with whether we should wake up other backends. I'm not very persuaded by 0003, mainly because it seems likely to me that 0001 and 0002 will greatly reduce the possibility that the early-exit can happen. So it seems like it's adding cycles (in a spot where we hold exclusive lock) without a good chance of saving any cycles. Taking a step back in hopes of seeing the bigger picture ... as you already noted, these changes don't really fix the "thundering herd of wakeups" problem, they just arrange for it to happen once per SLRU page rather than once per message. I wonder if we could improve matters by stealing an idea from the sinval code: when we're trying to cause advance of the global QUEUE_TAIL, waken only the slowest backend, and have it waken the next-slowest after it's done. In sinval there are some additional provisions to prevent a nonresponsive backend from delaying matters for other backends, but I think maybe we don't need that here. async.c doesn't have anything equivalent to sinval reset, so there's no chance of overruling a slow backend's failure to advance its pos pointer, so there's not much reason not to just wait till it does do so. A related idea is to awaken only one backend at a time when we send a new message (i.e., advance QUEUE_HEAD) but I think that would likely be bad. The hazard with the chained-wakeups method is that a slow backend blocks wakeup of everything else. We don't care about that hugely for QUEUE_TAIL advance, because we're just hoping to free some SLRU space. But for QUEUE_HEAD advance it'd mean failing to deliver notifies in a timely way, which we do care about. (Also, if I remember correctly, the processing on that side only requires shared lock so it's less of a problem if many backends do it at once.) regards, tom lane
Re: [sqlsmith] Crash in mcv_get_match_bitmap
On Sat, Jul 13, 2019 at 11:39:55AM -0400, Tom Lane wrote: Tomas Vondra writes: On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote: On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote: The right way to determine operator semantics is to look to see whether they are in a btree opclass. That's what the rest of the planner does, and there is no good reason for the mcv code to do it some other way. Hmmm, but that will mean we're unable to estimate operators that are not part of a btree opclass. Which is a bit annoying, because that would also affect equalities (and thus functional dependencies), in which case the type may easily have just hash opclass or something. After thinking about this more, I may have been analogizing to the wrong code. It's necessary to use opclass properties when we're reasoning about operators in a way that *must* be correct, for instance to conclude that a partition can be pruned from a query. But this code is just doing selectivity estimation, so the correctness standards are a lot lower. In particular I see that the long-established range-query-detection code in clauselist_selectivity is looking for operators that have F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you lifted parts of the mcv code from that, cause it looks pretty similar). So if we've gotten away with that so far over there, there's probably no reason not to do likewise here. I am a little troubled by the point I made about operators possibly wanting to have a more-specific estimator than scalarltsel, but that seems like an issue to solve some other time; and if we do change that logic then clauselist_selectivity needs to change too. Here are WIP patches addressing two of the issues: 1) determining operator semantics by matching them to btree opclasses Per above, I'm sort of inclined to drop this, unless you feel better about doing it like this than the existing way. OK. TBH I don't have a very strong opinion on this - I always disliked how we rely on the estimator OIDs in this code, and relying on btree opclasses seems somewhat more principled. But I'm not sure I understand all the implications of such change (and I have some concerns about it too, per my last message), so I'd revisit that in PG13. 2) deconstructing OpExpr into Var/Const nodes deconstruct_opexpr is still failing to verify that the Var is a Var. I'd try something like leftop = linitial(expr->args); while (IsA(leftop, RelabelType)) leftop = ((RelabelType *) leftop)->arg; // and similarly for rightop if (IsA(leftop, Var) && IsA(rightop, Const)) // return appropriate results else if (IsA(leftop, Const) && IsA(rightop, Var)) // return appropriate results else // fail Ah, right. The RelabelType might be on top of something that's not Var. Also, I think deconstruct_opexpr is far too generic a name for what this is actually doing. It'd be okay as a static function name perhaps, but not if you're going to expose it globally. I agree. I can't quite make it static, because it's used from multiple places, but I'll move it to extended_stats_internal.h (and I'll see if I can think of a better name too). a) I don't think unary-argument OpExpr are an issue, because this is checked when determining which clauses are compatible (and the code only allows the case with 2 arguments). OK. b) Const with constisnull=true - I'm not yet sure what to do about this. The easiest fix would be to deem those clauses incompatible, but that seems a bit too harsh. The right thing is probably passing the NULL to the operator proc (but that means we can't use FunctionCall). No, because most of the functions in question are strict and will just crash on null inputs. Perhaps you could just deem that cases involving a null Const don't match what you're looking for. Makes sense, I'll do that. Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when calling the operator is the right thing. We're using type->typcollation when building the stats, so maybe we should do the same thing here. Yeah, I was wondering that too. But really you should be using the column's collation not the type's default collation. See commit 5e0928005. OK, thanks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote: On 7/13/19 9:38 AM, Joe Conway wrote: On 7/11/19 9:05 PM, Bruce Momjian wrote: On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote: On 7/11/19 6:37 PM, Bruce Momjian wrote: > Our first implementation will encrypt the entire cluster. We can later > consider encryption per table or tablespace. It is unclear if > encrypting different parts of the system with different keys is useful > or feasible. (This is separate from key rotation.) I still object strongly to using a single key for the entire database. I think we can use a single key for WAL, but we need some way to split the heap so that multiple keys are used. If not by tablespace, then some other method. What do you base this on? Ok, so here we go. See links below. I skimmed through the entire thread and FWIW it was exhausting. To some extent this degenerated into a general search for relevant information: --- [1] and [2] show that at least some file system encryption uses a different key per file. --- [2] also shows that file system encryption uses a KDF (key derivation function) which we may want to use ourselves. The analogy would be per-table derived key instead of per file derived key. Note that KDF is a safe way to derive a key and it is not the same as a "related key" which was mentioned on another email as an attack vector. --- [2] also says provides additional support for AES 256. It also mentions CBC versus XTS -- I came across this elsewhere and it bears discussion: "Currently, the following pairs of encryption modes are supported: AES-256-XTS for contents and AES-256-CTS-CBC for filenames AES-128-CBC for contents and AES-128-CTS-CBC for filenames Adiantum for both contents and filenames If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair. AES-128-CBC was added only for low-powered embedded devices with crypto accelerators such as CAAM or CESA that do not support XTS." --- [2] also states this, which again makes me think in terms of table being the moral equivalent to a file: "Unlike dm-crypt, fscrypt operates at the filesystem level rather than at the block device level. This allows it to encrypt different files with different keys and to have unencrypted files on the same filesystem. This is useful for multi-user systems where each user’s data-at-rest needs to be cryptographically isolated from the others. However, except for filenames, fscrypt does not encrypt filesystem metadata." --- [3] suggests 68 GB per key and unique IV in GCM mode. --- [4] specifies 68 GB per key and unique IV in CTR mode -- this applies directly to our proposal to use CTR for WAL. --- [5] has this to say which seems independent of mode: "When encrypting data with a symmetric block cipher, which uses blocks of n bits, some security concerns begin to appear when the amount of data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2 bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit blocks). This means a limit of more than 250 millions of terabytes, which is sufficiently large not to be a problem. That's precisely why AES was defined with 128-bit blocks, instead of the more common (at that time) 64-bit blocks: so that data size is practically unlimited." FWIW I was a bit confused at first, because the copy paste mangled the formulas a bit - it should have been 2^(n/2) and n*2^(n/2). But goes on to say: "I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you reach that number of bits the probability of a collision will grow quickly and you will be way over 50% probability of a collision by the time you reach 2*n*2^(n/2) bits. In order to keep the probability of a collision negligible I recommend encrypting no more than n*2^(n/4) bits with the same key. In the case of AES that works out to 64GB" It is hard to say if that recommendation is per key or per key+IV. Hmm, yeah. The question is what collisions they have in mind? Presumably it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed IV, so per key+IV. --- [6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to imply a single key is used although at one point it says "transparent data encryption master key, also known as the transparent data encryption protector". The term "master key" indicates that they likely use derived keys under the covers. --- [7] is generally useful read about how many of the things we have been discussing are done in SQL Server --- [8] was referenced by Sehrope. In addition to support for AES 256 for long term use, table 5.1 is interesting. It lists CBC mode as "legacy" but not "future". --- [9] IETF RFC for KDF --- [10] IETF RFC for Key wrapping -- this is probably how we should wrap the master key with the Key Encryption Key (KEK) -- i.e. the outer key provided by the user or command on postmaster start --- Based on all of that I cannot find a requirement that we use more than one key per
Re: Patch to document base64 encoding
Hi Fabien, Attached is doc_base64_v10.patch On Fri, 12 Jul 2019 15:58:21 +0200 (CEST) Fabien COELHO wrote: > >> I suggested "Function <>decode ...", which is the kind of thing > >> we do in academic writing to improve precision, because I thought > >> it could be better:-) > > > > "Function <>decode ..." just does not work in English. > > It really works in research papers: "Theorem X can be proven by > applying Proposition Y. See Figure 2 for details. Algorithm Z > describes whatever, which is listed in Table W..." I've not thought about it before but I suppose the difference is between declarative and descriptive, the latter being more inviting and better allows for flow between sentences. Otherwise you're writing in bullet points. So it is a question of balance between specification and narration. In regular prose you're always going to see the "the" unless the sentence starts with the name. The trouble is that we can't start sentences with function names because of capitalization confusion. > > I hope that 3 patches will make review easier. > > Not really. I'm reviewing the 3 patches put together rather than each > one individually, which would require more work. I figured with e.g. a separate patch for moving and alphabetizing that it'd be easier to check that nothing got lost. Anyhow, Just one patch this time. > convert: I'd merge the 2 first sentences to state that if convert > from X to Y. The doc does not say explicitely what happens if a > character cannot be converted. After testing, an error is raised. The > example comment could add ", if possible". Done. Good idea. I reworked the whole paragraph to shorten and clarify since I was altering it anyway. This does introduce some inconsistency with wording that appears elsewhere but it seems worth it because the listentry box was getting overfull. > to_hex: add "." at the end of the sentence? I left as-is, without a ".". The only consistent rule about periods in the listentrys seems to be that if there's more than one sentence then there's periods -- I think. In any case a lot of them don't have periods and probably don't need periods. I don't know what to do and since the original did not have a period it seems better to leave well enough alone. > Minor comment: you usually put two spaces between a "." and the first > world of then next sentence, but not always. I now always put 2 spaces after the end of a sentence. But I've only done this where I've changed text, not when moving pre-existing text around. Again, there seems to be no consistency in the original. (I believe docbook redoes all inter-sentence spacing anyway.) Thanks for the help. I'll be sure to sign up to review a patch (or patches) when life permits. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index a25c122ac8..131a63b36c 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1795,48 +1795,6 @@ - convert - -convert(string bytea, -src_encoding name, -dest_encoding name) - - bytea - -Convert string to dest_encoding. The -original encoding is specified by -src_encoding. The -string must be valid in this encoding. -Conversions can be defined by CREATE CONVERSION. -Also there are some predefined conversions. See for available conversions. - - convert('text_in_utf8', 'UTF8', 'LATIN1') - text_in_utf8 represented in Latin-1 - encoding (ISO 8859-1) - - - - - - convert_from - -convert_from(string bytea, -src_encoding name) - - text - -Convert string to the database encoding. The original encoding -is specified by src_encoding. The -string must be valid in this encoding. - - convert_from('text_in_utf8', 'UTF8') - text_in_utf8 represented in the current database encoding - - - - - convert_to convert_to(string text, @@ -1844,7 +1802,8 @@ bytea -Convert string to dest_encoding. +Convert string to dest_encoding. See for available conversions. convert_to('some text', 'UTF8') some text represented in the UTF8 encoding @@ -1855,39 +1814,24 @@ decode + + base64 encoding + decode(string text, format text) bytea Decode binary data from textual representation in string. -Options for format are same as in encode. +Options +for format are same as +in encode. decode('MTIzAAE=', 'base64') \x3132330001 - - - encode - -
Re: SHOW CREATE
> On 2019–07–05, at 12:14, Corey Huinker wrote: > > In doing that work, it became clear that the command was serving two masters: > 1. A desire to see the underlying nuts and bolts of a given database object. > 2. A desire to essentially make the schema portion of pg_dump a server side > command. > > To that end, I see splitting this into two commands, SHOW CREATE and SHOW > DUMP. I like the idea of having these features available via SQL as opposed to separate tools. Is it necessary to have specific commands for them? It seems they would potentially more useful as functions, where they'd be available for all of the programmatic features of the rest of SQL. Michael Glaesemann grzm seespotcode net
Re: Avoiding hash join batch explosions with extreme skew and weird stats
On Wed, Jul 03, 2019 at 02:22:09PM -0700, Melanie Plageman wrote: On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman wrote: These questions will probably make a lot more sense with corresponding code, so I will follow up with the second version of the state machine patch once I finish it. I have changed the state machine and resolved the questions I had raised in the previous email. This seems to work for the parallel and non-parallel cases. I have not yet rewritten the unmatched outer tuple status as a bitmap in a spill file (for ease of debugging). Before doing that, I wanted to ask what a desirable fallback condition would be. In this patch, fallback to hashloop join happens only when inserting tuples into the hashtable after batch 0 when inserting another tuple from the batch file would exceed work_mem. This means you can't increase nbatches, which, I would think is undesirable. Yes, I think that's undesirable. I thought a bit about when fallback should happen. So, let's say that we would like to fallback to hashloop join when we have increased nbatches X times. At that point, since we do not want to fall back to hashloop join for all batches, we have to make a decision. After increasing nbatches the Xth time, do we then fall back for all batches for which inserting inner tuples exceeds work_mem? Do we use this strategy but work_mem + some fudge factor? Or, do we instead try to determine if data skew led us to increase nbatches both times and then determine which batch, given new nbatches, contains that data, set fallback to true only for that batch, and let all other batches use the existing logic (with no fallback option) unless they contain a value which leads to increasing nbatches X number of times? I think we should try to detect the skew and use this hashloop logic only for the one batch. That's based on the assumption that the hashloop is less efficient than the regular hashjoin. We may need to apply it even for some non-skewed (but misestimated) cases, though. At some point we'd need more than work_mem for BufFiles, at which point we ought to use this hashloop. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: SHOW CREATE
On Sat, Jul 13, 2019 at 06:32:41PM -0500, Michael Glaesemann wrote: > > > > On 2019–07–05, at 12:14, Corey Huinker wrote: > > > > In doing that work, it became clear that the command was serving two > > masters: > > 1. A desire to see the underlying nuts and bolts of a given database object. > > 2. A desire to essentially make the schema portion of pg_dump a server side > > command. > > > > To that end, I see splitting this into two commands, SHOW CREATE > > and SHOW DUMP. > > I like the idea of having these features available via SQL as > opposed to separate tools. Is it necessary to have specific commands > for them? It seems they would potentially more useful as functions, > where they'd be available for all of the programmatic features of > the rest of SQL. Having commands for them would help meet people's expectations coming from other RDBMSs. On the other hand, making functions could just be done in SQL, which might hurry the process along. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: Conflict handling for COPY FROM
On Fri, Jul 12, 2019 at 1:42 AM Surafel Temesgen wrote: > Here are the patch that contain all the comment given except adding a way to > specify > to ignoring all error because specifying a highest number can do the work and > may be > try to store such badly structure data is a bad idea Hi Surafel, FYI GCC warns: copy.c: In function ‘CopyFrom’: copy.c:3383:8: error: ‘dest’ may be used uninitialized in this function [-Werror=maybe-uninitialized] (void) dest->receiveSlot(myslot, dest); ^ copy.c:2702:16: note: ‘dest’ was declared here DestReceiver *dest; ^ -- Thomas Munro https://enterprisedb.com
Re: pgbench - implement strict TPC-B benchmark
On Tue, Apr 9, 2019 at 3:58 AM Fabien COELHO wrote: > The attached patch does $SUBJECT, as a showcase for recently added > features, including advanced expressions (CASE...), \if, \gset, ending SQL > commands at ";"... Hi Fabien, + the account branch has a 15% probability to be in the same branch as the teller (unless I would say "... has a 15% probability of being in the same ...". The same wording appears further down in the comment. I see that the parameters you propose match the TPCB 2.0 description[1], and the account balance was indeed supposed to be returned to the driver. I wonder if "strict" is the right name here though. "tpcb-like-2" at least leaves room for someone to propose yet another variant, and still includes the "-like" disclaimer, which I interpret as a way of making it clear that this benchmark and results produced by it have no official TPC audited status. > There is also a small fix to the doc which describes the tpcb-like > implementation but gets one variable name wrong: balance -> delta. Agreed. I committed that part. Thanks! [1] http://www.tpc.org/tpcb/spec/tpcb_current.pdf -- Thomas Munro https://enterprisedb.com
Re: Bad canonicalization for dateranges with 'infinity' bounds
On Sun, Jul 14, 2019 at 12:44 AM Thomas Munro wrote: > Even though !(X || Y) is equivalent to !X && !Y, by my reading of > range_in(), lower.value can be uninitialised when lower.infinite is > true, and it's also a bit hard to read IMHO, so I'd probably write > that as !upper.infinite && !DATE_NOT_FINITE(upper.val) && > upper.inclusive. I don't think it can affect the result but it might > upset Valgrind or similar. I take back the bit about reading an uninitialised value (X || Y doesn't access Y if X is true... duh), but I still think the other way of putting it is a bit easier to read. YMMV. Generally, +1 for this patch. I'll wait a couple of days for more feedback to appear. -- Thomas Munro https://enterprisedb.com
Re: Built-in connection pooler
On Tue, Jul 9, 2019 at 8:30 AM Konstantin Knizhnik wrote: Rebased version of the patch is attached. Thanks for including nice documentation in the patch, which gives a good overview of what's going on. I haven't read any code yet, but I took it for a quick drive to understand the user experience. These are just some first impressions. I started my server with -c connection_proxies=1 and tried to connect to port 6543 and the proxy segfaulted on null ptr accessing port->gss->enc. I rebuilt without --with-gssapi to get past that. Using SELECT pg_backend_pid() from many different connections, I could see that they were often being served by the same process (although sometimes it created an extra one when there didn't seem to be a good reason for it to do that). I could see the proxy managing these connections with SELECT * FROM pg_pooler_state() (I suppose this would be wrapped in a view with a name like pg_stat_proxies). I could see that once I did something like SET foo.bar = 42, a backend became dedicated to my connection and no other connection could use it. As described. Neat. Obviously your concept of tainted backends (= backends that can't be reused by other connections because they contain non-default session state) is quite simplistic and would help only the very simplest use cases. Obviously the problems that need to be solved first to do better than that are quite large. Personally I think we should move all GUCs into the Session struct, put the Session struct into shared memory, and then figure out how to put things like prepared plans into something like Ideriha-san's experimental shared memory context so that they also can be accessed by any process, and then we'll mostly be tackling problems that we'll have to tackle for threads too. But I think you made the right choice to experiment with just reusing the backends that have no state like that. On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old school poll() for now), I see the connection proxy process eating a lot of CPU and the temperature rising. I see with truss that it's doing this as fast as it can: poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000) = 1 (0x1) Ouch. I admit that I had the idea to test on FreeBSD because I noticed the patch introduces EPOLLET and I figured this might have been tested only on Linux. FWIW the same happens on a Mac. That's all I had time for today, but I'm planning to poke this some more, and get a better understand of how this works at an OS level. I can see fd passing, IO multiplexing, and other interesting things happening. I suspect there are many people on this list who have thoughts about the architecture we should use to allow a smaller number of PGPROCs and a larger number of connections, with various different motivations. > Thank you, I will look at Takeshi Ideriha's patch. Cool. > > Could you please fix these compiler warnings so we can see this > > running check-world on CI? > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678 > > > Sorry, I do not have access to Windows host, so can not check Win32 > build myself. C:\projects\postgresql\src\include\../interfaces/libpq/libpq-int.h(33): fatal error C1083: Cannot open include file: 'pthread-win32.h': No such file or directory (src/backend/postmaster/proxy.c) [C:\projects\postgresql\postgres.vcxproj] These relative includes in proxy.c are part of the problem: #include "../interfaces/libpq/libpq-fe.h" #include "../interfaces/libpq/libpq-int.h" I didn't dig into this much but some first reactions: 1. I see that proxy.c uses libpq, and correctly loads it as a dynamic library just like postgres_fdw. Unfortunately it's part of core, so it can't use the same technique as postgres_fdw to add the libpq headers to the include path. 2. libpq-int.h isn't supposed to be included by code outside libpq, and in this case it fails to find pthead-win32.h which is apparently expects to find in either the same directory or the include path. I didn't look into what exactly is going on (I don't have Windows either) but I think we can say the root problem is that you shouldn't be including that from backend code. -- Thomas Munro https://enterprisedb.com
Fix typos and inconsistencies for HEAD (take 6)
Hello hackers, Please consider fixing the next batch of typos and inconsistencies in the tree: 6.1. FADVISE_WILLNEED -> POSIX_FADV_WILLNEED 6.2. failOK -> missing_ok 6.3. failOnerror -> failOnSignal 6.4. fakebits -> remove (irrelevant since introduction in 945543d9) 6.5. FastPathGetLockEntry -> FastPathGetRelationLockEntry 6.6. FAST_PATH_HASH_BUCKETS -> FAST_PATH_STRONG_LOCK_HASH_PARTITIONS 6.7. FastPathTransferLocks -> FastPathTransferRelationLocks 6.8. GenericOptionFlags -> remove (unused since 090173a3) 6.9. fetch_data -> fetched_data 6.10. fildes -> fd 6.11. filedescriptors -> file descriptors 6.12. fillatt -> remove (orphaned since 8609d4ab) 6.13. finalfunction -> finalfn 6.14. flail -> fail 6.15. FlushBuffers -> FlushBuffer & rephrase a comment (incorrectly updated in 6f5c38dc) 6.16. flush_context -> wb_context 6.17. followon -> follow-on 6.18. force_quotes -> remove (orphaned since e18d900d) 6.19. formatstring -> format-string 6.20. formarray, formfloat -> remove (orphaned since a237dd2b) 6.21. found_row_type -> found_whole_row 6.22. freeScanStack -> remove a comment (irrelevant since 2a636834) 6.23. free_segment_counter -> freed_segment_counter 6.24. FreeSpace Map -> FreeSpaceMap 6.25. user friendly-operand -> user-friendly operand 6.26. frozenids -> frozenxids 6.27. fsm_internal.h -> fsm_internals.h 6.28. fsm_size_to_avail_cat -> fsm_space_avail_to_cat 6.29. full_result -> full_results 6.30. FULL_SIZ -> remove (orphaned since 65b731bd) 6.31. funxtions -> functions 6.32. generate_nonunion_plan, generate_union_plan -> generate_nonunion_paths, generate_union_paths 6.33. getaddinfo -> getaddrinfo 6.34. get_expr, get_indexdef, get_ruledef, get_viewdef, get_triggerdef, get_userbyid -> pg_get_* 6.35. GetHashPageStatis -> GetHashPageStats 6.36. GetNumShmemAttachedBgworkers -> remove (orphaned since 6bc8ef0b) 6.37. get_one_range_partition_bound_string -> get_range_partbound_string 6.38. getPartitions -> remove a comment (irrelevant since 44c52881) 6.39. GetRecordedFreePage -> GetRecordedFreeSpace 6.40. get_special_varno -> resolve_special_varno 6.41. gig -> GB 6.42. GinITupIsCompressed -> GinItupIsCompressed 6.43. GinPostingListSegmentMaxSize-bytes -> GinPostingListSegmentMaxSize bytes 6.44. gistfindCorrectParent -> gistFindCorrectParent 6.45. gistinserthere -> gistinserttuple 6.46. GISTstate -> giststate 6.47. GlobalSerializableXmin, SerializableGlobalXmin -> SxactGlobalXmin 6.48. Greenwish -> Greenwich 6.49. groupClauseVars -> groupClauseCommonVars As a side note, while looking at dt_common.c (fixing 6.47), I've got a feeling that the datetktbl is largely outdated and thus mostly unuseful (e.g. USSR doesn't exist for almost 30 years). Best regards, Alexander diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml index 763b8cf7fd..c95776eb75 100644 --- a/doc/src/sgml/gist.sgml +++ b/doc/src/sgml/gist.sgml @@ -921,7 +921,7 @@ my_fetch(PG_FUNCTION_ARGS) * Convert 'fetched_data' into the a Datum of the original datatype. */ -/* fill *retval from fetch_data. */ +/* fill *retval from fetched_data. */ gistentryinit(*retval, PointerGetDatum(converted_datum), entry->rel, entry->page, entry->offset, FALSE); diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 47db7a8a88..d78f706ff7 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -155,7 +155,7 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids, * NB: this is a low-level routine and is NOT the preferred entry point * for most uses; functions in transam.c are the intended callers. * - * XXX Think about issuing FADVISE_WILLNEED on pages that we will need, + * XXX Think about issuing POSIX_FADV_WILLNEED on pages that we will need, * but aren't yet in cache, as well as hinting pages not to fall out of * cache yet. */ diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index 05c6ca81b9..0aa12e8dde 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -2399,7 +2399,7 @@ TSParserIsVisible(Oid prsId) /* * get_ts_dict_oid - find a TS dictionary by possibly qualified name * - * If not found, returns InvalidOid if failOK, else throws error + * If not found, returns InvalidOid if missing_ok, else throws error */ Oid get_ts_dict_oid(List *names, bool missing_ok) diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 6745a2432e..e2bed96b9b 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -216,9 +216,9 @@ static PROCLOCK *FastPathGetRelationLockEntry(LOCALLOCK *locallock); /* * To make the fast-path lock mechanism work, we must have some way of - * preventing the use of the fast-path when a conflicting lock might be - * present. We partition* the locktag space into FAST_PATH_HASH_BUCKETS - * partitions, and maintain an integer count of the number of "strong" lockers +