Re: POC: rational number type (fractions)
Hello, It seems you are not the first to be interested in such feature. There was a similar extension used in "incremental view maintenance" testing: https://github.com/nuko-yokohama/pg_fraction didn't tryed it myself. Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Index Skip Scan
On Fri, Feb 07, 2020 at 05:25:43PM +0100, Dmitry Dolgov wrote: On Thu, Feb 06, 2020 at 09:22:20PM +0900, Kyotaro Horiguchi wrote: At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote in > > On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote: > > At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote in > > We could add an additional parameter "in_cursor" to > > ExecSupportBackwardScan and let skip scan return false if in_cursor is > > true, but I'm not sure it's acceptable. > > I also was thinking about whether it's possible to use > ExecSupportBackwardScan here, but skip scan is just a mode of an > index/indexonly scan. Which means that ExecSupportBackwardScan also need > to know somehow if this mode is being used, and then, since this > function is called after it's already decided to use skip scan in the > resulting plan, somehow correct the plan (exclude skipping and try to > find next best path?) - do I understand your suggestion correct? I didn't thought so hardly, but a bit of confirmation told me that IndexSupportsBackwardScan returns fixed flag for AM. It seems that things are not that simple. Yes, I've mentioned that already in one of the previous emails :) The simplest way I see to achieve what we want is to do something like in attached modified version with a new hasDeclaredCursor field. It's not a final version though, but posted just for discussion, so feel free to suggest any improvements or alternatives. IMO the proper fix for this case (moving forward, reading backwards) is simply making it work by properly checking deleted tuples etc. Not sure why that would be so much complex (haven't tried implementing it)? I think making this depend on things like declared cursor etc. is going to be tricky, may easily be more complex than checking deleted tuples, and the behavior may be quite surprising. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Marking some contrib modules as trusted extensions
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Julien Rouhaud writes: > >>> Probably NO, if only because you'd need additional privileges > >>> to use these anyway: > >>> pg_stat_statements > > > But the additional privileges are global, so assuming the extension > > has been properly setup, wouldn't it be sensible to ease the > > per-database installation? If not properly setup, there's no harm in > > creating the extension anyway. > > Mmm, I'm not convinced --- the ability to see what statements are being > executed in other sessions (even other databases) is something that > paranoid installations might not be so happy about. Of course, but that's why we have a default role which allows installations to control access to that kind of information- and it's already being checked in the pg_stat_statements case and in the pg_stat_activity case: /* Superusers or members of pg_read_all_stats members are allowed */ is_allowed_role = is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS); > Our previous > discussions about what privilege level is needed to look at > pg_stat_statements info were all made against a background assumption > that you needed some extra privilege to set up the view in the first > place. I think that would need another look or two before being > comfortable that we're not shifting the goal posts too far. While you could maybe argue that's true for pg_stat_statements, it's certainly not true for pg_stat_activity, so I don't buy it for either. This looks like revisionist history to justify paranoia. I understand the general concern, but if we were really depending on the mere installation of the extension to provide security then we wouldn't have bothered putting in checks like the one above, and, worse, I think our users would be routinely complaining that our extensions don't follow our security model and how they can't install them. Lots of people want to use pg_stat_statements, even in environments where not everyone on the database server, or even in the database you want pg_stat_statements in, is trusted, and therefore we have to have these additional checks inside the extension itself. The same goes for just about everything else (I sure hope, at least) in our extensions set- none of the core extensions should be allowing access to things which break our security model, even if they're installed, unless some additional privileges are granted out. The act of installing a core extension should not create a security risk for our users- if it did, it'd be a security issue and CVE-worthy. As such, I really don't agree with this entire line of thinking when it comes to our core extensions. I view the 'trusted extension' model as really for things where the extension author doesn't care about, and doesn't wish to care about, dealing with our security model and making sure that it's followed. We do care, and we do maintain, the security model that we have throughout the core extensions. What I expect and hope will happen is that people will realize that, now that they can have non-superusers installing these extensions and therefore they don't have to give out superuser-level rights as much, there will be asks for more default roles to allow granting out of access to formerly superuser-only capabilities. There's a bit of a complication there since there might be privileges that only make sense for a specific extension, but an extension can't really install a new default role (and, even if it did, the role would have to be only available to the superuser initially anyway), so we might have to try and come up with some more generic and reusable default role for that case. Still, we can try to deal with that when it happens. Consider that you may wish to have a system that, once installed, a superuser will virtually never access again, but one where you want users to be able to install and use extensions like postgis and pg_stat_statements. That can be done with these changes, and that's fantastic progress- you just install PG, create a non-superuser role, make them the DB owner, and then GRANT things like pg_read_all_stats to their role with admin rights, and boom, they're good to go and you didn't have to hack up the PG source code at all. > The bigger picture here is that I don't want to get push-back that > we've broken somebody's security posture by marking too many extensions > trusted. So for anything where there's any question about security > implications, we should err in the conservative direction of leaving > it untrusted. This is just going to a) cause our users to complain about not being able to install extensions that they've routinely installed in the past, and b) make our users wonder what it is about these extensions that we've decided can't be trusted to even just be installed and if they're at risk today because they've installed them. While it might not seem obvious, the discussion over on the thread about DEFAULT PRIVILEGES and pg_
Re: Index Skip Scan
Hi, I've done some testing and benchmarking of the v31 patch, looking for regressions, costing issues etc. Essentially, I've ran a bunch of SELECT DISTINCT queries on data sets of various size, number of distinct values etc. The results are fairly large, so I've uploaded them to github https://github.com/tvondra/skip-scan-test There are four benchmark groups, depending on how the data are generated and availability of extended stats and if the columns are independent: 1) skipscan - just indexes, columns are independent 2) skipscan-with-stats - indexes and extended stats, independent columns 3) skipscan-correlated - just indexes, correlated columns 4) skipscan-correlated-with-stats - indexes and extended stats, correlated columns The github repository contains *.ods spreadsheet comparing duration with the regular query plan (no skip scan) and skip scan. In general, there are pretty massive speedups, often by about two orders of magnitude. There are a couple of regressions, where the plan with skipscan enables is ~10x slower. But this seems to happen only in high-cardinality cases where we misestimate the number of groups. Consider a table with two independent columns CREATE TABLE t (a text, b text); INSERT INTO t SELECT md5((1*random())::int::text), md5((1*random())::int::text) FROM generate_series(1,100) s(i); CREATE INDEX ON t(a,b); ANALYZE; which then behaves like this: test=# select * from (select distinct a,b from t) foo offset 1000; Time: 3138.222 ms (00:03.138) test=# set enable_indexskipscan = off; Time: 0.312 ms test=# select * from (select distinct a,b from t) foo offset 1000; Time: 199.749 ms So in this case the skip scan is ~15x slower than the usual plan (index only scan + unique). The reason why this happens is pretty simple - to estimate the number of groups we multiply the ndistinct estimates for the two columns (which both have n_distinct = 1), but then we cap the estimate to 10% of the table. But when the columns are independent with high cardinalities that under-estimates the actual value, making the cost for skip scan much lower than it should be. I don't think this is an issue the skipscan patch needs to fix, though. Firstly, the regressed cases are a tiny minority. Secondly, we already have a way to improve the root cause - creating extended stats with ndistinct coefficients generally makes the problem go away. One interesting observation however is that this regression only happened with text columns but not with int or bigint. My assumption is that this is due to text comparisons being much more expensive. Not sure if there is something we could do to deal with this - reduce the number of comparisons or something? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
OK, A couple more comments based on quick review of the patch, particularly the part related to planning: 1) create_skipscan_unique_path has one assert commented out. Either it's something we want to enforce, or we should remove it. /*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/ 2) I wonder if the current costing model is overly optimistic. We simply copy the startup cost from the IndexPath, which seems fine. But for total cost we do this: pathnode->path.total_cost = basepath->startup_cost * numGroups; which seems a bit too simplistic. The startup cost is pretty much just the cost to find the first item in the index, but surely we need to do more to find the next group - we need to do comparisons to skip some of the items, etc. If we think that's unnecessary, we need to explain it in a comment or somthing. 3) I don't think we should make planning dependent on hasDeclareCursor. 4) I'm not quite sure how sensible it's to create a new IndexPath in create_skipscan_unique_path. On the one hand it works, but I don't think any other path is constructed like this so I wonder if we're missing something. Perhaps it'd be better to just add a new path node on top of the IndexPath, and then handle this in create_plan. We already do something similar for Bitmap Index Scans, where we create a different executor node from IndexPath depending on the parent node. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
ALTER TABLE rewrite to use clustered order
Forking this thread https://www.postgresql.org/message-id/20181227132417.xe3oagawina7775b%40alvherre.pgsql On Wed, Dec 26, 2018 at 01:09:39PM -0500, Robert Haas wrote: > ALTER TABLE already has a lot of logic that is oriented towards being > able to do multiple things at the same time. If we added CLUSTER, > VACUUM FULL, and REINDEX to that set, then you could, say, change a > data type, cluster, and change tablespaces all in a single SQL > command. On Thu, Dec 27, 2018 at 10:24:17AM -0300, Alvaro Herrera wrote: > I think it would be valuable to have those ALTER TABLE variants that rewrite > the table do so using the cluster order, if there is one, instead of the heap > order, which is what it does today. That's a neat idea. I haven't yet fit all of ALTERs processing logic in my head ... but there's an issue that ALTER (unlike CLUSTER) needs to deal with column type promotion, so the indices may need to be dropped and recreated. The table rewrite happens AFTER dropping indices (and all other processing), but the clustered index can't be scanned if it's just been dropped. I handled that by using a tuplesort, same as heapam_relation_copy_for_cluster. Experimental patch attached. With clustered ALTER: template1=# DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,999)i; CREATE INDEX ON t(i DESC); ALTER TABLE t CLUSTER ON t_i_idx; ALTER TABLE t ALTER i TYPE bigint; SELECT * FROM t LIMIT 9; DROP TABLE SELECT 999 CREATE INDEX ALTER TABLE ALTER TABLE i - 999 998 997 996 995 994 993 992 991 (9 rows) 0001 patch is stolen from the nearby thread: https://www.postgresql.org/message-id/flat/20200207143935.GP403%40telsasoft.com It doesn't make much sense for ALTER to use a clustered index when rewriting a table, if doesn't also go to the effort to preserve the cluster property when rebuilding its indices. 0002 patch is included and not squished with 0003 to show the original implementation using an index scan (by not dropping indices on the old table, and breaking various things), and the evolution to tuplesort. Note, this doesn't use clustered order when rewriting only due to tablespace change. Alter currently does an AM specific block copy without looking at tuples. But I think it'd be possible to use tuplesort and copy if desired. >From f93bbd6c30e883068f46ff86def28d0e66aea4f5 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Fri, 7 Feb 2020 22:06:57 -0600 Subject: [PATCH v1 1/3] Preserve CLUSTER ON index during ALTER rewrite Amit Langote and Justin Pryzby https://www.postgresql.org/message-id/flat/20200202161718.GI13621%40telsasoft.com --- src/backend/commands/tablecmds.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index b7c8d66..642a85c 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, static void RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, Relation rel, List *domname, const char *conname); +static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid); static void TryReuseIndex(Oid oldId, IndexStmt *stmt); static void TryReuseForeignKey(Oid oldId, Constraint *con); static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName, @@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, newcmd->def = (Node *) stmt; tab->subcmds[AT_PASS_OLD_INDEX] = lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId); } else if (IsA(stm, AlterTableStmt)) { @@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, rel, NIL, indstmt->idxname); + + /* Preserve index's indisclustered property, if set. */ + PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid); } else if (cmd->subtype == AT_AddConstraint) { @@ -11997,6 +12004,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid, } /* + * For a table's index that is to be recreated due to PostAlterType + * processing, preserve its indisclustered property by issuing ALTER TABLE + * CLUSTER ON command on the table that will run after the command to recreate + * the index. + */ +static void +PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid) +{ + HeapTuple indexTuple; + Form_pg_index indexForm; + + Assert(OidIsValid(indoid)); + Assert(pass == AT_PASS_OLD_INDEX); + + indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indoid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + + if (indexForm->indisclustered) + { + Alt
Re: Internal key management system
On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: On Sat, 8 Feb 2020 at 03:24, Andres Freund wrote: Hi, On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > Yeah I'm not going to use pgcrypto for transparent data encryption. > The KMS patch includes the new basic infrastructure for cryptographic > functions (mainly AES-CBC). I'm thinking we can expand that > infrastructure so that we can also use it for TDE purpose by > supporting new cryptographic functions such as AES-CTR. Anyway, I > agree to not have it depend on pgcrypto. I thought for a minute, before checking the patch, that you were saying above that the KMS patch includes its *own* implementation of cryptographic functions. I think it's pretty crucial that it continues not to do that... I meant that we're going to use OpenSSL for AES encryption and decryption independent of pgcrypto's openssl code, as the first step. That is, KMS is available only when configured --with-openssl. And hopefully we eventually merge these openssl code and have pgcrypto use it, like when we introduced SCRAM. I don't think it's very likely we'll ever merge any openssl code into our repository, e.g. because of licensing. But we already have AES implementation in pgcrypto - why not to use that? I'm not saying we should make this depend on pgcrypto, but maybe we should move the AES library from pgcrypto into src/common or something like that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Internal key management system
Hi, I wonder if this is meant to support external KMS systems/services like Vault (from HashiCorp) or CloudHSM (from AWS) or a hardware HSM. AFAICS the current implementation does not allow storing keys in such external systems, right? But it seems kinda reasonable to want to do that, when already using the HSM for other parts of the system. Now, I'm not saying the first version we commit has to support this, or that it necessarily makes sense. But for example MariaDB seems to support this [1]. [1] https://mariadb.com/kb/en/encryption-key-management/ -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
> On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote: > > I've done some testing and benchmarking of the v31 patch, looking for > regressions, costing issues etc. Essentially, I've ran a bunch of SELECT > DISTINCT queries on data sets of various size, number of distinct values > etc. The results are fairly large, so I've uploaded them to github > > https://github.com/tvondra/skip-scan-test Thanks a lot for testing! > There are a couple of regressions, where the plan with skipscan enables > is ~10x slower. But this seems to happen only in high-cardinality cases > where we misestimate the number of groups. Consider a table with two > independent columns > > CREATE TABLE t (a text, b text); > INSERT INTO t SELECT > md5((1*random())::int::text), > md5((1*random())::int::text) > FROM generate_series(1,100) s(i); > > CREATE INDEX ON t(a,b); > > ANALYZE; > > which then behaves like this: > > test=# select * from (select distinct a,b from t) foo offset 1000; > Time: 3138.222 ms (00:03.138) > test=# set enable_indexskipscan = off; > Time: 0.312 ms > test=# select * from (select distinct a,b from t) foo offset 1000; > Time: 199.749 ms > > So in this case the skip scan is ~15x slower than the usual plan (index > only scan + unique). The reason why this happens is pretty simple - to > estimate the number of groups we multiply the ndistinct estimates for > the two columns (which both have n_distinct = 1), but then we cap > the estimate to 10% of the table. But when the columns are independent > with high cardinalities that under-estimates the actual value, making > the cost for skip scan much lower than it should be. The current implementation checks if we can find the next value on the same page to do a shortcut instead of tree traversal and improve such kind of situations, but I can easily imagine that it's still not enough in some extreme situations. > I don't think this is an issue the skipscan patch needs to fix, though. > Firstly, the regressed cases are a tiny minority. Secondly, we already > have a way to improve the root cause - creating extended stats with > ndistinct coefficients generally makes the problem go away. Yes, I agree. > One interesting observation however is that this regression only > happened with text columns but not with int or bigint. My assumption is > that this is due to text comparisons being much more expensive. Not sure > if there is something we could do to deal with this - reduce the number > of comparisons or something? Hm, interesting. I need to check that we do not do any unnecessary comparisons. > On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote: > > Yes, I've mentioned that already in one of the previous emails :) The > > simplest way I see to achieve what we want is to do something like in > > attached modified version with a new hasDeclaredCursor field. It's not a > > final version though, but posted just for discussion, so feel free to > > suggest any improvements or alternatives. > > IMO the proper fix for this case (moving forward, reading backwards) is > simply making it work by properly checking deleted tuples etc. Not sure > why that would be so much complex (haven't tried implementing it)? It's probably not that complex by itself, but requires changing responsibilities isolation. At the moment current implementation leaves jumping over a tree fully to _bt_skip, and heap visibility checks only to IndexOnlyNext. To check deleted tuples properly we need to either verify a corresponding heap tuple visibility inside _bt_skip (as I've mentioned in one of the previous emails, checking if an index tuple is dead is not enough), or teach the code in IndexOnlyNext to understand that _bt_skip can lead to returning the same tuple while moving forward & reading backward. Do you think it's still makes sense to go this way?
Re: Internal key management system
Hi, On February 8, 2020 7:08:26 AM PST, Tomas Vondra wrote: >On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: >>On Sat, 8 Feb 2020 at 03:24, Andres Freund wrote: >>> >>> Hi, >>> >>> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: >>> > Yeah I'm not going to use pgcrypto for transparent data >encryption. >>> > The KMS patch includes the new basic infrastructure for >cryptographic >>> > functions (mainly AES-CBC). I'm thinking we can expand that >>> > infrastructure so that we can also use it for TDE purpose by >>> > supporting new cryptographic functions such as AES-CTR. Anyway, I >>> > agree to not have it depend on pgcrypto. >>> >>> I thought for a minute, before checking the patch, that you were >saying >>> above that the KMS patch includes its *own* implementation of >>> cryptographic functions. I think it's pretty crucial that it >continues >>> not to do that... >> >>I meant that we're going to use OpenSSL for AES encryption and >>decryption independent of pgcrypto's openssl code, as the first step. >>That is, KMS is available only when configured --with-openssl. And >>hopefully we eventually merge these openssl code and have pgcrypto use >>it, like when we introduced SCRAM. >> > >I don't think it's very likely we'll ever merge any openssl code into >our repository, e.g. because of licensing. But we already have AES >implementation in pgcrypto - why not to use that? I'm not saying we >should make this depend on pgcrypto, but maybe we should move the AES >library from pgcrypto into src/common or something like that. The code uses functions exposed by openssl, it doesn't copy there code. And no, I don't think we should copy the implemented from pgcrypto - it's not good. We should remove it entirely. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Internal key management system
On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: Hi, On February 8, 2020 7:08:26 AM PST, Tomas Vondra wrote: On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: On Sat, 8 Feb 2020 at 03:24, Andres Freund wrote: Hi, On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > Yeah I'm not going to use pgcrypto for transparent data encryption. > The KMS patch includes the new basic infrastructure for cryptographic > functions (mainly AES-CBC). I'm thinking we can expand that > infrastructure so that we can also use it for TDE purpose by > supporting new cryptographic functions such as AES-CTR. Anyway, I > agree to not have it depend on pgcrypto. I thought for a minute, before checking the patch, that you were saying above that the KMS patch includes its *own* implementation of cryptographic functions. I think it's pretty crucial that it continues not to do that... I meant that we're going to use OpenSSL for AES encryption and decryption independent of pgcrypto's openssl code, as the first step. That is, KMS is available only when configured --with-openssl. And hopefully we eventually merge these openssl code and have pgcrypto use it, like when we introduced SCRAM. I don't think it's very likely we'll ever merge any openssl code into our repository, e.g. because of licensing. But we already have AES implementation in pgcrypto - why not to use that? I'm not saying we should make this depend on pgcrypto, but maybe we should move the AES library from pgcrypto into src/common or something like that. The code uses functions exposed by openssl, it doesn't copy there code. Sure, I know the code is currently calling ooenssl functions. I was responding to Masahiko-san's message that we might eventually merge this openssl code into our tree. And no, I don't think we should copy the implemented from pgcrypto - it's not good. We should remove it entirely. OK, no opinion on the quality of this implementation. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ALTER TABLE rewrite to use clustered order
Justin Pryzby writes: > On Thu, Dec 27, 2018 at 10:24:17AM -0300, Alvaro Herrera wrote: >> I think it would be valuable to have those ALTER TABLE variants that rewrite >> the table do so using the cluster order, if there is one, instead of the heap >> order, which is what it does today. > That's a neat idea. TBH, I'm -1 on this. The current behavior of preserving physical order is perfectly sane, and it's faster than anything involving CLUSTER is going to be, and if you try to change that you are going to have enormous headaches with the variants of ALTER TABLE that would change the semantics of the CLUSTER index columns. (Unless of course your theory is that you don't actually care exactly what the finished order is, in which case why are we bothering?) The proposed patch which *forces* it to be done like that, whether the user wants it or not, seems particularly poorly thought out. regards, tom lane
Re: Index Skip Scan
On Sat, Feb 08, 2020 at 04:24:40PM +0100, Dmitry Dolgov wrote: On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote: I've done some testing and benchmarking of the v31 patch, looking for regressions, costing issues etc. Essentially, I've ran a bunch of SELECT DISTINCT queries on data sets of various size, number of distinct values etc. The results are fairly large, so I've uploaded them to github https://github.com/tvondra/skip-scan-test Thanks a lot for testing! There are a couple of regressions, where the plan with skipscan enables is ~10x slower. But this seems to happen only in high-cardinality cases where we misestimate the number of groups. Consider a table with two independent columns CREATE TABLE t (a text, b text); INSERT INTO t SELECT md5((1*random())::int::text), md5((1*random())::int::text) FROM generate_series(1,100) s(i); CREATE INDEX ON t(a,b); ANALYZE; which then behaves like this: test=# select * from (select distinct a,b from t) foo offset 1000; Time: 3138.222 ms (00:03.138) test=# set enable_indexskipscan = off; Time: 0.312 ms test=# select * from (select distinct a,b from t) foo offset 1000; Time: 199.749 ms So in this case the skip scan is ~15x slower than the usual plan (index only scan + unique). The reason why this happens is pretty simple - to estimate the number of groups we multiply the ndistinct estimates for the two columns (which both have n_distinct = 1), but then we cap the estimate to 10% of the table. But when the columns are independent with high cardinalities that under-estimates the actual value, making the cost for skip scan much lower than it should be. The current implementation checks if we can find the next value on the same page to do a shortcut instead of tree traversal and improve such kind of situations, but I can easily imagine that it's still not enough in some extreme situations. Yeah. I'm not sure there's room for further improvements. The regressed cases were subject to the 10% cap, and with ndistinct being more than 10% of the table, we probably can find many distinct keys on each index page - we know that every ~10 rows the values change. I don't think this is an issue the skipscan patch needs to fix, though. Firstly, the regressed cases are a tiny minority. Secondly, we already have a way to improve the root cause - creating extended stats with ndistinct coefficients generally makes the problem go away. Yes, I agree. One interesting observation however is that this regression only happened with text columns but not with int or bigint. My assumption is that this is due to text comparisons being much more expensive. Not sure if there is something we could do to deal with this - reduce the number of comparisons or something? Hm, interesting. I need to check that we do not do any unnecessary comparisons. On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote: > Yes, I've mentioned that already in one of the previous emails :) The > simplest way I see to achieve what we want is to do something like in > attached modified version with a new hasDeclaredCursor field. It's not a > final version though, but posted just for discussion, so feel free to > suggest any improvements or alternatives. IMO the proper fix for this case (moving forward, reading backwards) is simply making it work by properly checking deleted tuples etc. Not sure why that would be so much complex (haven't tried implementing it)? It's probably not that complex by itself, but requires changing responsibilities isolation. At the moment current implementation leaves jumping over a tree fully to _bt_skip, and heap visibility checks only to IndexOnlyNext. To check deleted tuples properly we need to either verify a corresponding heap tuple visibility inside _bt_skip (as I've mentioned in one of the previous emails, checking if an index tuple is dead is not enough), or teach the code in IndexOnlyNext to understand that _bt_skip can lead to returning the same tuple while moving forward & reading backward. Do you think it's still makes sense to go this way? Not sure. I have to think about this first. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Internal key management system
Tomas Vondra writes: > On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: >> On February 8, 2020 7:08:26 AM PST, Tomas Vondra >> wrote: I don't think it's very likely we'll ever merge any openssl code into our repository, e.g. because of licensing. But we already have AES implementation in pgcrypto - why not to use that? I'm not saying we should make this depend on pgcrypto, but maybe we should move the AES library from pgcrypto into src/common or something like that. >> The code uses functions exposed by openssl, it doesn't copy there code. > Sure, I know the code is currently calling ooenssl functions. I was > responding to Masahiko-san's message that we might eventually merge this > openssl code into our tree. No. This absolutely, positively, will not happen. There will never be crypto functions in our core tree, because then there'd be no recourse for people who want to use Postgres in countries with restrictions on crypto software. It's hard enough for them that we have such code in contrib --- but at least they can remove pgcrypto and be legal. If it's in src/common then they're stuck. For the same reason, I don't think that an "internal key management" feature in the core code is ever going to be acceptable. It has to be an extension. (But, as long as it's an extension, whether it's bringing its own crypto or relying on some other extension for that doesn't matter from the legal standpoint.) regards, tom lane
Re: Marking some contrib modules as trusted extensions
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Our previous >> discussions about what privilege level is needed to look at >> pg_stat_statements info were all made against a background assumption >> that you needed some extra privilege to set up the view in the first >> place. I think that would need another look or two before being >> comfortable that we're not shifting the goal posts too far. > While you could maybe argue that's true for pg_stat_statements, it's > certainly not true for pg_stat_activity, so I don't buy it for either. The analogy of pg_stat_activity certainly suggests that there shouldn't be a reason *in principle* why pg_stat_statements couldn't be made trusted. There's a difference between that statement and saying that *in practice* pg_stat_statements is ready to be trusted right now with no further changes. I haven't done the analysis needed to conclude that, and don't care to do so as part of this patch proposal. > The same goes for just about everything else (I sure hope, at least) in > our extensions set- none of the core extensions should be allowing > access to things which break our security model, even if they're > installed, unless some additional privileges are granted out. Maybe not, but the principle of defense-in-depth still says that admins could reasonably want to not let dangerous tools get installed in the first place. > As such, I really don't agree with this entire line of thinking when it > comes to our core extensions. I view the 'trusted extension' model as > really for things where the extension author doesn't care about, and > doesn't wish to care about, dealing with our security model and making > sure that it's followed. We do care, and we do maintain, the security > model that we have throughout the core extensions. I am confused as to what "entire line of thinking" you are objecting to. Are you now saying that we should forget the trusted-extension model? Or maybe that we can just mark *everything* we ship as trusted? I'm not going to agree with either. >> The bigger picture here is that I don't want to get push-back that >> we've broken somebody's security posture by marking too many extensions >> trusted. So for anything where there's any question about security >> implications, we should err in the conservative direction of leaving >> it untrusted. > This is just going to a) cause our users to complain about not being > able to install extensions that they've routinely installed in the past, That's utter nonsense. Nothing here is taking away privileges that existed before; if you could install $whatever as superuser before, you still can. OTOH, we *would* have a problem of that sort if we marked $whatever as trusted and then later had to undo it. So I think there's plenty of reason to be conservative about the first wave of what-to-mark-as-trusted. Once we've got more experience with this mechanism under our belts, we might decide we can be more liberal about it. > and b) make our users wonder what it is about these extensions that > we've decided can't be trusted to even just be installed and if they're > at risk today because they've installed them. Yep, you're right, this patch does make value judgements of that sort, and I'll stand behind them. Giving people the impression that, say, postgres_fdw isn't any more dangerous than cube isn't helpful. > While it might not seem obvious, the discussion over on the thread about > DEFAULT PRIVILEGES and pg_init_privs is actually a lot more relevant > here- there's extensions we have that expect certain functions, once > installed, to be owned by a superuser (which will still be the case > here, thanks to how you've addressed that), but then to not have EXECUTE > rights GRANT'd to anyone (thanks to the REVERT FROM PUBLIC in the > installation), but that falls apart when someone's decided to set > up DEFAULT PRIVILEGES for the superuser. While no one seems to want to > discuss that with me, unfortunately, it's becoming more and more clear > that we need to skip DEFAULT PRIVILEGES from being applied during > extension creation. Or that we can't let people apply default privileges to superuser-created objects at all. But I agree that that's a different discussion. regards, tom lane
Re: Index Skip Scan
On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote: > > So in this case the skip scan is ~15x slower than the usual plan (index > > only scan + unique). The reason why this happens is pretty simple - to > > estimate the number of groups we multiply the ndistinct estimates for > > the two columns (which both have n_distinct = 1), but then we cap > > the estimate to 10% of the table. But when the columns are independent > > with high cardinalities that under-estimates the actual value, making > > the cost for skip scan much lower than it should be. > > The current implementation checks if we can find the next value on the > same page to do a shortcut instead of tree traversal and improve such > kind of situations, but I can easily imagine that it's still not enough > in some extreme situations. This is almost certainly rehashing already covered ground, but since I doubt it's been discussed recently, would you be able to summarize that choice (not to always get the next tuple by scanning from the top of the tree again) and the performance/complexity tradeoffs? Thanks, James
Re: Internal key management system
On Sun, 9 Feb 2020 at 01:53, Tomas Vondra wrote: > > On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote: > >Hi, > > > >On February 8, 2020 7:08:26 AM PST, Tomas Vondra > > wrote: > >>On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote: > >>>On Sat, 8 Feb 2020 at 03:24, Andres Freund wrote: > > Hi, > > On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote: > > Yeah I'm not going to use pgcrypto for transparent data > >>encryption. > > The KMS patch includes the new basic infrastructure for > >>cryptographic > > functions (mainly AES-CBC). I'm thinking we can expand that > > infrastructure so that we can also use it for TDE purpose by > > supporting new cryptographic functions such as AES-CTR. Anyway, I > > agree to not have it depend on pgcrypto. > > I thought for a minute, before checking the patch, that you were > >>saying > above that the KMS patch includes its *own* implementation of > cryptographic functions. I think it's pretty crucial that it > >>continues > not to do that... > >>> > >>>I meant that we're going to use OpenSSL for AES encryption and > >>>decryption independent of pgcrypto's openssl code, as the first step. > >>>That is, KMS is available only when configured --with-openssl. And > >>>hopefully we eventually merge these openssl code and have pgcrypto use > >>>it, like when we introduced SCRAM. > >>> > >> > >>I don't think it's very likely we'll ever merge any openssl code into > >>our repository, e.g. because of licensing. But we already have AES > >>implementation in pgcrypto - why not to use that? I'm not saying we > >>should make this depend on pgcrypto, but maybe we should move the AES > >>library from pgcrypto into src/common or something like that. > > > >The code uses functions exposed by openssl, it doesn't copy there code. > > > > Sure, I know the code is currently calling ooenssl functions. I was > responding to Masahiko-san's message that we might eventually merge this > openssl code into our tree. Sorry for confusing you. What I wanted to say is to write AES encryption code in src/common using openssl library as the first step apart from pgcrypto's openssl code, and then merge these two code library into src/common as the next step. That is, it's moving the AES library pgcrypto to src/common as you mentioned. IIRC when we introduced SCRAM we moved sha2 library from pgcrypto to src/common. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
On Sun, Aug 25, 2019 at 1:56 PM Tom Lane wrote: > I agree that teaching opclasses to say whether this is okay is a > reasonable approach. I've begun working on this, with help from Anastasia. My working assumption is that I only need to care about opclass-declared input data types (pg_opclass.opcintype), plus the corresponding collations -- the former can be used to lookup an appropriate pg_amproc entry (i.e. B-Tree support function 4), while the latter are passed to the support function to get an answer about whether or not it's okay to use deduplication. This approach seems to be good enough as far as the deduplication project's needs are concerned. However, I think that I probably need to take a broader view of the problem than that. Any guidance would be much appreciated. > > Consumers of this new infrastructure probably won't be limited to the > > deduplication feature; > > Indeed, we run up against this sort of thing all the time in, eg, planner > optimizations. I think some sort of "equality is precise" indicator > would be really useful for a lot of things. Suppose I wanted to add support for deduplication of a B-Tree index on an array of integers. This probably wouldn't be very compelling, but just suppose. It's not clear how this could work within the confines of the type and operator class systems. I can hardly determine that it's safe or unsafe to do so at CREATE INDEX time, since the opclass-declared input data type is always the pg_type.oid corresponding to 'anyarray' -- I am forced to make a generic assumption that deduplication is not safe. I must make this conservative assumption since, in general, the indexed column could turn out to be an array of numeric datums -- a "transitively unsafe" anyarray (numeric's display scale issue could leak into anyarray). I'm not actually worried about any practical downside that this may create for users of the B-Tree deduplication feature; a B-Tree index on an array *is* a pretty niche thing. Does seem like I should make sure that I get this right, though. Code like the 'anyarray' B-Tree support function 1 (i.e. btarraycmp()/array_cmp()) doesn't hint at a solution -- it merely does a lookup of the underlying type's comparator using the typcache. That depends on having actual anyarray datums to do something with, which isn't something that this new infrastructure can rely on in any way. I suppose that the only thing that would work here would be to somehow look through the pg_attribute entry for the index column, which will have the details required to distinguish between (say) an array of integers (which is safe, I think) from an array of numerics (which is unsafe). From there, the information about the element type could (say) be passed to the anyarray default opclass' support function 4, which could do its own internal lookup. That seems like it might be a solution in search of a problem, though. BTW, I currently forbid cross-type support function 4 entries for an opclass, on the grounds that that isn't sensible for deduplication. Do you think that that restriction is appropriate in general, given the likelihood that this support function will be used in several other areas? Thanks -- Peter Geoghegan
Re: logical decoding : exceeded maxAllocatedDescs for .spill files
On Sat, Feb 8, 2020 at 12:10 AM Andres Freund wrote: > > Hi, > > On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote: > > I performed the same test in pg11 and reproduced the issue on the > > commit prior to a4ccc1cef5a04 (Generational memory allocator). > > > > ulimit -s 1024 > > ulimit -v 30 > > > > wal_level = logical > > max_replication_slots = 4 > > > > [...] > > > After that, I applied the "Generational memory allocator" patch and > > that solved the issue. From the error message, it is evident that the > > underlying code is trying to allocate a MaxTupleSize memory for each > > tuple. So, I re-introduced the following lines (which are removed by > > a4ccc1cef5a04) on top of the patch: > > > --- a/src/backend/replication/logical/reorderbuffer.c > > +++ b/src/backend/replication/logical/reorderbuffer.c > > @@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size > > tuple_len) > > > > alloc_len = tuple_len + SizeofHeapTupleHeader; > > > > + if (alloc_len < MaxHeapTupleSize) > > + alloc_len = MaxHeapTupleSize; > > Maybe I'm being slow here - but what does this actually prove? Before > the generation contexts were introduced we avoided fragmentation (which > would make things unusably slow) using a a brute force method (namely > forcing all tuple allocations to be of the same/maximum size). > It seems for this we formed a cache of max_cached_tuplebufs number of objects and we don't need to allocate more than that number of tuples of size MaxHeapTupleSize because we will anyway return that memory to aset.c. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Postgres 32 bits client compilation fail. Win32 bits client is supported?
On Sat, Feb 8, 2020 at 8:05 AM Ranier Vilela wrote: > > Hi, > I am migrating my applications that use postgres client from msvc 2010 > (32bits) to msvc 2019 (32 bits). > Compilation using msvc 2019 (64 bits), works very well. > But the build using msvc 2019 (32 bit) is not working. > The 32-bit Platform variable is set to x86, resulting in the first error. > > "Project" C: \ dll \ postgres \ pgsql.sln "on node 1 (default targets). > C: \ dll \ postgres \ pgsql.sln.metaproj: error MSB4126: The specified > solution configuration "Release | x86" is invalid. Plea > if specify a valid solution configuration using the Configuration and > Platform properties (e.g. MSBuild.exe Solution.sl > n / p: Configuration = Debug / p: Platform = "Any CPU") or leave those > properties blank to use the default solution configurati > on. [C: \ dll \ postgres \ pgsql.sln] > Done Building Project "C: \ dll \ postgres \ pgsql.sln" (default targets) - > FAILED. " > > This is because the Sub DeterminePlatform function of the Solution.pm program > uses the following expression: > "my $ output =` cl /? 2> & 1`; " > The result of msvc 2019 (32bits) is: > "Microsoft (R) C / C ++ Optimizing Compiler Version 19.24.28315 for x86" > > By setting the Platform variable manually to WIn32, the compilation process > continues until it stops at: > "Generating configuration headers ..." > > Where the second error occurs: > unused defines: HAVE_STRUCT_CMSGCRED > USE_NAMED_POSI ... etc ... > ALIGNOF_DOUBLE USE_DEV_URANDOM at C: \ dll \ postgres \ src \ tools \ msvc / > Mkvcbuild.pm line 842. > Try by removing src/include/pg_config. > Question: > Will Postgres continue to support 32-bit client? > I am not aware of any discussion related to stopping the support of 32-bit client. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com