Re: POC: rational number type (fractions)

2020-02-08 Thread legrand legrand
Hello,
It seems you are not the first to be interested in such feature.

There was a similar extension used in "incremental view maintenance"
testing:
https://github.com/nuko-yokohama/pg_fraction

didn't tryed it myself.

Regards
PAscal 



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html




Re: Index Skip Scan

2020-02-08 Thread Tomas Vondra

On Fri, Feb 07, 2020 at 05:25:43PM +0100, Dmitry Dolgov wrote:

On Thu, Feb 06, 2020 at 09:22:20PM +0900, Kyotaro Horiguchi wrote:
At Thu, 6 Feb 2020 11:57:07 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote 
in
> > On Thu, Feb 06, 2020 at 10:24:50AM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 5 Feb 2020 17:37:30 +0100, Dmitry Dolgov <9erthali...@gmail.com> 
wrote in
> > We could add an additional parameter "in_cursor" to
> > ExecSupportBackwardScan and let skip scan return false if in_cursor is
> > true, but I'm not sure it's acceptable.
>
> I also was thinking about whether it's possible to use
> ExecSupportBackwardScan here, but skip scan is just a mode of an
> index/indexonly scan. Which means that ExecSupportBackwardScan also need
> to know somehow if this mode is being used, and then, since this
> function is called after it's already decided to use skip scan in the
> resulting plan, somehow correct the plan (exclude skipping and try to
> find next best path?) - do I understand your suggestion correct?

I didn't thought so hardly, but a bit of confirmation told me that
IndexSupportsBackwardScan returns fixed flag for AM.  It seems that
things are not that simple.


Yes, I've mentioned that already in one of the previous emails :) The
simplest way I see to achieve what we want is to do something like in
attached modified version with a new hasDeclaredCursor field. It's not a
final version though, but posted just for discussion, so feel free to
suggest any improvements or alternatives.


IMO the proper fix for this case (moving forward, reading backwards) is
simply making it work by properly checking deleted tuples etc. Not sure
why that would be so much complex (haven't tried implementing it)?

I think making this depend on things like declared cursor etc. is going
to be tricky, may easily be more complex than checking deleted tuples,
and the behavior may be quite surprising.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Marking some contrib modules as trusted extensions

2020-02-08 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Julien Rouhaud  writes:
> >>> Probably NO, if only because you'd need additional privileges
> >>> to use these anyway:
> >>> pg_stat_statements
> 
> > But the additional privileges are global, so assuming the extension
> > has been properly setup, wouldn't it be sensible to ease the
> > per-database installation?  If not properly setup, there's no harm in
> > creating the extension anyway.
> 
> Mmm, I'm not convinced --- the ability to see what statements are being
> executed in other sessions (even other databases) is something that
> paranoid installations might not be so happy about.

Of course, but that's why we have a default role which allows
installations to control access to that kind of information- and it's
already being checked in the pg_stat_statements case and in the
pg_stat_activity case:

/* Superusers or members of pg_read_all_stats members are allowed */
is_allowed_role = is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_ALL_STATS);

> Our previous
> discussions about what privilege level is needed to look at
> pg_stat_statements info were all made against a background assumption
> that you needed some extra privilege to set up the view in the first
> place.  I think that would need another look or two before being
> comfortable that we're not shifting the goal posts too far.

While you could maybe argue that's true for pg_stat_statements, it's
certainly not true for pg_stat_activity, so I don't buy it for either.
This looks like revisionist history to justify paranoia.  I understand
the general concern, but if we were really depending on the mere
installation of the extension to provide security then we wouldn't have
bothered putting in checks like the one above, and, worse, I think our
users would be routinely complaining that our extensions don't follow
our security model and how they can't install them.

Lots of people want to use pg_stat_statements, even in environments
where not everyone on the database server, or even in the database you
want pg_stat_statements in, is trusted, and therefore we have to have
these additional checks inside the extension itself.

The same goes for just about everything else (I sure hope, at least) in
our extensions set- none of the core extensions should be allowing
access to things which break our security model, even if they're
installed, unless some additional privileges are granted out.  The act
of installing a core extension should not create a security risk for our
users- if it did, it'd be a security issue and CVE-worthy.

As such, I really don't agree with this entire line of thinking when it
comes to our core extensions.  I view the 'trusted extension' model as
really for things where the extension author doesn't care about, and
doesn't wish to care about, dealing with our security model and making
sure that it's followed.  We do care, and we do maintain, the security
model that we have throughout the core extensions.  

What I expect and hope will happen is that people will realize that, now
that they can have non-superusers installing these extensions and
therefore they don't have to give out superuser-level rights as much,
there will be asks for more default roles to allow granting out of
access to formerly superuser-only capabilities.  There's a bit of a
complication there since there might be privileges that only make sense
for a specific extension, but an extension can't really install a new
default role (and, even if it did, the role would have to be only
available to the superuser initially anyway), so we might have to try
and come up with some more generic and reusable default role for that
case.  Still, we can try to deal with that when it happens.

Consider that you may wish to have a system that, once installed, a
superuser will virtually never access again, but one where you want
users to be able to install and use extensions like postgis and
pg_stat_statements.  That can be done with these changes, and that's
fantastic progress- you just install PG, create a non-superuser role,
make them the DB owner, and then GRANT things like pg_read_all_stats to
their role with admin rights, and boom, they're good to go and you
didn't have to hack up the PG source code at all.

> The bigger picture here is that I don't want to get push-back that
> we've broken somebody's security posture by marking too many extensions
> trusted.  So for anything where there's any question about security
> implications, we should err in the conservative direction of leaving
> it untrusted.

This is just going to a) cause our users to complain about not being
able to install extensions that they've routinely installed in the past,
and b) make our users wonder what it is about these extensions that
we've decided can't be trusted to even just be installed and if they're
at risk today because they've installed them.

While it might not seem obvious, the discussion over on the thread about
DEFAULT PRIVILEGES and pg_

Re: Index Skip Scan

2020-02-08 Thread Tomas Vondra

Hi,

I've done some testing and benchmarking of the v31 patch, looking for
regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
DISTINCT queries on data sets of various size, number of distinct values
etc. The results are fairly large, so I've uploaded them to github

https://github.com/tvondra/skip-scan-test

There are four benchmark groups, depending on how the data are generated
and availability of extended stats and if the columns are independent:

1) skipscan - just indexes, columns are independent

2) skipscan-with-stats - indexes and extended stats, independent columns

3) skipscan-correlated - just indexes, correlated columns

4) skipscan-correlated-with-stats - indexes and extended stats,
correlated columns

The github repository contains *.ods spreadsheet comparing duration with
the regular query plan (no skip scan) and skip scan. In general, there
are pretty massive speedups, often by about two orders of magnitude.

There are a couple of regressions, where the plan with skipscan enables
is ~10x slower. But this seems to happen only in high-cardinality cases
where we misestimate the number of groups. Consider a table with two
independent columns

  CREATE TABLE t (a text, b text);
  INSERT INTO t SELECT
md5((1*random())::int::text),
md5((1*random())::int::text)
  FROM generate_series(1,100) s(i);

  CREATE INDEX ON t(a,b);

  ANALYZE;

which then behaves like this:

  test=# select * from (select distinct a,b from t) foo offset 1000;
  Time: 3138.222 ms (00:03.138)
  test=# set enable_indexskipscan = off;
  Time: 0.312 ms
  test=# select * from (select distinct a,b from t) foo offset 1000;
  Time: 199.749 ms

So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 1), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.

I don't think this is an issue the skipscan patch needs to fix, though.
Firstly, the regressed cases are a tiny minority. Secondly, we already
have a way to improve the root cause - creating extended stats with
ndistinct coefficients generally makes the problem go away.

One interesting observation however is that this regression only
happened with text columns but not with int or bigint. My assumption is
that this is due to text comparisons being much more expensive. Not sure
if there is something we could do to deal with this - reduce the number
of comparisons or something?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Index Skip Scan

2020-02-08 Thread Tomas Vondra

OK,

A couple more comments based on quick review of the patch, particularly
the part related to planning:

1) create_skipscan_unique_path has one assert commented out. Either it's
something we want to enforce, or we should remove it.

  /*Assert(distinctPrefixKeys <= list_length(pathnode->path.pathkeys));*/


2) I wonder if the current costing model is overly optimistic. We simply
copy the startup cost from the IndexPath, which seems fine. But for
total cost we do this:

  pathnode->path.total_cost = basepath->startup_cost * numGroups;

which seems a bit too simplistic. The startup cost is pretty much just
the cost to find the first item in the index, but surely we need to do
more to find the next group - we need to do comparisons to skip some of
the items, etc. If we think that's unnecessary, we need to explain it in
a comment or somthing.


3) I don't think we should make planning dependent on hasDeclareCursor.


4) I'm not quite sure how sensible it's to create a new IndexPath in
create_skipscan_unique_path. On the one hand it works, but I don't think
any other path is constructed like this so I wonder if we're missing
something. Perhaps it'd be better to just add a new path node on top of
the IndexPath, and then handle this in create_plan. We already do
something similar for Bitmap Index Scans, where we create a different
executor node from IndexPath depending on the parent node.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




ALTER TABLE rewrite to use clustered order

2020-02-08 Thread Justin Pryzby
Forking this thread
https://www.postgresql.org/message-id/20181227132417.xe3oagawina7775b%40alvherre.pgsql

On Wed, Dec 26, 2018 at 01:09:39PM -0500, Robert Haas wrote:
> ALTER TABLE already has a lot of logic that is oriented towards being
> able to do multiple things at the same time.  If we added CLUSTER,
> VACUUM FULL, and REINDEX to that set, then you could, say, change a
> data type, cluster, and change tablespaces all in a single SQL
> command.

On Thu, Dec 27, 2018 at 10:24:17AM -0300, Alvaro Herrera wrote:
> I think it would be valuable to have those ALTER TABLE variants that rewrite
> the table do so using the cluster order, if there is one, instead of the heap
> order, which is what it does today.

That's a neat idea.

I haven't yet fit all of ALTERs processing logic in my head ... but there's an
issue that ALTER (unlike CLUSTER) needs to deal with column type promotion, so
the indices may need to be dropped and recreated.  The table rewrite happens
AFTER dropping indices (and all other processing), but the clustered index
can't be scanned if it's just been dropped.  I handled that by using a
tuplesort, same as heapam_relation_copy_for_cluster.

Experimental patch attached.  With clustered ALTER:

template1=# DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,999)i; 
CREATE INDEX ON t(i DESC); ALTER TABLE t CLUSTER ON t_i_idx; ALTER TABLE t 
ALTER i TYPE bigint; SELECT * FROM t LIMIT 9;
DROP TABLE
SELECT 999
CREATE INDEX
ALTER TABLE
ALTER TABLE
  i  
-
 999
 998
 997
 996
 995
 994
 993
 992
 991
(9 rows)

0001 patch is stolen from the nearby thread:
https://www.postgresql.org/message-id/flat/20200207143935.GP403%40telsasoft.com
It doesn't make much sense for ALTER to use a clustered index when rewriting a
table, if doesn't also go to the effort to preserve the cluster property when
rebuilding its indices.

0002 patch is included and not squished with 0003 to show the original
implementation using an index scan (by not dropping indices on the old table,
and breaking various things), and the evolution to tuplesort.

Note, this doesn't use clustered order when rewriting only due to tablespace
change.  Alter currently does an AM specific block copy without looking at
tuples.  But I think it'd be possible to use tuplesort and copy if desired.
>From f93bbd6c30e883068f46ff86def28d0e66aea4f5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 7 Feb 2020 22:06:57 -0600
Subject: [PATCH v1 1/3] Preserve CLUSTER ON index during ALTER rewrite

Amit Langote and Justin Pryzby

https://www.postgresql.org/message-id/flat/20200202161718.GI13621%40telsasoft.com
---
 src/backend/commands/tablecmds.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b7c8d66..642a85c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -490,6 +490,7 @@ static void ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId,
 static void RebuildConstraintComment(AlteredTableInfo *tab, int pass,
 	 Oid objid, Relation rel, List *domname,
 	 const char *conname);
+static void PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid);
 static void TryReuseIndex(Oid oldId, IndexStmt *stmt);
 static void TryReuseForeignKey(Oid oldId, Constraint *con);
 static ObjectAddress ATExecAlterColumnGenericOptions(Relation rel, const char *colName,
@@ -11838,6 +11839,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			newcmd->def = (Node *) stmt;
 			tab->subcmds[AT_PASS_OLD_INDEX] =
 lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd);
+
+			/* Preserve index's indisclustered property, if set. */
+			PreserveClusterOn(tab, AT_PASS_OLD_INDEX, oldId);
 		}
 		else if (IsA(stm, AlterTableStmt))
 		{
@@ -11874,6 +11878,9 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 			 rel,
 			 NIL,
 			 indstmt->idxname);
+
+	/* Preserve index's indisclustered property, if set. */
+	PreserveClusterOn(tab, AT_PASS_OLD_INDEX, indoid);
 }
 else if (cmd->subtype == AT_AddConstraint)
 {
@@ -11997,6 +12004,38 @@ RebuildConstraintComment(AlteredTableInfo *tab, int pass, Oid objid,
 }
 
 /*
+ * For a table's index that is to be recreated due to PostAlterType
+ * processing, preserve its indisclustered property by issuing ALTER TABLE
+ * CLUSTER ON command on the table that will run after the command to recreate
+ * the index.
+ */
+static void
+PreserveClusterOn(AlteredTableInfo *tab, int pass, Oid indoid)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	Assert(OidIsValid(indoid));
+	Assert(pass == AT_PASS_OLD_INDEX);
+
+	indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indoid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indoid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	if (indexForm->indisclustered)
+	{
+		Alt

Re: Internal key management system

2020-02-08 Thread Tomas Vondra

On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote:

On Sat, 8 Feb 2020 at 03:24, Andres Freund  wrote:


Hi,

On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote:
> Yeah I'm not going to use pgcrypto for transparent data encryption.
> The KMS patch includes the new basic infrastructure for cryptographic
> functions (mainly AES-CBC). I'm thinking we can expand that
> infrastructure so that we can also use it for TDE purpose by
> supporting new cryptographic functions such as AES-CTR. Anyway, I
> agree to not have it depend on pgcrypto.

I thought for a minute, before checking the patch, that you were saying
above that the KMS patch includes its *own* implementation of
cryptographic functions.  I think it's pretty crucial that it continues
not to do that...


I meant that we're going to use OpenSSL for AES encryption and
decryption independent of pgcrypto's openssl code, as the first step.
That is, KMS is available only when configured --with-openssl. And
hopefully we eventually merge these openssl code and have pgcrypto use
it, like when we introduced SCRAM.



I don't think it's very likely we'll ever merge any openssl code into
our repository, e.g. because of licensing. But we already have AES
implementation in pgcrypto - why not to use that? I'm not saying we
should make this depend on pgcrypto, but maybe we should move the AES
library from pgcrypto into src/common or something like that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Internal key management system

2020-02-08 Thread Tomas Vondra

Hi,

I wonder if this is meant to support external KMS systems/services like
Vault (from HashiCorp) or CloudHSM (from AWS) or a hardware HSM. AFAICS
the current implementation does not allow storing keys in such external
systems, right? But it seems kinda reasonable to want to do that, when
already using the HSM for other parts of the system.

Now, I'm not saying the first version we commit has to support this, or
that it necessarily makes sense. But for example MariaDB seems to
support this [1].

[1] https://mariadb.com/kb/en/encryption-key-management/

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Index Skip Scan

2020-02-08 Thread Dmitry Dolgov
> On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
>
> I've done some testing and benchmarking of the v31 patch, looking for
> regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
> DISTINCT queries on data sets of various size, number of distinct values
> etc. The results are fairly large, so I've uploaded them to github
>
> https://github.com/tvondra/skip-scan-test

Thanks a lot for testing!

> There are a couple of regressions, where the plan with skipscan enables
> is ~10x slower. But this seems to happen only in high-cardinality cases
> where we misestimate the number of groups. Consider a table with two
> independent columns
>
>   CREATE TABLE t (a text, b text);
>   INSERT INTO t SELECT
> md5((1*random())::int::text),
> md5((1*random())::int::text)
>   FROM generate_series(1,100) s(i);
>
>   CREATE INDEX ON t(a,b);
>
>   ANALYZE;
>
> which then behaves like this:
>
>   test=# select * from (select distinct a,b from t) foo offset 1000;
>   Time: 3138.222 ms (00:03.138)
>   test=# set enable_indexskipscan = off;
>   Time: 0.312 ms
>   test=# select * from (select distinct a,b from t) foo offset 1000;
>   Time: 199.749 ms
>
> So in this case the skip scan is ~15x slower than the usual plan (index
> only scan + unique). The reason why this happens is pretty simple - to
> estimate the number of groups we multiply the ndistinct estimates for
> the two columns (which both have n_distinct = 1), but then we cap
> the estimate to 10% of the table. But when the columns are independent
> with high cardinalities that under-estimates the actual value, making
> the cost for skip scan much lower than it should be.

The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.

> I don't think this is an issue the skipscan patch needs to fix, though.
> Firstly, the regressed cases are a tiny minority. Secondly, we already
> have a way to improve the root cause - creating extended stats with
> ndistinct coefficients generally makes the problem go away.

Yes, I agree.

> One interesting observation however is that this regression only
> happened with text columns but not with int or bigint. My assumption is
> that this is due to text comparisons being much more expensive. Not sure
> if there is something we could do to deal with this - reduce the number
> of comparisons or something?

Hm, interesting. I need to check that we do not do any unnecessary
comparisons.

> On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote:
> > Yes, I've mentioned that already in one of the previous emails :) The
> > simplest way I see to achieve what we want is to do something like in
> > attached modified version with a new hasDeclaredCursor field. It's not a
> > final version though, but posted just for discussion, so feel free to
> > suggest any improvements or alternatives.
>
> IMO the proper fix for this case (moving forward, reading backwards) is
> simply making it work by properly checking deleted tuples etc. Not sure
> why that would be so much complex (haven't tried implementing it)?

It's probably not that complex by itself, but requires changing
responsibilities isolation. At the moment current implementation leaves
jumping over a tree fully to _bt_skip, and heap visibility checks only
to IndexOnlyNext. To check deleted tuples properly we need to either
verify a corresponding heap tuple visibility inside _bt_skip (as I've
mentioned in one of the previous emails, checking if an index tuple is
dead is not enough), or teach the code in IndexOnlyNext to understand
that _bt_skip can lead to returning the same tuple while moving forward
& reading backward. Do you think it's still makes sense to go this way?




Re: Internal key management system

2020-02-08 Thread Andres Freund
Hi, 

On February 8, 2020 7:08:26 AM PST, Tomas Vondra  
wrote:
>On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote:
>>On Sat, 8 Feb 2020 at 03:24, Andres Freund  wrote:
>>>
>>> Hi,
>>>
>>> On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote:
>>> > Yeah I'm not going to use pgcrypto for transparent data
>encryption.
>>> > The KMS patch includes the new basic infrastructure for
>cryptographic
>>> > functions (mainly AES-CBC). I'm thinking we can expand that
>>> > infrastructure so that we can also use it for TDE purpose by
>>> > supporting new cryptographic functions such as AES-CTR. Anyway, I
>>> > agree to not have it depend on pgcrypto.
>>>
>>> I thought for a minute, before checking the patch, that you were
>saying
>>> above that the KMS patch includes its *own* implementation of
>>> cryptographic functions.  I think it's pretty crucial that it
>continues
>>> not to do that...
>>
>>I meant that we're going to use OpenSSL for AES encryption and
>>decryption independent of pgcrypto's openssl code, as the first step.
>>That is, KMS is available only when configured --with-openssl. And
>>hopefully we eventually merge these openssl code and have pgcrypto use
>>it, like when we introduced SCRAM.
>>
>
>I don't think it's very likely we'll ever merge any openssl code into
>our repository, e.g. because of licensing. But we already have AES
>implementation in pgcrypto - why not to use that? I'm not saying we
>should make this depend on pgcrypto, but maybe we should move the AES
>library from pgcrypto into src/common or something like that.

The code uses functions exposed by openssl, it doesn't copy there code.

And no, I don't think we should copy the implemented from pgcrypto - it's not 
good. We should remove it entirely.

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Internal key management system

2020-02-08 Thread Tomas Vondra

On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote:

Hi,

On February 8, 2020 7:08:26 AM PST, Tomas Vondra  
wrote:

On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote:

On Sat, 8 Feb 2020 at 03:24, Andres Freund  wrote:


Hi,

On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote:
> Yeah I'm not going to use pgcrypto for transparent data

encryption.

> The KMS patch includes the new basic infrastructure for

cryptographic

> functions (mainly AES-CBC). I'm thinking we can expand that
> infrastructure so that we can also use it for TDE purpose by
> supporting new cryptographic functions such as AES-CTR. Anyway, I
> agree to not have it depend on pgcrypto.

I thought for a minute, before checking the patch, that you were

saying

above that the KMS patch includes its *own* implementation of
cryptographic functions.  I think it's pretty crucial that it

continues

not to do that...


I meant that we're going to use OpenSSL for AES encryption and
decryption independent of pgcrypto's openssl code, as the first step.
That is, KMS is available only when configured --with-openssl. And
hopefully we eventually merge these openssl code and have pgcrypto use
it, like when we introduced SCRAM.



I don't think it's very likely we'll ever merge any openssl code into
our repository, e.g. because of licensing. But we already have AES
implementation in pgcrypto - why not to use that? I'm not saying we
should make this depend on pgcrypto, but maybe we should move the AES
library from pgcrypto into src/common or something like that.


The code uses functions exposed by openssl, it doesn't copy there code.



Sure, I know the code is currently calling ooenssl functions. I was
responding to Masahiko-san's message that we might eventually merge this
openssl code into our tree.


And no, I don't think we should copy the implemented from pgcrypto -
it's not good. We should remove it entirely.


OK, no opinion on the quality of this implementation.


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE rewrite to use clustered order

2020-02-08 Thread Tom Lane
Justin Pryzby  writes:
> On Thu, Dec 27, 2018 at 10:24:17AM -0300, Alvaro Herrera wrote:
>> I think it would be valuable to have those ALTER TABLE variants that rewrite
>> the table do so using the cluster order, if there is one, instead of the heap
>> order, which is what it does today.

> That's a neat idea.

TBH, I'm -1 on this.  The current behavior of preserving physical order is
perfectly sane, and it's faster than anything involving CLUSTER is going
to be, and if you try to change that you are going to have enormous
headaches with the variants of ALTER TABLE that would change the semantics
of the CLUSTER index columns.  (Unless of course your theory is that you
don't actually care exactly what the finished order is, in which case why
are we bothering?)

The proposed patch which *forces* it to be done like that, whether the
user wants it or not, seems particularly poorly thought out.

regards, tom lane




Re: Index Skip Scan

2020-02-08 Thread Tomas Vondra

On Sat, Feb 08, 2020 at 04:24:40PM +0100, Dmitry Dolgov wrote:

On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:

I've done some testing and benchmarking of the v31 patch, looking for
regressions, costing issues etc. Essentially, I've ran a bunch of SELECT
DISTINCT queries on data sets of various size, number of distinct values
etc. The results are fairly large, so I've uploaded them to github

https://github.com/tvondra/skip-scan-test


Thanks a lot for testing!


There are a couple of regressions, where the plan with skipscan enables
is ~10x slower. But this seems to happen only in high-cardinality cases
where we misestimate the number of groups. Consider a table with two
independent columns

  CREATE TABLE t (a text, b text);
  INSERT INTO t SELECT
md5((1*random())::int::text),
md5((1*random())::int::text)
  FROM generate_series(1,100) s(i);

  CREATE INDEX ON t(a,b);

  ANALYZE;

which then behaves like this:

  test=# select * from (select distinct a,b from t) foo offset 1000;
  Time: 3138.222 ms (00:03.138)
  test=# set enable_indexskipscan = off;
  Time: 0.312 ms
  test=# select * from (select distinct a,b from t) foo offset 1000;
  Time: 199.749 ms

So in this case the skip scan is ~15x slower than the usual plan (index
only scan + unique). The reason why this happens is pretty simple - to
estimate the number of groups we multiply the ndistinct estimates for
the two columns (which both have n_distinct = 1), but then we cap
the estimate to 10% of the table. But when the columns are independent
with high cardinalities that under-estimates the actual value, making
the cost for skip scan much lower than it should be.


The current implementation checks if we can find the next value on the
same page to do a shortcut instead of tree traversal and improve such
kind of situations, but I can easily imagine that it's still not enough
in some extreme situations.



Yeah. I'm not sure there's room for further improvements. The regressed
cases were subject to the 10% cap, and with ndistinct being more than
10% of the table, we probably can find many distinct keys on each index
page - we know that every ~10 rows the values change.


I don't think this is an issue the skipscan patch needs to fix, though.
Firstly, the regressed cases are a tiny minority. Secondly, we already
have a way to improve the root cause - creating extended stats with
ndistinct coefficients generally makes the problem go away.


Yes, I agree.


One interesting observation however is that this regression only
happened with text columns but not with int or bigint. My assumption is
that this is due to text comparisons being much more expensive. Not sure
if there is something we could do to deal with this - reduce the number
of comparisons or something?


Hm, interesting. I need to check that we do not do any unnecessary
comparisons.


On Sat, Feb 08, 2020 at 02:11:59PM +0100, Tomas Vondra wrote:
> Yes, I've mentioned that already in one of the previous emails :) The
> simplest way I see to achieve what we want is to do something like in
> attached modified version with a new hasDeclaredCursor field. It's not a
> final version though, but posted just for discussion, so feel free to
> suggest any improvements or alternatives.

IMO the proper fix for this case (moving forward, reading backwards) is
simply making it work by properly checking deleted tuples etc. Not sure
why that would be so much complex (haven't tried implementing it)?


It's probably not that complex by itself, but requires changing
responsibilities isolation. At the moment current implementation leaves
jumping over a tree fully to _bt_skip, and heap visibility checks only
to IndexOnlyNext. To check deleted tuples properly we need to either
verify a corresponding heap tuple visibility inside _bt_skip (as I've
mentioned in one of the previous emails, checking if an index tuple is
dead is not enough), or teach the code in IndexOnlyNext to understand
that _bt_skip can lead to returning the same tuple while moving forward
& reading backward. Do you think it's still makes sense to go this way?


Not sure. I have to think about this first.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Internal key management system

2020-02-08 Thread Tom Lane
Tomas Vondra  writes:
> On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote:
>> On February 8, 2020 7:08:26 AM PST, Tomas Vondra 
>>  wrote:
 I don't think it's very likely we'll ever merge any openssl code into
 our repository, e.g. because of licensing. But we already have AES
 implementation in pgcrypto - why not to use that? I'm not saying we
 should make this depend on pgcrypto, but maybe we should move the AES
 library from pgcrypto into src/common or something like that.

>> The code uses functions exposed by openssl, it doesn't copy there code.

> Sure, I know the code is currently calling ooenssl functions. I was
> responding to Masahiko-san's message that we might eventually merge this
> openssl code into our tree.

No.  This absolutely, positively, will not happen.  There will never be
crypto functions in our core tree, because then there'd be no recourse for
people who want to use Postgres in countries with restrictions on crypto
software.  It's hard enough for them that we have such code in contrib
--- but at least they can remove pgcrypto and be legal.  If it's in
src/common then they're stuck.

For the same reason, I don't think that an "internal key management"
feature in the core code is ever going to be acceptable.  It has to
be an extension.  (But, as long as it's an extension, whether it's
bringing its own crypto or relying on some other extension for that
doesn't matter from the legal standpoint.)

regards, tom lane




Re: Marking some contrib modules as trusted extensions

2020-02-08 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Our previous
>> discussions about what privilege level is needed to look at
>> pg_stat_statements info were all made against a background assumption
>> that you needed some extra privilege to set up the view in the first
>> place.  I think that would need another look or two before being
>> comfortable that we're not shifting the goal posts too far.

> While you could maybe argue that's true for pg_stat_statements, it's
> certainly not true for pg_stat_activity, so I don't buy it for either.

The analogy of pg_stat_activity certainly suggests that there shouldn't be
a reason *in principle* why pg_stat_statements couldn't be made trusted.
There's a difference between that statement and saying that *in practice*
pg_stat_statements is ready to be trusted right now with no further
changes.  I haven't done the analysis needed to conclude that, and don't
care to do so as part of this patch proposal.

> The same goes for just about everything else (I sure hope, at least) in
> our extensions set- none of the core extensions should be allowing
> access to things which break our security model, even if they're
> installed, unless some additional privileges are granted out.

Maybe not, but the principle of defense-in-depth still says that admins
could reasonably want to not let dangerous tools get installed in the
first place.

> As such, I really don't agree with this entire line of thinking when it
> comes to our core extensions.  I view the 'trusted extension' model as
> really for things where the extension author doesn't care about, and
> doesn't wish to care about, dealing with our security model and making
> sure that it's followed.  We do care, and we do maintain, the security
> model that we have throughout the core extensions.  

I am confused as to what "entire line of thinking" you are objecting
to.  Are you now saying that we should forget the trusted-extension
model?  Or maybe that we can just mark *everything* we ship as trusted?
I'm not going to agree with either.

>> The bigger picture here is that I don't want to get push-back that
>> we've broken somebody's security posture by marking too many extensions
>> trusted.  So for anything where there's any question about security
>> implications, we should err in the conservative direction of leaving
>> it untrusted.

> This is just going to a) cause our users to complain about not being
> able to install extensions that they've routinely installed in the past,

That's utter nonsense.  Nothing here is taking away privileges that
existed before; if you could install $whatever as superuser before,
you still can.  OTOH, we *would* have a problem of that sort if we
marked $whatever as trusted and then later had to undo it.  So I
think there's plenty of reason to be conservative about the first
wave of what-to-mark-as-trusted.  Once we've got more experience
with this mechanism under our belts, we might decide we can be more
liberal about it.

> and b) make our users wonder what it is about these extensions that
> we've decided can't be trusted to even just be installed and if they're
> at risk today because they've installed them.

Yep, you're right, this patch does make value judgements of that
sort, and I'll stand behind them.  Giving people the impression that,
say, postgres_fdw isn't any more dangerous than cube isn't helpful.

> While it might not seem obvious, the discussion over on the thread about
> DEFAULT PRIVILEGES and pg_init_privs is actually a lot more relevant
> here- there's extensions we have that expect certain functions, once
> installed, to be owned by a superuser (which will still be the case
> here, thanks to how you've addressed that), but then to not have EXECUTE
> rights GRANT'd to anyone (thanks to the REVERT FROM PUBLIC in the
> installation), but that falls apart when someone's decided to set
> up DEFAULT PRIVILEGES for the superuser.  While no one seems to want to
> discuss that with me, unfortunately, it's becoming more and more clear
> that we need to skip DEFAULT PRIVILEGES from being applied during
> extension creation.

Or that we can't let people apply default privileges to superuser-created
objects at all.  But I agree that that's a different discussion.

regards, tom lane




Re: Index Skip Scan

2020-02-08 Thread James Coleman
On Sat, Feb 8, 2020 at 10:24 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Sat, Feb 08, 2020 at 03:22:17PM +0100, Tomas Vondra wrote:
> > So in this case the skip scan is ~15x slower than the usual plan (index
> > only scan + unique). The reason why this happens is pretty simple - to
> > estimate the number of groups we multiply the ndistinct estimates for
> > the two columns (which both have n_distinct = 1), but then we cap
> > the estimate to 10% of the table. But when the columns are independent
> > with high cardinalities that under-estimates the actual value, making
> > the cost for skip scan much lower than it should be.
>
> The current implementation checks if we can find the next value on the
> same page to do a shortcut instead of tree traversal and improve such
> kind of situations, but I can easily imagine that it's still not enough
> in some extreme situations.

This is almost certainly rehashing already covered ground, but since I
doubt it's been discussed recently, would you be able to summarize
that choice (not to always get the next tuple by scanning from the top
of the tree again) and the performance/complexity tradeoffs?

Thanks,
James




Re: Internal key management system

2020-02-08 Thread Masahiko Sawada
On Sun, 9 Feb 2020 at 01:53, Tomas Vondra  wrote:
>
> On Sat, Feb 08, 2020 at 07:47:24AM -0800, Andres Freund wrote:
> >Hi,
> >
> >On February 8, 2020 7:08:26 AM PST, Tomas Vondra 
> > wrote:
> >>On Sat, Feb 08, 2020 at 02:48:54PM +0900, Masahiko Sawada wrote:
> >>>On Sat, 8 Feb 2020 at 03:24, Andres Freund  wrote:
> 
>  Hi,
> 
>  On 2020-02-07 20:44:31 +0900, Masahiko Sawada wrote:
>  > Yeah I'm not going to use pgcrypto for transparent data
> >>encryption.
>  > The KMS patch includes the new basic infrastructure for
> >>cryptographic
>  > functions (mainly AES-CBC). I'm thinking we can expand that
>  > infrastructure so that we can also use it for TDE purpose by
>  > supporting new cryptographic functions such as AES-CTR. Anyway, I
>  > agree to not have it depend on pgcrypto.
> 
>  I thought for a minute, before checking the patch, that you were
> >>saying
>  above that the KMS patch includes its *own* implementation of
>  cryptographic functions.  I think it's pretty crucial that it
> >>continues
>  not to do that...
> >>>
> >>>I meant that we're going to use OpenSSL for AES encryption and
> >>>decryption independent of pgcrypto's openssl code, as the first step.
> >>>That is, KMS is available only when configured --with-openssl. And
> >>>hopefully we eventually merge these openssl code and have pgcrypto use
> >>>it, like when we introduced SCRAM.
> >>>
> >>
> >>I don't think it's very likely we'll ever merge any openssl code into
> >>our repository, e.g. because of licensing. But we already have AES
> >>implementation in pgcrypto - why not to use that? I'm not saying we
> >>should make this depend on pgcrypto, but maybe we should move the AES
> >>library from pgcrypto into src/common or something like that.
> >
> >The code uses functions exposed by openssl, it doesn't copy there code.
> >
>
> Sure, I know the code is currently calling ooenssl functions. I was
> responding to Masahiko-san's message that we might eventually merge this
> openssl code into our tree.

Sorry for confusing you. What I wanted to say is to write AES
encryption code in src/common using openssl library as the first step
apart from pgcrypto's openssl code, and then merge these two code
library into src/common as the next step. That is, it's moving the AES
library pgcrypto to src/common as you mentioned. IIRC when we
introduced SCRAM we moved sha2 library from pgcrypto to src/common.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

2020-02-08 Thread Peter Geoghegan
On Sun, Aug 25, 2019 at 1:56 PM Tom Lane  wrote:
> I agree that teaching opclasses to say whether this is okay is a
> reasonable approach.

I've begun working on this, with help from Anastasia.

My working assumption is that I only need to care about
opclass-declared input data types (pg_opclass.opcintype), plus the
corresponding collations -- the former can be used to lookup an
appropriate pg_amproc entry (i.e. B-Tree support function 4), while
the latter are passed to the support function to get an answer about
whether or not it's okay to use deduplication. This approach seems to
be good enough as far as the deduplication project's needs are
concerned. However, I think that I probably need to take a broader
view of the problem than that. Any guidance would be much appreciated.

> > Consumers of this new infrastructure probably won't be limited to the
> > deduplication feature;
>
> Indeed, we run up against this sort of thing all the time in, eg, planner
> optimizations.  I think some sort of "equality is precise" indicator
> would be really useful for a lot of things.

Suppose I wanted to add support for deduplication of a B-Tree index on
an array of integers. This probably wouldn't be very compelling, but
just suppose. It's not clear how this could work within the confines
of the type and operator class systems.

I can hardly determine that it's safe or unsafe to do so at CREATE
INDEX time, since the opclass-declared input data type is always the
pg_type.oid corresponding to 'anyarray' -- I am forced to make a
generic assumption that deduplication is not safe. I must make this
conservative assumption since, in general, the indexed column could
turn out to be an array of numeric datums -- a "transitively unsafe"
anyarray (numeric's display scale issue could leak into anyarray). I'm
not actually worried about any practical downside that this may create
for users of the B-Tree deduplication feature; a B-Tree index on an
array *is* a pretty niche thing. Does seem like I should make sure
that I get this right, though.

Code like the 'anyarray' B-Tree support function 1 (i.e.
btarraycmp()/array_cmp()) doesn't hint at a solution -- it merely does
a lookup of the underlying type's comparator using the typcache. That
depends on having actual anyarray datums to do something with, which
isn't something that this new infrastructure can rely on in any way.

I suppose that the only thing that would work here would be to somehow
look through the pg_attribute entry for the index column, which will
have the details required to distinguish between (say) an array of
integers (which is safe, I think) from an array of numerics (which is
unsafe). From there, the information about the element type could
(say) be passed to the anyarray default opclass' support function 4,
which could do its own internal lookup. That seems like it might be a
solution in search of a problem, though.

BTW, I currently forbid cross-type support function 4 entries for an
opclass, on the grounds that that isn't sensible for deduplication. Do
you think that that restriction is appropriate in general, given the
likelihood that this support function will be used in several other
areas?

Thanks
-- 
Peter Geoghegan




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-02-08 Thread Amit Kapila
On Sat, Feb 8, 2020 at 12:10 AM Andres Freund  wrote:
>
> Hi,
>
> On 2020-02-04 10:15:01 +0530, Kuntal Ghosh wrote:
> > I performed the same test in pg11 and reproduced the issue on the
> > commit prior to a4ccc1cef5a04 (Generational memory allocator).
> >
> > ulimit -s 1024
> > ulimit -v 30
> >
> > wal_level = logical
> > max_replication_slots = 4
> >
> > [...]
>
> > After that, I applied the "Generational memory allocator" patch and
> > that solved the issue. From the error message, it is evident that the
> > underlying code is trying to allocate a MaxTupleSize memory for each
> > tuple. So, I re-introduced the following lines (which are removed by
> > a4ccc1cef5a04) on top of the patch:
>
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -417,6 +417,9 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size 
> > tuple_len)
> >
> > alloc_len = tuple_len + SizeofHeapTupleHeader;
> >
> > +   if (alloc_len < MaxHeapTupleSize)
> > +   alloc_len = MaxHeapTupleSize;
>
> Maybe I'm being slow here - but what does this actually prove? Before
> the generation contexts were introduced we avoided fragmentation (which
> would make things unusably slow) using a a brute force method (namely
> forcing all tuple allocations to be of the same/maximum size).
>

It seems for this we formed a cache of max_cached_tuplebufs number of
objects and we don't need to allocate more than that number of tuples
of size MaxHeapTupleSize because we will anyway return that memory to
aset.c.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Postgres 32 bits client compilation fail. Win32 bits client is supported?

2020-02-08 Thread Amit Kapila
On Sat, Feb 8, 2020 at 8:05 AM Ranier Vilela  wrote:
>
> Hi,
> I am migrating my applications that use postgres client from msvc 2010 
> (32bits) to msvc 2019 (32 bits).
> Compilation using msvc 2019 (64 bits), works very well.
> But the build using msvc 2019 (32 bit) is not working.
> The 32-bit Platform variable is set to x86, resulting in the first error.
>
> "Project" C: \ dll \ postgres \ pgsql.sln "on node 1 (default targets).
> C: \ dll \ postgres \ pgsql.sln.metaproj: error MSB4126: The specified 
> solution configuration "Release | x86" is invalid. Plea
> if specify a valid solution configuration using the Configuration and 
> Platform properties (e.g. MSBuild.exe Solution.sl
> n / p: Configuration = Debug / p: Platform = "Any CPU") or leave those 
> properties blank to use the default solution configurati
> on. [C: \ dll \ postgres \ pgsql.sln]
> Done Building Project "C: \ dll \ postgres \ pgsql.sln" (default targets) - 
> FAILED. "
>
> This is because the Sub DeterminePlatform function of the Solution.pm program 
> uses the following expression:
> "my $ output =` cl /? 2> & 1`; "
> The result of msvc 2019 (32bits) is:
> "Microsoft (R) C / C ++ Optimizing Compiler Version 19.24.28315 for x86"
>
> By setting the Platform variable manually to WIn32, the compilation process 
> continues until it stops at:
> "Generating configuration headers ..."
>
> Where the second error occurs:
> unused defines: HAVE_STRUCT_CMSGCRED
> USE_NAMED_POSI ... etc ...
> ALIGNOF_DOUBLE USE_DEV_URANDOM at C: \ dll \ postgres \ src \ tools \ msvc / 
> Mkvcbuild.pm line 842.
>

Try by removing src/include/pg_config.

> Question:
> Will Postgres continue to support 32-bit client?
>

I am not aware of any discussion related to stopping the support of
32-bit client.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com