Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-09 Thread David G. Johnston
On Mon, Jul 8, 2019 at 11:34 PM Michael Paquier  wrote:

> On Mon, Jul 08, 2019 at 08:12:18PM -0700, David G. Johnston wrote:
> > Reads a bit backward.  How about:
> >
> > "As uniqueness can only be enforced within an individual partition when
> > defining a primary key on a partitioned table all columns present in the
> > partition key must also exist in the primary key."
>
> Yes, I was not really inspired on this one.
>
> Looking closely at the code in DefineIndex() (and as Rajkumar has
> mentioned upthread for unique constraints) this can happen for primary
> keys, unique constraints and exclusion constraints.  So we had better
> mention all three of them.  I am not sure that we need to be explicit
> about the uniqueness part though, let's say the following:
> "When defining a primary key, a unique constraint or an exclusion
> constraint on a partitioned table, all the columns present in the
> constraint definition must be included in the partition key."
>
>
That isn't true, it needs to be reversed at least:

"Table-scoped constraints defined on a partitioned table - primary key,
unique, and exclusion - must include the partition key columns because the
enforcement of such constraints is performed independently on each
partition."

The complaint here is the user puts a PK id column on their partitioned
table and wonders why they need the partition key columns to also be in the
PK.  The answer is the description provided above - with the reminder (or
initial cluing in depending) to the reader that this limitation exists
because we do not implement global constraints/indexes but instead the
definition on the partitioned table is simply copied to all of its
partitions.  For me this seems worthy of recapping at this location (I
haven't gone looking for a nice cross-reference link to put there).

David J.


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Michael Paquier
On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote:
> On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud  wrote:
>> I'll resubmit the parallel patch using per-table only
>> approach
> 
> Attached.

I have done a lookup of this patch set with a focus on the refactoring
part, and the split is a bit confusing.

+void
+DisconnectDatabase(ParallelSlot *slot)
+{
+   charerrbuf[256];
common.c has already an API to connect to a database.  It would be
more natural to move the disconnection part also to common.c and have
the caller of DisconnectDatabase reset the slot connection by itself?
disconnectDatabase() (lower case for the first character) would make
the set more consistent.  We could also have a wrapper like say
DiscardSlot() which does this work, but that seems like an overkill
for a single slot if one API could do the cleanup of the full set.

$ git grep select_loop
scripts_parallel.c: /* We must reconstruct the fd_set for each
call to select_loop */
scripts_parallel.c: i = select_loop(maxFd, &slotset, &aborting);
scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool
*aborting)
scripts_parallel.h:extern int   select_loop(int maxFd, fd_set
*workerset, bool *aborting);

select_loop is only present in scripts_parallel.c, so it can remain
static.

+   slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) *
concurrentCons);
+   init_slot(slots, conn);
+   if (parallel)
+   {
+   for (i = 1; i < concurrentCons; i++)
+   {
+   conn = connectDatabase(dbname, host, port,
username, prompt_password,
+
progname, echo, false, true);
+   init_slot(slots + i, conn);
+   }
+   }

This comes from 0002 and could be more refactored as vacuumdb does the
same thing.  Based on 0001, init_slot() is called now in vacuumdb.c
and initializes a set of slots while connecting to a given database.
In short, in input we have a set of parameters and the ask to open
connections with N slots, and the return result is an pg_malloc'd
array of slots ready to be used.  We could just call that
ParallelSlotInit() (if you have a better name feel free).

+/*
+ * Get the connection slot to use.  If in parallel mode, here we wait
+ * for one connection to become available if none already is. In
+ * non-parallel mode we simply use the only slot we have, which we
+ * know to be free.
+ */
+if (parallel)
This business also is duplicated in both reindexdb.c and vacuumdb.c.

+bool
+GetQueryResult(PGconn *conn, const char *progname)
+{
This also does not stick with the parallel stuff, as that's basically
only getting a query result.  We could stick that into common.c.

Patch 2 has no documentation.  The option as well as the restrictions
in place need to be documented properly.

Here is a small idea about the set of routines we could have for the
parallel stuff, with only three of them needed to work on the parallel
slots and get free connections:
- Initialization of the full slot set.
- Cleanup and disconnection of the slots.
- Fetch an idle connection and wait for one until available.
--
Michael


signature.asc
Description: PGP signature


Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-09 Thread Amit Langote
Sorry for jumping in late here.

On Tue, Jul 9, 2019 at 3:51 PM Michael Paquier  wrote:
> On Tue, Jul 09, 2019 at 03:34:48PM +0900, Michael Paquier wrote:
> > Looking closely at the code in DefineIndex() (and as Rajkumar has
> > mentioned upthread for unique constraints) this can happen for primary
> > keys, unique constraints and exclusion constraints.  So we had better
> > mention all three of them.  I am not sure that we need to be explicit
> > about the uniqueness part though, let's say the following:
> > "When defining a primary key, a unique constraint or an exclusion
> > constraint on a partitioned table, all the columns present in the
> > constraint definition must be included in the partition key."
>
> Let's try again that (that's a long day..):
> "When defining a primary key, a unique constraint or an exclusion
> constraint on a partitioned table, all the columns present in the
> partition key must be included in the constraint definition."

As mentioned in the docs, defining exclusion constraints on
partitioned tables is not supported.

-- on 13dev
create table p (a int, exclude using gist (a with &&)) partition by list (a);
ERROR:  exclusion constraints are not supported on partitioned tables

Regarding primary key and unique constraints, how about writing it
such that it's clear that there are limitations?  Maybe like:

"While defining a primary key and unique constraints on partitioned
tables is supported, the set of columns being constrained must include
all of the partition key columns."

Maybe, as David also says, it might be a good idea to mention the
reason why.  So maybe like:

"While defining a primary key and unique constraints on partitioned
tables is supported, the set of columns being constrained must include
all of the partition key columns.  This limitation exists because
PostgreSQL can ensure uniqueness only
across a given partition."

Thanks,
Amit

[1] 
https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE
5.11.2.3. Limitations
The following limitations apply to partitioned tables:
* There is no way to create an exclusion constraint spanning all
partitions; it is only possible to constrain each leaf partition
individually.




Re: Extra quote_all_identifiers in _dumpOptions

2019-07-09 Thread Michael Paquier
On Mon, Jul 08, 2019 at 07:32:07PM -0400, Bruce Momjian wrote:
> Wow, good catch.  I thought C compilers would have reported this issue,
> but obviously not.  Patch applied to head.  Thanks.

Yes, I don't recall that gcc nor clang have a magic recipy for that.
We have a couple of other orphaned ones in the backend actually.
--
Michael


signature.asc
Description: PGP signature


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Julien Rouhaud
Thanks for the review.

On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier  wrote:
>
> On Mon, Jul 08, 2019 at 11:02:14PM +0200, Julien Rouhaud wrote:
> > On Mon, Jul 8, 2019 at 9:08 PM Julien Rouhaud  wrote:
> >> I'll resubmit the parallel patch using per-table only
> >> approach
> >
> > Attached.
>
> I have done a lookup of this patch set with a focus on the refactoring
> part, and the split is a bit confusing.

Yes, that wasn't a smart split  :(

> +void
> +DisconnectDatabase(ParallelSlot *slot)
> +{
> +   charerrbuf[256];
> common.c has already an API to connect to a database.  It would be
> more natural to move the disconnection part also to common.c and have
> the caller of DisconnectDatabase reset the slot connection by itself?

Ok.

> $ git grep select_loop
> scripts_parallel.c: /* We must reconstruct the fd_set for each
> call to select_loop */
> scripts_parallel.c: i = select_loop(maxFd, &slotset, &aborting);
> scripts_parallel.c:select_loop(int maxFd, fd_set *workerset, bool
> *aborting)
> scripts_parallel.h:extern int   select_loop(int maxFd, fd_set
> *workerset, bool *aborting);
>
> select_loop is only present in scripts_parallel.c, so it can remain
> static.

Good point.

> +   slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) *
> concurrentCons);
> +   init_slot(slots, conn);
> +   if (parallel)
> +   {
> +   for (i = 1; i < concurrentCons; i++)
> +   {
> +   conn = connectDatabase(dbname, host, port,
> username, prompt_password,
> +
> progname, echo, false, true);
> +   init_slot(slots + i, conn);
> +   }
> +   }
>
> This comes from 0002 and could be more refactored as vacuumdb does the
> same thing.  Based on 0001, init_slot() is called now in vacuumdb.c
> and initializes a set of slots while connecting to a given database.
> In short, in input we have a set of parameters and the ask to open
> connections with N slots, and the return result is an pg_malloc'd
> array of slots ready to be used.  We could just call that
> ParallelSlotInit() (if you have a better name feel free).

Given how the rest of the functions are named, I'll probably use
InitParallelSlots().

>
> +/*
> + * Get the connection slot to use.  If in parallel mode, here we wait
> + * for one connection to become available if none already is. In
> + * non-parallel mode we simply use the only slot we have, which we
> + * know to be free.
> + */
> +if (parallel)
> This business also is duplicated in both reindexdb.c and vacuumdb.c.
>
> +bool
> +GetQueryResult(PGconn *conn, const char *progname)
> +{
> This also does not stick with the parallel stuff, as that's basically
> only getting a query result.  We could stick that into common.c.

This function also has a bad name, as it's discarding the result via
ProcessQueryResult.  Maybe we should rename them to GetQuerySuccess()
and ConsumeAndTrashQueryResult()?

> Patch 2 has no documentation.  The option as well as the restrictions
> in place need to be documented properly.

I forgot that I had forgotten to add documentation :( will fix this time.




Re: Extra quote_all_identifiers in _dumpOptions

2019-07-09 Thread Arthur Zakirov

On 09.07.2019 02:32, Bruce Momjian wrote:

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index db30b54a92..8c0cedcd98 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -153,7 +153,6 @@ typedef struct _dumpOptions
int no_synchronized_snapshots;
int no_unlogged_table_data;
int serializable_deferrable;
-   int quote_all_identifiers;
int disable_triggers;
int outputNoTablespaces;
int use_setsessauth;


Wow, good catch.  I thought C compilers would have reported this issue,
but obviously not.  Patch applied to head.  Thanks.


Thank you, Bruce!

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 06:24:40PM -0400, Joe Conway wrote:

On 7/8/19 6:04 PM, Stephen Frost wrote:

* Bruce Momjian (br...@momjian.us) wrote:

Uh, well, renaming the user was a big problem, but that is the only case
I can think of.  I don't see that as an issue for block or WAL sequence
numbers.  If we want to use a different nonce, we have to find a way to
store it or look it up efficiently.  Considering the nonce size, I don't
see how that is possible.


No, this also meant that, as an attacker, I *knew* the salt ahead of
time and therefore could build rainbow tables specifically for that
salt.  I could also use those *same* tables for any system where that
user had an account, even if they used different passwords on different
systems...

I appreciate that *some* of this might not be completely relevant for
the way a nonce is used in cryptography, but I'd be very surprised to
have a cryptographer tell me that a deterministic nonce didn't have
similar issues or didn't reduce the value of the nonce significantly.


I have worked side by side on projects with bona fide cryptographers and
I can assure you that they recommended random nonces. Granted, that was
in the early 2000s, but I don't think "modern cryptography" has changed
that any more than "web scale" has made Postgres irrelevant in the
intervening years.

Related links:

https://defuse.ca/cbcmodeiv.htm
https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors



AFAIK it very much depends on the encryption mode. CBC mode does require
random nonces, other modes may be fine with even sequences as long as
the values are not reused. In which case we might even use the LSN, for
example. And I wonder if sha2(LSN) could be considered "random", but
maybe that's entirely silly idea ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:

On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:

Yes, 'postgres' can be used to create a nice md5 rainbow table that
works on many servers --- good point.  Are rainbow tables possible with
something like AES?

> I appreciate that *some* of this might not be completely relevant for
> the way a nonce is used in cryptography, but I'd be very surprised to
> have a cryptographer tell me that a deterministic nonce didn't have
> similar issues or didn't reduce the value of the nonce significantly.

This post:


https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

says:

GCM is a variation on Counter Mode (CTR).  As you say, with any variant
of Counter Mode, it is essential  that the Nonce is not repeated with
the same key.  Hence CTR mode  Nonces often include either a counter or
a timer element: something that  is guaranteed not to repeat over the
lifetime of the key.

CTR is what we use for WAL.  8k pages, we would use CBC, which says we
need a random nonce.  I need to dig deeper into ECB mode attack.


Looking here:


https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm

I think the issues is that we can't use a _counter_ for the nonce since
each page-0 of each table would use the same nonce, and each page-1,
etc.  I assume we would use the table oid and page number as the nonce.
We can't use the database oid since we copy the files from one database
to another via file system copy and not through the shared buffer cache
where they would be re encrypted.  Using relfilenode seems dangerous.
For WAL I think it would be the WAL segment number.  It would be nice
to mix that with the "Database system identifier:", but are these the
same on primary and replicas?



Can't we just store the nonce somewhere? What if for encrypted pages we
only use/encrypt (8kB - X bytes), where X bytes is just enough to store
the nonce and maybe some other encryption metadata (key ID?).

This would be similar to the "special" area on a page, except that that
relies on page header which is encrypted (and thus not accessible before
decrypting the page).

So encryption would:

1) encrypt the (8kB - X bytes) with nonce suitable for the used
  encryption mode (sequence, random, ...)

2) store the nonce / key ID etc. to the reserved space

and encryption would

1) look at the encryption metadata at the end (nonce, key )

2) decrypt the page using that info

Or maybe we could define a new relation fork for encrypted relations,
storing all this metadata (not sure if we need that just for the main
fork or for all forks including vm, fsm ...)?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: progress report for ANALYZE

2019-07-09 Thread Tatsuro Yamada

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas  wrote:


On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either.  We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.


I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.


+1

Note that this patch is already behaving like that if the table only
contains dead rows.



I fixed the patch including:

  - Replace "if" to "else if". (Suggested by Julien)
  - Fix typo s/ech/each/. (Suggested by Anthony)
  - Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, 
Robert and me)
It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

-
[Session #1]
create table hoge as select * from generate_series(1, 100) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
-

Thanks,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..c368444 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  
 
  
+  
pg_stat_progress_analyzepg_stat_progress_analyze
+  One row for each backend (including autovacuum worker processes) 
running
+   ANALYZE, showing current progress.
+   See .
+  
+ 
+
+ 
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
CLUSTER or VACUUM FULL, showing 
current progress.
@@ -3927,6 +3935,134 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  
 
+ 
+  ANALYZE Progress Reporting
+
+  
+   Whenever ANALYZE is running, the
+   pg_stat_progress_analyze view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  
+
+  
+   pg_stat_progress_analyze View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+ 
+  datid
+  oid
+  OID of the database to which this backend is connected.
+ 
+ 
+  datname
+  name
+  Name of the database to which this backend is connected.
+ 
+ 
+  relid
+  oid
+  OID of the table being analyzed.
+ 
+ 
+  include_children
+  boolean
+  Whether the current scan includes legacy inheritance 
children.
+ 
+ 
+  scanning_table
+  oid
+  
+   The table being scanned (differs from relid
+   only when processing partitions or inheritance children).
+  
+ 
+ 
+  phase
+  text
+  Current processing phase. See 
+ 
+ 
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks to scan in the current table.
+ 
+ 
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.
+ 
+
+   
+   
+  
+
+  
+   ANALYZE phases
+   
+
+
+  Phase
+  Description
+ 
+
+   
+
+ initializing
+ 
+   The command is preparing to begin scanning the heap.  This phase is
+   expected to be very brief.
+ 
+
+
+ scanning table
+ 
+   The command is currently scanning the table.
+ 
+
+
+ analyzing sample
+ 
+   ANALYZE is currently extracting statistical data
+   from the sample obtained.
+ 
+
+
+ analyzing sample (extended)
+ 
+   ANALYZE is currently extracting statistical data
+   for extended statistics from the sample obtained.
+ 
+
+
+ analyzing complete
+ 
+   The command is updating pg_class. When this phase is completed, 
+   ANALYZE will end.
+ 
+
+   
+   
+  
+ 
+
  
   CLUSTER Progress Reporting
 
diff --git a/src/backend/catalog/system_vie

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 05:41:55PM -0400, Bruce Momjian wrote:

On Mon, Jul  8, 2019 at 09:30:03PM +0200, Tomas Vondra wrote:

I think Bruce's proposal was to minimize the time the key is "unlocked"
in memory by only unlocking them when the user connects and supplies
some sort of secret (passphrase), and remove them from memory when the
user disconnects. So there's no way for the auxiliary processes to gain
access to those keys, because only the user knows the secret.


I mentioned that because I thought that was the security value that
people wanted.  While I can see the value, I don't see how it can be
cleanly accomplished.  Keeping the keys unlocked at all times seems to
be possible, but of much smaller value.

Part of my goal in this discussion is to reverse the rush to implement
and pick apart exactly what is possible, and desirable.


FWIW I have doubts this scheme actually measurably improves privacy in
practice, because most busy applications will end up having the keys in
the memory all the time anyway.


Yep.


It also assumes memory is unsafe, i.e. bad actors can read it, and
that's probably a valid concern (root access, vulnerabilities etc.). But
in that case we already have plenty of issues with data in flight
anyway, and I doubt TDE is an answer to that.


Agreed.


> Ideally, all of this would leverage a vaulting system or other mechanism
> which manages access to the keys and allows their usage to be limited.
> That's been generally accepted as a good way to bridge the gap between
> having to ask users every time for a key and having keys stored
> long-term in memory.

Right. I agree with this.

> Having *only* the keys for the data which the
> currently connected user is allowed to access would certainly be a great
> initial capability, even if system processes (including potentially WAL
> replay) have to have access to all of the keys.  And yes, shared buffers
> being unencrypted and accessible by every backend continues to be an
> issue- it'd be great to improve on that situation too.  I don't think
> having everything encrypted in shared buffers is likely the solution,
> rather, segregating it up might make more sense, again, along similar
> lines to keys and using metadata that's outside of the catalogs, which
> has been discussed previously, though I don't think anyone's actively
> working on it.
>

I very much doubt TDE is a solution to this. Essentially, TDE is a good
data-at-rest solution, but this seems more like protecting data during
execution. And in that case I think we may need an entirely different
encryption scheme.


I thought client-level encryption or pgcrypto-style encryption fits that
need better.



I'm not sure client-level encryption is something people really want.

It essentially means moving a lot of the logic to the client. For
example, when you want to do grouping on joins on encrypted columns, we
can't do that in the database when only the client knows the key. And we
know how terribly half-baked those client-side implementations are, even
without the additional encryption complexity.

So it's more a case of not having a better in-database solution :-(

pgcrypto is a bit ugly and tedious, but in general it's a step in the
right direction. The main issue is (a) the key management (or lack of
it) and (2) essentially decrypting once and then processing plaintext in
the rest of the query.

I think (1) could be improved if we had a key vault of some sorts (which
we might get as part of TDE) and (2) might be improved by having special
encrypted data types, with operations offloaded to a trusted execution
environment (TrustZone, SGX, ...).

But that's a very separate topic, unrelated to TDE.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pg_receivewal documentation

2019-07-09 Thread Laurenz Albe
On Thu, 2019-06-27 at 10:06 -0400, Jesper Pedersen wrote:
> Here is a patch for the pg_receivewal documentation to highlight that 
> WAL isn't acknowledged to be applied.

I think it is a good idea to document this, but I have a few quibbles
with the patch as it is:

- I think there shouldn't be commas after the "note" and before the "if".
  Disclaimer: I am not a native speaker, so I am lacking authority.

- The assertion is wrong.  "on" (remote flush) is perfectly fine
  for synchronous_commit, only "remote_apply" is a problem.

- There is already something about "--synchronous" in the "Description"
  section.  It might make sense to add the additional information there.

How about the attached patch?

Yours,
Laurenz Albe
From c18b4b384a963e04cc5b5b50537c150858824f0a Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 9 Jul 2019 11:15:09 +0200
Subject: [PATCH] Better documentation for "pg_receivewal --synchronous"

"synchronous_commit" must not be set to "remote_apply" because
pg_receivewal doesn't apply WAL.
---
 doc/src/sgml/ref/pg_receivewal.sgml | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 0506120c00..43932c 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -52,7 +52,10 @@ PostgreSQL documentation
Unlike the WAL receiver of a PostgreSQL standby server, pg_receivewal
by default flushes WAL data only when a WAL file is closed.
The option --synchronous must be specified to flush WAL data
-   in real time.
+   in real time.  Note that while WAL will be flushed with this setting,
+   it will never be applied, so  must
+   not be set to remote_apply if pg_receivewal
+   is the only synchronous standby.
   
 
   
-- 
2.20.1



Re: errbacktrace

2019-07-09 Thread Peter Eisentraut
After further research I'm thinking about dropping the libunwind
support.  The backtrace()/backtrace_symbols() API is more widely
available: darwin, freebsd, linux, netbsd, openbsd (via port), solaris,
and of course it's built-in, whereas libunwind is only available for
linux, freebsd, hpux, solaris, and requires an external dependency.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Duplicated LSN in ReorderBuffer

2019-07-09 Thread Masahiko Sawada
On Mon, Jul 8, 2019 at 11:46 PM Masahiko Sawada  wrote:
>
> On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro  wrote:
> >
> > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin  wrote:
> > > Attached is a simple patch that uses subxid instead of top-level xid
> > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > > i'm not sure that this is a valid change. Can someone please verify it
> > > or maybe suggest a better solution for the issue?
> >
>
> I've reproduced this issue with script Ildar provided. I don't find
> out the root cause yet and I'm not sure the patch takes a correct way
> to fix this.
>
> In my environment, I got the following pg_waldump output and the
> logical decoding failed at 0/019FA058 when processing NEW_CID.
>
>   90489 rmgr: Transaction len (rec/tot): 38/38, tx:
> 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
> subxacts: 1999
>   90490 rmgr: Standby len (rec/tot):405/   405, tx:
> 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
> latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
> 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
> 1914 1982 1958 1840 1920 1926 1992 1962 1
>   90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
> 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
>   90491 rmgr: Heap2   len (rec/tot): 60/60, tx:
> 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
> 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
> 4294967295
>   90492 rmgr: Heaplen (rec/tot):127/   127, tx:
> 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
> 0x00, blkref #0: rel 1663/12678/2615 blk 11
>
> I thought that the logical decoding doesn't create ReorderBufferTXN of
> xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
> xid=1999 beforehand. But what actually happen is that it skips NEW_CID
> since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
> and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
> RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
> for xid = 1999 and xid = 1998 as top-level transactions when
> processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
> and ReorderBufferAddNewTupleCids creates xid = 1998).

I think the cause of this bug would be that a ReorderBufferTXN entry
of sub transaction is created as top-level transaction. And this
happens because we skip to decode ASSIGNMENT during the state of
snapshot builder < SNAPBUILD_FULL.

@@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder,
TransactionId xid,
  */
  ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

- ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn,
  xlrec->target_node, xlrec->target_tid,
  xlrec->cmin, xlrec->cmax,
  xlrec->combocid);

The above change in the proposed patch changes SnapBuildProcessNewCid
so that it passes sub transaction id instead of top transaction id to
ReorderBufferAddNewTupleCids that adds a (relfilenode, tid) -> (cmin,
cmax) mapping to the transaction. But I think the fix is not correct
since as the comment of ReorderBufferTXN describes, the mappings are
always assigned to the top-level transaction.

in reorderbuffer.h,
/*
 * List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
 * Those are always assigned to the toplevel transaction. (Keep track of
 * #entries to create a hash of the right size)
 */
dlist_head  tuplecids;
uint64  ntuplecids;

Instead, I wonder if we can decode ASSIGNMENT even when the state of
snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
ReorderBufferTXN entries of both top transaction and sub transaction
are created properly before we decode NEW_CID.

Attached patch do that. In my environment the issue seems to be fixed
but I'm still not confident that this is the right fix. Please review
it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


always_decode_assignment.patch
Description: Binary data


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Peter Eisentraut
On 2019-07-08 18:09, Joe Conway wrote:
> In my mind, and in practice to a
> large extent, a postgres tablespace == a unique mount point.

But a critical difference is that in file systems, a separate mount
point has its own journal.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Masahiko Sawada
On Mon, Jul 8, 2019 at 11:20 PM Bruce Momjian  wrote:
>
> On Mon, Jul  8, 2019 at 06:04:28PM +0900, Masahiko Sawada wrote:
> > On Sun, Jul 7, 2019 at 1:05 AM Bruce Momjian  wrote:
> > > What about referential integrity constraints that need to check primary
> > > keys in the encrypted tables?  I also don't see a way of delaying that,
> > > and if you can't do referential integrity into the encrypted tables, it
> > > reduces the value of having encrypted data in the same database rather
> > > than in another database or cluster?
> >
> > I just thought that PostgreSQL's auxiliary processes such as
> > autovacuum, startup, checkpointer, bgwriter should always be able to
> > access all keys because there are already in inside database. Even
> > today these processes don't check any privileges when accessing to
> > data. What security threats we can protect data from by requiring
> > privileges even for auxiliary processes? If this is a security problem
> > isn't it also true for cluster-wide encryption? I guess that processes
> > who have an access privilege on the table can always get the
> > corresponding encryption key. And any processes cannot access an
> > encryption key directly without accessing to a database object.
>
> Well, see my list of three things that users want in an earlier email:
>
> 
> https://www.postgresql.org/message-id/20190706160514.b67q4f7abcxfd...@momjian.us
>
> When people are asking for multiple keys (not just for key rotation),
> they are asking to have multiple keys that can be supplied by users only
> when they need to access the data.  Yes, the keys are always in the
> datbase, but the feature request is that they are only unlocked when the
> user needs to access the data.  Obviously, that will not work for
> autovacuum when the encryption is at the block level.

I got your point. I also felt that the client-side encryption or the
encryption during execution (by using pgcrypto with triggers and
views) would fits to these requirements better.

>
> If the key is always unlocked, there is questionable security value of
> having multiple keys, beyond key rotation.
>
> > > I still feel we have not clearly described what the options are:
> > >
> > > 1.  Encrypt everything
> > >
> > > 2.  Encrypt only some tables (for performance reasons), and use only one
> > > key, or use multiple keys to allow for key rotation.  All keys are
> > > always unlocked.
> > >
> > > 3.  Encrypt only some tables with different keys, and the keys are not
> > > always unlocked.
> > >
> > > As Tomas already stated, using tablespaces to distinguish encrypted from
> > > non-encrypted tables doesn't make sense since the file system used for
> > > the storage is immaterial to the encryptions status. An easier way would
> > > be to just add a bit to WAL that would indicate if the rest of the WAL
> > > record is encrypted, though I doubt the performance boost is worth the
> > > complexity.
> >
> > Okay, instead of using tablespaces we can create groups grouping
> > tables being encrypted with the same key. I think the one of the most
> > important point here is to provide a granular encryption feature and
>
> Why is this important?  What are you trying to accomplish?

Because it can not only suppress performance overhead by
encryption/decryption and reduce the amount of data encryption.

The having less number of keys in database would make the
implementation simple, especially for recovery. Since system caches
are not available during recovery we might need a cache mechanism for
keys if we have several thousand keys in database.

>
> > have less the number of keys in database cluster, not to provide per
> > tablespace encryption feature. I'm not going to insist it should be
> > per tablespace encryption.
>
> It is unclear which item you are looking for.  Which number are you
> suggesting from the three listed above in the email URL?
>

Sorry, I'm referring to #2.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Attempt to consolidate reading of XLOG page

2019-07-09 Thread Antonin Houska
Thomas Munro  wrote:

> On Tue, May 21, 2019 at 9:12 PM Antonin Houska  wrote:
> > Robert Haas  wrote:
> > > It seems to me that it's better to unwind the stack i.e. have the
> > > function return the error information to the caller and let the caller
> > > do as it likes.
> >
> > Thanks for a hint. The next version tries to do that.
> 
> Hi Antonin,
> 
> Could you please send a fresh rebase for the new Commitfest?

Rebased.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From 3d441d7f584182dfbc76c14591fe0c045679b66a Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Tue, 9 Jul 2019 11:54:15 +0200
Subject: [PATCH 1/5] Make XLogReaderInvalReadState static.

This change is not necessary for the XLogRead() refactoring itself, but I
noticed the problem while working on it. Not sure it's worth a separate CF
entry.
---
 src/backend/access/transam/xlogreader.c | 4 ++--
 src/include/access/xlogreader.h | 4 
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 41dae916b4..08bc9695c1 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -35,10 +35,10 @@ static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
 			XLogRecPtr recptr);
+static void XLogReaderInvalReadState(XLogReaderState *state);
 static int	ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
 			 int reqLen);
 static void report_invalid_record(XLogReaderState *state, const char *fmt,...) pg_attribute_printf(2, 3);
-
 static void ResetDecoder(XLogReaderState *state);
 
 /* size of the buffer allocated for error message. */
@@ -620,7 +620,7 @@ err:
 /*
  * Invalidate the xlogreader's read state to force a re-read.
  */
-void
+static void
 XLogReaderInvalReadState(XLogReaderState *state)
 {
 	state->readSegNo = 0;
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 04228e2a87..a3d3cc1e7b 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -212,13 +212,9 @@ extern struct XLogRecord *XLogReadRecord(XLogReaderState *state,
 extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
 		 XLogRecPtr recptr, char *phdr);
 
-/* Invalidate read state */
-extern void XLogReaderInvalReadState(XLogReaderState *state);
-
 #ifdef FRONTEND
 extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr);
 #endif			/* FRONTEND */
-
 /* Functions for decoding an XLogRecord */
 
 extern bool DecodeXLogRecord(XLogReaderState *state, XLogRecord *record,
-- 
2.16.4

>From 4e933993cbf5ad6d8646b064907d3c861ea1536f Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Tue, 9 Jul 2019 11:54:15 +0200
Subject: [PATCH 2/5] Remove TLI from some argument lists.

The timeline information is available to caller via XLogReaderState. Now that
XLogRead() is gonna be (sometimes) responsible for determining the TLI, it
would have to be added the (TimeLineID *) argument too, just to be consistent
with the current coding style. Since XLogRead() updates also other
position-specific fields of XLogReaderState, it seems simpler if we remove the
output argument from XLogPageReadCB and always report the TLI via
XLogReaderState.
---
 src/backend/access/transam/xlog.c  |  7 +++
 src/backend/access/transam/xlogreader.c|  6 +++---
 src/backend/access/transam/xlogutils.c | 11 ++-
 src/backend/replication/logical/logicalfuncs.c |  4 ++--
 src/backend/replication/walsender.c|  2 +-
 src/bin/pg_rewind/parsexlog.c  |  9 -
 src/bin/pg_waldump/pg_waldump.c|  2 +-
 src/include/access/xlogreader.h|  8 +++-
 src/include/access/xlogutils.h |  5 ++---
 src/include/replication/logicalfuncs.h |  2 +-
 10 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b6c9353cbd..f30c2ce0ce 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -885,8 +885,7 @@ static int	XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 		 int source, bool notfoundOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int	XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
-		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf,
-		 TimeLineID *readTLI);
+		 int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		bool fetching_ckpt, XLogRecPtr tliRecPtr);
 static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
@@ -11520,7 +11519,7 @@ CancelBackup(void)
  */
 static int
 XLogPage

Re: POC: Cleaning up orphaned files using undo logs

2019-07-09 Thread Dilip Kumar
On Sat, Jul 6, 2019 at 8:26 PM Dilip Kumar  wrote:
>
> On Thu, Jul 4, 2019 at 5:24 PM Amit Kapila  wrote:
> >
> PFA, the latest version of the undo interface and undo processing patches.
>
> Summary of the changes in the patch set
>
> 1. Undo Interface
> - Rebased over latest undo storage code
> - Implemented undo page compression (don't store the common fields in
> all the records instead we get from the first complete record of the
> page).
> - As per Robert's comment, UnpackedUndoRecord is divided in two parts,
>   a) All fields which are set by the caller.
>   b) Pointer to structures which are set internally.
> - Epoch and the Transaction id are  unified as full transaction id
> - Fixed handling of dbid during recovery (TODO in PrepareUndoInsert)
>
> Pending:
> - Move loop in UndoFetchRecord to outside and test performance with
> keeping pin vs pin+lock across undo records.  This will be done after
> testing performance over the zheap code.
> - I need to investigate whether Discard checking can be unified in
> master and HotStandby in UndoFetchRecord function.
>
> 2. Undo Processing
> - Defect fix in multi-log rollback for subtransaction.
> - Assorted defect fixes.
>
> Others
>- Fixup for undo log code to handle full transaction id in
> UndoLogSlot for discard and other bug fixes in undo log.
>- Fixup for Orphan file cleanup to pass dbid in PrepareUndoInsert
>
PFA, updated patch version which includes
- One defect fix in undo interface related to undo page compression
for handling persistence level
- Implemented pending TODO optimization in undo page compression.
- One defect fix in undo processing related to the prepared transaction

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


undo_20190709.tar.gz
Description: GNU Zip compressed data


Re: Introduce timeout capability for ConditionVariableSleep

2019-07-09 Thread Thomas Munro
On Sun, Jul 7, 2019 at 3:09 PM Thomas Munro  wrote:
> +   /* Timed out */
> +   if (rc == 0)
> +   return true;

Here's a version without that bit, because I don't think we need it.

> That still leaves the danger that the CV can be signalled some time
> after ConditionVariableTimedSleep() returns.  So now I'm wondering if
> ConditionVariableCancelSleep() should signal the CV if it discovers
> that this process is not in the proclist, on the basis that that must
> indicate that we've been signalled even though we're not interested in
> the message anymore, and yet some other process else might be
> interested, and that might have been the only signal that is ever
> going to be delivered by ConditionVariableSignal().

And a separate patch to do that.  Thoughts?


--
Thomas Munro
https://enterprisedb.com


0001-Introduce-timed-waits-for-condition-variables-v6.patch
Description: Binary data


0002-Forward-received-condition-variable-signals-on-ca-v6.patch
Description: Binary data


Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?

2019-07-09 Thread Daniel Gustafsson
> On 8 Jul 2019, at 00:02, Thomas Munro  wrote:
> 
> On Wed, Jun 12, 2019 at 1:21 AM Daniel Gustafsson  wrote:
>> The patch is still rough around the edges (TODO’s left to mark some areas), 
>> but
>> I prefer to get some feedback early rather than working too far in 
>> potentially
>> the wrong direction, so parking this in the CF for now.
> 
> Hi Daniel,
> 
> Given the above disclaimers the following may be entirely expected,
> but just in case you weren't aware:
> t/010_logical_decoding_timelines.pl fails with this patch applied.
> 
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/555205042

I hadn’t seen since I had fat-fingered and accidentally run the full tests in a
tree without assertions.  The culprit here seems to an assertion in the logical
decoding code which doesn’t account for heap_multi_insert into catalog
relations (which there are none now, this patch introduce them and thus trip
the assertion).  As the issue is somewhat unrelated, I’ve opened a separate
thread with a small patch:

https://postgr.es/m/cbffd532-c033-49eb-9a5a-f67eaee9e...@yesql.se

The attached v3 also has that fix in order to see if the cfbot is happier with
this.

cheers ./daniel



catalog_multi_insert-v3.patch
Description: Binary data


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Peter Eisentraut
On 2019-07-08 21:08, Julien Rouhaud wrote:
> Don't get me wrong, I do agree that implementing filtering in the
> backend is a better design.  What's bothering me is that I also agree
> that there will be more glibc breakage, and if that happens within a
> few years, a lot of people will still be using pg12- version, and they
> still won't have an efficient way to rebuild their indexes.  Now, it'd
> be easy to publish an external tools that does a simple
> parallel-and-glic-filtering reindex tool that will serve that purpose
> for the few years it'll be needed, so everyone can be happy.

You can already do that: Run a query through psql to get a list of
affected tables or indexes and feed those to reindexdb using -i or -t
options.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: FETCH FIRST clause PERCENT option

2019-07-09 Thread Surafel Temesgen
Hi Ryan,
On Tue, Jul 9, 2019 at 1:27 AM Ryan Lambert  wrote:

> Hi Surafel,
>
> The v5 patch [1] applies cleanly and passes make installcheck-world.
>
> My concern with this is its performance.  As a user I would expect using
> this feature to enable queries to run faster than they would simply pulling
> the full table.  I tested on tables ranging from 10k rows up to 10 million
> with the same basic result that using FETCH FIRST N PERCENT is slower than
> using the full table.  At best it ran slightly slower than querying the
> full table, at worst it increased execution times by 1400% when using a
> large high percentage (95%).
>
>

The cost of FITCH FIRST N PERCENT execution in current implementation is
the cost of pulling the full table plus the cost of storing and fetching
the tuple from tuplestore so it can not perform better than pulling the
full table in any case . This is because we can't determined the number of
rows to return without executing the plan until the end. We can find the
estimation of rows that will be return in planner estimation but that is
not exact.


regards

Surafel


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 4:34 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
>>On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
>>> Yes, 'postgres' can be used to create a nice md5 rainbow table that
>>> works on many servers --- good point.  Are rainbow tables possible with
>>> something like AES?
>>>
>>> > I appreciate that *some* of this might not be completely relevant for
>>> > the way a nonce is used in cryptography, but I'd be very surprised to
>>> > have a cryptographer tell me that a deterministic nonce didn't have
>>> > similar issues or didn't reduce the value of the nonce significantly.
>>>
>>> This post:
>>>
>>> 
>>> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>>
>>> says:
>>>
>>> GCM is a variation on Counter Mode (CTR).  As you say, with any variant
>>> of Counter Mode, it is essential  that the Nonce is not repeated with
>>> the same key.  Hence CTR mode  Nonces often include either a counter or
>>> a timer element: something that  is guaranteed not to repeat over the
>>> lifetime of the key.
>>>
>>> CTR is what we use for WAL.  8k pages, we would use CBC, which says we
>>> need a random nonce.  I need to dig deeper into ECB mode attack.
>>
>>Looking here:
>>
>>  
>> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
>>
>>I think the issues is that we can't use a _counter_ for the nonce since
>>each page-0 of each table would use the same nonce, and each page-1,
>>etc.  I assume we would use the table oid and page number as the nonce.
>>We can't use the database oid since we copy the files from one database
>>to another via file system copy and not through the shared buffer cache
>>where they would be re encrypted.  Using relfilenode seems dangerous.
>>For WAL I think it would be the WAL segment number.  It would be nice
>>to mix that with the "Database system identifier:", but are these the
>>same on primary and replicas?
>>
> 
> Can't we just store the nonce somewhere? What if for encrypted pages we
> only use/encrypt (8kB - X bytes), where X bytes is just enough to store
> the nonce and maybe some other encryption metadata (key ID?).
> 
> This would be similar to the "special" area on a page, except that that
> relies on page header which is encrypted (and thus not accessible before
> decrypting the page).
> 
> So encryption would:
> 
> 1) encrypt the (8kB - X bytes) with nonce suitable for the used
>encryption mode (sequence, random, ...)
> 
> 2) store the nonce / key ID etc. to the reserved space
> 
> and encryption would
> 
> 1) look at the encryption metadata at the end (nonce, key )
> 
> 2) decrypt the page using that info

That is pretty much what I had been envisioning.

> Or maybe we could define a new relation fork for encrypted relations,
> storing all this metadata (not sure if we need that just for the main
> fork or for all forks including vm, fsm ...)?

I like the idea of a fork if it is workable.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 4:23 AM, Tomas Vondra wrote:
> On Mon, Jul 08, 2019 at 06:24:40PM -0400, Joe Conway wrote:
>>On 7/8/19 6:04 PM, Stephen Frost wrote:
>>> * Bruce Momjian (br...@momjian.us) wrote:
 Uh, well, renaming the user was a big problem, but that is the only case
 I can think of.  I don't see that as an issue for block or WAL sequence
 numbers.  If we want to use a different nonce, we have to find a way to
 store it or look it up efficiently.  Considering the nonce size, I don't
 see how that is possible.
>>>
>>> No, this also meant that, as an attacker, I *knew* the salt ahead of
>>> time and therefore could build rainbow tables specifically for that
>>> salt.  I could also use those *same* tables for any system where that
>>> user had an account, even if they used different passwords on different
>>> systems...
>>>
>>> I appreciate that *some* of this might not be completely relevant for
>>> the way a nonce is used in cryptography, but I'd be very surprised to
>>> have a cryptographer tell me that a deterministic nonce didn't have
>>> similar issues or didn't reduce the value of the nonce significantly.
>>
>>I have worked side by side on projects with bona fide cryptographers and
>>I can assure you that they recommended random nonces. Granted, that was
>>in the early 2000s, but I don't think "modern cryptography" has changed
>>that any more than "web scale" has made Postgres irrelevant in the
>>intervening years.
>>
>>Related links:
>>
>>https://defuse.ca/cbcmodeiv.htm
>>https://www.cryptofails.com/post/70059609995/crypto-noobs-1-initialization-vectors
>>
> 
> AFAIK it very much depends on the encryption mode. CBC mode does require
> random nonces, other modes may be fine with even sequences as long as
> the values are not reused. In which case we might even use the LSN, for
> example. And I wonder if sha2(LSN) could be considered "random", but
> maybe that's entirely silly idea ...


Yeah, we worked mostly with CBC so that could be the case in terms of
what is required. But I don't think it is ever a bad idea.

But as Stephen pointed out elsewhere on this thread, I think we should
be getting our guidance from places like NIST, which has actual experts
in this stuff.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> On 2019-07-08 18:09, Joe Conway wrote:
>> In my mind, and in practice to a
>> large extent, a postgres tablespace == a unique mount point.
> 
> But a critical difference is that in file systems, a separate mount
> point has its own journal.

While it would be ideal to have separate WAL, and even separate shared
buffer pools, per tablespace, I think that is too much complexity for
the first implementation and we could have a single separate key for all
WAL for now. The main thing I don't think we want is e.g. a 50TB
database with everything encrypted with a single key -- for the reasons
previously stated.

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:


...

Notice that cost of the second plan is almost double the first one. That
means 0004 does not even generate the first plan, i.e. there are cases
where we don't try to add the explicit sort before passing the path to
generate_gather_paths().

And I think I know why is that - while gather_grouping_paths() tries to
add explicit sort below the gather merge, there are other places that
call generate_gather_paths() that don't do that. In this case it's
probably apply_scanjoin_target_to_paths() which simply builds

 parallel (seq|index) scan + gather merge

and that's it. The problem is likely the same - the code does not know
which pathkeys are "interesting" at that point. We probably need to
teach planner to do this.



I've looked into this again, and yes - that's the reason. I've added
generate_useful_gather_paths() which is a wrapper on top of
generate_gather_paths(). It essentially does what 0001 patch did directly
in generate_gather_paths() so it's more like create_grouping_paths().

And that does the trick - we now generate the cheaper paths, and I don't
see any crashes in regression tests etc.

I still suspect we already have code doing similar checks whether pathkeys
might be useful somewhere. I've looked into pathkeys.c and elsewhere but
no luck.

Attached is a slightly modified patch series:

1) 0001 considers incremental sort paths in various places (adds the new
generate_useful_gather_paths and modifies places calling create_sort_path)

2) 0002 and 0003 are fixes I mentioned before

3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
tries to nudge the optimizer into using incremental sort by essentially
making it free (i.e. using startup/total costs of the subpath). I've found
this useful when trying to force incremental sorts into plans where it may
not be the best strategy.

I won't have time to hack on this over the next ~2 weeks, but I'll try to
respond to questions when possible.



FWIW tweaking all the create_sort_path() places to also consider adding
incremental sort is a bit tedious and invasive, and it almost doubles
the amount of repetitive code. It's OK for experiment like this, but we
should try handling this in a nicer way (move to a separate function
that does both, or something like that).



This definitely needs more work. We need to refactor it in some way, e.g.
have a function that would consider both explicit sort (on the cheapest
path) and incremental sort (on all paths), and call it from all those
places. Haven't tried it, though.

There's also a couple more places where we do create_sort_path() and don't
consider incremental sort yet - window functions, distinct etc.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

>From e7e97daf447a91e090809be4f07a5eee650eb5e7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 9 Jul 2019 00:12:45 +0200
Subject: [PATCH 1/4] fix pathkey processing in generate_gather_paths

---
 src/backend/optimizer/path/allpaths.c   | 365 +++-
 src/backend/optimizer/plan/createplan.c |  10 +-
 src/backend/optimizer/plan/planner.c| 301 ++-
 src/include/optimizer/paths.h   |   2 +
 4 files changed, 673 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 3efc807164..acddbef064 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -556,7 +556,7 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 */
if (rel->reloptkind == RELOPT_BASEREL &&
bms_membership(root->all_baserels) != BMS_SINGLETON)
-   generate_gather_paths(root, rel, false);
+   generate_useful_gather_paths(root, rel, false);
 
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
@@ -2730,6 +2730,367 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo 
*rel, bool override_rows)
}
 }
 
+/*
+ * Find an equivalence class member expression, all of whose Vars, come from
+ * the indicated relation.
+ */
+static Expr *
+find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
+{
+   ListCell   *lc_em;
+
+   foreach(lc_em, ec->ec_members)
+   {
+   EquivalenceMember *em = lfirst(lc_em);
+
+   if (bms_is_subset(em->em_relids, rel->relids) &&
+   !bms_is_empty(em->em_relids))
+   {
+   /*
+* If there is more than one equivalence member whose 
Vars are
+* taken entirely from this relation, we'll be content 
to choose
+* any one of those.
+*/
+   return em->em_expr;
+   }
+   }
+
+   /* We didn't find any suitable equivalence class expres

Re: [HACKERS] Cached plans and statement generalization

2019-07-09 Thread Thomas Munro
On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
 wrote:
> Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
> I can not reproduce the problem with building documentation:

+  autoprepare_threshold (integer/type>)

The problem is that "integer/type>".  (Missing "<").  There more
than one of those.  Not sure why that doesn't fail for you.

> Also autoporepare-16.patch doesn't include any junk
>
> src/include/catalog/pg_proc.dat.~1~

Oops, yeah, sorry for the noise.  I did something stupid in my apply script.

-- 
Thomas Munro
https://enterprisedb.com




Re: [Patch] Mingw: Fix import library extension, build actual static libraries

2019-07-09 Thread Andrew Dunstan


On 4/16/19 1:22 AM, Noah Misch wrote:
> On Thu, Mar 07, 2019 at 03:23:50PM +0100, Sandro Mani wrote:
>> Related, no actual static libraries are produced alongside the respective
>> dlls. The attached patch 0002-Build-static-libraries.patch addresses this,
>> in a similar fashion as is already done for the AIX case in Makefile.shlib.
> We don't build static libraries on AIX, though Makefile.shlib uses the
> $(stlib) variable to get a name for the *.a shared library it makes.  Here's
> an example of one AIX Makefile.shlib build sequence, from
> https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hornet&dt=2019-04-15%2022%3A35%3A52&stg=make
>
> rm -f libpq.a
> ar crs libpq.a fe-auth.o fe-auth-scram.o fe-connect.o fe-exec.o fe-misc.o 
> fe-print.o fe-lobj.o fe-protocol2.o fe-protocol3.o pqexpbuffer.o fe-secure.o 
> libpq-events.o encnames.o wchar.o fe-secure-openssl.o fe-secure-common.o
> touch libpq.a
> ../../../src/backend/port/aix/mkldexport.sh libpq.a libpq.so.5 >libpq.exp
> xlc_r -qmaxmem=33554432 -D_LARGE_FILES=1  -qnoansialias -g -O2 -qmaxmem=16384 
> -qsrcmsg -D_REENTRANT -D_THREAD_SAFE -D_POSIX_PTHREAD_SEMANTICS  -o 
> libpq.so.5 libpq.a -Wl,-bE:libpq.exp -L../../../src/port 
> -L../../../src/common -lpgcommon_shlib -lpgport_shlib  
> -L/home/nm/sw/nopath/libxml2-64/lib -L/home/nm/sw/nopath/icu58.2-64/lib  
> -L/home/nm/sw/nopath/uuid-64/lib -L/home/nm/sw/nopath/openldap-64/lib 
> -L/home/nm/sw/nopath/icu58.2-64/lib -L/home/nm/sw/nopath/libxml2-64/lib 
> -Wl,-blibpath:'/home/nm/farm/xlc64/HEAD/inst/lib:/home/nm/sw/nopath/libxml2-64/lib:/home/nm/sw/nopath/icu58.2-64/lib:/home/nm/sw/nopath/uuid-64/lib:/home/nm/sw/nopath/openldap-64/lib:/home/nm/sw/nopath/icu58.2-64/lib:/home/nm/sw/nopath/libxml2-64/lib:/usr/lib:/lib'
>   -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE -lintl -lssl -lcrypto -lm -lldap_r 
> -llber -lpthreads
> rm -f libpq.a
> ar crs libpq.a libpq.so.5
>
>

I'm wondering if it wouldn't be better to set the value of stlib for
windows, instead of changes like this:



-    $(CC) $(CFLAGS)  -shared -static-libgcc -o $@  $(OBJS) $(LDFLAGS)
$(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) -Wl,--export-all-symbols
-Wl,--out-implib=$(stlib)
+    $(CC) $(CFLAGS)  -shared -static-libgcc -o $@  $(OBJS) $(LDFLAGS)
$(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) -Wl,--export-all-symbols
-Wl,--out-implib=lib$(NAME).dll.a



I'm also wondering if changing this will upset third party authors.


Thoughts?


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Ryan Lambert
Hi Thomas,

> CBC mode does require
> random nonces, other modes may be fine with even sequences as long as
> the values are not reused.

I disagree that CBC mode requires random nonces, at least based on what
NIST has published.  They only require that the IV (not the nonce) must be
unpredictable per [1]:

" For the CBC and CFB modes, the IVs must be unpredictable."

The unpredictable IV can be generated from a non-random nonce including a
counter:

"There are two recommended methods for generating unpredictable IVs. The
first method is to apply the forward cipher function, under the same key
that is used for the encryption of the plaintext, to a nonce. The nonce
must be a data block that is unique to each execution of the encryption
operation. For example, the nonce may be a counter, as described in
Appendix B, or a message number."

[1]
https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf

Thanks,
Ryan Lambert


>


Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-09 Thread Julien Rouhaud
On Tue, Jul 9, 2019 at 9:52 AM Julien Rouhaud  wrote:
>
> On Tue, Jul 9, 2019 at 9:24 AM Michael Paquier  wrote:
> >
> > I have done a lookup of this patch set with a focus on the refactoring
> > part, and the split is a bit confusing.
> [...]

I finished to do a better refactoring, and ended up with this API in
scripts_parallel:

extern ParallelSlot *ConsumeIdleSlot(ParallelSlot *slots, int numslots,
const char *progname);

extern ParallelSlot *SetupParallelSlots(const char *dbname, const char *host,
const char *port,
const char *username, bool prompt_password,
const char *progname, bool echo,
PGconn *conn, int numslots);

extern bool WaitForSlotsCompletion(ParallelSlot *slots, int numslots,
   const char *progname);

ConsumeIdleSlot() being a wrapper on top of (now static) GetIdleSlot,
which handles parallelism and possible failure.

Attached v3, including updated documentation for the new -j option.


0002-Add-parallel-processing-to-reindexdb-v3.patch
Description: Binary data


0001-Export-vacuumdb-s-parallel-infrastructure-v3.patch
Description: Binary data


Re: FETCH FIRST clause PERCENT option

2019-07-09 Thread Ryan Lambert
Surafel,

> The cost of FITCH FIRST N PERCENT execution in current implementation is
the cost of pulling the full table plus the cost of storing and fetching
the tuple from tuplestore so it can > not perform better than pulling the
full table in any case . This is because we can't determined the number of
rows to return without executing the plan until the end. We can find the >
estimation of rows that will be return in planner estimation but that is
not exact.

Ok, I can live with that for the normal use cases.  This example from the
end of my previous message using 95% seems like a problem still, I don't
like syntax that unexpectedly kills performance like this one.  If this
can't be improved in the initial release of the feature I'd suggest we at
least make a strong disclaimer in the docs, along the lines of:

"It is possible for FETCH FIRST N PERCENT to create poorly performing query
plans when the N supplied exceeds 50 percent.  In these cases query
execution can take an order of magnitude longer to execute than simply
returning the full table.  If performance is critical using an explicit row
count for limiting is recommended."

I'm not certain the 50 percent is the true threshold of where things start
to fall apart, I just used that as a likely guess for now.  I can do some
more testing this week to identify where things start falling apart
performance wise.  Thanks,

EXPLAIN (ANALYZE, COSTS)
WITH t AS (
SELECT id, v1, v2
FROM r10mwide
FETCH FIRST 95 PERCENT ROWS ONLY
) SELECT AVG(v1), MIN(v1), AVG(v1 + v2) FROM t
;
   QUERY PLAN

-
 Aggregate  (cost=651432.48..651432.49 rows=1 width=24) (actual
time=58981.043..58981.044 rows=1 loops=1)
   ->  Limit  (cost=230715.67..461431.34 rows=9500057 width=20) (actual
time=0.017..55799.389 rows=950 loops=1)
 ->  Seq Scan on r10mwide  (cost=0.00..242858.60 rows=1060
width=20) (actual time=0.014..3847.146 rows=1000 loops=1)
 Planning Time: 0.117 ms
 Execution Time: 59079.680 ms
(5 rows)

Ryan Lambert

>
>


Re: Contribution to Perldoc for TestLib module in Postgres

2019-07-09 Thread Daniel Gustafsson
> On 7 Apr 2019, at 20:04, Ramanarayana  wrote:

> Please find the updated patch. Added to the commitfest as well

The v2 patch is somewhat confused as it has Windows carriage returns rather
than newlines, so it replaces the entire file making the diff hard to read.  It
also includes a copy of TestLib and the v1 patch and has a lot of whitespace
noise.

Please redo the patch on a clean tree to get a more easily digestable patch.

cheers ./daniel





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 8:39 AM, Ryan Lambert wrote:
> Hi Thomas,
> 
>> CBC mode does require
>> random nonces, other modes may be fine with even sequences as long as
>> the values are not reused.   
> 
> I disagree that CBC mode requires random nonces, at least based on what
> NIST has published.  They only require that the IV (not the nonce) must
> be unpredictable per [1]:
> 
> " For the CBC and CFB modes, the IVs must be unpredictable."
> 
> The unpredictable IV can be generated from a non-random nonce including
> a counter:
> 
> "There are two recommended methods for generating unpredictable IVs. The
> first method is to apply the forward cipher function, under the same key
> that is used for the encryption of the plaintext, to a nonce. The nonce
> must be a data block that is unique to each execution of the encryption
> operation. For example, the nonce may be a counter, as described in
> Appendix B, or a message number."
> 
> [1] 
> https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf


The terms nonce and IV are often used more-or-less interchangeably, and
it is important to be clear when we are talking about an IV specifically
- an IV is a specific type of nonce. Nonce means "number used once".
i.e. unique, whereas an IV (for CBC use anyway) should be unique and
random but not necessarily kept secret. The NIST requirements that
Stephen referenced elsewhere on this thread are as I understand it
intended to ensure the random but unique property with high probability.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


RE: extension patch of CREATE OR REPLACE TRIGGER

2019-07-09 Thread osumi.takami...@fujitsu.com
Dear Michael san

> > So, I'll fix this within a couple of weeks.
> Please note that I have switched the patch as waiting on author.

Thanks for your support.
I've rebased the previous patch to be applied
to the latest PostgreSQL without any failure of regression tests.

Best, 
Takamichi Osumi


CREATE_OR_REPLACE_TRIGGER_v02.patch
Description: CREATE_OR_REPLACE_TRIGGER_v02.patch


[PATCH] Fix trigger argument propagation to child partitions

2019-07-09 Thread Patrick McHardy
The following patch fixes propagation of arguments to the trigger
function to child partitions both when initially creating the trigger
and when adding new partitions to a partitioned table.

The included regression test should demonstrate the problem, for clarity
repeated in slightly more readable form here:

bb=> create table parted_trig (a int) partition by list (a);
CREATE TABLE
bb=> create table parted_trig1 partition of parted_trig for values in (1);
CREATE TABLE
bb=> create or replace function trigger_notice() returns trigger as $$
bb$>   declare
bb$> arg1 text = TG_ARGV[0];
bb$> arg2 integer = TG_ARGV[1];
bb$>   begin
bb$> raise notice 'trigger % on % % % for % args % %', TG_NAME, 
TG_TABLE_NAME, TG_WHEN, TG_OP, TG_LEVEL, arg1, arg2;
bb$> return null;
bb$>   end;
bb$>   $$ language plpgsql;
CREATE FUNCTION
bb=> create trigger aaa after insert on parted_trig for each row execute 
procedure trigger_notice('text', 1);
CREATE TRIGGER
bb=> \d parted_trig
 Tabelle »public.parted_trig«
 Spalte |   Typ   | Sortierfolge | NULL erlaubt? | Vorgabewert 
+-+--+---+-
 a  | integer |  |   | 
Partitionsschlüssel: LIST (a)
Trigger:
aaa AFTER INSERT ON parted_trig FOR EACH ROW EXECUTE PROCEDURE 
trigger_notice('text', '1')
Anzahl Partitionen: 1 (Mit \d+ alle anzeigen.)

bb=> \d parted_trig1
 Tabelle »public.parted_trig1«
 Spalte |   Typ   | Sortierfolge | NULL erlaubt? | Vorgabewert 
+-+--+---+-
 a  | integer |  |   | 
Partition von: parted_trig FOR VALUES IN (1)
Trigger:
aaa AFTER INSERT ON parted_trig1 FOR EACH ROW EXECUTE PROCEDURE 
trigger_notice()

Fixed:

bb=> \d parted_trig1
 Tabelle »public.parted_trig1«
 Spalte |   Typ   | Sortierfolge | NULL erlaubt? | Vorgabewert 
+-+--+---+-
 a  | integer |  |   | 
Partition von: parted_trig FOR VALUES IN (1)
Trigger:
aaa AFTER INSERT ON parted_trig1 FOR EACH ROW EXECUTE PROCEDURE 
trigger_notice('text', '1')

Patch is against 11.4, but applies to master with minor offset.

All regression test pass.
commit eb33c2cc0047bee6fb6ae20fd11aec438c3463da
Author: Patrick McHardy 
Date:   Tue Jul 9 14:41:21 2019 +0200

Fix trigger argument propagation to child partitions

When creating a trigger on a partitioned table, the trigger is
propagated to child partitions. However arguments specified for the
trigger function are not propagated, leading to the trigger function
being called without any arguments.

Similarly, when attaching a new partition to a partitioned table, the
trigger is copied without arguments.

Fix this by:

- Passing the original argument list to CreateTrigger when creating
  triggers for existing partitions.

- Reconstructing the original argument list from the heap and passing it
  to CreateTrigger when creating triggers for newly attached partitions.

Additionally add a simple regression test for both cases.

Introduced in 86f575948c7.

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5c278463c6..655d39ef71 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15145,6 +15145,7 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 		Datum		value;
 		bool		isnull;
 		List	   *cols = NIL;
+		List	   *args = NIL;
 		MemoryContext oldcxt;
 
 		/*
@@ -15209,11 +15210,28 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 			}
 		}
 
+		/* Reconstruct trigger arguments list. */
+		if (trigForm->tgnargs > 0)
+		{
+			char 	   *p;
+			int			i;
+
+			value = heap_getattr(tuple, Anum_pg_trigger_tgargs,
+ RelationGetDescr(pg_trigger), &isnull);
+			p = (char *) VARDATA_ANY(DatumGetByteaPP(value));
+
+			for (i = 0; i < trigForm->tgnargs; i++)
+			{
+args = lappend(args, makeString(pstrdup(p)));
+p += strlen(p) + 1;
+			}
+		}
+
 		trigStmt = makeNode(CreateTrigStmt);
 		trigStmt->trigname = NameStr(trigForm->tgname);
 		trigStmt->relation = NULL;
 		trigStmt->funcname = NULL;	/* passed separately */
-		trigStmt->args = NULL;	/* passed separately */
+		trigStmt->args = args;
 		trigStmt->row = true;
 		trigStmt->timing = trigForm->tgtype & TRIGGER_TYPE_TIMING_MASK;
 		trigStmt->events = trigForm->tgtype & TRIGGER_TYPE_EVENT_MASK;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f6c7a3fefc..a716827c77 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -1153,7 +1153,6 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 */
 			childStmt = (CreateTrigStmt *) copyObject(stmt);
 			childStmt->funcname = NIL;
-			childStmt->args = NIL;
 			childStmt->whenClause = NULL;
 
 			/* If there 

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-09 Thread James Coleman
On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
 wrote:
>
> On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:
> >On Mon, Jul 8, 2019 at 10:58 AM Tomas Vondra
> > wrote:
> >>
> >> On Mon, Jul 08, 2019 at 10:32:18AM -0400, James Coleman wrote:
> >> >On Mon, Jul 8, 2019 at 9:59 AM Tomas Vondra
> >> > wrote:
> >> >>
> >> >> On Mon, Jul 08, 2019 at 09:22:39AM -0400, James Coleman wrote:
> >> >> >On Sun, Jul 7, 2019 at 5:02 PM Tomas Vondra
> >> >> > wrote:
> >> >> >> We're running query like this:
> >> >> >>
> >> >> >>   SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a HAVING 
> >> >> >> avg(b) < 3 ORDER BY 1, 2, 3
> >> >> >>
> >> >> >> but we're trying to add the incremental sort *before* the 
> >> >> >> aggregation,
> >> >> >> because the optimizer also considers group aggregate with a sorted
> >> >> >> input. And (a) is a prefix of (a,sum(b),count(b)) so we think we
> >> >> >> actually can do this, but clearly that's nonsense, because we can't
> >> >> >> possibly know the aggregates yet. Hence the error.
> >> >> >>
> >> >> >> If this is the actual issue, we need to ensure we actually can 
> >> >> >> evaluate
> >> >> >> all the pathkeys. I don't know how to do that yet. I thought that 
> >> >> >> maybe
> >> >> >> we should modify pathkeys_common_contained_in() to set 
> >> >> >> presorted_keys to
> >> >> >> 0 in this case.
> >> >> >>
> >> >> >> But then I started wondering why we don't see this issue even for
> >> >> >> regular (non-incremental-sort) paths built in create_ordered_paths().
> >> >> >> How come we don't see these failures there? I've modified costing to
> >> >> >> make all incremental sort paths very cheap, and still nothing.
> >> >> >
> >> >> >I assume you mean you modified costing to make regular sort paths very 
> >> >> >cheap?
> >> >> >
> >> >>
> >> >> No, I mean costing of incremental sort paths, so that they end up being
> >> >> the cheapest ones. If some other path is cheaper, we won't see the error
> >> >> because it only happens when building plan from the cheapest path.
> >> >
> >> >Ah, I misunderstood as you trying to figure out a way to try to cause
> >> >the same problem with a regular sort.
> >> >
> >> >> >> So presumably there's a check elsewhere (either implicit or 
> >> >> >> explicit),
> >> >> >> because create_ordered_paths() uses pathkeys_common_contained_in() 
> >> >> >> and
> >> >> >> does not have the same issue.
> >> >> >
> >> >> >Given this comment in create_ordered_paths():
> >> >> >
> >> >> >  generate_gather_paths() will have already generated a simple Gather
> >> >> >  path for the best parallel path, if any, and the loop above will have
> >> >> >  considered sorting it.  Similarly, generate_gather_paths() will also
> >> >> >  have generated order-preserving Gather Merge plans which can be used
> >> >> >  without sorting if they happen to match the sort_pathkeys, and the 
> >> >> > loop
> >> >> >  above will have handled those as well.  However, there's one more
> >> >> >  possibility: it may make sense to sort the cheapest partial path
> >> >> >  according to the required output order and then use Gather Merge.
> >> >> >
> >> >> >my understanding is that generate_gather_paths() only considers paths
> >> >> >that already happen to be sorted (not explicit sorts), so I'm
> >> >> >wondering if it would make more sense for the incremental sort path
> >> >> >creation for this case to live alongside the explicit ordered path
> >> >> >creation in create_ordered_paths() rather than in
> >> >> >generate_gather_paths().
> >> >> >
> >> >>
> >> >> How would that solve the issue? Also, we're building a gather path, so
> >> >> I think generate_gather_paths() is the right place where to do it. And
> >> >> we're not changing the semantics of generate_gather_paths() - the result
> >> >> path should be sorted "correctly" with respect to sort_pathkeys.
> >> >
> >> >Does that imply what the explicit sort in create_ordered_paths() is in
> >> >the wrong spot?
> >> >
> >>
> >> I think those are essentially the right places where to do this sort of
> >> stuff. Maybe there's a better place, but I don't think those places are
> >> somehow wrong.
> >>
> >> >Or, to put it another way, do you think that both kinds of sorts
> >> >should be added in the same place? It seems confusing to me that
> >> >they'd be split between the two methods (unless I'm completely
> >> >misunderstanding how the two work).
> >> >
> >>
> >> The paths built in those two places were very different in one aspect:
> >>
> >> 1) generate_gather_paths only ever looked at pathkeys for the subpath, it
> >> never even looked at ordering expected by paths above it (or the query as
> >> a whole). Plain Gather ignores pathkeys entirely, Gather Merge only aims
> >> to maintain ordering of the different subpaths.
> >>
> >> 2) create_oredered_paths is meant to enforce ordering needed by upper
> >> parts of the plan - either by using a properly sorted path, or adding an
> >> explicit sort.
> >>
> >>
> >> We want to exten

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-09 Thread James Coleman
On Mon, Jul 8, 2019 at 6:37 PM Alexander Korotkov
 wrote:
>
> On Thu, Jul 4, 2019 at 4:25 PM James Coleman  wrote:
> > Process questions:
> > - Do I need to explicitly move the patch somehow to the next CF?
>
> We didn't manage to register it on current (July) commitfest.  So,
> please, register it on next (September) commitfest.

I've moved it to the September cf.

> > - Since I've basically taken over patch ownership, should I move my
> > name from reviewer to author in the CF app? And can there be two
> > authors listed there?
>
> Sure, you're co-author of this patch.  Two or more authors could be
> listed at CF app, you can find a lot of examples on the list.

Done.

James Coleman




Re: Index Skip Scan

2019-07-09 Thread Jesper Pedersen

Hi,

On 7/4/19 6:59 AM, Thomas Munro wrote:

For the MIN query you just need a path with Pathkeys: { i ASC, j ASC
}, UniqueKeys: { i, j }, doing the MAX query you just need j DESC.




David, are you thinking about something like the attached ?

Some questions.

* Do you see UniqueKey as a "complete" planner node ?
  - I didn't update the nodes/*.c files for this yet

* Is a UniqueKey with a list of EquivalenceClass best, or a list of 
UniqueKey with a single EquivalenceClass


Likely more questions around this coming -- should this be a separate 
thread ?


Based on this I'll start to update the v21 patch to use UniqueKey, and 
post a new version.



While updating the Loose Index Scan wiki page with links to other
products' terminology on this subject, I noticed that MySQL can
skip-scan MIN() and MAX() in the same query.  Hmm.  That seems quite
desirable.  I think it requires a new kind of skipping: I think you
have to be able to skip to the first AND last key that has each
distinct prefix, and then stick a regular agg on top to collapse them
into one row.  Such a path would not be so neatly describable by
UniqueKeys, or indeed by the amskip() interface in the current patch.
I mention all this stuff not because I want us to run before we can
walk, but because to be ready to commit the basic distinct skip scan
feature, I think we should know approximately how it'll handle the
future stuff we'll need.



Thomas, do you have any ideas for this ? I can see that MySQL did the 
functionality in two change sets (base and function support), but like 
you said we shouldn't paint ourselves into a corner.


Feedback greatly appreciated.

Best regards,
 Jesper
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index 4b9e141404..2e07db2e6e 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -459,6 +459,44 @@ print_pathkeys(const List *pathkeys, const List *rtable)
printf(")\n");
 }
 
+/*
+ * print_unique_key -
+ *   unique_key an UniqueKey
+ */
+void
+print_unique_key(const UniqueKey *unique_key, const List *rtable)
+{
+   ListCell   *l;
+
+   printf("(");
+   foreach(l, unique_key->eq_clauses)
+   {
+   EquivalenceClass *eclass = (EquivalenceClass *) lfirst(l);
+   ListCell   *k;
+   boolfirst = true;
+
+   /* chase up */
+   while (eclass->ec_merged)
+   eclass = eclass->ec_merged;
+
+   printf("(");
+   foreach(k, eclass->ec_members)
+   {
+   EquivalenceMember *mem = (EquivalenceMember *) 
lfirst(k);
+
+   if (first)
+   first = false;
+   else
+   printf(", ");
+   print_expr((Node *) mem->em_expr, rtable);
+   }
+   printf(")");
+   if (lnext(l))
+   printf(", ");
+   }
+   printf(")\n");
+}
+
 /*
  * print_tl
  *   print targetlist in a more legible way.
diff --git a/src/backend/optimizer/path/Makefile 
b/src/backend/optimizer/path/Makefile
index 6864a62132..8249a6b6db 100644
--- a/src/backend/optimizer/path/Makefile
+++ b/src/backend/optimizer/path/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = allpaths.o clausesel.o costsize.o equivclass.o indxpath.o \
-   joinpath.o joinrels.o pathkeys.o tidpath.o
+   joinpath.o joinrels.o pathkeys.o tidpath.o uniquekey.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index b7723481b0..a8c8fe8a30 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3957,6 +3957,14 @@ print_path(PlannerInfo *root, Path *path, int indent)
print_pathkeys(path->pathkeys, root->parse->rtable);
}
 
+   if (path->unique_key)
+   {
+   for (i = 0; i < indent; i++)
+   printf("\t");
+   printf("  unique_key: ");
+   print_unique_key(path->unique_key, root->parse->rtable);
+   }
+
if (join)
{
JoinPath   *jp = (JoinPath *) path;
diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index a2a9b1f7be..dbd0bbf3dc 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -705,6 +705,11 @@ cost_index(IndexPath *path, PlannerInfo *root, double 
loop_count,
path->path.parallel_aware = true;
}
 
+   /* Consider cost based on unique key */
+   if (path->path.unique_key)
+   {
+   }
+
/*
 * Now interpolate based on estimated index order correlation to get 
total
 * disk I/O cost for main table accesses.
diff --git a/src/backend/optimizer/path/uniquekey.c 

Re: [HACKERS] WIP: Aggregation push-down

2019-07-09 Thread Antonin Houska
Richard Guo  wrote:

> Another rebase is needed for the patches.

Done.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

>From f656bd8d46afb9cb0a331cf3d34b9eed39f5e360 Mon Sep 17 00:00:00 2001
From: Antonin Houska 
Date: Tue, 9 Jul 2019 15:30:13 +0200
Subject: [PATCH 1/3] Introduce RelInfoList structure.

---
 contrib/postgres_fdw/postgres_fdw.c|   3 +-
 src/backend/nodes/outfuncs.c   |  11 +++
 src/backend/optimizer/geqo/geqo_eval.c |  12 +--
 src/backend/optimizer/plan/planmain.c  |   3 +-
 src/backend/optimizer/util/relnode.c   | 157 -
 src/include/nodes/nodes.h  |   1 +
 src/include/nodes/pathnodes.h  |  28 --
 7 files changed, 136 insertions(+), 79 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2556..90414f1168 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5205,7 +5205,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 	 */
 	Assert(fpinfo->relation_index == 0);	/* shouldn't be set yet */
 	fpinfo->relation_index =
-		list_length(root->parse->rtable) + list_length(root->join_rel_list);
+		list_length(root->parse->rtable) +
+		list_length(root->join_rel_list->items);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 8400dd319e..4529b5c63b 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2278,6 +2278,14 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_NODE_FIELD(partitioned_child_rels);
 }
 
+static void
+_outRelInfoList(StringInfo str, const RelInfoList *node)
+{
+	WRITE_NODE_TYPE("RELOPTINFOLIST");
+
+	WRITE_NODE_FIELD(items);
+}
+
 static void
 _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
 {
@@ -4052,6 +4060,9 @@ outNode(StringInfo str, const void *obj)
 			case T_RelOptInfo:
 _outRelOptInfo(str, obj);
 break;
+			case T_RelInfoList:
+_outRelInfoList(str, obj);
+break;
 			case T_IndexOptInfo:
 _outIndexOptInfo(str, obj);
 break;
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 6c69c1c147..c69f3469ba 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -92,11 +92,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	 *
 	 * join_rel_level[] shouldn't be in use, so just Assert it isn't.
 	 */
-	savelength = list_length(root->join_rel_list);
-	savehash = root->join_rel_hash;
+	savelength = list_length(root->join_rel_list->items);
+	savehash = root->join_rel_list->hash;
 	Assert(root->join_rel_level == NULL);
 
-	root->join_rel_hash = NULL;
+	root->join_rel_list->hash = NULL;
 
 	/* construct the best path for the given combination of relations */
 	joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	 * Restore join_rel_list to its former state, and put back original
 	 * hashtable if any.
 	 */
-	root->join_rel_list = list_truncate(root->join_rel_list,
-		savelength);
-	root->join_rel_hash = savehash;
+	root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+			   savelength);
+	root->join_rel_list->hash = savehash;
 
 	/* release all the memory acquired within gimme_tree */
 	MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 2dbf1db844..0bc8a6 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,7 @@ query_planner(PlannerInfo *root,
 	 * NOTE: append_rel_list was set up by subquery_planner, so do not touch
 	 * here.
 	 */
-	root->join_rel_list = NIL;
-	root->join_rel_hash = NULL;
+	root->join_rel_list = makeNode(RelInfoList);
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 6054bd2b53..c238dd6538 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -31,11 +31,11 @@
 #include "utils/hsearch.h"
 
 
-typedef struct JoinHashEntry
+typedef struct RelInfoEntry
 {
-	Relids		join_relids;	/* hash key --- MUST BE FIRST */
-	RelOptInfo *join_rel;
-} JoinHashEntry;
+	Relids		relids;			/* hash key --- MUST BE FIRST */
+	void	   *data;
+} RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
 RelOptInfo *input_rel);
@@ -375,11 +375,11 @@ find_base_rel(PlannerInfo *root, int relid)
 }
 
 /*
- * build_join_rel_hash
- *	  Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ *	  Construct the auxiliary hash table for relation specific data.
  */
 static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
 {
 	HTAB	   *hashtab;
 	HASHCTL		hash_ctl;
@@ -388,47 

Re: Feature: Add Greek language fulltext search

2019-07-09 Thread Panagiotis Mavrogiorgos
On Thu, Jul 4, 2019 at 1:39 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-25 12:04, Panagiotis Mavrogiorgos wrote:
> > Last November snowball added support for Greek language [1]. Following
> > the instructions [2], I wrote a patch that adds fulltext search for
> > Greek in Postgres. The patch is attached.
>
> I have committed a full sync from the upstream snowball repository,
> which pulled in the new greek stemmer.
>
> Could you please clarify where you got the stopword list from?  The
> README says those need to be downloaded separately, but I wasn't able to
> find the download location.  It would be good to document this, for
> example in the commit message.  I haven't committed the stopword list yet.
>

Thank you Peter,

Here is the repo with the stop-words:
https://github.com/pmav99/greek_stopwords
The list is based on an earlier publication with modification by me. All
the relevant info is on github.

Disclaimer 1: The list has not been validated by an expert.

Disclaimer 2: There are more stop-words lists on the internet, but they are
less complete and they also use ancient greek words. Furthermore, my
testing showed that snowball needs to handle accents (tonous) and ς (teliko
sigma) in a special way if you want the stemmer to work with capitalized
words too.

https://github.com/Xangis/extra-stopwords/blob/master/greek
https://github.com/stopwords-iso/stopwords-el/tree/master/raw

all the best,
Panagiotis


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Tue, Jul  9, 2019 at 08:01:35AM -0400, Joe Conway wrote:
> On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> > On 2019-07-08 18:09, Joe Conway wrote:
> >> In my mind, and in practice to a
> >> large extent, a postgres tablespace == a unique mount point.
> > 
> > But a critical difference is that in file systems, a separate mount
> > point has its own journal.
> 
> While it would be ideal to have separate WAL, and even separate shared
> buffer pools, per tablespace, I think that is too much complexity for
> the first implementation and we could have a single separate key for all
> WAL for now. 

Agreed.  I have thought about this some more.  There is certainly value
in layered security, so if something gets violated, it doesn't open the
whole system.  However, I think the layering has to be done at the right
levels, and I think you want levels that have clear boundaries, like IP
filtering or monitoring.  Placing a boundary inside the database seems
much too complex a level to be effective.  Using separate encrypted and
unencrypted clusters and allowing the encrypted cluster to query the
unencrypted clusters using FDWs does seem like good layering, though the
FDW queries might leak information.

> The main thing I don't think we want is e.g. a 50TB
> database with everything encrypted with a single key -- for the reasons
> previously stated.

Yes, I think we need to research in which cases the nonce must be
random, and how much key space the secret+nonce gives us.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Mon, Jul  8, 2019 at 06:45:50PM -0400, Bruce Momjian wrote:
> On Mon, Jul  8, 2019 at 06:23:13PM -0400, Bruce Momjian wrote:
> > Yes, 'postgres' can be used to create a nice md5 rainbow table that
> > works on many servers --- good point.  Are rainbow tables possible with
> > something like AES?
> > 
> > > I appreciate that *some* of this might not be completely relevant for
> > > the way a nonce is used in cryptography, but I'd be very surprised to
> > > have a cryptographer tell me that a deterministic nonce didn't have
> > > similar issues or didn't reduce the value of the nonce significantly.
> > 
> > This post:
> > 
> > 
> > https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> > 
> > says:
> > 
> > GCM is a variation on Counter Mode (CTR).  As you say, with any variant
> > of Counter Mode, it is essential  that the Nonce is not repeated with
> > the same key.  Hence CTR mode  Nonces often include either a counter or
> > a timer element: something that  is guaranteed not to repeat over the
> > lifetime of the key.
> > 
> > CTR is what we use for WAL.  8k pages, we would use CBC, which says we
> > need a random nonce.  I need to dig deeper into ECB mode attack.
> 
> Looking here:
> 
>   
> https://stackoverflow.com/questions/36760973/why-is-random-iv-fine-for-aes-cbc-but-not-for-aes-gcm
> 
> I think the issues is that we can't use a _counter_ for the nonce since
> each page-0 of each table would use the same nonce, and each page-1,
> etc.  I assume we would use the table oid and page number as the nonce. 
> We can't use the database oid since we copy the files from one database
> to another via file system copy and not through the shared buffer cache
> where they would be re encrypted.  Using relfilenode seems dangerous. 

FYI, pg_upgrade already preserves the pg_class.oid, which is why I
recommended it over pg_class.relfilenode:


https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_upgrade/pg_upgrade.c;h=ff78929707ef12699a7579274693f6020c54c755;hb=HEAD#l14

We control all assignments of pg_class.oid (and relfilenode) so toast
oids are the same between old and new clusters.  This is important
because toast oids are stored as toast pointers in user tables.

While pg_class.oid and pg_class.relfilenode are initially the same
in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
FULL.  In the new cluster, pg_class.oid and pg_class.relfilenode will
be the same and will match the old pg_class.oid value.  Because of
this, old/new pg_class.relfilenode values will not match if CLUSTER,
REINDEX, or VACUUM FULL have been performed in the old cluster.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

2019-07-09 Thread Paul Guo
Yes, the patches changed Makefile so that pg_rewind and pg_basebackup could
use some common code, but for Windows build, I'm not sure where are those
window build files. Does anyone know about that? Thanks.

On Tue, Jul 9, 2019 at 6:55 AM Thomas Munro  wrote:

> On Tue, Jul 2, 2019 at 5:46 PM Paul Guo  wrote:
> > Rebased, aligned with recent changes in pg_rewind/pg_basebackup and then
> retested. Thanks.
>
> Hi Paul,
>
> A minor build problem on Windows:
>
> src/bin/pg_rewind/pg_rewind.c(32): fatal error C1083: Cannot open
> include file: 'backup_common.h': No such file or directory
> [C:\projects\postgresql\pg_rewind.vcxproj]
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.appveyor.com_project_postgresql-2Dcfbot_postgresql_build_1.0.46422&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=Usi0ex6Ch92MsB5QQDgYFw&m=9FvwTFotsG_UdMt_xNEvMpM7_kKgR-hV4Fg8mnseaNM&s=cieAr5np_1qgfD3tXImqOJcdIpBzgBco-pm1TLLUUuI&e=
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__cfbot.cputube.org_paul-2Dguo.html&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=Usi0ex6Ch92MsB5QQDgYFw&m=9FvwTFotsG_UdMt_xNEvMpM7_kKgR-hV4Fg8mnseaNM&s=RkCg3MktPW2gi4I_fAkHqI8i3anSADrz0MXq2VaqmFE&e=
>
> --
> Thomas Munro
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__enterprisedb.com&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=Usi0ex6Ch92MsB5QQDgYFw&m=9FvwTFotsG_UdMt_xNEvMpM7_kKgR-hV4Fg8mnseaNM&s=N8IZBBSK2EyREasMrbBQqHTHJwe1NbCBLEsxP-8C1Hk&e=
>


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Mon, Jul  8, 2019 at 09:57:57PM -0600, Ryan Lambert wrote:
> Hey everyone,
> 
> Here is my input regarding nonces and randomness.
> 
> > As I understand it, the NIST recommendation is a 96-bit *random* nonce,
> 
> I could not find that exact requirement in the NIST documents, though given 
> the
> volume of that library it would be possible to miss.  The recommendation I
> repeatedly saw for the nonce was unique.  There is also an important
> distinction that the nonce is not the Initialization Vector (IV), it can be
> used as part of the IV, more on that later.
> 
> The most succinct definition for nonce I found was in SP-800-38A [1] page 4:
>  "A value that is only used once."
> SP-800-90A [2] (page 6) expands on the definition: "A time-varying value that
> has at most a negligible chance of repeating, e.g., a random value that is
> generated anew for each use, a timestamp, a sequence number, or some
> combination of these."
> 
> The second definition references randomness but does not require it.  [1] (pg
> 19) reinforces the importance of uniqueness:  "A procedure should be
> established to ensure the uniqueness of the message nonces"

Yes, that is what I remembered but the URL I referenced stated
randomness is preferred.  I was hopeful that whatever was preferring
randomness was trying to avoid a problem we didn't have.

> Further, knowing the nonce gets you nowhere, it isn't the salt until it is ran
> through the forward cipher with the encryption key.  Even with the nonce the
> attacker has doesn't know the salt unless they steal the key, and that's a
> bigger problem.

Yes, I had forgotten about that step --- good point, meaning that the
nonce for block zero is different for every encryption key.

> The strictest definition of nonce I found was in [2] (pg 19) defining nonces 
> to
> use in the process of random generation:
> 
> "The nonce shall be either:
> a. A value with at least (security_strength/2) bits of entropy, or
> b. A value that is expected to repeat no more often than a (security_strength/
> 2)-bit random
> string would be expected to repeat."
> 
> Even there it is randomness (a) or uniqueness (b).

Thanks, this was very helpful.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Tue, Jul  9, 2019 at 10:34:06AM +0200, Tomas Vondra wrote:
> > I think the issues is that we can't use a _counter_ for the nonce since
> > each page-0 of each table would use the same nonce, and each page-1,
> > etc.  I assume we would use the table oid and page number as the nonce.
> > We can't use the database oid since we copy the files from one database
> > to another via file system copy and not through the shared buffer cache
> > where they would be re encrypted.  Using relfilenode seems dangerous.
> > For WAL I think it would be the WAL segment number.  It would be nice
> > to mix that with the "Database system identifier:", but are these the
> > same on primary and replicas?
> > 
> 
> Can't we just store the nonce somewhere? What if for encrypted pages we
> only use/encrypt (8kB - X bytes), where X bytes is just enough to store
> the nonce and maybe some other encryption metadata (key ID?).

Storing the nonce on each 8k page is going to add complexity, so I am
trying to figure out if it is a security requirement.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 09:28:42AM -0400, James Coleman wrote:

On Mon, Jul 8, 2019 at 9:37 PM Tomas Vondra
 wrote:


On Mon, Jul 08, 2019 at 12:07:06PM -0400, James Coleman wrote:
> ...
>
>I guess I'm still not following. If (2) is responsible (currently) for
>adding an explicit sort, why wouldn't adding an incremental sort be an
>example of that responsibility? The subpath that either a Sort or
>IncrementalSort is being added on top of (to then feed into the
>GatherMerge) is the same in both cases right?
>
>Unless you're saying that the difference is that since we have a
>partial ordering already for incremental sort then incremental sort
>falls into the category of "maintaining" existing ordering of the
>subpath?
>

Oh, I think I understand what you're saying. Essentially, we should not
be making generate_gather_paths responsible for adding the incremental
sort. Instead, we should be looking at places than are adding explicit
sort (using create_sort_path) and also consider adding incremental sort.

I definitely agree with the second half - we should look at all places
that create explicit sorts and make them also consider incremental
sorts. That makes sense.


Yep, exactly.



If I remember correctly, one of the previous patch versions (in the early
2018 commitfests) actually modified many of those places, but it did that
in a somewhat "naive" way. It simply used incremental sort whenever the
path was partially sorted, or something like that. So it did not use
costing properly. There was an attempt to fix that in the last commitfest
but the costing model was deemed to be a bit too rough and unreliable
(especially the ndistinct estimates can be quite weak), so the agreement 
was to try salvaging the patch for PG11 by only considering incremental

sort in "safest" places with greatest gains.

We've significantly improved the costing model since then, and the
implementation likely handles the corner cases much better. But that does
not mean we have to introduce the incremental sort to all those places at
once - it might be wise to split that into separate patches.

It's not just about picking the right plan - we've kinda what impact these
extra paths might have on planner performance, so maybe we should look at
that too. And the impact might be different for each of those places.

I'll leave that up to you, but I certainly won't insist on doing it all in
one huge patch.


But I'm not sure it'll address all cases - the problem is that those
places add the explicit sort because they need sorted input. Gather
Merge does not do that, it only preserves existing ordering of paths.

So it's possible the path does not have an explicit sort on to, and
gather merge will not know to add it. And once we have the gather merge
in place, we can't push place "under" it.


That's the explanation I was missing; and it makes sense (to restate:
sometimes sorting is useful even when not required for correctness of
the user returned data).



Yes, although even when the sorting is required for correctness (because
the user specified ORDER BY) you can do it at different points in the
plan. We'd still produce correct results, but the sort might be done at
the very end without these changes.

For example we might end up with plans

  Incremental Sort (presorted: a, path keys: a,b)
   -> Gather Merge (path keys: a)
   -> Index Scan (path keys: a)

but with those changes we might push the incremental sort down into the
parallel part:

  Gather Merge (path keys: a,b)
   -> Incremental Sort (presorted: a, path keys: a,b)
   -> Index Scan (path keys: a)

which is likely better. Perhaps that's what you meant, though.



In fact, we already have code dealing with this "issue" for a special
case - see gather_grouping_paths(). It generates plain gather merge
paths, but then also considers building one with explicit sort. But it
only does that for grouping paths (when it's clear we need to be looking
at grouping_pathkeys), and there are generate_gather_paths() that don't
have similar treatment.


I just find it humorous both of us were writing separate emails
mentioning that function at the same time.



;-)


...

And I think I know why is that - while gather_grouping_paths() tries to
add explicit sort below the gather merge, there are other places that
call generate_gather_paths() that don't do that. In this case it's
probably apply_scanjoin_target_to_paths() which simply builds

   parallel (seq|index) scan + gather merge

and that's it. The problem is likely the same - the code does not know
which pathkeys are "interesting" at that point. We probably need to
teach planner to do this.


I had also noticed that that was an obvious place where
generate_gather_paths() was used but an explicit sort wasn't also
added separately, which makes me think the division of labor is
probably currently wrong regardless of the incremental sort patch.

Do you agree? Should we try to fix that (likely with your new
"interesting paths" version of g

Re: fix for BUG #3720: wrong results at using ltree

2019-07-09 Thread Oleg Bartunov
On Mon, Jul 8, 2019 at 7:22 AM Thomas Munro  wrote:
>
> On Sun, Apr 7, 2019 at 3:46 AM Tom Lane  wrote:
> > =?UTF-8?Q?Filip_Rembia=C5=82kowski?=  writes:
> > > Here is my attempt to fix a 12-years old ltree bug (which is a todo item).
> > > I see it's not backward-compatible, but in my understanding that's
> > > what is documented. Previous behavior was inconsistent with
> > > documentation (where single asterisk should match zero or more
> > > labels).
> > > http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php
>
> [...]
>
> > In short, I'm wondering if we should treat this as a documentation
> > bug not a code bug.  But to do that, we'd need a more accurate
> > description of what the code is supposed to do, because the statement
> > quoted above is certainly not a match to the actual behavior.
>
> This patch doesn't apply.  More importantly, it seems like we don't
> have a consensus on whether we want it.
>
> Teodor, Oleg, would you like to offer an opinion here?  If I
> understand correctly, the choices are doc change, code/comment change
> or WONT_FIX.  This seems to be an entry that we can bring to a
> conclusion in this CF with some input from the ltree experts.

We are currently very busy and will look at the problem (and dig into
our memory)
later.  There is also another ltree patch
(https://commitfest.postgresql.org/23/1977/), it would be
nice if Filip try it.

>
> --
> Thomas Munro
> https://enterprisedb.com



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Jul  9, 2019 at 08:01:35AM -0400, Joe Conway wrote:
> > On 7/9/19 6:07 AM, Peter Eisentraut wrote:
> > > On 2019-07-08 18:09, Joe Conway wrote:
> > >> In my mind, and in practice to a
> > >> large extent, a postgres tablespace == a unique mount point.
> > > 
> > > But a critical difference is that in file systems, a separate mount
> > > point has its own journal.
> > 
> > While it would be ideal to have separate WAL, and even separate shared
> > buffer pools, per tablespace, I think that is too much complexity for
> > the first implementation and we could have a single separate key for all
> > WAL for now. 

I agree that all of that isn't necessary for an initial implementation,
I was rather trying to lay out how we could improve on this in the
future and why having the keying done at a tablespace level makes sense
initially because we can then potentially move forward with further
segregation to improve the situation.  I do believe it's also useful in
its own right, to be clear, just not as nice since a compromised backend
could still get access to data in shared buffers that it really
shouldn't be able to, even broadly, see.

> Agreed.  I have thought about this some more.  There is certainly value
> in layered security, so if something gets violated, it doesn't open the
> whole system.  However, I think the layering has to be done at the right
> levels, and I think you want levels that have clear boundaries, like IP
> filtering or monitoring.  Placing a boundary inside the database seems
> much too complex a level to be effective.  Using separate encrypted and
> unencrypted clusters and allowing the encrypted cluster to query the
> unencrypted clusters using FDWs does seem like good layering, though the
> FDW queries might leak information.

Using FDWs simply isn't a solution to this, for a few different reasons-
the first is that our solution to authentication for FDWs is to store
passwords in our catalog tables, but an FDW table also doesn't behave
like a regular table in many important cases.

> > The main thing I don't think we want is e.g. a 50TB
> > database with everything encrypted with a single key -- for the reasons
> > previously stated.
> 
> Yes, I think we need to research in which cases the nonce must be
> random, and how much key space the secret+nonce gives us.

Agreed on both.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> I agree that all of that isn't necessary for an initial implementation,
> I was rather trying to lay out how we could improve on this in the
> future and why having the keying done at a tablespace level makes sense
> initially because we can then potentially move forward with further
> segregation to improve the situation.  I do believe it's also useful in
> its own right, to be clear, just not as nice since a compromised backend
> could still get access to data in shared buffers that it really
> shouldn't be able to, even broadly, see.

I think TDE is feature of questionable value at best and the idea that
we would fundmentally change the internals of Postgres to add more
features to it seems very unlikely.  I realize we have to discuss it so
we don't block reasonable future feature development.

> > Agreed.  I have thought about this some more.  There is certainly value
> > in layered security, so if something gets violated, it doesn't open the
> > whole system.  However, I think the layering has to be done at the right
> > levels, and I think you want levels that have clear boundaries, like IP
> > filtering or monitoring.  Placing a boundary inside the database seems
> > much too complex a level to be effective.  Using separate encrypted and
> > unencrypted clusters and allowing the encrypted cluster to query the
> > unencrypted clusters using FDWs does seem like good layering, though the
> > FDW queries might leak information.
> 
> Using FDWs simply isn't a solution to this, for a few different reasons-
> the first is that our solution to authentication for FDWs is to store
> passwords in our catalog tables, but an FDW table also doesn't behave
> like a regular table in many important cases.

The FDW authentication problem is something I think we need to improve
no matter what.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Tue, Jul  9, 2019 at 09:16:17AM -0400, Joe Conway wrote:
> On 7/9/19 8:39 AM, Ryan Lambert wrote:
> > Hi Thomas,
> > 
> >> CBC mode does require
> >> random nonces, other modes may be fine with even sequences as long as
> >> the values are not reused.   
> > 
> > I disagree that CBC mode requires random nonces, at least based on what
> > NIST has published.  They only require that the IV (not the nonce) must
> > be unpredictable per [1]:
> > 
> > " For the CBC and CFB modes, the IVs must be unpredictable."
> > 
> > The unpredictable IV can be generated from a non-random nonce including
> > a counter:
> > 
> > "There are two recommended methods for generating unpredictable IVs. The
> > first method is to apply the forward cipher function, under the same key
> > that is used for the encryption of the plaintext, to a nonce. The nonce
> > must be a data block that is unique to each execution of the encryption
> > operation. For example, the nonce may be a counter, as described in
> > Appendix B, or a message number."
> > 
> > [1] 
> > https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
> 
> 
> The terms nonce and IV are often used more-or-less interchangeably, and
> it is important to be clear when we are talking about an IV specifically
> - an IV is a specific type of nonce. Nonce means "number used once".
> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
> random but not necessarily kept secret. The NIST requirements that
> Stephen referenced elsewhere on this thread are as I understand it
> intended to ensure the random but unique property with high probability.

Good point about nonce and IV.  I wonder if running the nonce through
the cipher with the key makes it random enough to use as an IV.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Jul  9, 2019 at 10:59:12AM -0400, Stephen Frost wrote:
> > * Bruce Momjian (br...@momjian.us) wrote:
> > I agree that all of that isn't necessary for an initial implementation,
> > I was rather trying to lay out how we could improve on this in the
> > future and why having the keying done at a tablespace level makes sense
> > initially because we can then potentially move forward with further
> > segregation to improve the situation.  I do believe it's also useful in
> > its own right, to be clear, just not as nice since a compromised backend
> > could still get access to data in shared buffers that it really
> > shouldn't be able to, even broadly, see.
> 
> I think TDE is feature of questionable value at best and the idea that
> we would fundmentally change the internals of Postgres to add more
> features to it seems very unlikely.  I realize we have to discuss it so
> we don't block reasonable future feature development.

We'd be getting to something much better than just TDE by going down
that road- we'd be able to properly leverage the kernel to enforce real
MAC.  I get that this would be a change but I'm not entirely convinced
that it'd be as much of a fundamental change as implied here.  I expect
that we're going to get to a point where we want to have multiple shared
buffer segments for other reasons anyway.

> > > Agreed.  I have thought about this some more.  There is certainly value
> > > in layered security, so if something gets violated, it doesn't open the
> > > whole system.  However, I think the layering has to be done at the right
> > > levels, and I think you want levels that have clear boundaries, like IP
> > > filtering or monitoring.  Placing a boundary inside the database seems
> > > much too complex a level to be effective.  Using separate encrypted and
> > > unencrypted clusters and allowing the encrypted cluster to query the
> > > unencrypted clusters using FDWs does seem like good layering, though the
> > > FDW queries might leak information.
> > 
> > Using FDWs simply isn't a solution to this, for a few different reasons-
> > the first is that our solution to authentication for FDWs is to store
> > passwords in our catalog tables, but an FDW table also doesn't behave
> > like a regular table in many important cases.
> 
> The FDW authentication problem is something I think we need to improve
> no matter what.

Yes, constrained delegation with Kerberos would certainly be an
improvement, and having a way to do something like peer auth when local,
and maybe even a server-to-server trust based on certificates or similar
might be an option.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: range_agg

2019-07-09 Thread David Fetter
On Mon, Jul 08, 2019 at 09:46:44AM -0700, Paul A Jungwirth wrote:
> On Sat, Jul 6, 2019 at 12:13 PM Jeff Davis  wrote:
> >
> > On Fri, 2019-07-05 at 09:58 -0700, Paul A Jungwirth wrote:
> > > user-defined range types. So how about I start on it and see how it
> > > goes? I expect I can follow the existing code for range types pretty
> > > closely, so maybe it won't be too hard.
> >
> > That would be great to at least take a look. If it starts to look like
> > a bad idea, then we can re-evaluate and see if it's better to just
> > return arrays.
> 
> I made some progress over the weekend. I don't have a patch yet but I
> thought I'd ask for opinions on the approach I'm taking:
> 
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore
> - I don't need separate commands to add/drop multirange types. You get
> one when you define a range type, and if you drop a range type it gets
> dropped automatically.

Yay for fewer manual steps!

> - I'm adding a new typtype for multiranges. ('m' in pg_type).
> - I'm just adding a mltrngtypeid column to pg_range. I don't think I
> need a new pg_multirange table.

Makes sense, as they'd no longer be separate concepts.

> - You can have a multirange[].

I can see how that would fall out of this, but I'm a little puzzled as
to what people might use it for. Aggregates, maybe?

> - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'
> - I'll add an anymultirange pseudotype. When it's the return type of a
> function that has an "anyrange" parameter, it will use the same base
> element type. (So basically anymultirange : anyrange :: anyarray ::
> anyelement.)

Neat!

> - You can cast from a multirange to an array. The individual ranges
> are always sorted in the result array.

Is this so people can pick individual ranges out of the multirange,
or...? Speaking of casts, it's possible that a multirange is also a
range. Would it make sense to have a cast from multirange to range?

> - You can cast from an array to a multirange but it will error if
> there are overlaps (or not?).

An alternative would be to canonicalize into non-overlapping ranges.
There's some precedent for this in casts to JSONB. Maybe a function
that isn't a cast should handle such things.

> The array's ranges don't have to be sorted but they will be after a
> "round trip".

Makes sense.

> - Interesting functions:
>   - multirange_length

Is that the sum of the lengths of the ranges?  Are we guaranteeing a
measure in addition to ordering on ranges now?

>   - range_agg (range_union_agg if you like)
>   - range_intersection_agg
> - You can subscript a multirange like you do an array (? This could be
> a function instead.)

How would this play with the generic subscripting patch in flight?

> - operators:
>   - union (++) and intersection (*):
> - We already have + for range union but it raises an error if
> there is a gap, so ++ is the same but with no errors.
> - r ++ r = mr (commutative, associative)
> - mr ++ r = mr
> - r ++ mr = mr
> - r * r = r (unchanged)
> - mr * r = r
> - r * mr = r

Shouldn't the two above both yield multirange ? For example, if I
understand correctly,

{"[1,3)","[5,7)"} * [2,6) should be {"[2,3)","[5,6)"}

> - mr - r = mr
> - r - mr = mr
> - mr - mr = mr
>   - comparison
> - mr = mr
> - mr @> x

x is in the domain of the (multi)range?

> - mr @> r
> - mr @> mr
> - x <@ mr
> - r <@ mr
> - mr <@ mr
> - mr << mr (strictly left of)
> - mr >> mr (strictly right of)
> - mr &< mr (does not extend to the right of)
> - mr &> mr (does not extend to the left of)
>   - inverse operator?:
> - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.

Is that the same as ["(∞, ∞)"] - {"[1,2)"}? I seem to recall that the
usual convention (at least in math) is to use intervals that are
generally represented as open on the infinity side, but that might not
fit how we do things.

> - not sure we want this or what the symbol should be. I don't like
> -mr as an inverse because then mr - mr != mr ++ -mr.

!mr , perhaps?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: clean up docs for v12

2019-07-09 Thread Justin Pryzby
I made bunch of changes based on Andres' review and I split some more
indisputable 1 line changes from the large commit, hoping it will be easier to
review both.  Several bits and pieces of the patch have been applied piecemeal,
but I was hoping to avoid continuing to do that.

I think at least these are also necessary.
v5-0002-Say-it-more-naturally.patch 

  
v5-0010-spelling-and-typos.patch

  

I suggest to anyone reading to look at the large patch last, since its changes
are longer and less easy to read.  Many of the changes are intended to improve
the text rather than to fix a definite error.

Justin
>From 292f9854ac0f847a1783a7d276e6009b3bdd3e91 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Thu, 28 Mar 2019 18:50:03 -0500
Subject: [PATCH v5 01/12] review docs for pg12dev

---
 doc/src/sgml/catalogs.sgml |  4 ++--
 doc/src/sgml/config.sgml   | 14 +++--
 doc/src/sgml/ddl.sgml  | 16 +++---
 doc/src/sgml/ecpg.sgml | 16 +++---
 doc/src/sgml/monitoring.sgml   | 43 +++---
 doc/src/sgml/perform.sgml  |  8 +++
 doc/src/sgml/protocol.sgml |  6 +++---
 doc/src/sgml/ref/create_index.sgml |  4 ++--
 doc/src/sgml/ref/pg_rewind.sgml| 13 ++--
 doc/src/sgml/ref/pgbench.sgml  |  7 +++
 doc/src/sgml/ref/vacuum.sgml   |  6 +++---
 doc/src/sgml/runtime.sgml  |  4 ++--
 doc/src/sgml/sources.sgml  | 18 
 doc/src/sgml/xfunc.sgml| 12 +--
 src/bin/pg_upgrade/check.c |  6 +++---
 15 files changed, 87 insertions(+), 90 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3428a7c..47482f2 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3060,7 +3060,7 @@ SCRAM-SHA-256$:&l
simplifies ATTACH/DETACH PARTITION operations:
the partition dependencies need only be added or removed.
Example: a child partitioned index is made partition-dependent
-   on both the partition table it is on and the parent partitioned
+   on both the table partition and the parent partitioned
index, so that it goes away if either of those is dropped, but
not otherwise.  The dependency on the parent index is primary,
so that if the user tries to drop the child partitioned index,
@@ -3123,7 +3123,7 @@ SCRAM-SHA-256$:&l
Note that it's quite possible for two objects to be linked by more than
one pg_depend entry.  For example, a child
partitioned index would have both a partition-type dependency on its
-   associated partition table, and an auto dependency on each column of
+   associated table partition, and an auto dependency on each column of
that table that it indexes.  This sort of situation expresses the union
of multiple dependency semantics.  A dependent object can be dropped
without CASCADE if any of its dependencies satisfies
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index aaab1c5..fa8b40a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3433,7 +3433,7 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
 current when the base backup was taken.  The
 value latest recovers
 to the latest timeline found in the archive, which is useful in
-a standby server.  latest is the default.
+a standby server.  The default is latest.

 

@@ -4195,8 +4195,9 @@ ANY num_sync ( bytea
 
  
-  The handling of the bytea type is also similar to
-  the VARCHAR. The definition on an array of type
+  The handling of the bytea type is similar to
+  the VARCHAR. The definition of an array of type
   bytea is converted into a named struct for every
   variable. A declaration like:
 
@@ -1220,9 +1220,8 @@ bytea var[180];
 
 struct bytea_var { int len; char arr[180]; } var;
 
-  The member arr hosts binary format
-  data. It also can handle even '\0' as part of
-  data unlike VARCHAR.
+  The member arr stores binary format
+  data, which can include zero bytes, unlike VARCHAR.
   The data is converted from/to hex format and sent/received by
   ecpglib.
  
@@ -7571,8 +7570,7 @@ PREPARE name FROM 

 A literal C string or a host variable containing a preparable
-statement, one of the SELECT, INSERT, UPDATE, or
-DELETE.
+statement (SELECT, INSERT, UPDATE, or DELETE).

   
  
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..7cec551 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/do

Re: range_agg

2019-07-09 Thread Paul A Jungwirth
On Mon, Jul 8, 2019 at 10:09 PM Pavel Stehule  wrote:
> po 8. 7. 2019 v 18:47 odesílatel Paul A Jungwirth 
>  napsal:
> I am not against a multirange type, but I miss a explanation why you 
> introduce new kind of types and don't use just array of ranges.

Hi Pavel, I'm sorry, and thanks for your feedback! I had a response to
your earlier email but it was stuck in my Drafts folder.

I do think a multirange would have enough new functionality to be
worth doing. I was pretty reluctant to take it on at first but the
idea is growing on me, and it does seem to offer a more sensible
interface. A lot of the value would come from range and multirange
operators, which we can't do with just arrays (I think). Having a
range get merged correctly when you add it would be very helpful. Also
it would be nice to have a data type that enforces a valid structure,
since not all range arrays are valid multiranges. My reply yesterday
to Jeff expands on this with all the functions/operators/etc we could
offer.

Your other email also asked:
> I don't think - working with large arrays is slow, due often cache miss.
>I understand so it depends on data, but in this area, I can imagine 
>significant memory reduction based on running processing.
>
> array builder doesn't respect work_men, and I think so any different way is 
> safer.

I'm still thinking about this one. I tried to work out how I'd
implement a tree-based sorted list of ranges so that I can quickly
insert/remove ranges. It is very complicated, and I started to feel
like I was just re-implementing GiST but in memory. I did find the
interval_set class from Boost's boost_icl library which could offer
some guidance. But for now I want to press forward with a
sort-then-iterate implementation and consider a different
implementation later. If you have any guidance I would appreciate it!
I especially want something that is O(n log n) to insert n ranges.
Other suggestions here are very welcome! :-)

Regards,
Paul




Development Environment

2019-07-09 Thread Igal @ Lucee.org
I have been wanting to contribute to the Postgres project for a while 
now, and I wanted to get some suggestions about the IDE and other tools 
that others are using (preferably, somewhat modern tools).


Can anyone share what IDE they are using and if they have any other tips 
on setting up a development environment etc.?


I use both Windows and Linux so either OS is fine.

Thank you,

Igal Sapir
Lucee Core Developer
Lucee.org 



Re: range_agg

2019-07-09 Thread Paul A Jungwirth
On Tue, Jul 9, 2019 at 8:51 AM David Fetter  wrote:
> > - A multirange type is an extra thing you get when you define a range
> > (just like how you get a tstzrange[]). Therefore
> > - I don't need separate commands to add/drop multirange types. You get
> > one when you define a range type, and if you drop a range type it gets
> > dropped automatically.
>
> Yay for fewer manual steps!

Thanks for taking a look and sharing your thoughts!

> > - You can have a multirange[].
>
> I can see how that would fall out of this, but I'm a little puzzled as
> to what people might use it for. Aggregates, maybe?

I don't know either, but I thought it was standard to define a T[] for
every T. Anyway it doesn't seem difficult.

> > - You can cast from a multirange to an array. The individual ranges
> > are always sorted in the result array.
>
> Is this so people can pick individual ranges out of the multirange,
> or...?

Yes. I want this for foreign keys actually, where I construct a
multirange and ask for just its first range.

> Speaking of casts, it's possible that a multirange is also a
> range. Would it make sense to have a cast from multirange to range?

Hmm, that seems strange to me. You don't cast from an array to one of
its elements. If we have subscripting, why use casting to get the
first element?

> > - You can cast from an array to a multirange but it will error if
> > there are overlaps (or not?).
>
> An alternative would be to canonicalize into non-overlapping ranges.
> There's some precedent for this in casts to JSONB. Maybe a function
> that isn't a cast should handle such things.

I agree it'd be nice to have both.

> > - Interesting functions:
> >   - multirange_length
>
> Is that the sum of the lengths of the ranges?  Are we guaranteeing a
> measure in addition to ordering on ranges now?

Just the number of disjoint ranges in the multirange.

> > - You can subscript a multirange like you do an array (? This could be
> > a function instead.)
>
> How would this play with the generic subscripting patch in flight?

I'm not aware of that patch but I guess I better check it out. :-)

> > - operators:
> > - mr * r = r
> > - r * mr = r
>
> Shouldn't the two above both yield multirange ? For example, if I
> understand correctly,

You're right! Thanks for the correction.

> >   - comparison
> > - mr = mr
> > - mr @> x
>
> x is in the domain of the (multi)range?

Yes. It's the scalar base type the range type is based on. I had in
mind the math/ML convention of `x` for scalar and `X` for
vector/matrix.

> >   - inverse operator?:
> > - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
>
> Is that the same as ["(∞, ∞)"] - {"[1,2)"}?

Yes.

> I seem to recall that the
> usual convention (at least in math) is to use intervals that are
> generally represented as open on the infinity side, but that might not
> fit how we do things.

I think it does, unless I'm misunderstanding?

> > - not sure we want this or what the symbol should be. I don't like
> > -mr as an inverse because then mr - mr != mr ++ -mr.
>
> !mr , perhaps?

I like that suggestion. Honestly I'm not sure we even want an inverse,
but it's so important theoretically we should at least consider
whether it is appropriate here. Or maybe "inverse" is the wrong word
for this, or there is a different meaning it should have.

Thanks,
Paul




Re: [PATCH] Fix trigger argument propagation to child partitions

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 03:00:27PM +0200, Patrick McHardy wrote:

The following patch fixes propagation of arguments to the trigger
function to child partitions both when initially creating the trigger
and when adding new partitions to a partitioned table.



Thanks for the report and bugfix. It seeem the parameters in row triggers
on partitioned tables never worked :-( For a moment I was wondering why it
shows on 11 and not 10 (based on the assumption you'd send a patch against
10 if it was affected), but 10 actually did not support row triggers on
partitioned tables.

The fix seems OK to me, although I see we're parsing tgargs in ruleutils.c
and that version (around line ~1050) uses fastgetattr instead of
heap_getattr, and checks the isnull parameter after the call. I guess we
should do the same thing here.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: pg_receivewal documentation

2019-07-09 Thread Jesper Pedersen

Hi Laurenz,

On 7/9/19 5:16 AM, Laurenz Albe wrote:

On Thu, 2019-06-27 at 10:06 -0400, Jesper Pedersen wrote:

Here is a patch for the pg_receivewal documentation to highlight that
WAL isn't acknowledged to be applied.


I think it is a good idea to document this, but I have a few quibbles
with the patch as it is:

- I think there shouldn't be commas after the "note" and before the "if".
   Disclaimer: I am not a native speaker, so I am lacking authority.

- The assertion is wrong.  "on" (remote flush) is perfectly fine
   for synchronous_commit, only "remote_apply" is a problem.

- There is already something about "--synchronous" in the "Description"
   section.  It might make sense to add the additional information there.

How about the attached patch?



Thanks for the review, and the changes.

However, I think it belongs in the --synchronous section, so what about 
moving it there as attached ?


Best regards,
 Jesper
>From 6cd525b365f3afcdb63f478c4410d6e20ca2f6e0 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Tue, 9 Jul 2019 13:14:25 -0400
Subject: [PATCH] Highlight that pg_receivewal doesn't acknowledge that WAL has
 been applied, and as such synchronous-commit needs to be remote_write or
 lower.

Authors: Laurenz Albe and Jesper Pedersen
Review-by: Laurenz Albe
---
 doc/src/sgml/ref/pg_receivewal.sgml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/src/sgml/ref/pg_receivewal.sgml b/doc/src/sgml/ref/pg_receivewal.sgml
index 0506120c00..46605db662 100644
--- a/doc/src/sgml/ref/pg_receivewal.sgml
+++ b/doc/src/sgml/ref/pg_receivewal.sgml
@@ -207,6 +207,13 @@ PostgreSQL documentation
 server as a synchronous standby, to ensure that timely feedback is
 sent to the server.

+
+   
+Note that while WAL will be flushed with this setting,
+it will never be applied, so  must
+not be set to remote_apply if pg_receivewal
+is the only synchronous standby.
+   
   
  
 
-- 
2.21.0



Re: Increasing default value for effective_io_concurrency?

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 08:11:55PM -0400, Bruce Momjian wrote:

On Wed, Jul  3, 2019 at 11:42:49AM -0400, Robert Haas wrote:

On Wed, Jul 3, 2019 at 11:24 AM Tomas Vondra
 wrote:
> Maybe. And it would probably work for the systems I used for benchmarks.
>
> It however assumes two things: (a) the storage system actually has
> spindles and (b) you know how many spindles there are. Which is becoming
> less and less safe these days - flash storage becomes pretty common, and
> even when there are spindles they are often hidden behind the veil of
> virtualization in a SAN, or something.

Yeah, that's true.

> I wonder if we might provide something like pg_test_prefetch which would
> measure performance with different values, similarly to pg_test_fsync.

That's not a bad idea, but I'm not sure if the results that we got in
a synthetic test - presumably unloaded - would be a good guide to what
to use in a production situation.  Maybe it would; I'm just not sure.


I think it would be better than what we have now.



TBH I don't know how useful would that tool be. AFAICS the key assumptions
prefetching relies are that (a) issuing the prefetch request is much
cheaper than jut doing the I/O, and (b) the prefetch request can be
completed before we actually need the page.

(a) is becoming not quite true on new hardware - if you look at results
from the NVMe device, the improvements are much smaller compared to the
other storage systems. The speedup is ~1.6x, no matter the e_i_c value,
while on other storage types it's easily 10x in some cases.

But this is something we could measure using the new tool, because it's
mostly hardware dependent.

But (b) is the hard bit, because it depends on how much time it takes to
process a page read from the heap - if it takes a lot of time, lower e_i_c
values are fine. If it's fast, we need to increase the prefetch distance.

Essentially, from the tests I've done it seems fetching just 1 page in
advance is way too conservative, because (1) it does not really increase
I/O concurrency at the storage level and (2) we often get into situation
where the prefetch is still in progress when we actually need the page.

I don't know how to meaningfully benchmark this, though - it's way to
dependent on the particular workload / query. 


Of course, backend concurrency just makes it even more complicated.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 11:11 AM, Bruce Momjian wrote:
> On Tue, Jul  9, 2019 at 09:16:17AM -0400, Joe Conway wrote:
>> On 7/9/19 8:39 AM, Ryan Lambert wrote:
>> > Hi Thomas,
>> > 
>> >> CBC mode does require
>> >> random nonces, other modes may be fine with even sequences as long as
>> >> the values are not reused.   
>> > 
>> > I disagree that CBC mode requires random nonces, at least based on what
>> > NIST has published.  They only require that the IV (not the nonce) must
>> > be unpredictable per [1]:
>> > 
>> > " For the CBC and CFB modes, the IVs must be unpredictable."
>> > 
>> > The unpredictable IV can be generated from a non-random nonce including
>> > a counter:
>> > 
>> > "There are two recommended methods for generating unpredictable IVs. The
>> > first method is to apply the forward cipher function, under the same key
>> > that is used for the encryption of the plaintext, to a nonce. The nonce
>> > must be a data block that is unique to each execution of the encryption
>> > operation. For example, the nonce may be a counter, as described in
>> > Appendix B, or a message number."
>> > 
>> > [1] 
>> > https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-38a.pdf
>> 
>> 
>> The terms nonce and IV are often used more-or-less interchangeably, and
>> it is important to be clear when we are talking about an IV specifically
>> - an IV is a specific type of nonce. Nonce means "number used once".
>> i.e. unique, whereas an IV (for CBC use anyway) should be unique and
>> random but not necessarily kept secret. The NIST requirements that
>> Stephen referenced elsewhere on this thread are as I understand it
>> intended to ensure the random but unique property with high probability.
> 
> Good point about nonce and IV.  I wonder if running the nonce through
> the cipher with the key makes it random enough to use as an IV.

Based on that NIST document it seems so.

The trick will be to be 100% sure we never reuse a nonce that is used to
produce the IV when using the same key.

I think the potential to get that wrong (i.e. inadvertently reuse a
nonce) would lead to using the second described method

  "The second method is to generate a random data block using a
   FIPS-approved random number generator."

That method is what I am used to seeing. But with the second method we
need to store the IV, with the first we could reproduce it if we select
our initial nonce carefully.

So thinking out loud, and perhaps you already said this Bruce, but I
guess the input nonce used to generate the IV could be something like
pg_class.oid and blocknum concatenated together with some delimiting
character as long as we guarantee that we generate different keys in
different databases. Then there would be no need to store the IV since
we could reproduce it. This all assumes that we encrypt each block
independently. Sound correct?

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Proposal to add GUC_REPORT to lc_monetary, lc_numeric and search_path

2019-07-09 Thread Dave Cramer
See attached patch.

I think adding GUC_REPORT to search_path is probably the most important as
this is potentially a security issue. See joe conway's blog on security and
search path here
https://info.crunchydata.com/blog/postgresql-defaults-and-impact-on-security-part-2

I also see there was a proposal to make reportable GUC's configurable here
https://www.postgresql.org/message-id/ca+tgmobsxsy0kfr_vdqqoxjxqafnesfxf_-darne+qhhqcw...@mail.gmail.com

I don't really care which one gets implemented, although I think the latter
makes more sense.

Dave Cramer


On Fri, 5 Jul 2019 at 08:05, Shay Rojansky  wrote:

> > The latter is important for similar reasons. JDBC caches prepared
> statements internally and if the user changes the search path without using
> setSchema or uses a function to change it then internally it would be
> necessary to invalidate the cache. Currently if this occurs these
> statements fail.
>
> While Npgsql specifically doesn't care about any locale/formatting (being
> a binary-only driver), knowing about search_path changes would benefit
> Npgsql in the same way as it would JDBC.
>
> > This seems like a rather innocuous change as the protocol is not
> changed, rather the amount of information returned on startup is increased
> marginally.
>
> Although adding these specific parameters are easy to add, we could also
> think of a more generic way for clients to subscribe to parameter updates
> (am not sure if this was previously discussed - I cannot see anything
> obvious in the wiki TODO page). At its simplest, this could be a new
> parameter containing a comma-separated list of parameters for which
> asynchronous updates should be sent.  This new parameter would default to
> the current hard-coded list (as documented in
> https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-ASYNC).
> Unless I'm mistaken, one issue (as in general with new parameters) is that
> drivers wouldn't be able to send this new parameter in the startup package
> because they don't yet know whether they're talking to a PostgreSQL version
> which supports it.
>
>


0001-make-lc_monetary-lc_numeric-and-search_path-GUC_REPO.patch
Description: Binary data


Re: range_agg

2019-07-09 Thread Jeff Davis
On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
> 
> I am not against a multirange type, but I miss a explanation why you
> introduce new kind of types and don't use just array of ranges.
> 
> Introduction of new kind of types is not like introduction new type.

The biggest benefit, in my opinion, is that it means you can define
functions/operators that take an "anyrange" and return an
"anymultirange". That way you don't have to define different functions
for int4 ranges, date ranges, etc.

It starts to get even more complex when you want to add opclasses, etc.

Ranges and arrays are effectively generic types that need a type
parameter to become a concrete type. Ideally, we'd have first-class
support for generic types, but I think that's a different topic ;-)

Regards,
Jeff Davis






Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2019-07-09 Thread Dave Cramer
On Mon, 22 Jan 2018 at 07:39, Robert Haas  wrote:

> On Sun, Jan 21, 2018 at 5:41 PM, Craig Ringer 
> wrote:
> > If we'd done server_version_num in 9.5, for example, less stuff would've
> > broken with pg10.
>
> Yeah, and if Tom hadn't forced it to be reverted from *8.2*, then
> every version anyone still cares about would now have support for it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
So did this die from lack of interest?

I have proposed in another thread adding more GUC REPORT variables, but I
see this as a much better way.

I'm willing to code the patch if we can get some buy in here ?

Dave Cramer

da...@postgresintl.com
www.postgresintl.com


Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11

2019-07-09 Thread Melanie Plageman
On Mon, Jul 8, 2019 at 12:21 PM Tom Lane  wrote:

> The point of regressplans.sh is to see if anything goes seriously
> wrong when forcing non-default plan choices --- seriously wrong being
> defined as crashes or semantically wrong answers.  It's not expected
> that the regression tests will automatically pass when you do that,
> because of their dependencies on output row ordering, not to mention
> all the EXPLAINs.  I'm not for removing it --- the fact that its
> results require manual evaluation doesn't make it useless.
>
>
It might be worth post-processing results files to ignore row ordering
in some cases to allow for easier comparison. Has this been proposed
in the past?

-- 
Melanie Plageman


Re: range_agg

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-08, Paul A Jungwirth wrote:

> - You can subscript a multirange like you do an array (? This could be
> a function instead.)

Note that we already have a patch in the pipe to make subscripting an
extensible operation, which would fit pretty well here, I think.

Also, I suppose you would need unnest(multirange) to yield the set of
ranges.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: range_agg

2019-07-09 Thread Jeff Davis
On Mon, 2019-07-08 at 09:46 -0700, Paul A Jungwirth wrote:
> - A multirange type is an extra thing you get when you define a range
> (just like how you get a tstzrange[]). Therefore

Agreed.

> - I'm adding a new typtype for multiranges. ('m' in pg_type).

Sounds reasonable.

> - I'm just adding a mltrngtypeid column to pg_range. I don't think I
> need a new pg_multirange table.
> - You can have a multirange[].
> - Multirange in/out work just like arrays, e.g. '{"[1,3)", "[5,6)"}'

It would be cool to have a better text representation. We could go
simple like:

   '[1,3) [5,6)'

Or maybe someone has another idea how to represent a multirange to be
more visually descriptive?

> - I'll add an anymultirange pseudotype. When it's the return type of
> a
> function that has an "anyrange" parameter, it will use the same base
> element type. (So basically anymultirange : anyrange :: anyarray ::
> anyelement.)

I like it.

>   - range_agg (range_union_agg if you like)
>   - range_intersection_agg

I'm fine with those names.

> - You can subscript a multirange like you do an array (? This could
> be
> a function instead.)

I wouldn't try to hard to make them subscriptable. I'm not opposed to
it, but it's easy enough to cast to an array and then subscript.

> - operators:
>   - union (++) and intersection (*):
> - We already have + for range union but it raises an error if
> there is a gap, so ++ is the same but with no errors.
> - r ++ r = mr (commutative, associative)
> - mr ++ r = mr
> - r ++ mr = mr

I like it.

>   - inverse operator?:
> - the inverse of {"[1,2)"} would be {"[null, 1)", "[2, null)"}.
> - not sure we want this or what the symbol should be. I don't
> like
> -mr as an inverse because then mr - mr != mr ++ -mr.

I think "complement" might be a better name than "inverse".

m1 - m2 = m1 * complement(m2)

What about "~"?



There will be some changes to parse_coerce.c, just like in range types.
I took a brief look here and it looks pretty reasonable; hopefully
there aren't any hidden surprises.

Regards,
Jeff Davis






Re: range_agg

2019-07-09 Thread Paul Jungwirth

On 7/9/19 12:01 PM, Alvaro Herrera wrote:

On 2019-Jul-08, Paul A Jungwirth wrote:


- You can subscript a multirange like you do an array (? This could be
a function instead.)


Note that we already have a patch in the pipe to make subscripting an
extensible operation, which would fit pretty well here, I think.


I'll take a look at that!


Also, I suppose you would need unnest(multirange) to yield the set of
ranges.


I think that would be really nice, although it isn't critical I think if 
you can do something like UNNEST(multirange::tstzrange[]).


--
Paul  ~{:-)
p...@illuminatedcomputing.com




Re: range_agg

2019-07-09 Thread Pavel Stehule
út 9. 7. 2019 v 20:25 odesílatel Jeff Davis  napsal:

> On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
> >
> > I am not against a multirange type, but I miss a explanation why you
> > introduce new kind of types and don't use just array of ranges.
> >
> > Introduction of new kind of types is not like introduction new type.
>
> The biggest benefit, in my opinion, is that it means you can define
> functions/operators that take an "anyrange" and return an
> "anymultirange". That way you don't have to define different functions
> for int4 ranges, date ranges, etc.
>
>
I am not sure how strong is this argument.

I think so introduction of anyrangearray polymorphic type and enhancing
some type deduction can do same work.

It starts to get even more complex when you want to add opclasses, etc.
>
> Ranges and arrays are effectively generic types that need a type
> parameter to become a concrete type. Ideally, we'd have first-class
> support for generic types, but I think that's a different topic ;-)


I afraid so with generic multiragetype there lot of array infrastructure
will be duplicated

Regards

Pavel


> Regards,
> Jeff Davis
>
>
>


Re: Broken defenses against dropping a partitioning column

2019-07-09 Thread Tomas Vondra

On Mon, Jul 08, 2019 at 10:58:56AM -0400, Tom Lane wrote:

Alvaro Herrera  writes:

That said, I'm not sure I see the use case for an ALTER TABLE .. DROP
COLUMN command that turns a partitioned table (with existing partitions
containing data) into one non-partitioned table with all data minus the
partitioning column(s).


Yeah, it'd be a lot of work for a dubious goal.


This seems vaguely related to the issue of dropping foreign keys; see
https://postgr.es/m/20190329152239.GA29258@alvherre.pgsql wherein I
settled with a non-ideal solution to the problem of being unable to
depend on something that did not cause the entire table to be dropped
in certain cases.


That's an interesting analogy.  Re-reading that thread, what I said
in <29497.1554217...@sss.pgh.pa.us> seems pretty apropos to the
current problem:


FWIW, I think that the dependency mechanism is designed around the idea
that whether it's okay to drop a *database object* depends only on what
other *database objects* rely on it, and that you can always make a DROP
valid by dropping all the dependent objects.  That's not an unreasonable
design assumption, considering that the SQL standard embeds the same
assumption in its RESTRICT/CASCADE syntax.


I think that is probably a fatal objection to my idea of putting an error
check into RemoveAttributeById().  As an example, consider the possibility
that somebody makes a temporary type and then makes a permanent table with
a partitioning column of that type.  What shall we do at session exit?
Failing to remove the temp type is not an acceptable choice, because that
leaves us with a permanently broken temp schema (compare bug #15631 [1]).

Also, I don't believe we can make that work without order-of-operations
problems in cases comparable to the original bug in this thread [2].
One or the other order of the object OIDs is going to lead to the column
being visited for deletion before the whole table is, and again rejecting
the column deletion is not going to be an acceptable behavior.

So I think we're probably stuck with the approach of adding new internal
dependencies.  If we go that route, then our options for the released
branches are (1) do nothing, or (2) back-patch the code that adds such
dependencies, but without a catversion bump.  That would mean that only
tables created after the next minor releases would have protection against
this problem.  That's not ideal but maybe it's okay, considering that we
haven't seen actual field reports of trouble of this kind.



Couldn't we also write a function that adds those dependencies for
existing objects, and request users to run it after the update?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: range_agg

2019-07-09 Thread Pavel Stehule
út 9. 7. 2019 v 21:10 odesílatel Pavel Stehule 
napsal:

>
>
> út 9. 7. 2019 v 20:25 odesílatel Jeff Davis  napsal:
>
>> On Tue, 2019-07-09 at 07:08 +0200, Pavel Stehule wrote:
>> >
>> > I am not against a multirange type, but I miss a explanation why you
>> > introduce new kind of types and don't use just array of ranges.
>> >
>> > Introduction of new kind of types is not like introduction new type.
>>
>> The biggest benefit, in my opinion, is that it means you can define
>> functions/operators that take an "anyrange" and return an
>> "anymultirange". That way you don't have to define different functions
>> for int4 ranges, date ranges, etc.
>>
>>
> I am not sure how strong is this argument.
>
> I think so introduction of anyrangearray polymorphic type and enhancing
> some type deduction can do same work.
>
> It starts to get even more complex when you want to add opclasses, etc.
>>
>> Ranges and arrays are effectively generic types that need a type
>> parameter to become a concrete type. Ideally, we'd have first-class
>> support for generic types, but I think that's a different topic ;-)
>
>
> I afraid so with generic multiragetype there lot of array infrastructure
> will be duplicated
>

on second hand - it is true so classic array concat is not optimal for set
of ranges, so some functionality should be redefined every time.

I don't know what is possible, but for me - multiranges is special kind
(subset) of arrays and can be implement as subset of arrays. I remember
other possible kind of arrays - "sets" without duplicates. It is similar
case, I think.

Maybe introduction of multirages as new generic type is bad direction, and
can be better and more enhanceable in future to introduce some like special
kinds of arrays. So for example - unnest can be used directly for arrays
and multiranges too - because there will be common base.

Regards

Pavel



> Regards
>
> Pavel
>
>
>> Regards,
>> Jeff Davis
>>
>>
>>


Re: Ltree syntax improvement

2019-07-09 Thread Dmitry Belyavsky
On Mon, Jul 8, 2019 at 11:33 PM Alvaro Herrera 
wrote:

> On 2019-Jul-08, Dmitry Belyavsky wrote:
>
> > I did not introduce any functions. I've just changed the parser.
>
> I mean the C-level functions -- count_parts_ors() and so on.
>
> Added a comment to  count_parts_ors()

The other functions introduced by me were already described.

-- 
SY, Dmitry Belyavsky
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out
index 8226930905..5f45726229 100644
--- a/contrib/ltree/expected/ltree.out
+++ b/contrib/ltree/expected/ltree.out
@@ -1,4 +1,5 @@
 CREATE EXTENSION ltree;
+SET standard_conforming_strings=on;
 -- Check whether any of our opclasses fail amvalidate
 SELECT amname, opcname
 FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod
@@ -7679,3 +7680,1587 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ;
 15
 (1 row)
 
+-- Extended syntax, escaping, quoting etc
+-- success
+SELECT E'\\.'::ltree;
+ ltree 
+---
+ "."
+(1 row)
+
+SELECT E'\\ '::ltree;
+ ltree 
+---
+ " "
+(1 row)
+
+SELECT E''::ltree;
+ ltree 
+---
+ "\"
+(1 row)
+
+SELECT E'\\a'::ltree;
+ ltree 
+---
+ a
+(1 row)
+
+SELECT E'\\n'::ltree;
+ ltree 
+---
+ n
+(1 row)
+
+SELECT E'x'::ltree;
+ ltree 
+---
+ "x\"
+(1 row)
+
+SELECT E'x\\ '::ltree;
+ ltree 
+---
+ "x "
+(1 row)
+
+SELECT E'x\\.'::ltree;
+ ltree 
+---
+ "x."
+(1 row)
+
+SELECT E'x\\a'::ltree;
+ ltree 
+---
+ xa
+(1 row)
+
+SELECT E'x\\n'::ltree;
+ ltree 
+---
+ xn
+(1 row)
+
+SELECT 'a b.с d'::ltree;
+ltree
+-
+ "a b"."с d"
+(1 row)
+
+SELECT ' e . f '::ltree;
+ ltree 
+---
+ e.f
+(1 row)
+
+SELECT ' '::ltree;
+ ltree 
+---
+ 
+(1 row)
+
+SELECT E'\\ g  . h\\ '::ltree;
+   ltree   
+---
+ " g"."h "
+(1 row)
+
+SELECT E'\\ g'::ltree;
+ ltree 
+---
+ " g"
+(1 row)
+
+SELECT E' h\\ '::ltree;
+ ltree 
+---
+ "h "
+(1 row)
+
+SELECT '" g  "." h "'::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT '" g  " '::ltree;
+ ltree  
+
+ " g  "
+(1 row)
+
+SELECT '" g  "   ." h "  '::ltree;
+ltree 
+--
+ " g  "." h "
+(1 row)
+
+SELECT nlevel(E'Bottom\\.Test'::ltree);
+ nlevel 
+
+  1
+(1 row)
+
+SELECT subpath(E'Bottom\\.'::ltree, 0, 1);
+  subpath  
+---
+ "Bottom."
+(1 row)
+
+SELECT subpath(E'a\\.b', 0, 1);
+ subpath 
+-
+ "a.b"
+(1 row)
+
+SELECT subpath(E'a\\..b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a\\..\\b', 1, 1);
+ subpath 
+-
+ b
+(1 row)
+
+SELECT subpath(E'a b.с d'::ltree, 1, 1);
+ subpath 
+-
+ "с d"
+(1 row)
+
+SELECT(
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'\z\z\z\z\z')::ltree;
+  ltree  
+-
+ 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789z
+(1 row)
+
+SELECT('   ' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'01234567890123456789012345678901234567890123456789' ||
+'\a\b\c\d\e   ')::ltree;
+  ltree  
+-
+ 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abcde
+(1 row)
+
+SELECT 'abc\|d'::lquery;
+ lquery  
+-
+ "abc|d"
+(1 row)
+
+SELECT 'abc\|d'::ltree ~ 'abc\|d'::lquery;
+ ?column? 
+--

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Bruce Momjian
On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
> On 7/9/19 11:11 AM, Bruce Momjian wrote:
> > Good point about nonce and IV.  I wonder if running the nonce
> > through the cipher with the key makes it random enough to use as an
> > IV.
>
> Based on that NIST document it seems so.
>
> The trick will be to be 100% sure we never reuse a nonce that is used
> to produce the IV when using the same key.
>
> I think the potential to get that wrong (i.e. inadvertently reuse a
> nonce) would lead to using the second described method
>
>   "The second method is to generate a random data block using a
>   FIPS-approved random number generator."
>
> That method is what I am used to seeing. But with the second method
> we need to store the IV, with the first we could reproduce it if we
> select our initial nonce carefully.
>
> So thinking out loud, and perhaps you already said this Bruce, but I
> guess the input nonce used to generate the IV could be something like
> pg_class.oid and blocknum concatenated together with some delimiting
> character as long as we guarantee that we generate different keys in
> different databases. Then there would be no need to store the IV since
> we could reproduce it.

Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
be preserved by pg_upgrade anyway), and the page number.  However,
different databases can have the same pg_class.oid/page number
combination, so there would be duplication between databases.  Now, you
might say let's add the pg_database.oid, but unfortunately, because of
the way we file-system-copy files from one database to another during
database creation (it doesn't go through shared buffers), we can't use
pg_database.oid as part of the nonce.

My only idea here is that we actually decrypt/re-encrypted pages as we
copy them at the file system level during database creation to match the
new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
(I think we will need to modify pg_upgrade to preserve pg_database.oid.)

If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
4-bytes for the block number, which gets us to 32TB before the page
counter would overflow a 4-byte value, and our max table size is 32TB
anyway, so that all works.

> This all assumes that we encrypt each block independently. Sound
> correct?

Yes, I think 8k encryption granularity is a requirement.  If not, you
would need to potentially load and write multiple 8k pages for a single
8k page change, which seems very complex.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 3:50 PM, Bruce Momjian wrote:
> On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:
>> On 7/9/19 11:11 AM, Bruce Momjian wrote:
>> > Good point about nonce and IV.  I wonder if running the nonce
>> > through the cipher with the key makes it random enough to use as an
>> > IV.
>>
>> Based on that NIST document it seems so.
>>
>> The trick will be to be 100% sure we never reuse a nonce that is used
>> to produce the IV when using the same key.
>>
>> I think the potential to get that wrong (i.e. inadvertently reuse a
>> nonce) would lead to using the second described method
>>
>>   "The second method is to generate a random data block using a
>>   FIPS-approved random number generator."
>>
>> That method is what I am used to seeing. But with the second method
>> we need to store the IV, with the first we could reproduce it if we
>> select our initial nonce carefully.
>>
>> So thinking out loud, and perhaps you already said this Bruce, but I
>> guess the input nonce used to generate the IV could be something like
>> pg_class.oid and blocknum concatenated together with some delimiting
>> character as long as we guarantee that we generate different keys in
>> different databases. Then there would be no need to store the IV since
>> we could reproduce it.
> 
> Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
> be preserved by pg_upgrade anyway), and the page number.  However,
> different databases can have the same pg_class.oid/page number
> combination, so there would be duplication between databases.

But as I said "as long as we guarantee that we generate different keys
in different databases". The IV only needs to be unique for a given key.
the combination of oid and page number when run through the cipher
should always produce a unique IV with high probability. And if we
generate random keys with sufficient entropy the chances of collision
should approach zero.

> If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
> pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
> 4-bytes for the block number, which gets us to 32TB before the page
> counter would overflow a 4-byte value, and our max table size is 32TB
> anyway, so that all works.

The IV will be the same as the algorithm block size (128 bits for AES).
It gets XOR'd with the first block of plaintext in CBC. The nonce used
to produce the IV does not need to be the same size, but but running it
through the cipher the output IV will be the correct size.

https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation
https://en.wikipedia.org/wiki/Initialization_vector

>> This all assumes that we encrypt each block independently. Sound
>> correct?
> 
> Yes, I think 8k encryption granularity is a requirement.  If not, you
> would need to potentially load and write multiple 8k pages for a single
> 8k page change, which seems very complex.

Exactly, and it would also be terrible for performance.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:

On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:

On 7/9/19 11:11 AM, Bruce Momjian wrote:
> Good point about nonce and IV.  I wonder if running the nonce
> through the cipher with the key makes it random enough to use as an
> IV.

Based on that NIST document it seems so.

The trick will be to be 100% sure we never reuse a nonce that is used
to produce the IV when using the same key.

I think the potential to get that wrong (i.e. inadvertently reuse a
nonce) would lead to using the second described method

  "The second method is to generate a random data block using a
  FIPS-approved random number generator."

That method is what I am used to seeing. But with the second method
we need to store the IV, with the first we could reproduce it if we
select our initial nonce carefully.

So thinking out loud, and perhaps you already said this Bruce, but I
guess the input nonce used to generate the IV could be something like
pg_class.oid and blocknum concatenated together with some delimiting
character as long as we guarantee that we generate different keys in
different databases. Then there would be no need to store the IV since
we could reproduce it.


Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
be preserved by pg_upgrade anyway), and the page number.  However,
different databases can have the same pg_class.oid/page number
combination, so there would be duplication between databases.  Now, you
might say let's add the pg_database.oid, but unfortunately, because of
the way we file-system-copy files from one database to another during
database creation (it doesn't go through shared buffers), we can't use
pg_database.oid as part of the nonce.

My only idea here is that we actually decrypt/re-encrypted pages as we
copy them at the file system level during database creation to match the
new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
(I think we will need to modify pg_upgrade to preserve pg_database.oid.)



Ot you could just encrypt them with a different key, and you would not
need to make database OID part of the nonce.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: benchmarking Flex practices

2019-07-09 Thread Tom Lane
John Naylor  writes:
> [ v4 patches for trimming lexer table size ]

I reviewed this and it looks pretty solid.  One gripe I have is
that I think it's best to limit backup-prevention tokens such as
quotecontinuefail so that they match only exact prefixes of their
"success" tokens.  This seems clearer to me, and in at least some cases
it can save a few flex states.  The attached v5 patch does it like that
and gets us down to 22331 states (from 23696).  In some places it looks
like you did that to avoid writing an explicit "{other}" match rule for
an exclusive state, but I think it's better for readability and
separation of concerns to go ahead and have those explicit rules
(and it seems to make no difference table-size-wise).

I also made some cosmetic changes (mostly improving comments) and
smashed the patch series down to 1 patch, because I preferred to
review it that way and we're not really going to commit these
separately.

I did a little bit of portability testing, to the extent of verifying
that the oldest and newest Flex versions I have handy (2.5.33 and 2.6.4)
agree on the table size change and get through regression tests.  So
I think we should be good from that end.

We still need to propagate these changes into the psql and ecpg lexers,
but I assume you were waiting to agree on the core patch before touching
those.  If you're good with the changes I made here, have at it.

regards, tom lane

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index e1cae85..899da09 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -168,12 +168,14 @@ extern void core_yyset_column(int column_no, yyscan_t yyscanner);
  *   delimited identifiers (double-quoted identifiers)
  *   hexadecimal numeric string
  *   standard quoted strings
+ *   quote stop (detect continued strings)
  *   extended quoted strings (support backslash escape sequences)
  *   $foo$ quoted strings
  *   quoted identifier with Unicode escapes
- *   end of a quoted identifier with Unicode escapes, UESCAPE can follow
  *   quoted string with Unicode escapes
- *   end of a quoted string with Unicode escapes, UESCAPE can follow
+ *   end of a quoted string or identifier with Unicode escapes,
+ *UESCAPE can follow
+ *   expecting escape character literal after UESCAPE
  *   Unicode surrogate pair in extended quoted string
  *
  * Remember to add an <> case whenever you add a new exclusive state!
@@ -185,12 +187,13 @@ extern void core_yyset_column(int column_no, yyscan_t yyscanner);
 %x xd
 %x xh
 %x xq
+%x xqs
 %x xe
 %x xdolq
 %x xui
-%x xuiend
 %x xus
-%x xusend
+%x xuend
+%x xuchar
 %x xeu
 
 /*
@@ -231,19 +234,18 @@ special_whitespace		({space}+|{comment}{newline})
 horiz_whitespace		({horiz_space}|{comment})
 whitespace_with_newline	({horiz_whitespace}*{newline}{special_whitespace}*)
 
+quote			'
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue	{whitespace_with_newline}{quote}
+
 /*
- * To ensure that {quotecontinue} can be scanned without having to back up
- * if the full pattern isn't matched, we include trailing whitespace in
- * {quotestop}.  This matches all cases where {quotecontinue} fails to match,
- * except for {quote} followed by whitespace and just one "-" (not two,
- * which would start a {comment}).  To cover that we have {quotefail}.
- * The actions for {quotestop} and {quotefail} must throw back characters
- * beyond the quote proper.
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}.  It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
  */
-quote			'
-quotestop		{quote}{whitespace}*
-quotecontinue	{quote}{whitespace_with_newline}{quote}
-quotefail		{quote}{whitespace}*"-"
+quotecontinuefail	{whitespace}*"-"?
 
 /* Bit string
  * It is tempting to scan the string for only those characters
@@ -304,10 +306,15 @@ xdstop			{dquote}
 xddouble		{dquote}{dquote}
 xdinside		[^"]+
 
-/* Unicode escapes */
-uescape			[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']{quote}
+/* Optional UESCAPE after a quoted string or identifier with Unicode escapes */
+uescape			[uU][eE][sS][cC][aA][pP][eE]
+/* error rule to avoid backup */
+uescapefail		[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
+
+/* escape character literal */
+uescchar		{quote}[^']{quote}
 /* error rule to avoid backup */
-uescapefail		[uU][eE][sS][cC][aA][pP][eE]{whitespace}*"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]
+uesccharfail	{quote}[^']|{quote}
 
 /* Quoted identifier with Un

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 4:12 PM, Tomas Vondra wrote:
> On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:
>>On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:

>>> the input nonce used to generate the IV could be something like
>>> pg_class.oid and blocknum concatenated together with some delimiting
>>> character as long as we guarantee that we generate different keys in
>>> different databases.



> Ot you could just encrypt them with a different key, and you would not
> need to make database OID part of the nonce.

Yeah that was pretty much exactly what I was trying to say above ;-)

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Broken defenses against dropping a partitioning column

2019-07-09 Thread Tom Lane
Tomas Vondra  writes:
> On Mon, Jul 08, 2019 at 10:58:56AM -0400, Tom Lane wrote:
>> So I think we're probably stuck with the approach of adding new internal
>> dependencies.  If we go that route, then our options for the released
>> branches are (1) do nothing, or (2) back-patch the code that adds such
>> dependencies, but without a catversion bump.  That would mean that only
>> tables created after the next minor releases would have protection against
>> this problem.  That's not ideal but maybe it's okay, considering that we
>> haven't seen actual field reports of trouble of this kind.

> Couldn't we also write a function that adds those dependencies for
> existing objects, and request users to run it after the update?

Maybe.  I'm not volunteering to write such a thing.

BTW, it looks like somebody actually did think about this problem with
respect to external dependencies of partition expressions:

regression=# create function myabs(int) returns int language internal as 
'int4abs' immutable strict parallel safe;
CREATE FUNCTION
regression=# create table foo (f1 int) partition by range (myabs(f1));
CREATE TABLE
regression=# drop function myabs(int);
ERROR:  cannot drop function myabs(integer) because other objects depend on it
DETAIL:  table foo depends on function myabs(integer)
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Unfortunately, there's still no dependency on the column f1 in this
scenario.  That means any function that wants to reconstruct the
correct dependencies would need a way to scan the partition expressions
for Vars.  Not much fun from plpgsql, for sure.

regards, tom lane




Re: [PATCH] Fix trigger argument propagation to child partitions

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-09, Tomas Vondra wrote:

> On Tue, Jul 09, 2019 at 03:00:27PM +0200, Patrick McHardy wrote:
> > The following patch fixes propagation of arguments to the trigger
> > function to child partitions both when initially creating the trigger
> > and when adding new partitions to a partitioned table.
> 
> Thanks for the report and bugfix. It seeem the parameters in row triggers
> on partitioned tables never worked :-( For a moment I was wondering why it
> shows on 11 and not 10 (based on the assumption you'd send a patch against
> 10 if it was affected), but 10 actually did not support row triggers on
> partitioned tables.

Right ...

> The fix seems OK to me, although I see we're parsing tgargs in ruleutils.c
> and that version (around line ~1050) uses fastgetattr instead of
> heap_getattr, and checks the isnull parameter after the call. I guess we
> should do the same thing here.

Yeah, absolutely.  The attached v2 is basically Patrick's patch with
very minor style changes.  I'll get this pushed as soon as the tests
finish running.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e7d3738739ece9fa7a48192c7ebc481b27a4a682 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 9 Jul 2019 16:46:16 -0400
Subject: [PATCH v2] first

---
 src/backend/commands/tablecmds.c   | 23 ++-
 src/backend/commands/trigger.c |  1 -
 src/test/regress/expected/triggers.out | 24 
 src/test/regress/sql/triggers.sql  | 23 +++
 4 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c62922a112..5fbe7242c4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15202,6 +15202,7 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 		Datum		value;
 		bool		isnull;
 		List	   *cols = NIL;
+		List	   *trigargs = NIL;
 		MemoryContext oldcxt;
 
 		/*
@@ -15266,11 +15267,31 @@ CloneRowTriggersToPartition(Relation parent, Relation partition)
 			}
 		}
 
+		/* Reconstruct trigger arguments list. */
+		if (trigForm->tgnargs > 0)
+		{
+			char	   *p;
+
+			value = heap_getattr(tuple, Anum_pg_trigger_tgargs,
+ RelationGetDescr(pg_trigger), &isnull);
+			if (isnull)
+elog(ERROR, "tgargs is null for trigger \"%s\" in partition \"%s\"",
+	 NameStr(trigForm->tgname), RelationGetRelationName(partition));
+
+			p = (char *) VARDATA_ANY(DatumGetByteaPP(value));
+
+			for (int i = 0; i < trigForm->tgnargs; i++)
+			{
+trigargs = lappend(trigargs, makeString(pstrdup(p)));
+p += strlen(p) + 1;
+			}
+		}
+
 		trigStmt = makeNode(CreateTrigStmt);
 		trigStmt->trigname = NameStr(trigForm->tgname);
 		trigStmt->relation = NULL;
 		trigStmt->funcname = NULL;	/* passed separately */
-		trigStmt->args = NULL;	/* passed separately */
+		trigStmt->args = trigargs;
 		trigStmt->row = true;
 		trigStmt->timing = trigForm->tgtype & TRIGGER_TYPE_TIMING_MASK;
 		trigStmt->events = trigForm->tgtype & TRIGGER_TYPE_EVENT_MASK;
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index f6c7a3fefc..a716827c77 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -1153,7 +1153,6 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			 */
 			childStmt = (CreateTrigStmt *) copyObject(stmt);
 			childStmt->funcname = NIL;
-			childStmt->args = NIL;
 			childStmt->whenClause = NULL;
 
 			/* If there is a WHEN clause, create a modified copy of it */
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 31dbc9bcfc..5be0009f00 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2092,6 +2092,30 @@ NOTICE:  trigger zzz on parted_trig_1_1 AFTER INSERT for ROW
 NOTICE:  trigger bbb on parted_trig_2 AFTER INSERT for ROW
 NOTICE:  trigger zzz on parted_trig_2 AFTER INSERT for ROW
 drop table parted_trig;
+-- Verify propagation of trigger arguments to partitions
+create table parted_trig (a int) partition by list (a);
+create table parted_trig1 partition of parted_trig for values in (1);
+create or replace function trigger_notice() returns trigger as $$
+  declare
+arg1 text = TG_ARGV[0];
+arg2 integer = TG_ARGV[1];
+  begin
+raise notice 'trigger % on % % % for % args % %',
+		TG_NAME, TG_TABLE_NAME, TG_WHEN, TG_OP, TG_LEVEL, arg1, arg2;
+return null;
+  end;
+  $$ language plpgsql;
+create trigger aaa after insert on parted_trig
+   for each row execute procedure trigger_notice('quirky', 1);
+-- Verify propagation of trigger arguments to partitions attached after creating trigger
+create table parted_trig2 partition of parted_trig for values in (2);
+create table parted_trig3 (like parted_trig);
+alter table parted_trig attach partition parted_trig3 for values in (3);
+insert in

Re: [HACKERS] Cached plans and statement generalization

2019-07-09 Thread Konstantin Knizhnik



On 09.07.2019 15:16, Thomas Munro wrote:

On Tue, Jul 9, 2019 at 7:32 AM Konstantin Knizhnik
 wrote:

Sorry, are you tests autoprepare-16.patch I have sent in the last e-mail?
I can not reproduce the problem with building documentation:

+  autoprepare_threshold (integer/type>)

The problem is that "integer/type>".  (Missing "<").  There more
than one of those.  Not sure why that doesn't fail for you.
or the noise.  I did something stupid in my apply script.



Sorry, there were really several syntax problems.
I didn't understand why I have not noticed them before.
Fixed patch version of the path is attached.
diff --git a/doc/src/sgml/autoprepare.sgml b/doc/src/sgml/autoprepare.sgml
new file mode 100644
index 00..d20a4d0098
--- /dev/null
+++ b/doc/src/sgml/autoprepare.sgml
@@ -0,0 +1,62 @@
+
+
+ 
+  Autoprepared statements
+
+  
+   autoprepared statements
+  
+
+  
+PostgreSQL makes it possible prepare
+frequently used statements to eliminate cost of their compilation
+and optimization on each execution of the query. On simple queries
+(like ones in pgbench -S) using prepared statements
+increase performance more than two times.
+  
+
+  
+Unfortunately not all database applications are using prepared statements
+and, moreover, it is not always possible. For example, in case of using
+pgbouncer or any other session pooler,
+there is no session state (transactions of one client may be executed at different
+backends) and so prepared statements can not be used.
+  
+
+  
+Autoprepare mode allows to overcome this limitation.
+In this mode Postgres tries to generalize executed statements
+and build parameterized plan for them. Speed of execution of
+autoprepared statements is almost the same as of explicitly
+prepared statements.
+  
+
+  
+By default autoprepare mode is switched off. To enable it, assign non-zero
+value to GUC variable autoprepare_threshold.
+This variable specified minimal number of times the statement should be
+executed before it is autoprepared. Please notice that, despite to the
+value of this parameter, Postgres makes a decision about using
+generalized plan vs. customized execution plans based on the results
+of comparison of average time of five customized plans with
+time of generalized plan.
+  
+
+  
+If number of different statements issued by application is large enough,
+then autopreparing all of them can cause memory overflow
+(especially if there are many active clients, because prepared statements cache
+is local to the backend). To prevent growth of backend's memory because of
+autoprepared cache, it is possible to limit number of autoprepared statements
+by setting autoprepare_limit GUC variable. LRU strategy will be used
+to keep in memory most frequently used queries.
+  
+
+  
+It is possible to inspect autoprepared queries in the backend using
+pg_autoprepared_statements view. It shows original text of the
+query, types of the extracted parameters (replacing literals) and
+query execution counter.
+  
+
+ 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f2b9d404cb..cb703f2084 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8313,6 +8313,11 @@ SCRAM-SHA-256$:&l
   prepared statements
  
 
+ 
+  pg_autoprepared_statements
+  autoprepared statements
+ 
+
  
   pg_prepared_xacts
   prepared transactions
@@ -9630,6 +9635,68 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+
+ 
+  pg_autoprepared_statements
+
+  
+   pg_autoprepared_statements
+  
+
+  
+   The pg_autoprepared_statements view displays
+   all the autoprepared statements that are available in the current
+   session. See  for more information about autoprepared
+   statements.
+  
+
+  
+   pg_autoprepared_statements Columns
+
+   
+
+ 
+  Name
+  Type
+  Description
+ 
+
+
+ 
+  statement
+  text
+  
+The query string submitted by the client from which this prepared statement
+was created. Please notice that literals in this statement are not
+replaced with prepared statement placeholders.
+  
+ 
+ 
+  parameter_types
+  regtype[]
+  
+   The expected parameter types for the autoprepared statement in the
+   form of an array of regtype. The OID corresponding
+   to an element of this array can be obtained by casting the
+   regtype value to oid.
+  
+ 
+ 
+  exec_count
+  int8
+  
+Number of times this statement was executed.
+  
+ 
+
+   
+  
+
+  
+   The pg_autoprepared_statements view is read only.
+  
+ 
+
  
   pg_prepared_xacts
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 84341a30e5..8a9fff3756 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-09, Joe Conway wrote:

> > Ot you could just encrypt them with a different key, and you would not
> > need to make database OID part of the nonce.
> 
> Yeah that was pretty much exactly what I was trying to say above ;-)

So you need to decrypt each file and encrypt again when doing CREATE
DATABASE?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Optimize partial TOAST decompression

2019-07-09 Thread Tomas Vondra

On Sat, Jul 06, 2019 at 05:23:37PM +0200, Tomas Vondra wrote:

On Sat, Jul 06, 2019 at 02:27:56AM +0800, Binguo Bao wrote:

Hi, Tomas!
Thanks for your testing and the suggestion.

That's quite bizarre behavior - it does work with a prefix, but not with

suffix. And the exact ERROR changes after the prefix query.



I think bug is caused by "#2  0x004c3b08 in
heap_tuple_untoast_attr_slice (attr=, sliceoffset=0,
slicelength=-1) at tuptoaster.c:315",
since I ignore the case where slicelength is negative, and I've appended
some comments for heap_tuple_untoast_attr_slice for the case.

FWIW the new code added to this

function does not adhere to our code style, and would deserve some
additional explanation of what it's doing/why. Same for the
heap_tuple_untoast_attr_slice, BTW.



I've added more comments to explain the code's behavior.
Besides, I also modified the macro "TOAST_COMPRESS_RAWDATA" to
"TOAST_COMPRESS_DATA" since
it is used to get toast compressed data rather than raw data.



Thanks, this seems to address the issue - I can no longer reproduce the
crashes, allowing the benchmark to complete. I'm attaching the script I
used and spreadsheet with a summary of results.

For the cases I've tested (100k - 10M values, different compressibility,
different prefix/length values), the results are kinda mixed - many
cases got much faster (~2x), but other cases got slower too. We're
however talking about queries taking a couple of miliseconds, so in
absolute numbers the differences are small.

That does not mean the optimization is useless - but the example shared
at the beginning of this thread is quite extreme, as the values are
extremely compressible. Each value is ~38MB (in plaintext), but a table
with 100 such values has only ~40MB. Tha's 100:1 compression ratio,
which I think is not typical for real-world data sets.

The data I've used are less extreme, depending on the fraction of random
data in values.

I went through the code too. I've reworder a couple of comments and code
style issues, but there are a couple of more serious issues.


1) Why rename TOAST_COMPRESS_RAWDATA to TOAST_COMPRESS_DATA?

This seems unnecessary, and it discards the clear hint that it's about
accessing the *raw* data, and the relation to TOAST_COMPRESS_RAWSIZE.
IMHO we should keep the original naming.


2) pglz_maximum_compressed_size signatures are confusing

There are two places with pglz_maximum_compressed_size signature, and
those places are kinda out of sync when it comes to parameter names:

int32
pglz_maximum_compressed_size(int32 raw_slice_size,
 int32 total_compressed_size)

extern
int32 pglz_maximum_compressed_size(int32 raw_slice_size,
   int32 raw_size);

Also, pg_lzcompress.c has no concept of a "slice" because it only deals
with simple compression, slicing is responsibility of the tuptoaster. So
we should not mix those two, not even in comments.


I propose tweaks per the attached patch - I think it makes the code
clearer, and it's mostly cosmetic stuff. But I haven't tested the
changes beyond "it compiles".


regards



FWIW I've done another round of tests with larger values (up to ~10MB)
and the larger the values the better the speedup (a bit as expected).
Attached is the script I used and a spreasheet with a summary.

We still need a patch addressing the review comments, though.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 


random-bench.sh
Description: Bourne shell script


results-large.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

2019-07-09 Thread Tomas Vondra

On Sun, Jul 07, 2019 at 11:06:38PM +, Tom Mercha wrote:

On 06/07/2019 00:06, Tomas Vondra wrote:

First of all, it's pretty difficult to follow the discussion when it's
not clear what's the original message and what's the response. E-mail
clients generally indent the original message with '>' or someting like
that, but your client does not do that (which is pretty silly). And
copying the message at the top does not really help. Please do something
about that.


I would like to apologise. I did not realize that my client was doing
that and now I have changed the client. I hope it's fine now.



Thanks, seems fine now.



On Fri, Jul 05, 2019 at 09:37:03PM +, Tom Mercha wrote:

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.


I was considering having two distinct grammars as an option - thanks
for indicating the effort involved. At the end of the day I want both
my DSL and the PostgreSQL grammars to coexist. Is extending
PostgreSQL's grammar with my own through the PostgreSQL extension
infrastructure worth consideration or is it also difficult to develop?
Could you suggest any reference material on this topic?



Well, I'm not an expert in that area, but we currently don't have any
infrastructure to support that. It's a topic that was discussed in the
past (perhaps you can find some references in the archives) and it
generally boils down to:

1) We're using bison as parser generator.
2) Bison does not allow adding rules on the fly.

So you have to modify the in-core src/backend/parser/gram.y and rebuild
postgres. See for example for an example of such discussion

https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com


When two of the smartest people on the list say it's a hard problem, it
probably is. Particularly for someone who does not know the internals.

You are right. Thanks for bringing it to my attention!

I didn't design my language for interaction with triggers and whatnot,
but I think that it would be very interesting to support those as well,
so looking at CREATE LANGUAGE functionality is actually exciting and
appropriate once I make some changes in design. Thanks again for this point!



;-)


I hope this is not off topic but I was wondering if you know what are
the intrinsic differences between HANDLER and INLINE parameters of
CREATE LANGUAGE? I know that they are functions which are invoked at
different instances of time (e.g. one is for handling anonymous code
blocks), but at the end of the day they seem to have the same purpose?



I've never written any PL handler, so I don't know. All I know is this
quote from the docs, right below the simple example of PL handler:

   Only a few thousand lines of code have to be added instead of the
   dots to complete the call handler.

I suppose the best idea to start an implementation is to copy an
existing PL implementation, and modify that. That's usually much easier
than starting from scratch, because you have something that works. Not
sure if PL/pgSQL is the right choice though, perhaps pick some other
language from https://wiki.postgresql.org/wiki/PL_Matrix


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:

On 2019-Jul-09, Joe Conway wrote:


> Ot you could just encrypt them with a different key, and you would not
> need to make database OID part of the nonce.

Yeah that was pretty much exactly what I was trying to say above ;-)


So you need to decrypt each file and encrypt again when doing CREATE
DATABASE?



The question is whether we actually need to do that? Do we change OIDs
of relations when creating the database? If not, we don't need to
re-encrypt because having copies of the same block encrypted with the
same nonce is not an issue (just like copying encrypted files is not an
issue).

Of course, we may need a CREATE DATABASE option that would force
re-encryption with a different key, but it's not necessary because of
nonces or whatnot.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-09, Tomas Vondra wrote:

> On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
> > On 2019-Jul-09, Joe Conway wrote:
> > 
> > > > Ot you could just encrypt them with a different key, and you would not
> > > > need to make database OID part of the nonce.
> > > 
> > > Yeah that was pretty much exactly what I was trying to say above ;-)
> > 
> > So you need to decrypt each file and encrypt again when doing CREATE
> > DATABASE?
> 
> The question is whether we actually need to do that?

I mean if the new database is supposed to be encrypted with key B, you
can't just copy the files from the other database, since they are
encrypted with key A, right?  Even if you consider that both copies of
each table have the same OID and each block has the same nonce.

> Do we change OIDs of relations when creating the database? If not, we
> don't need to re-encrypt because having copies of the same block
> encrypted with the same nonce is not an issue (just like copying
> encrypted files is not an issue).

Are you thinking that the files can be decrypted by the two keys
somehow?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
> >On 2019-Jul-09, Joe Conway wrote:
> >
> >>> Ot you could just encrypt them with a different key, and you would not
> >>> need to make database OID part of the nonce.
> >>
> >>Yeah that was pretty much exactly what I was trying to say above ;-)
> >
> >So you need to decrypt each file and encrypt again when doing CREATE
> >DATABASE?
> 
> The question is whether we actually need to do that? Do we change OIDs
> of relations when creating the database? If not, we don't need to
> re-encrypt because having copies of the same block encrypted with the
> same nonce is not an issue (just like copying encrypted files is not an
> issue).
> 
> Of course, we may need a CREATE DATABASE option that would force
> re-encryption with a different key, but it's not necessary because of
> nonces or whatnot.

This also depends on if we actually encrypt the template databases.
Seems like that could be optional, if we're supporting different keys
for different databases.

In that case we'd need the "encrypt this database" option during CREATE
DATABASE, of course.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 03:50:39PM -0400, Bruce Momjian wrote:

On Tue, Jul 9, 2019 at 02:09:38PM -0400, Joe Conway wrote:

On 7/9/19 11:11 AM, Bruce Momjian wrote:
> Good point about nonce and IV.  I wonder if running the nonce
> through the cipher with the key makes it random enough to use as an
> IV.

Based on that NIST document it seems so.

The trick will be to be 100% sure we never reuse a nonce that is used
to produce the IV when using the same key.

I think the potential to get that wrong (i.e. inadvertently reuse a
nonce) would lead to using the second described method

  "The second method is to generate a random data block using a
  FIPS-approved random number generator."

That method is what I am used to seeing. But with the second method
we need to store the IV, with the first we could reproduce it if we
select our initial nonce carefully.

So thinking out loud, and perhaps you already said this Bruce, but I
guess the input nonce used to generate the IV could be something like
pg_class.oid and blocknum concatenated together with some delimiting
character as long as we guarantee that we generate different keys in
different databases. Then there would be no need to store the IV since
we could reproduce it.


Uh, yes, and no.  Yes, we can use the pg_class.oid (since it has to
be preserved by pg_upgrade anyway), and the page number.  However,
different databases can have the same pg_class.oid/page number
combination, so there would be duplication between databases.  Now, you
might say let's add the pg_database.oid, but unfortunately, because of
the way we file-system-copy files from one database to another during
database creation (it doesn't go through shared buffers), we can't use
pg_database.oid as part of the nonce.

My only idea here is that we actually decrypt/re-encrypted pages as we
copy them at the file system level during database creation to match the
new pg_database.oid.  This would allow pg_database.oid in the nonce/IV.
(I think we will need to modify pg_upgrade to preserve pg_database.oid.)

If the nonce/IV is 96 bits, then that is 12 bytes or 3 4-byte values.
pg_class.oid is 4 bytes, pg_database.oid is 4 bytes, and that leaves
4-bytes for the block number, which gets us to 32TB before the page
counter would overflow a 4-byte value, and our max table size is 32TB
anyway, so that all works.



I don't think that works, because that'd mean we're encrypting the same
page with the same nonce over and over, which means reusing the reuse
(even if you hash/encrypt it). Or did I miss something?

There are two basic ways to construct nonces - CSPRNG and sequences, and
then a combination of both, i.e. one part is generated from a sequence
and one randomly.

FWIW not sure using OIDs as nonces directly is a good idea, as those are
inherently low entropy data - how often do you see databases with OIDs
above 1M or so? Probably not very often, and in most cases those are
databases where those OIDs are for OIDs and large objects, so irrelevant
for this purpose. I might be wrong but having a 96-bit nonce with maybe
just 32bits of entrophy seems suspicious.

That does not mean we can't use the OIDs at all, but maybe hashing them
into a single 4B value, and then picking the remaining 8B randomly.
Also, we have a "natural" sequence in the database - LSNs, maybe that
would be a good source of nonces too?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Tomas Vondra

On Tue, Jul 09, 2019 at 05:31:49PM -0400, Alvaro Herrera wrote:

On 2019-Jul-09, Tomas Vondra wrote:


On Tue, Jul 09, 2019 at 05:06:45PM -0400, Alvaro Herrera wrote:
> On 2019-Jul-09, Joe Conway wrote:
>
> > > Ot you could just encrypt them with a different key, and you would not
> > > need to make database OID part of the nonce.
> >
> > Yeah that was pretty much exactly what I was trying to say above ;-)
>
> So you need to decrypt each file and encrypt again when doing CREATE
> DATABASE?

The question is whether we actually need to do that?


I mean if the new database is supposed to be encrypted with key B, you
can't just copy the files from the other database, since they are
encrypted with key A, right?  Even if you consider that both copies of
each table have the same OID and each block has the same nonce.



Sure, if the databases are supposed to be encrypted with different keys,
then we may need to re-encrypt the files. I don't see a way around that,
but maybe we could use the scheme with master key somehow.


Do we change OIDs of relations when creating the database? If not, we
don't need to re-encrypt because having copies of the same block
encrypted with the same nonce is not an issue (just like copying
encrypted files is not an issue).


Are you thinking that the files can be decrypted by the two keys
somehow?



No, I was kinda assuming the database will start with the same key, but
that might have been a silly idea.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: progress report for ANALYZE

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-08, Robert Haas wrote:

> On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera  
> wrote:
> > Yeah, I got the impression that that was determined to be the desirable
> > behavior, so I made it do that, but I'm not really happy about it
> > either.  We're not too late to change the CREATE INDEX behavior, but
> > let's discuss what is it that we want.
> 
> I don't think I intended to make any such determination -- which
> commit do you think established this as the canonical behavior?

No commit, just discussion in the CREATE INDEX thread.

> I propose that once a field is set, we should leave it set until the end.

Hmm, ok.  In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: pg_receivewal documentation

2019-07-09 Thread Laurenz Albe
Jesper Pedersen wrote:
> Thanks for the review, and the changes.
> 
> However, I think it belongs in the --synchronous section, so what about 
> moving it there as attached ?

Works for me.

Marked as "ready for committer".

Yours,
Laurenz Albe





Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-09 Thread Alvaro Herrera
On 2019-Jul-09, Amit Langote wrote:

> As mentioned in the docs, defining exclusion constraints on
> partitioned tables is not supported.

Right.

> "While defining a primary key and unique constraints on partitioned
> tables is supported, the set of columns being constrained must include
> all of the partition key columns.  This limitation exists because
> PostgreSQL can ensure uniqueness only
> across a given partition."

I feel that PKs are mostly a special case of UNIQUE keys, so I tend to
mention UNIQUE as the central element and let PKs fall out from that.
That's a mild personal preference only though.  Anyway, based on your
proposed wording, I wrote this:

 
  
   Unique constraints on partitioned tables (as well as primary keys)
   must constrain all the partition key columns.  This limitation exists
   because PostgreSQL can only enforce
   uniqueness in each partition individually.
  
 

I'm not really sure about the "must constrain" verbiage.  Is that really
comprehensible?  Also, I chose to place it just above the existing para
that mentions FK limitations, which reads:

 
  
   While primary keys are supported on partitioned tables, foreign
   keys referencing partitioned tables are not supported.  (Foreign key
   references from a partitioned table to some other table are supported.)
  

Your proposed wording seemed to use too many of the same words, which
prompted me to change a bit.  Maybe I read too many novels and
insufficient technical literature.

In CREATE TABLE, we already have this:
 
  When establishing a unique constraint for a multi-level partition
  hierarchy, all the columns in the partition key of the target
  partitioned table, as well as those of all its descendant partitioned
  tables, must be included in the constraint definition.
 

which may not be the pinnacle of clarity, but took some time to craft
and I think is correct.  Also it doesn't mention primary keys
explicitly; maybe we should patch it by adding "(as well as a primary
key)" right after "a unique constraint".  Thoughts?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Postgres 11: Table Partitioning and Primary Keys

2019-07-09 Thread Tom Lane
Alvaro Herrera  writes:
> That's a mild personal preference only though.  Anyway, based on your
> proposed wording, I wrote this:

>  
>   
>Unique constraints on partitioned tables (as well as primary keys)
>must constrain all the partition key columns.  This limitation exists
>because PostgreSQL can only enforce
>uniqueness in each partition individually.
>   
>  

> I'm not really sure about the "must constrain" verbiage.  Is that really
> comprehensible?

I think "must include" might be better.

> In CREATE TABLE, we already have this:
>  
>   When establishing a unique constraint for a multi-level partition
>   hierarchy, all the columns in the partition key of the target
>   partitioned table, as well as those of all its descendant partitioned
>   tables, must be included in the constraint definition.
>  

> which may not be the pinnacle of clarity, but took some time to craft
> and I think is correct.  Also it doesn't mention primary keys
> explicitly; maybe we should patch it by adding "(as well as a primary
> key)" right after "a unique constraint".  Thoughts?

I'd leave that alone.  I don't think the parenthetical comment about
primary keys in your new text is adding much either.

regards, tom lane




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Joe Conway
On 7/9/19 5:42 PM, Tomas Vondra wrote:
> There are two basic ways to construct nonces - CSPRNG and sequences, and
> then a combination of both, i.e. one part is generated from a sequence
> and one randomly.
> 
> FWIW not sure using OIDs as nonces directly is a good idea, as those are
> inherently low entropy data - how often do you see databases with OIDs
> above 1M or so? Probably not very often, and in most cases those are
> databases where those OIDs are for OIDs and large objects, so irrelevant
> for this purpose. I might be wrong but having a 96-bit nonce with maybe
> just 32bits of entrophy seems suspicious.
> 
> That does not mean we can't use the OIDs at all, but maybe hashing them
> into a single 4B value, and then picking the remaining 8B randomly.
> Also, we have a "natural" sequence in the database - LSNs, maybe that
> would be a good source of nonces too?

I think you missed the quoted part (upthread) from the NIST document:

  "There are two recommended methods for generating unpredictable IVs.
   The first method is to apply the forward cipher  function, under the
   same key that is used for the encryption of the plaintext, to a
   nonce. The nonce must be a data block that is unique to each
   execution of the encryption operation. For example, the nonce may be
   a counter, as described in Appendix B, or a message number. The
   second method is to generate a random data block using a
   FIPS-approved random number generator."

That first method says a counter as input produces an acceptably
unpredictable IV as long as it is unique to each encryption operation.
If each page is going to be an "encryption operation", so as long as our
input nonce is unique for a given key, we should be ok. If the input
nonce is tableoid+pagenum and the key is different per database (at
least, hopefully different per tablespace too), we should be good to go,
at least from what I can see.

Joe
-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Stephen Frost
Greetings,

* Joe Conway (m...@joeconway.com) wrote:
> On 7/9/19 5:42 PM, Tomas Vondra wrote:
> > There are two basic ways to construct nonces - CSPRNG and sequences, and
> > then a combination of both, i.e. one part is generated from a sequence
> > and one randomly.
> > 
> > FWIW not sure using OIDs as nonces directly is a good idea, as those are
> > inherently low entropy data - how often do you see databases with OIDs
> > above 1M or so? Probably not very often, and in most cases those are
> > databases where those OIDs are for OIDs and large objects, so irrelevant
> > for this purpose. I might be wrong but having a 96-bit nonce with maybe
> > just 32bits of entrophy seems suspicious.
> > 
> > That does not mean we can't use the OIDs at all, but maybe hashing them
> > into a single 4B value, and then picking the remaining 8B randomly.
> > Also, we have a "natural" sequence in the database - LSNs, maybe that
> > would be a good source of nonces too?
> 
> I think you missed the quoted part (upthread) from the NIST document:
> 
>   "There are two recommended methods for generating unpredictable IVs.
>The first method is to apply the forward cipher  function, under the
>same key that is used for the encryption of the plaintext, to a
>nonce. The nonce must be a data block that is unique to each
>execution of the encryption operation. For example, the nonce may be
>a counter, as described in Appendix B, or a message number. The
>second method is to generate a random data block using a
>FIPS-approved random number generator."
> 
> That first method says a counter as input produces an acceptably
> unpredictable IV as long as it is unique to each encryption operation.
> If each page is going to be an "encryption operation", so as long as our
> input nonce is unique for a given key, we should be ok. If the input
> nonce is tableoid+pagenum and the key is different per database (at
> least, hopefully different per tablespace too), we should be good to go,
> at least from what I can see.

What I think Tomas is getting at here is that we don't write a page only
once.

A nonce of tableoid+pagenum will only be unique the first time we write
out that page.  Seems unlikely that we're only going to be writing these
pages once though- what we need is a nonce that's unique for *every
write* of the 8k page, isn't it?  As every write of the page is going to
be encrypting something new.

With sufficient randomness, we can at least be more likely to have a
unique nonce for each 8K write.  Including the LSN seems like it'd be a
possible alternative.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: warning to publication created and wal_level is not set to logical

2019-07-09 Thread Thomas Munro
On Tue, Jul 9, 2019 at 5:40 PM Lucas Viecelli  wrote:
> Follow the correct file, I added the wrong patch in the previous email

New status: Ready for Committer.  If nobody wants to bikeshed the
wording or other details, I will commit this tomorrow.

-- 
Thomas Munro
https://enterprisedb.com




Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

2019-07-09 Thread Tom Mercha
On 09/07/2019 23:22, Tomas Vondra wrote:
> On Sun, Jul 07, 2019 at 11:06:38PM +, Tom Mercha wrote:
>> On 06/07/2019 00:06, Tomas Vondra wrote:
>>> First of all, it's pretty difficult to follow the discussion when it's
>>> not clear what's the original message and what's the response. E-mail
>>> clients generally indent the original message with '>' or someting like
>>> that, but your client does not do that (which is pretty silly). And
>>> copying the message at the top does not really help. Please do something
>>> about that.
>>
>> I would like to apologise. I did not realize that my client was doing
>> that and now I have changed the client. I hope it's fine now.
>>
> 
> Thanks, seems fine now.
> 
>>>
>>> On Fri, Jul 05, 2019 at 09:37:03PM +, Tom Mercha wrote:
> I might be missing something, but it seems like you intend to replace
> the SQL grammar we have with something else. It's not clear to me what
> would be the point of doing that, and it definitely looks like a huge
> amount of work - e.g. we don't have any support for switching between
> two distinct grammars the way you envision, and just that alone seems
> like a multi-year project. And if you don't have that capability, all
> external tools kinda stop working. Good luck with running such 
> database.

 I was considering having two distinct grammars as an option - thanks
 for indicating the effort involved. At the end of the day I want both
 my DSL and the PostgreSQL grammars to coexist. Is extending
 PostgreSQL's grammar with my own through the PostgreSQL extension
 infrastructure worth consideration or is it also difficult to develop?
 Could you suggest any reference material on this topic?

>>>
>>> Well, I'm not an expert in that area, but we currently don't have any
>>> infrastructure to support that. It's a topic that was discussed in the
>>> past (perhaps you can find some references in the archives) and it
>>> generally boils down to:
>>>
>>> 1) We're using bison as parser generator.
>>> 2) Bison does not allow adding rules on the fly.
>>>
>>> So you have to modify the in-core src/backend/parser/gram.y and rebuild
>>> postgres. See for example for an example of such discussion
>>>
>>> https://www.postgresql.org/message-id/flat/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_%3D5i5C2jog%40mail.gmail.com
>>>  
>>>
>>>
>>>
>>> When two of the smartest people on the list say it's a hard problem, it
>>> probably is. Particularly for someone who does not know the internals.
>> You are right. Thanks for bringing it to my attention!
>>
>> I didn't design my language for interaction with triggers and whatnot,
>> but I think that it would be very interesting to support those as well,
>> so looking at CREATE LANGUAGE functionality is actually exciting and
>> appropriate once I make some changes in design. Thanks again for this 
>> point!
>>
> 
> ;-)
> 
>> I hope this is not off topic but I was wondering if you know what are
>> the intrinsic differences between HANDLER and INLINE parameters of
>> CREATE LANGUAGE? I know that they are functions which are invoked at
>> different instances of time (e.g. one is for handling anonymous code
>> blocks), but at the end of the day they seem to have the same purpose?
>>
> 
> I've never written any PL handler, so I don't know. All I know is this
> quote from the docs, right below the simple example of PL handler:
> 
>     Only a few thousand lines of code have to be added instead of the
>     dots to complete the call handler.
> 
> I suppose the best idea to start an implementation is to copy an
> existing PL implementation, and modify that. That's usually much easier
> than starting from scratch, because you have something that works. Not
> sure if PL/pgSQL is the right choice though, perhaps pick some other
> language from https://wiki.postgresql.org/wiki/PL_Matrix
> 

I understand that you never wrote any PL handler but was just thinking 
about this functionality as a follow-up to our conversation. I was just 
wondering whether anonymous DO blocks *must* return void or not?

The docs for DO say it is a function returning void - 
https://www.postgresql.org/docs/current/sql-do.html

But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return 
void' - https://www.postgresql.org/docs/current/sql-createlanguage.html

Is the implication that we can make the DO block return something 
somehow? I would be quite interested if there is a way of achieving this 
kind of functionality. My experiments using an SRF, which I have 
written, within an anonymous DO block just gives me an "ERROR: 
set-valued function called in context that cannot accept a set".

Anyway maybe I'm going off on a tangent here... perhaps it is better to 
open a new thread?

> 
> regards
> 


Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

2019-07-09 Thread David G. Johnston
On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha  wrote:

>
> I understand that you never wrote any PL handler but was just thinking
> about this functionality as a follow-up to our conversation. I was just
> wondering whether anonymous DO blocks *must* return void or not?
>
> The docs for DO say it is a function returning void -
> https://www.postgresql.org/docs/current/sql-do.html



>

But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return
> void' - https://www.postgresql.org/docs/current/sql-createlanguage.html


No, the language cannot override the SQL execution environment's
limitations.

"The code block is treated as though it were the body of a function with no
parameters, returning void. It is parsed and executed a single time."

The above applies regardless of the language the code block is written in.

It can, however, affect permanent session state (so, use tables).

David J.


Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

2019-07-09 Thread Tom Mercha
On 10/07/2019 02:31, David G. Johnston wrote:
> On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha  wrote:
> 
>>
>> I understand that you never wrote any PL handler but was just thinking
>> about this functionality as a follow-up to our conversation. I was just
>> wondering whether anonymous DO blocks *must* return void or not?
>>
>> The docs for DO say it is a function returning void -
>> https://www.postgresql.org/docs/current/sql-do.html
> 
> 
> 
>>
> 
> But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return
>> void' - https://www.postgresql.org/docs/current/sql-createlanguage.html
> 
> 
> No, the language cannot override the SQL execution environment's
> limitations.
> 
> "The code block is treated as though it were the body of a function with no
> parameters, returning void. It is parsed and executed a single time."
> 
> The above applies regardless of the language the code block is written in.
> 
> It can, however, affect permanent session state (so, use tables).
> 

Thank you very much for addressing the question.

I am still a bit of a novice with PostgreSQL internals. Could you please 
provide some more detail on your comment regarding affecting permanent 
session state?

> David J.
> 


Re: warning to publication created and wal_level is not set to logical

2019-07-09 Thread Tom Lane
Thomas Munro  writes:
> New status: Ready for Committer.  If nobody wants to bikeshed the
> wording or other details, I will commit this tomorrow.

Hm, so:

1.

+   errmsg("insufficient wal_level to publish logical 
changes"),

Might read better as "wal_level is insufficient to publish logical changes"?

2.

+   errhint("Set wal_level to logical before creating 
subscriptions")));

This definitely is not per style guidelines, needs a trailing period.

3. AFAICS, the proposed test case changes will cause the core regression
tests to fail if wal_level is not replica.  This is not true today ---
they pass regardless of wal_level --- and I object in the strongest terms
to making it otherwise.

I'm not really convinced that we need regression tests for this change at
all, but if we do, put them in one of the TAP replication test suites,
which already depend on wal_level being set to something in particular.

regards, tom lane




coypu: "FATAL: sorry, too many clients already"

2019-07-09 Thread Thomas Munro
Hello,

Several times on master[1] beginning with an initial occurrence 36
days ago, and every time on REL_12_STABLE[2], but not on older
branches, build farm animal coypu has failed in the regression tests
with the error given in the subject.  How can there be too many if
there are only 20 in a parallel group?

[1] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=coypu&br=HEAD
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=coypu&br=REL_12_STABLE

-- 
Thomas Munro
https://enterprisedb.com




Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

2019-07-09 Thread David G. Johnston
On Tue, Jul 9, 2019 at 5:43 PM Tom Mercha  wrote:

> I am still a bit of a novice with PostgreSQL internals. Could you please
> provide some more detail on your comment regarding affecting permanent
> session state?


I was not referring to internals.

BEGIN;
CREATE TEMP TABLE tempdo (id int);
DO $$
BEGIN
INSERT INTO tempdo VALUES (1);
END;
$$;
SELECT * FROM tempdo;
ROLLBACK;

David J.


Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-07-09 Thread Ryan Lambert
> What I think Tomas is getting at here is that we don't write a page only
> once.

> A nonce of tableoid+pagenum will only be unique the first time we write
> out that page.  Seems unlikely that we're only going to be writing these
> pages once though- what we need is a nonce that's unique for *every
> write* of the 8k page, isn't it?  As every write of the page is going to
>  be encrypting something new.

> With sufficient randomness, we can at least be more likely to have a
> unique nonce for each 8K write.  Including the LSN seems like it'd be a
> possible alternative.

Agreed.  I know little of the inner details about the LSN but what I read
in [1] sounds encouraging in addition to tableoid + pagenum.

[1] https://www.postgresql.org/docs/current/datatype-pg-lsn.html

Ryan Lambert


>


  1   2   >