date:20190616

Re: Performance issue in foreign-key-aware join estimation

2019-06-16 Thread David Rowley

On Sat, 25 May 2019 at 16:37, David Rowley  wrote:
> The only problem I see is that you're not performing a bms_copy() of
> the parent's eclass_indexes in add_child_rel_equivalences(). You've
> written a comment that claims it's fine to just point to the parent's
> in:
>
> /*
> * The child is now mentioned in all the same eclasses as its parent ---
> * except for corner cases such as a volatile EC.  But it's okay if
> * eclass_indexes lists too many rels, so just borrow the parent's index
> * set rather than making a new one.
> */
> child_rel->eclass_indexes = parent_rel->eclass_indexes;
>
> but that's not true since in get_eclass_for_sort_expr() we perform:
>
> rel->eclass_indexes = bms_add_member(rel->eclass_indexes,
> ec_index);
>
> which will work fine about in about 63 out of 64 cases, but once
> bms_add_member has to reallocate the set then it'll leave the child
> rel's eclass_indexes pointing to freed memory.  I'm not saying I have
> any reproducible test case that can crash the backend from this, but
> it does seem like a bug waiting to happen.
>
> Apart from that, as far as I can tell, the patch seems fine.
>
> I didn't add the bms_copy(). I'll wait for your comments before doing so.

I've rebased this on top of the current master. d25ea0127 conflicted
with the old version.

I also tried to get the planner to crash by trying to find a case
where a new eclass is added after setting the child relations
eclass_indexes. I thought this could be done via a call to
make_pathkey_from_sortinfo(), but I was unable to find any case that
passes create_it as true. I added some code to emit a warning when
this happens after a call to add_child_rel_equivalences() and found
that the warning wasn't raised after running make check. However, I'm
still pretty scared by not making a copy of the Bitmapset. It seems
like if anything ever changed in this area then we could end up with
some very rare crashes if the parent's eclass_indexes grew another
bitmapword since the child took it's copy of them, so I added the
bms_copy() in the attached version.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

eclass_indexes_v7.patch
Description: Binary data

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Joe Conway

On 6/15/19 9:28 PM, Bruce Momjian wrote:
> On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
>> On 6/13/19 11:07 AM, Bruce Momjian wrote:
>> > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
>> >> Yeah, in principle since data key of 2 tier key architecture should
>> >> not go outside database I think we should not tell data keys to
>> >> utility commands. So the rearranging WAL format seems to be a better
>> >> solution but is there any reason why the main data is placed at end of
>> >> WAL record? I wonder if we can assemble WAL records as following order
>> >> and encrypt only 3 and 4.
>> >> 
>> >> 1. Header data (XLogRecord and other headers)
>> >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
>> >> 3. Block data (Tuple data, FPI)
>> >> 4. Sub data (e.g tuple data for logical decoding)
>> > 
>> > Yes, that does sound like a reasonable idea.  It is similar to us not
>> > encrypting the clog --- there is little value.  However, if we only
>> > encrypt the cluster, we don't need to expose the relfilenode and we can
>> > just encrypt the entire WAL --- I like that simplicity.  We might find
>> > that the complexity of encrypting only certain tablespaces makes the
>> > system slower than just encrypting the entire cluster.
>> 
>> 
>> There are reasons other than performance to want more granular than
>> entire cluster encryption. Limiting the volume of encrypted data with
>> any one key for example. And not encrypting #1 & 2 above would help
>> avoid known-plaintext attacks I would think.
> 
> There are no known non-exhaustive plaintext attacks on AES:
> 
>   
> https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks

Even that non-authoritative stackexchange thread has varying opinions.
Surely you don't claim that limiting know plaintext as much as is
practical is a bad idea in general.

> Even if we don't encrypt the first part of the WAL record (1 & 2), the
> block data (3) probably has enough format for a plaintext attack.

Perhaps.

In any case it doesn't address my first point, which is limiting the
volume encrypted with the same key. Another valid reason is you might
have data at varying sensitivity levels and prefer different keys be
used for each level.

And although I'm not proposing this for the first implementation, yet
another reason is I might want to additionally control "transparent
access" to data based on who is logged in. That could be done by
layering an additional key on top of the per-tablespace key for example.

The bottom line in my mind is encrypting the entire database with a
single key is not much different/better than using filesystem
encryption, so I'm not sure it is worth the effort and complexity to get
that capability. I think having the ability to encrypt at the tablespace
level adds a lot of capability for minimal extra complexity.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

signature.asc
Description: OpenPGP digital signature

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Bruce Momjian

On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> On 6/15/19 9:28 PM, Bruce Momjian wrote:
> >> There are reasons other than performance to want more granular than
> >> entire cluster encryption. Limiting the volume of encrypted data with
> >> any one key for example. And not encrypting #1 & 2 above would help
> >> avoid known-plaintext attacks I would think.
> > 
> > There are no known non-exhaustive plaintext attacks on AES:
> > 
> > 
> > https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks
> 
> Even that non-authoritative stackexchange thread has varying opinions.
> Surely you don't claim that limiting know plaintext as much as is
> practical is a bad idea in general.

I think we have to look at complexity vs. benefit.

> > Even if we don't encrypt the first part of the WAL record (1 & 2), the
> > block data (3) probably has enough format for a plaintext attack.
> 
> Perhaps.
> 
> In any case it doesn't address my first point, which is limiting the
> volume encrypted with the same key. Another valid reason is you might
> have data at varying sensitivity levels and prefer different keys be
> used for each level.

That seems quite complex.

> And although I'm not proposing this for the first implementation, yet
> another reason is I might want to additionally control "transparent
> access" to data based on who is logged in. That could be done by
> layering an additional key on top of the per-tablespace key for example.
> 
> The bottom line in my mind is encrypting the entire database with a
> single key is not much different/better than using filesystem
> encryption, so I'm not sure it is worth the effort and complexity to get
> that capability. I think having the ability to encrypt at the tablespace
> level adds a lot of capability for minimal extra complexity.

I disagree.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Bruce Momjian

On Sun, Jun 16, 2019 at 09:45:09AM -0400, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> > And although I'm not proposing this for the first implementation, yet
> > another reason is I might want to additionally control "transparent
> > access" to data based on who is logged in. That could be done by
> > layering an additional key on top of the per-tablespace key for example.
> > 
> > The bottom line in my mind is encrypting the entire database with a
> > single key is not much different/better than using filesystem
> > encryption, so I'm not sure it is worth the effort and complexity to get
> > that capability. I think having the ability to encrypt at the tablespace
> > level adds a lot of capability for minimal extra complexity.
> 
> I disagree.

I will add that OpenSSL has been removing features and compatibility
because the added complexity had hidden exploits that they could not
have anticipated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Index Skip Scan

2019-06-16 Thread Dmitry Dolgov

> On Fri, Jun 14, 2019 at 9:20 AM David Rowley  
> wrote:
>
> The code in create_distinct_paths() I think should work a different
> way. I think it would be much better to add a new field to Path and
> allow a path to know what keys it is distinct for.  This sort of goes
> back to an idea I thought about when developing unique joins
> (9c7f5229ad) about an easier way to detect fields that a relation is
> unique for. I've been calling these "UniqueKeys" in a few emails [1].
> The idea was to tag these onto RelOptInfo to mention which columns or
> exprs a relation is unique by so that we didn't continuously need to
> look at unique indexes in all the places that call
> relation_has_unique_index_for().  The idea there was that unique joins
> would know when a join was unable to duplicate rows. If the outer side
> of a join didn't duplicate the inner side, then the join RelOptInfo
> could keep the UniqueKeys from the inner rel, and vice-versa. If both
> didn't duplicate then the join rel would obtain the UniqueKeys from
> both sides of the join.  The idea here is that this would be a better
> way to detect unique joins, and also when it came to the grouping
> planner we'd also know if the distinct or group by should be a no-op.
> DISTINCT could be skipped, and GROUP BY could do a group aggregate
> without any sort.
>
> I think these UniqueKeys ties into this work, perhaps not adding
> UniqueKeys to RelOptInfo, but just to Path so that we create paths
> that have UniqueKeys during create_index_paths() based on some
> uniquekeys that are stored in PlannerInfo, similar to how we create
> index paths in build_index_paths() by checking if the index
> has_useful_pathkeys().  Doing it this way would open up more
> opportunities to use skip scans. For example, semi-joins and
> anti-joins could make use of them if the uniquekeys covered the entire
> join condition. With this idea, the code you've added in
> create_distinct_paths() can just search for the cheapest path that has
> the correct uniquekeys, or if none exist then just do the normal
> sort->unique or hash agg.  I'm not entirely certain how we'd instruct
> a semi/anti joined relation to build such paths, but that seems like a
> problem that could be dealt with when someone does the work to allow
> skip scans to be used for those.
>
> Also, I'm not entirely sure that these UniqueKeys should make use of
> PathKey since there's no real need to know about pk_opfamily,
> pk_strategy, pk_nulls_first as those all just describe how the keys
> are ordered. We just need to know if they're distinct or not. All
> that's left after removing those fields is pk_eclass, so could
> UniqueKeys just be a list of EquivalenceClass? or perhaps even a
> Bitmapset with indexes into PlannerInfo->ec_classes (that might be
> premature for not since we've not yet got
> https://commitfest.postgresql.org/23/1984/ or
> https://commitfest.postgresql.org/23/2019/ )  However, if we did use
> PathKey, that does allow us to quickly check if the UniqueKeys are
> contained within the PathKeys, since pathkeys are canonical which
> allows us just to compare their memory address to know if two are
> equal, however, if you're storing eclasses we could probably get the
> same just by comparing the address of the eclass to the pathkey's
> pk_eclass.

Interesting, thanks for sharing this.

> I also agree with James that this should not be limited to Index Only
> Scans. From testing the patch, the following seems pretty strange to
> me:
> ...
> explain analyze select distinct on (a) a,b from abc order by a,b;
> explain analyze select distinct on (a) a,b,c from abc order by a,b;
> ...

Yes, but I believe this limitation is not intrinsic to the idea of the patch,
and the very same approach can be used for IndexScan in the second example.
I've already prepared a new version to enable it for IndexScan with minimal
modifications, just need to rebase it on top of the latest changes and then
can post it. Although still there would be some limitations I guess (e.g. the
first thing I've stumbled upon is that an index scan with a filter wouldn't
work well, because checking qual causes with a filter happens after
ExecScanFetch)

Re: Extracting only the columns needed for a query

2019-06-16 Thread Corey Huinker

>
> The thing that most approaches to this have fallen down on is triggers ---
> that is, a trigger function might access columns mentioned nowhere in the
> SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
> for dealing with that, then ...
>

Well, if we had a trigger language that compiled to  at creation
time, and that trigger didn't do any dynamic/eval code, we could store
which attributes and rels were touched inside the trigger.

I'm not sure if that trigger language would be sql, plpgsql with a
"compile" pragma, or maybe we exhume PSM, but it could have some side
benefits:

  1. This same issue haunts any attempts at refactoring triggers and
referential integrity, so narrowing the scope of what a trigger touches
will help there too
  2. additional validity checks
  3. (this is an even bigger stretch) possibly a chance to combine multiple
triggers into one statement, or combine mutliple row-based triggers into a
statement level trigger

Of course, this all falls apart with one dynamic SQL or one SELECT *, but
it would be incentive for the users to refactor code to not do things that
impede trigger optimization.

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Joe Conway

On 6/16/19 9:45 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> In any case it doesn't address my first point, which is limiting the
>> volume encrypted with the same key. Another valid reason is you might
>> have data at varying sensitivity levels and prefer different keys be
>> used for each level.
> 
> That seems quite complex.


How? It is no more complex than encrypting at the tablespace level
already gives you - in that case you get this property for free if you
care to use it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Joe Conway

On 6/16/19 9:46 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 09:45:09AM -0400, Bruce Momjian wrote:
>> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> > And although I'm not proposing this for the first implementation, yet
>> > another reason is I might want to additionally control "transparent
>> > access" to data based on who is logged in. That could be done by
>> > layering an additional key on top of the per-tablespace key for example.
>> > 
>> > The bottom line in my mind is encrypting the entire database with a
>> > single key is not much different/better than using filesystem
>> > encryption, so I'm not sure it is worth the effort and complexity to get
>> > that capability. I think having the ability to encrypt at the tablespace
>> > level adds a lot of capability for minimal extra complexity.
>> 
>> I disagree.
> 
> I will add that OpenSSL has been removing features and compatibility
> because the added complexity had hidden exploits that they could not
> have anticipated.

Sorry but I'm not buying it.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

$host_cpu -> $target_cpu in configure?

2019-06-16 Thread Tom Lane

There are a few places in configure and the makefiles that are looking
at $host_cpu to decide what to do.  As far as I can tell, almost all of
them are wrong and should be looking at $target_cpu instead.  (The
lack of complaints indicates that nobody is trying very hard to test
cross-compilation.)

I'm not too sure about this case in makefiles/Makefile.hpux:

ifeq ($(host_cpu), ia64)
   DLSUFFIX = .so
else
   DLSUFFIX = .sl
endif

Does HPUX even support cross-compiling, and if so what shlib extension
do you get in that case?

The other references seem definitely wrong ...

regards, tom lane

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-16 Thread Stephen Frost

Greetings,

* Amit Kapila (amit.kapil...@gmail.com) wrote:
> On Fri, Jun 14, 2019 at 9:38 PM Stephen Frost  wrote:
> > * Ian Barwick (ian.barw...@2ndquadrant.com) wrote:
> > >
> > > Note this issue is not specific to pg_basebackup, primary_conninfo (or 
> > > any other settings
> > > formerly in recovery.conf), it has just manifested itself as the built-in 
> > > toolset as of now
> > > provides a handy way of getting into this situation without too much 
> > > effort (and any
> > > utilities which clone standbys and append the replication configuration to
> > > "postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone 
> > > to running into
> > > the same situation).
> >
> > This is absolutely the fault of the system for putting in multiple
> > entries into the auto.conf, which it wasn't ever written to handle.
> 
> Right.  I think if possible, it should use existing infrastructure to
> write to postgresql.auto.conf rather than inventing a new way to
> change it.  Apart from this issue, if we support multiple ways to edit
> postgresql.auto.conf, we might end up with more problems like this in
> the future where one system is not aware of the way file being edited
> by another system.

I agere that there should have been some effort put into making the way
ALTER SYSTEM is modified be consistent between the backend and utilities
like pg_basebackup (which would also help third party tools understand
how a non-backend application should be modifying the file).

> > > I had previously always assumed that ALTER SYSTEM  would change the 
> > > *last* occurrence for
> > > the parameter in "postgresql.auto.conf", in the same way you'd need to be 
> > > sure to change
> > > the last occurrence in the normal configuration files, however this 
> > > actually not the case -
> > > as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):
> > >
> > > /* Search the list for an existing match (we assume there's only one) 
> > > */
> > >
> > > the *first* match is replaced.
> > >
> > > Attached patch attempts to rectify this situation by having 
> > > replace_auto_config_value()
> > > deleting any duplicate entries first, before making any changes to the 
> > > last entry.
> >
> > While this might be a good belt-and-suspenders kind of change to
> > include,
> 
> Another possibility to do something on these lines is to extend Alter
> System Reset  to remove all the duplicate entries.  Then
> the user has a way to remove all duplicate entries if any and set the
> new value.  I think handling duplicate entries in *.auto.conf files is
> an enhancement of the existing system and there could be multiple
> things we can do there, so we shouldn't try to do that as a bug-fix.

Unless there's actually a use-case for duplicate entries in
postgresql.auto.conf, what we should do is clean them up (and possibly
throw a WARNING or similar at the user saying "something modified your
postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
every ALTER SYSTEM call.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-16 Thread Tom Lane

Stephen Frost  writes:
> Unless there's actually a use-case for duplicate entries in
> postgresql.auto.conf,

There isn't --- guc.c will just discard the earlier duplicates.

> what we should do is clean them up (and possibly
> throw a WARNING or similar at the user saying "something modified your
> postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> every ALTER SYSTEM call.

+1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
a WARNING would seem too in-your-face.

regards, tom lane

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-16 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Unless there's actually a use-case for duplicate entries in
> > postgresql.auto.conf,
> 
> There isn't --- guc.c will just discard the earlier duplicates.

One might be able to argue for trying to create a stack or some such, to
allow you to more easily move between values or to see what the value
was set to at some point in the past, etc etc.  Until we see an actual
thought out use-case along those lines that requires supporting
duplicates in some fashion though, with code to make it all work, I
don't think we should allow it.

> > what we should do is clean them up (and possibly
> > throw a WARNING or similar at the user saying "something modified your
> > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> > every ALTER SYSTEM call.
> 
> +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
> a WARNING would seem too in-your-face.

I'd hope for a warning from basically every part of the system when it
detects, clearly, that a file was changed in a way that it shouldn't
have been.  If we don't throw a warning, then we're implying that it's
acceptable, but then cleaning up the duplicates, which seems pretty
confusing.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-16 Thread Magnus Hagander

On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost  wrote:

>
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
>
> > > what we should do is clean them up (and possibly
> > > throw a WARNING or similar at the user saying "something modified your
> > > postgresql.auto.conf in an unexpected way").  I'd suggest we do that on
> > > every ALTER SYSTEM call.
> >
> > +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
> > a WARNING would seem too in-your-face.
>
> I'd hope for a warning from basically every part of the system when it
> detects, clearly, that a file was changed in a way that it shouldn't
> have been.  If we don't throw a warning, then we're implying that it's
> acceptable, but then cleaning up the duplicates, which seems pretty
> confusing.
>

+1. Silently "fixing" the file by cleaning up duplicates is going to be
even more confusing to uses who had seen them be there before.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Stephen Frost

Greetings,

* Joe Conway (m...@joeconway.com) wrote:
> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> In any case it doesn't address my first point, which is limiting the
> >> volume encrypted with the same key. Another valid reason is you might
> >> have data at varying sensitivity levels and prefer different keys be
> >> used for each level.
> > 
> > That seems quite complex.
> 
> How? It is no more complex than encrypting at the tablespace level
> already gives you - in that case you get this property for free if you
> care to use it.

Perhaps not surprising, but I'm definitely in agreement with Joe
regarding having multiple keys when possible and (reasonably)
straight-forward to do so.  I also don't buy off on the OpenSSL
argument; their more severe issues certainly haven't been due to key
management issues such as what we're discussing here, so I don't think
the argument applies.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Extracting only the columns needed for a query

2019-06-16 Thread Ashwin Agrawal

On Sat, Jun 15, 2019 at 10:02 AM Tom Lane  wrote:

> > Approach B: after parsing and/or after planning
>
> If we wanted to do something about this, making the planner record
> the set of used columns seems like the thing to do.  We could avoid
> the expense of doing it when it's not needed by setting up an AM/FDW/
> etc property or callback to request it.
>

Sounds good. In Zedstore patch, we have added AM property to convey the AM
leverages column projection and currently skip physical tlist optimization
based
on it. So, yes can similarly be leveraged for other planning needs.

> Another reason for having the planner do this is that presumably, in
> an AM that's excited about this, the set of fetched columns should
> play into the cost estimates for the scan.  I've not been paying
> enough attention to the tableam work to know if we've got hooks for
> the AM to affect scan costing ... but if we don't, that seems like
> a hole that needs plugged.
>

AM callback relation_estimate_size exists currently which planner
leverages. Via
this callback it fetches tuples, pages, etc.. So, our thought is to extend
this
API if possible to pass down needed column and help perform better costing
for
the query. Though we think if wish to leverage this function, need to know
list
of columns before planning hence might need to use query tree.

> > Approach B, however, does not work for utility statements which do
> > not go through planning.
>
> I'm not sure why you're excited about that case?  Utility statements
> tend to be pretty much all-or-nothing as far as data access goes.
>

Statements like COPY, CREATE INDEX, CREATE CONSTRAINTS, etc.. can benefit
from
subset of columns for scan. For example in Zedstore currently for CREATE
INDEX we extract needed columns by walking indexInfo->ii_Predicate and
indexInfo->ii_Expressions. For COPY, we currently use cstate->attnumlist to
know
which columns need to be scanned.

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Bruce Momjian

On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
> On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> >> In any case it doesn't address my first point, which is limiting the
> >> volume encrypted with the same key. Another valid reason is you might
> >> have data at varying sensitivity levels and prefer different keys be
> >> used for each level.
> > 
> > That seems quite complex.
> 
> 
> How? It is no more complex than encrypting at the tablespace level
> already gives you - in that case you get this property for free if you
> care to use it.

All keys used to encrypt WAL data must be unlocked at all times or crash
recovery, PITR, and replication will not stop when it hits a locked key.
Given that, how much value is there in allowing a key per tablespace?

I don't see how this is better than telling users they have to create a
separate cluster per key.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Alvaro Herrera

On 2019-Jun-15, Alvaro Herrera wrote:

> But that's not the danger ... with the current coding, it's initialized
> to false every time through that block; that means the tuple lock will
> never be skipped if we jump back to l3.  So the danger is that the first
> iteration sets the variable, then jumps back; second iteration
> initializes the variable again, so instead of skipping the lock, it
> takes it, causing a spurious deadlock.

So, I'm too lazy today to generate a case that fully reproduces the
deadlock, because you need to stall 's2' a little bit using the
well-known advisory lock trick, but this one hits the code that would
re-initialize the variable.

I'm going to push the change of lifetime of the variable for now.

setup
{
drop table if exists tlu_job;
create table tlu_job (id integer primary key, name text);

insert into tlu_job values(1, 'a');
}


teardown
{
drop table tlu_job;
}

session "s0"
setup { begin; set deadlock_timeout=1}
step "s0_fornokeyupdate" { select id from tlu_job where id = 1 for no key 
update; }
step "s0_update" { update tlu_job set name = 's0' where id = 1; }
step "s0_commit" { commit; }

session "s1"
setup { begin; set deadlock_timeout=1}
step "s1_for_key_share" { select id from tlu_job where id = 1 for key share; }
step "s1_for_update" { select id from tlu_job where id = 1 for update; }
step "s1_rollback" { rollback; }

session "s2"
setup { begin; set deadlock_timeout=1}
step "s2_for_key_share" { select id from tlu_job where id = 1 for key share; }
step "s2_for_share" { select id from tlu_job where id = 1 for share; }
step "s2_rollback" { rollback; }

session "s3"
setup { begin; set deadlock_timeout=1}
step "s3_update" { update tlu_job set name = 'c' where id = 1; }
step "s3_rollback" { rollback; }

permutation "s1_for_key_share" "s2_for_key_share" "s0_fornokeyupdate" 
"s2_for_share" "s0_update" "s0_commit" "s1_rollback" "s2_rollback" "s3_rollback"

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Stephen Frost

Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Sun, Jun 16, 2019 at 12:42:55PM -0400, Joe Conway wrote:
> > On 6/16/19 9:45 AM, Bruce Momjian wrote:
> > > On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
> > >> In any case it doesn't address my first point, which is limiting the
> > >> volume encrypted with the same key. Another valid reason is you might
> > >> have data at varying sensitivity levels and prefer different keys be
> > >> used for each level.
> > > 
> > > That seems quite complex.
> > 
> > How? It is no more complex than encrypting at the tablespace level
> > already gives you - in that case you get this property for free if you
> > care to use it.
> 
> All keys used to encrypt WAL data must be unlocked at all times or crash
> recovery, PITR, and replication will not stop when it hits a locked key.
> Given that, how much value is there in allowing a key per tablespace?

There's a few different things to discuss here, admittedly, but I don't
think it means that there's no value in having a key per tablespace.

Ideally, a given backend would only need, and only have access to, the
keys for the tablespaces that it is allowed to operate on.  I realize
that's a bit farther than what we're talking about today, but hopefully
not too much to be able to consider.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: $host_cpu -> $target_cpu in configure?

2019-06-16 Thread Noah Misch

On Sun, Jun 16, 2019 at 12:56:52PM -0400, Tom Lane wrote:
> There are a few places in configure and the makefiles that are looking
> at $host_cpu to decide what to do.  As far as I can tell, almost all of
> them are wrong and should be looking at $target_cpu instead.  (The
> lack of complaints indicates that nobody is trying very hard to test
> cross-compilation.)

https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Specifying-Target-Triplets.html
describes the intended usage.  When cross-compiling, $host_cpu is the machine
able to run the resulting PostgreSQL installation, and $build_cpu is the
machine creating that installation.  PostgreSQL does not contain a compiler
that emits code as output to the user, so $target_cpu is meaningless.  Every
use of $host_cpu looks correct.

Re: $host_cpu -> $target_cpu in configure?

2019-06-16 Thread Tom Lane

Noah Misch  writes:
> https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Specifying-Target-Triplets.html
> describes the intended usage.  When cross-compiling, $host_cpu is the machine
> able to run the resulting PostgreSQL installation, and $build_cpu is the
> machine creating that installation.  PostgreSQL does not contain a compiler
> that emits code as output to the user, so $target_cpu is meaningless.  Every
> use of $host_cpu looks correct.

Hmph ... okay, but that's sure a confusing usage of "host".

regards, tom lane

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-16 Thread Tomas Vondra

On Sun, Jun 16, 2019 at 02:10:23PM -0400, Stephen Frost wrote:

Greetings,

* Joe Conway (m...@joeconway.com) wrote:

On 6/16/19 9:45 AM, Bruce Momjian wrote:
> On Sun, Jun 16, 2019 at 07:07:20AM -0400, Joe Conway wrote:
>> In any case it doesn't address my first point, which is limiting the
>> volume encrypted with the same key. Another valid reason is you might
>> have data at varying sensitivity levels and prefer different keys be
>> used for each level.
>
> That seems quite complex.

How? It is no more complex than encrypting at the tablespace level
already gives you - in that case you get this property for free if you
care to use it.

Perhaps not surprising, but I'm definitely in agreement with Joe
regarding having multiple keys when possible and (reasonably)
straight-forward to do so.  I also don't buy off on the OpenSSL
argument; their more severe issues certainly haven't been due to key
management issues such as what we're discussing here, so I don't think
the argument applies.

I'm not sure what exactly is the "OpenSSL argument" you're disagreeing
with? IMHO Bruce is quite right that the risk of vulnerabilities grows
with the complexity of the system (both due to implementation bugs and
general design weaknesses). I don't think it's tied to the key
management specifically, except that it's one of the parts that may
contribute to the complexity.

(It's often claimed that key management is one of the weakest points of
current crypto systems - we have safe (a)symmetric algorithms, but safe
handling of keys is an issue. I don't have data / papers supporting this
claim, I kinda believe it.)

Now, I'm not opposed to eventually implementing something more
elaborate, but I also think just encrypting the whole cluster (not
necessarily with a single key, but with one master key) would be enough
for vast majority of users. Plus it's less error prone and easier to
operate (backups, replicas, crash recovery, ...).

But there's about 0% chance we'll get that in v1, of course, so we need
s "minimum viable product" to build on anyway.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-16 Thread Gavin Flower

On 17/06/2019 05:58, Magnus Hagander wrote:
On Sun, Jun 16, 2019 at 7:43 PM Stephen Frost > wrote:

* Tom Lane (t...@sss.pgh.pa.us ) wrote:
> Stephen Frost mailto:sfr...@snowman.net>>
writes:

> > what we should do is clean them up (and possibly
> > throw a WARNING or similar at the user saying "something
modified your
> > postgresql.auto.conf in an unexpected way").  I'd suggest we
do that on
> > every ALTER SYSTEM call.
>
> +1 for having ALTER SYSTEM clean out duplicates.  Not sure whether
> a WARNING would seem too in-your-face.

I'd hope for a warning from basically every part of the system when it
detects, clearly, that a file was changed in a way that it shouldn't
have been.  If we don't throw a warning, then we're implying that it's
acceptable, but then cleaning up the duplicates, which seems pretty
confusing.

+1. Silently "fixing" the file by cleaning up duplicates is going to 
be even more confusing to uses who had seen them be there before.

--
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 

I thinking fixing this silently should be at least a hanging offence.

At one time I came a cross a language PL/1, that would silently 
'correct' some mistakes, without indicating what it did.  I thought this 
was extremely dangerous, that could lead to some very nasty and 
unexpected bugs!

It is most important that people be aware of possibly conflicting 
changes, or that values they saw in postgresql.conf may have been changed.

Hmm... this suggests that all the implied defaults should be explicitly 
set!  Would that be too greater change to make?

Cheers,
Gavin

Re: range_agg

2019-06-16 Thread Paul A Jungwirth

On Wed, May 8, 2019 at 9:54 PM Paul A Jungwirth
 wrote:
> Here is an initial patch. I'd love to have some feedback! :-)

Here is a v2 rebased off current master. No substantive changes, but
it does fix one trivial git conflict.

After talking with David in Ottawa and hearing a good use-case from
one other person for his proposed weighted_range_agg and
covering_range_agg, I think *will* add those to this patch, but if
anyone wants to offer feedback on my approach so far, I'd appreciate
that too.

Yours,
Paul


range_agg_v0002.patch
Description: Binary data

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Tom Lane

Alvaro Herrera  writes:
> I'm going to push the change of lifetime of the variable for now.

If you're going to push anything before tomorrow's wrap, please do it
*now* not later.  We're running out of time to get a full sample of
buildfarm results.

regards, tom lane

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Alvaro Herrera

On 2019-Jun-16, Tom Lane wrote:

> Alvaro Herrera  writes:
> > I'm going to push the change of lifetime of the variable for now.
> 
> If you're going to push anything before tomorrow's wrap, please do it
> *now* not later.  We're running out of time to get a full sample of
> buildfarm results.

Yeah, I had to bail out earlier today, so the only thing I'm confident
pushing now is a revert.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Jun-16, Tom Lane wrote:
>> If you're going to push anything before tomorrow's wrap, please do it
>> *now* not later.  We're running out of time to get a full sample of
>> buildfarm results.

> Yeah, I had to bail out earlier today, so the only thing I'm confident
> pushing now is a revert.

Yeah, let's do that.  I don't want to risk shipping broken code.
We can try again for the next updates.

regards, tom lane

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Michael Paquier

On Sun, Jun 16, 2019 at 09:10:13PM -0400, Tom Lane wrote:
> Yeah, let's do that.  I don't want to risk shipping broken code.
> We can try again for the next updates.

Could you revert asap please then?
--
Michael


signature.asc
Description: PGP signature

nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

2019-06-16 Thread Michael Paquier

Hi all,
(Adding in CC relevant committer and author, Teodor and Alexander)

I have been looking today at a crash of pg_waldump on one of the test
builds keeping running in some internal environment.  Luckily, I have
been able to put my hands on a core file with 11.2 running.  The
backtrace is not that interesting:
(gdb) bt
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
#1 0x0040419f in XLogDumpDisplayRecord (config=0x7fff1ccfd360,
record=0x22ce590) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:558
#2 main (argc=, argv=) at
/build/mts/release/bora-13598999/vpostgres/src/postgres/src/bin/pg_waldump/pg_waldump.c:1170
(gdb) down 1
#0 btree_desc (buf=0x0, record=0x22ce590) at nbtdesc.c:103
103 nbtdesc.c: No such file or directory.
(gdb) p record
$1 = (XLogReaderState *) 0x22ce590
(gdb) p *record
$2 = {wal_segment_size = 16777216, read_page = 0x405220
, system_identifier = 0, private_data =
0x7fff1ccfd380, ReadRecPtr = 67109592, EndRecPtr = 67109672,
decoded_record = 0x22cf178, main_data = 0x0, main_data_len = 0,
main_data_bufsz = 0, record_origin = 0, blocks = {{in_use = true,
rnode = {spcNode = 16399, dbNode = 16386, relNode = 19907}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 96 '`', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0,  hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = true, data =
0x22db2c0 "\003", data_len = 32,  data_bufsz = 8192}, {in_use = false,
rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forknum =
MAIN_FORKNUM, blkno = 0, flags = 0 '\000', has_image = false,
apply_image = false, bkp_image = 0x0, hole_offset = 0, hole_length =
0, bimg_len = 0, bimg_info = 0 '\000', has_data = false, data = 0x0,
data_len = 0, data_bufsz = 0} }, max_block_id = 0,
readBuf = 0x22ceea0 "\230\320\a", readLen = 8192, readSegNo = 4,
readOff = 0, readPageTLI = 0, latestPagePtr = 67108864, latestPageTLI
= 1, currRecPtr = 67109592, currTLI = 0, currTLIValidUntil = 0,
nextTLI = 0, readRecordBuf = 0x22d12b0 "L", readRecordBufSize = 40960, 
errormsg_buf = 0x22d0eb0 ""}
(gdb) p xlrec
$5 = (xl_btree_metadata *) 0x0

Anyway, after looking at the code relevant to XLOG_BTREE_META_CLEANUP,
I have noticed that the meta-data associated to the first buffer is
registered via XLogRegisterBufData() (this is correct because we want
to associate this data to the metadata buffer).  However, nbtdesc.c
assumes that xl_btree_metadata is from the main record data, causing a
crash because we have nothing in this case.

I think that we could just patch nbtpage.c so as we fetch the
metadata using XLogRecGetBlockData(), and log its data.  The error
comes from 3d92796, which is one year-old and has been visibly
committed untested.  I am surprised that we  have not seen that
complain yet.  Attached is a patch, which looks right to me and should
be back-patched down to v11.  I have not taken the time to actually
test it though.

Thoughts?
--
Michael
diff --git a/src/backend/access/rmgrdesc/nbtdesc.c b/src/backend/access/rmgrdesc/nbtdesc.c
index 989c85ac08..a14eb792ec 100644
--- a/src/backend/access/rmgrdesc/nbtdesc.c
+++ b/src/backend/access/rmgrdesc/nbtdesc.c
@@ -97,8 +97,10 @@ btree_desc(StringInfo buf, XLogReaderState *record)
 			}
 		case XLOG_BTREE_META_CLEANUP:
 			{
-xl_btree_metadata *xlrec = (xl_btree_metadata *) rec;
+xl_btree_metadata *xlrec;
 
+xlrec = (xl_btree_metadata *) XLogRecGetBlockData(record, 0,
+  NULL);
 appendStringInfo(buf, "oldest_btpo_xact %u; last_cleanup_num_heap_tuples: %f",
  xlrec->oldest_btpo_xact,
  xlrec->last_cleanup_num_heap_tuples);


signature.asc
Description: PGP signature

Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

2019-06-16 Thread Peter Geoghegan

On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier  wrote:
> I think that we could just patch nbtpage.c so as we fetch the
> metadata using XLogRecGetBlockData(), and log its data.

Don't you mean nbtdesc.c?

> The error
> comes from 3d92796, which is one year-old and has been visibly
> committed untested.  I am surprised that we  have not seen that
> complain yet.

Why is that surprising?

https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

-- 
Peter Geoghegan

Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

2019-06-16 Thread Michael Paquier

On Sun, Jun 16, 2019 at 06:54:57PM -0700, Peter Geoghegan wrote:
> On Sun, Jun 16, 2019 at 6:31 PM Michael Paquier  wrote:
>> I think that we could just patch nbtpage.c so as we fetch the
>> metadata using XLogRecGetBlockData(), and log its data.
> 
> Don't you mean nbtdesc.c?

Yeah I meant nbtdesc.c, sorry.  This will have to wait after this
week's release for a fix by the way...

> Why is that surprising?
> 
> https://coverage.postgresql.org/src/backend/access/rmgrdesc/index.html

I would have supposed that more people scan WAL records using the
description callbacks.
--
Michael


signature.asc
Description: PGP signature

Re: nbtdesc.c and nbtpage.c are inconsistent with XLOG_BTREE_META_CLEANUP (11~)

2019-06-16 Thread Peter Geoghegan

On Sun, Jun 16, 2019 at 7:05 PM Michael Paquier  wrote:
> I would have supposed that more people scan WAL records using the
> description callbacks.

The WAL record in question, XLOG_BTREE_META_CLEANUP, is certainly one
of the less common record types used by nbtree. I agree that this
should have been tested when it went in, but I'm not surprised that
the bug remained undetected for a year. Not that many people use
pg_waldump.

-- 
Peter Geoghegan

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Alvaro Herrera

On 2019-Jun-17, Michael Paquier wrote:

> On Sun, Jun 16, 2019 at 09:10:13PM -0400, Tom Lane wrote:
> > Yeah, let's do that.  I don't want to risk shipping broken code.
> > We can try again for the next updates.
> 
> Could you revert asap please then?

Done.

I initially thought to keep the test in place, but then realized it
might be unstable, so I removed that too.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Jun-17, Michael Paquier wrote:
>> Could you revert asap please then?

> Done.

Thanks.

regards, tom lane

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-16 Thread Michael Paquier

On Sun, Jun 16, 2019 at 10:27:25PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
>> On 2019-Jun-17, Michael Paquier wrote:
>>> Could you revert asap please then?
> 
>> Done.
> 
> Thanks.

Thanks, Alvaro.
--
Michael


signature.asc
Description: PGP signature

RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-06-16 Thread Tsunakawa, Takayuki

From: David Rowley [mailto:david.row...@2ndquadrant.com]
> I've revised the patch to add a new constant named
> LOCKMETHODLOCALHASH_SHRINK_SIZE. I've set this to 64 for now. Once the hash

Thank you, and good performance.  The patch passed make check.

I'm OK with the current patch, but I have a few comments.  Please take them as 
you see fit (I wouldn't mind if you don't.)


(1)
+#define LOCKMETHODLOCALHASH_SHRINK_SIZE 64

How about LOCKMETHODLOCALHASH_SHRINK_THRESHOLD, because this determines the 
threshold value to trigger shrinkage?  Code in PostgreSQL seems to use the term 
threshold.


(2)
+/* Complain if the above are not set to something sane */
+#if LOCKMETHODLOCALHASH_SHRINK_SIZE < LOCKMETHODLOCALHASH_INIT_SIZE
+#error "invalid LOCKMETHODLOCALHASH_SHRINK_SIZE"
+#endif

I don't think these are necessary, because these are fixed and not 
configurable.  FYI, src/include/utils/memutils.h doesn't have #error to test 
these macros.

#define ALLOCSET_DEFAULT_MINSIZE   0
#define ALLOCSET_DEFAULT_INITSIZE  (8 * 1024)
#define ALLOCSET_DEFAULT_MAXSIZE   (8 * 1024 * 1024)


(3)
+   if (hash_get_num_entries(LockMethodLocalHash) == 0 &&
+   hash_get_max_bucket(LockMethodLocalHash) >
+   LOCKMETHODLOCALHASH_SHRINK_SIZE)
+   CreateLocalLockHash();

I get an impression that Create just creates something where there's nothing.  
How about Init or Recreate?


Regards
Takayuki Tsunakawa

Re: Problem with default partition pruning

2019-06-16 Thread Shawn Wang

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hi Hosoya-san,
I tested different types of key values, and multi-level partitioned tables, and 
found no problems.
Only the SQL in the file of src/test/regress/results/partition_prune.out has a 
space that caused the regression test to fail.

The new status of this patch is: Waiting on Author

Re: proposal: new polymorphic types - commontype and commontypearray

2019-06-16 Thread Pavel Stehule

Hi

pá 14. 6. 2019 v 6:09 odesílatel Pavel Stehule 
napsal:

>
>
> čt 13. 6. 2019 v 2:37 odesílatel Tom Lane  napsal:
>
>> Greg Stark  writes:
>> > The proposals I see above are "commontype", "supertype",
>> "anycommontype",
>> > and various abbreviations of those. I would humbly add "compatibletype".
>> > Fwiw I kind of like commontype.
>> > Alternately an argument could be made that length and typing convenience
>> > isn't really a factor here since database users never have to type these
>> > types. The only place they get written is when defining polymorphic
>> > functions which is a pretty uncommon operation.
>> > In which case a very explicit "anycompatibletype" may be better.
>>
>> I could go with "anycompatibletype".  That would lead us to needing
>> related names like "anycompatiblearraytype", which is getting annoyingly
>> long, but you might be right that people wouldn't have to type it that
>> often.
>>
>> Also, given the precedent of "anyarray" and "anyrange", it might be
>> okay to make these just "anycompatible" and "anycompatiblearray".
>>
>
> I like anycompatible and anycompatiblearray.
>
> I'll update the patch
>

and here it is

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>> [ wanders away wondering if psql can tab-complete type names in
>> function definitions ... ]
>>
>> regards, tom lane
>>
>
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 35ecd48ed5..d440f1c6ed 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4791,6 +4791,14 @@ SELECT * FROM pg_attribute
 anyrange

 
+   
+anycompatible
+   
+
+   
+anycompatiblearray
+   
+

 void

@@ -4900,6 +4908,34 @@ SELECT * FROM pg_attribute
 ).

 
+   
+anycompatible
+Indicates that a function accepts any data type. Values
+are converted to real common type.
+(see ).
+   
+
+   
+anycompatiblearray
+Indicates that a function accepts any array data type. The
+elements of array are converted to common type of these values.
+(see ).
+   
+
+   
+anycompatiblenonarray
+Indicates that a function accepts any non-array data type
+(see ).
+   
+
+   
+anycompatiblerange
+Indicates that a function accepts any range data type
+(see  and
+). The subtype can be used for
+deduction of common type.
+   
+

 cstring
 Indicates that a function accepts or returns a null-terminated C string.
diff --git a/doc/src/sgml/extend.sgml b/doc/src/sgml/extend.sgml
index b5e59d542a..2de97eab33 100644
--- a/doc/src/sgml/extend.sgml
+++ b/doc/src/sgml/extend.sgml
@@ -231,7 +231,7 @@
 
  Five pseudo-types of special interest are anyelement,
  anyarray, anynonarray, anyenum,
- and anyrange,
+ anyrange, anycompatible and anycompatiblearray.
  which are collectively called polymorphic types.
  Any function declared using these types is said to be
  a polymorphic function.  A polymorphic function can
@@ -267,6 +267,15 @@
  be an enum type.
 
 
+
+ Second family of polymorphic types are types anycompatible and
+ anycompatiblearray. These types are similar to types
+ anyelement and anyarray. The arguments declared
+ as anyelement requires same real type of passed values. For
+ anycompatible's arguments is selected common type, and later
+ all these arguments are casted to this common type.
+
+
 
  Thus, when more than one argument position is declared with a polymorphic
  type, the net effect is that only certain combinations of actual argument
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 7cab039ded..b3e7eb391b 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -136,7 +136,7 @@ AggregateCreate(const char *aggName,
 	hasInternalArg = false;
 	for (i = 0; i < numArgs; i++)
 	{
-		if (IsPolymorphicType(aggArgTypes[i]))
+		if (IsPolymorphicTypeAny(aggArgTypes[i]))
 			hasPolyArg = true;
 		else if (aggArgTypes[i] == INTERNALOID)
 			hasInternalArg = true;
@@ -146,7 +146,7 @@ AggregateCreate(const char *aggName,
 	 * If transtype is polymorphic, must have polymorphic argument also; else
 	 * we will have no way to deduce the actual transtype.
 	 */
-	if (IsPolymorphicType(aggTransType) && !hasPolyArg)
+	if (IsPolymorphicTypeAny(aggTransType) && !hasPolyArg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("cannot determine transition data type"),
@@ -156,7 +156,7 @@ AggregateCreate(const char *aggName,
 	 * Likewise for moving-aggregate transtype, if any
 	 */
 	if (OidIsValid(aggmTransType) &&
-		IsPolymorphicType(aggmTransType) && !hasPolyArg)
+		IsPolymorphicTypeAny(aggmTransType) && !hasPolyArg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("

Race conditions with TAP test for syncrep

2019-06-16 Thread Michael Paquier

Hi all,

Alvaro has reported a rather rare buildfarm failure involving
007_sync_rep.pl to which I have responded here:
https://www.postgresql.org/message-id/20190613060123.gc1...@paquier.xyz

The buildfarm failure is here:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk&dt=2019-05-12%2020%3A37%3A11

It involves a race condition related to the way the standbys of the
test are stopped and restarted to ensure that they appear in the
correct order in the WAL sender array of the primary, but feel free to
look at the message above for all the details.

Attached is a patch to improve the stability of the test.  The fix I
am proposing is very simple: in order to make sure that a standby is
added into the WAL sender array of the primary, let's check after
pg_stat_replication after a standby is started.  This can be done
consistently with a small wrapper in the tests.

Any thoughts?
--
Michael
diff --git a/src/test/recovery/t/007_sync_rep.pl b/src/test/recovery/t/007_sync_rep.pl
index bba47da17a..653a6bc842 100644
--- a/src/test/recovery/t/007_sync_rep.pl
+++ b/src/test/recovery/t/007_sync_rep.pl
@@ -27,6 +27,23 @@ sub test_sync_state
 	return;
 }
 
+# Start a standby and check that it is registered within the WAL sender
+# array of the given primary.  This polls the primary's pg_stat_replication
+# until the standby is confirmed as registered.
+sub start_standby_and_wait
+{
+	my ($master, $standby) = @_;
+	my $master_name  = $master->name;
+	my $standby_name = $standby->name;
+	my $query =
+	  "SELECT count(1) = 1 FROM pg_stat_replication WHERE application_name = '$standby_name'";
+
+	$standby->start;
+
+	print "### Waiting for standby \"$standby_name\" on \"$master_name\"\n";
+	$master->poll_query_until('postgres', $query);
+}
+
 # Initialize master node
 my $node_master = get_new_node('master');
 $node_master->init(allows_streaming => 1);
@@ -36,23 +53,27 @@ my $backup_name = 'master_backup';
 # Take backup
 $node_master->backup($backup_name);
 
+# Create all the standbys.  Their status on the primary is checked to
+# ensure the ordering of each one of them in the WAL sender array of the
+# primary.
+
 # Create standby1 linking to master
 my $node_standby_1 = get_new_node('standby1');
 $node_standby_1->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_1->start;
+start_standby_and_wait($node_master, $node_standby_1);
 
 # Create standby2 linking to master
 my $node_standby_2 = get_new_node('standby2');
 $node_standby_2->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_2->start;
+start_standby_and_wait($node_master, $node_standby_2);
 
 # Create standby3 linking to master
 my $node_standby_3 = get_new_node('standby3');
 $node_standby_3->init_from_backup($node_master, $backup_name,
 	has_streaming => 1);
-$node_standby_3->start;
+start_standby_and_wait($node_master, $node_standby_3);
 
 # Check that sync_state is determined correctly when
 # synchronous_standby_names is specified in old syntax.
@@ -82,8 +103,10 @@ $node_standby_1->stop;
 $node_standby_2->stop;
 $node_standby_3->stop;
 
-$node_standby_2->start;
-$node_standby_3->start;
+# Make sure that each standby reports back to the primary in
+# the wanted order.
+start_standby_and_wait($node_master, $node_standby_2);
+start_standby_and_wait($node_master, $node_standby_3);
 
 # Specify 2 as the number of sync standbys.
 # Check that two standbys are in 'sync' state.
@@ -94,7 +117,7 @@ standby3|3|sync),
 	'2(standby1,standby2,standby3)');
 
 # Start standby1
-$node_standby_1->start;
+start_standby_and_wait($node_master, $node_standby_1);
 
 # Create standby4 linking to master
 my $node_standby_4 = get_new_node('standby4');


signature.asc
Description: PGP signature

38 matches

Mail list logo