Set fallback_application_name for a walreceiver to cluster_name

2019-02-08 Thread Peter Eisentraut
By default, the fallback_application_name for a physical walreceiver is
"walreceiver".  This means that multiple standbys cannot be
distinguished easily on a primary, for example in pg_stat_activity or
synchronous_standby_names.

I propose, if cluster_name is set, use that for
fallback_application_name in the walreceiver.  (If it's not set, it
remains "walreceiver".)  If someone set cluster_name to identify their
instance, we might as well use that by default to identify the node
remotely as well.  It's still possible to specify another
application_name in primary_conninfo explicitly.

Then you can do something like cluster_name = 'nodeN' and
synchronous_standby_names = 'node1,node2,node3' without any further
fiddling with application_name.

See attached patches.

I also included a patch to set cluster_name in PostgresNode.pm
instances, for easier identification and a bit of minimal testing.
Because of the issues described in [0], this doesn't allow dropping the
explicit application_name assignments in tests yet, but it's part of the
path to get there.

[0]:


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From b2901f54e943f6b609a93890c682aa9cf416e3c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 8 Feb 2019 08:17:21 +0100
Subject: [PATCH 1/2] Set fallback_application_name for a walreceiver to
 cluster_name

By default, the fallback_application_name for a physical walreceiver
is "walreceiver".  This means that multiple standbys cannot be
distinguished easily on a primary, for example in pg_stat_activity or
synchronous_standby_names.

If cluster_name is set, use that for fallback_application_name in the
walreceiver.  (If it's not set, it remains "walreceiver".)  If someone
set cluster_name to identify their instance, we might as well use that
by default to identify the node remotely as well.  It's still possible
to specify another application_name in primary_conninfo explicitly.
---
 doc/src/sgml/config.sgml  | 14 +++---
 src/backend/replication/walreceiver.c |  2 +-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7e208a4b81..cab858b54e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3659,7 +3659,8 @@ Master Server
 application_name setting of the standby, as set in 
the
 standby's connection information.  In case of a physical replication
 standby, this should be set in the primary_conninfo
-setting; the default is walreceiver.
+setting; the default is the setting of 
+if set, else walreceiver.
 For logical replication, this can be set in the connection
 information of the subscription, and it defaults to the
 subscription name.  For other replication stream consumers,
@@ -6560,8 +6561,15 @@ Process Title
   
   

-Sets the cluster name that appears in the process title for all
-server processes in this cluster. The name can be any string of less
+Sets a name that identifies this database cluster (instance) for
+various purposes.  The cluster name appears in the process title for
+all server processes in this cluster.  Moreover, it is the default
+application name for a standby connection (see .)
+   
+
+   
+The name can be any string of less
 than NAMEDATALEN characters (64 characters in a 
standard
 build). Only printable ASCII characters may be used in the
 cluster_name value. Other characters will be
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 2e90944ad5..9eaaa8ff50 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -293,7 +293,7 @@ WalReceiverMain(void)
 
/* Establish the connection to the primary for XLOG streaming */
EnableWalRcvImmediateExit();
-   wrconn = walrcv_connect(conninfo, false, "walreceiver", &err);
+   wrconn = walrcv_connect(conninfo, false, cluster_name[0] ? cluster_name 
: "walreceiver", &err);
if (!wrconn)
ereport(ERROR,
(errmsg("could not connect to the primary 
server: %s", err)));

base-commit: 3677a0b26bb2f3f72d16dc7fa6f34c305badacce
-- 
2.20.1

From a332a82d1f7554f1b81fe4d59f3f9aea27bf4c0e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 8 Feb 2019 08:38:54 +0100
Subject: [PATCH 2/2] Set cluster_name for PostgresNode.pm instances

This can help identifying test instances more easily at run time, and
it also provides some minimal test coverage for the cluster_name
feature.
---
 src/test/perl/PostgresNode.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/

Re: ToDo: show size of partitioned table

2019-02-08 Thread Pavel Stehule
čt 7. 2. 2019 v 9:51 odesílatel Amit Langote 
napsal:

> Hi Pavel,
>
> Thanks for sending the updated patch.
>
> On 2018/12/19 15:38, Pavel Stehule wrote:
> > út 18. 12. 2018 v 8:49 odesílatel Amit Langote <
> >> On 2018/12/17 17:48, Pavel Stehule wrote:
> >>> I can imagine new additional flag - line "n" nested - and then we can
> >>> display nested partitioned tables with parent table info. Some like
> >>>
> >>> \dPt - show only root partition tables
> >>> \dPnt or \dPtn - show root and nested partitioned tables
> >>
> >> Too much complication maybe?
> >>
> >
> > I wrote it - the setup query is more complex, but not too much. I fixed
> the
> > size calculation, when nested partitions tables are visible - it
> calculate
> > partitions only from level1 group. Then the displayed size is same as
> total
> > size
>
> \dPn seems to work fine, but I don't quite understand why \dPn+ should
> show the sizes only for nested partitions of level.  Consider the
> following example outputs:
>
> create table p (a int, b char) partition by list (a);
> create table p_1 partition of p for values in (1) partition by list (b);
> create table p_1_a partition of p_1 for values in ('a');
> create table p_1_bc partition of p_1 for values in ('b', 'c') partition by
> list (b);
> create table p_1_b partition of p_1_bc for values in ('b');
> create table p_1_c partition of p_1_bc for values in ('c');
> create table p_2 partition of p for values in (2);
> insert into p values (1, 'a');
> insert into p values (1, 'b');
> insert into p values (2);
>
> \dP+
> List of partitioned relations
>  Schema │ Name │ Owner │ Size  │ Description
> ┼──┼───┼───┼─
>  public │ p│ amit  │ 24 kB │
> (1 row)
>
> -- size of 'p' shown as 8KB, whereas it's actually 24KB as per above
> -- size of 'p_1' shown as 8KB, whereas it's actually 16KB
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 8192 bytes │
>  public │ p_1│ amit  │ p   │ 8192 bytes │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> Also, if the root partitioned table doesn't have a directly attached leaf
> partition, it's size is shown as 0 bytes, whereas I think it should
> consider the sizes of its other nested partitions.
>
> drop table p_2;
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 0 bytes│
>  public │ p_1│ amit  │ p   │ 8192 bytes │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> If I remove the following two statements from the patched code:
>
> +if (show_nested_partitions)
> +appendPQExpBuffer(&buf, "\n WHERE d.level = 1");
>
> +/*
> + * Assign size just for directly assigned tables, when nested
> + * partitions are visible
> + */
> +if (show_nested_partitions)
> +appendPQExpBuffer(&buf, "\n WHERE ppt.isleaf AND
> ppt.level = 1");
>
> I get the following output, which I find more intuitive:
>
> create table p_2 partition of p for values in (2);
> insert into p values (2);
>
> \dP+
> List of partitioned relations
>  Schema │ Name │ Owner │ Size  │ Description
> ┼──┼───┼───┼─
>  public │ p│ amit  │ 24 kB │
> (1 row)
>
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 24 kB  │
>  public │ p_1│ amit  │ p   │ 16 kB  │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> drop table p_2;
>
> \dPn+
>   List of partitioned relations
>  Schema │  Name  │ Owner │ Parent name │Size│ Description
> ┼┼───┼─┼┼─
>  public │ p  │ amit  │ │ 16 kB  │
>  public │ p_1│ amit  │ p   │ 16 kB  │
>  public │ p_1_bc │ amit  │ p_1 │ 8192 bytes │
> (3 rows)
>
> Thoughts?
>
>
I renamed originally calculated column "size" to "direct partitions size"
.. see Alvaro's comment. Then I introduced new column "total partitions
size" that is calculated like you propose.

Now the result of dPn+ looks like

 List of partitioned relations
┌┬┬───┬─┬┬───┬─┐
│ Schema │  Name  │ Owner │ Parent name │ Direct partitions size │ Total
partitions size │ Description │
╞╪╪═══╪═╪═

Re: Inconsistent error handling in the openssl init code

2019-02-08 Thread Daniel Gustafsson
> On 8 Feb 2019, at 01:10, Michael Paquier  wrote:
> 
> On Thu, Feb 07, 2019 at 10:03:30AM +0100, Daniel Gustafsson wrote:
>> Doh, managed to completely overlook that.  The attached updated patch also
>> fixes the comment, thanks!
> 
> That looks fine to me.  Could you just add it to the next CF as a bug
> fix so as we don't forget?  I am pretty sure that Peter will look at
> that soon.

Done, thanks!  I took the liberty to mark you as reviewer since you’ve already
spent time looking at the patch.

cheers ./daniel


RE: Commit Fest 2019-01 is now closed

2019-02-08 Thread Tsunakawa, Takayuki
From: Bruce Momjian [mailto:br...@momjian.us]
> I am thinking we should see which items we really want for PG 12 _now_
> and allocate resources/help to get them done, rather than being
> surprised they didn't make it.  I am glad we are in good shape with
> CTEs, since that has been a long-requested feature.

I want the partitioning performance to be comparable to a certain commercial 
DBMS, and Horiguchi-san's "protect syscache ..." to limit the upper size of 
syscache/relcache.  Otherwise, our organization may be put in a difficult 
situation...  Anyway, I recognize my colleagues and I also have to review 
others' patches.

I'm sorry if I repeat what someone proposed in the past, but it would be nice 
if the CF app could show the number of modified lines (added + deleted) for 
each patch, e.g., the last line of diffstat command output.  That would help 
young newbies to choose patches.

Regards
Takayuki Tsunakawa





RE: speeding up planning with partitions

2019-02-08 Thread Imai, Yoshikazu
On Fri, Feb 8, 2019 at 1:34 AM, I wrote:
> On Wed, Feb 6, 2019 at 2:04 AM, Tsunakawa, Takayuki wrote:
> > Can you compare the performance of auto and force_custom_plan again
> > with the attached patch?  It uses PGPROC's LOCALLOCK list instead of
> > the hash table.
> 
> Thanks for the patch, but it seems to have some problems.

I just missed compiling.

Performance degradation I saw before is improved! The results are below.

[v20 + faster-locallock-scan.patch]
auto:   9,069 TPS
custom: 9,015 TPS

[v20]
auto:   8,037 TPS
custom: 8,933 TPS

As David and I mentioned this patch should be discussed on another thread, so 
Tsunakawa-san, could you launch the another thread please?

Thanks
--
Yoshikazu Imai



Re: ON SELECT rule on a table without columns

2019-02-08 Thread Rushabh Lathia
On Fri, Feb 8, 2019 at 12:48 PM Andres Freund  wrote:

> Hi,
>
> On 2019-02-08 12:18:32 +0530, Ashutosh Sharma wrote:
> > When "ON SELECT" rule is created on a table without columns, it
> > successfully converts a table into the view. However, when the same is
> > done using CREATE VIEW command, it fails with an error saying: "view
> > must have at least one column". Here is what I'm trying to say:
> >
> > -- create table t1 without columns
> > create table t1();
> >
> > -- create table t2 without columns
> > create table t2();
> >
> > -- create ON SELECT rule on t1 - this would convert t1 from table to view
> > create rule "_RETURN" as on select to t1 do instead select * from t2;
> >
> > -- now check the definition of t1
> > \d t1
> >
> > postgres=# \d+ t1
> > View "public.t1"
> >  Column | Type | Collation | Nullable | Default | Storage | Description
> > +--+---+--+-+-+-
> > View definition:
> >  SELECT
> >FROM t2;
> >
> > The output of "\d+ t1" shows the definition of converted view t1 which
> > doesn't have any columns in the select query.
> >
> > Now, when i try creating another view with the same definition using
> > CREATE VIEW command, it fails with the error -> ERROR:  view must have
> > at least one column. See below
> >
> > postgres=# create view v1 as select from t2;
> > ERROR:  view must have at least one column
> >
> > OR,
> >
> > postgres=# create view v1 as select * from t2;
> > ERROR:  view must have at least one column
> >
> > Isn't that a bug in create rule command or am i missing something here ?
> >
> > If it is a bug, then, attached is the patch that fixes it.
> >
> > --
> > With Regards,
> > Ashutosh Sharma
> > EnterpriseDB:http://www.enterprisedb.com
>
> > diff --git a/src/backend/rewrite/rewriteDefine.c
> b/src/backend/rewrite/rewriteDefine.c
> > index 3496e6f..cb51955 100644
> > --- a/src/backend/rewrite/rewriteDefine.c
> > +++ b/src/backend/rewrite/rewriteDefine.c
> > @@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
> >errmsg("could not convert
> table \"%s\" to a view because it has row security enabled",
> >
>  RelationGetRelationName(event_relation;
> >
> > + if (event_relation->rd_rel->relnatts == 0)
> > + ereport(ERROR,
> > +
>  (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> > +  errmsg("view must have at
> least one column")));
> > +
> >   if (relation_has_policies(event_relation))
> >   ereport(ERROR,
> >
>  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>
> Maybe I'm missing something, but why do we want to forbid this?


Because pg_dump - produce the output for such case as:

 CREATE VIEW public.foo AS
 SELECT
   FROM public.bar;

which fails to restore because we forbid this in create view:

postgres@20625=#CREATE VIEW public.foo AS
postgres-#  SELECT
postgres-#FROM public.bar;
ERROR:  view must have at least one column
postgres@20625=#

Given
> that we these days allows selects without columns, I see no reason to
> require this for views.  The view error check long predates allowing
> SELECT and CREATE TABLE without columns. I think it's existence is just
> an oversight.  Tom, you did relaxed the permissive cases, any opinion?
>
> Greetings,
>
> Andres Freund
>
>

-- 
Rushabh Lathia


Re: ON SELECT rule on a table without columns

2019-02-08 Thread Ashutosh Sharma
On Fri, Feb 8, 2019 at 12:48 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-02-08 12:18:32 +0530, Ashutosh Sharma wrote:
> > When "ON SELECT" rule is created on a table without columns, it
> > successfully converts a table into the view. However, when the same is
> > done using CREATE VIEW command, it fails with an error saying: "view
> > must have at least one column". Here is what I'm trying to say:
> >
> > -- create table t1 without columns
> > create table t1();
> >
> > -- create table t2 without columns
> > create table t2();
> >
> > -- create ON SELECT rule on t1 - this would convert t1 from table to view
> > create rule "_RETURN" as on select to t1 do instead select * from t2;
> >
> > -- now check the definition of t1
> > \d t1
> >
> > postgres=# \d+ t1
> > View "public.t1"
> >  Column | Type | Collation | Nullable | Default | Storage | Description
> > +--+---+--+-+-+-
> > View definition:
> >  SELECT
> >FROM t2;
> >
> > The output of "\d+ t1" shows the definition of converted view t1 which
> > doesn't have any columns in the select query.
> >
> > Now, when i try creating another view with the same definition using
> > CREATE VIEW command, it fails with the error -> ERROR:  view must have
> > at least one column. See below
> >
> > postgres=# create view v1 as select from t2;
> > ERROR:  view must have at least one column
> >
> > OR,
> >
> > postgres=# create view v1 as select * from t2;
> > ERROR:  view must have at least one column
> >
> > Isn't that a bug in create rule command or am i missing something here ?
> >
> > If it is a bug, then, attached is the patch that fixes it.
> >
> > --
> > With Regards,
> > Ashutosh Sharma
> > EnterpriseDB:http://www.enterprisedb.com
>
> > diff --git a/src/backend/rewrite/rewriteDefine.c 
> > b/src/backend/rewrite/rewriteDefine.c
> > index 3496e6f..cb51955 100644
> > --- a/src/backend/rewrite/rewriteDefine.c
> > +++ b/src/backend/rewrite/rewriteDefine.c
> > @@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
> >errmsg("could not convert 
> > table \"%s\" to a view because it has row security enabled",
> >   
> > RelationGetRelationName(event_relation;
> >
> > + if (event_relation->rd_rel->relnatts == 0)
> > + ereport(ERROR,
> > + 
> > (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> > +  errmsg("view must have at 
> > least one column")));
> > +
> >   if (relation_has_policies(event_relation))
> >   ereport(ERROR,
> >   
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>
> Maybe I'm missing something, but why do we want to forbid this? Given
> that we these days allows selects without columns, I see no reason to
> require this for views.  The view error check long predates allowing
> SELECT and CREATE TABLE without columns. I think it's existence is just
> an oversight.  Tom, you did relaxed the permissive cases, any opinion?
>

That's because, we don't allow creation of a view on a table without
columns. So, shouldn't we do the same when converting table to a view
that doesn't have any column in it. Regarding why we can't allow
select on a view without columns given that select on a table without
column is possible, I don't have any answer :)

I can see that, even SELECT without any targetlist or table name in
it, works fine, see this,

postgres=# select;
--
(1 row)



Re: libpq compression

2019-02-08 Thread Konstantin Knizhnik




On 08.02.2019 10:01, Iwata, Aya wrote:

Hi,

I am sorry for my late reply.


I fixed all issues you have reported except using list of supported compression
algorithms.

Sure. I confirmed that.


It will require extra round of communication between client and server to
make a decision about used compression algorithm.

In beginning of this thread, Robbie Harwood said  that no extra communication 
needed.
I think so, too.


Well, I think that this problem is more complex and requires more 
discussion.

There are three places determining choice of compression algorithm:
1. Specification of compression algorithm by client. Right now it is 
just boolean "compression" parameter in connection string,

but it is obviously possible to specify concrete algorithm here.
2. List of compression algorithms supported by client.
3. List of compression algorithms supported by server.

Concerning first option I have very serious doubt that it is good idea 
to let client choose compression protocol.

Without extra round-trip it can be only implemented in this way:
if client toggles compression option in connection string, then libpq 
includes in startup packet list of supported compression algorithms.
Then server intersects this list with its own set of supported 
compression algorithms and if result is not empty, then
somehow choose  one of the commonly supported algorithms and sends it to 
the client with 'z' command.


One more question: should we allow custom defined compression methods 
and if so, how them can be handled at client side (at server we can use 
standard extension

dynamic loading mechanism).

Frankly speaking, I do not think that such flexibility in choosing 
compression algorithms is really needed.
I do not expect that there will be many situations where old client has 
to communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres 
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the problem 
of providing backward compatibility seems to be not so important.







I still not sure whether it is good idea to make it possible to user to
explicitly specify compression algorithm.
Right now used streaming compression algorithm is hardcoded and depends on
--use-zstd ort --use-zlib configuration options.
If client and server were built with the same options, then they are able
to use compression.

I understand that compression algorithm is hardcoded in your proposal.
However given the possibility of future implementation, I think
it would be better for it to have a flexibility to choose compression library.

src/backend/libpq/pqcomm.c :
In current Postgres source code, pq_recvbuf() calls secure_read()
and pq_getbyte_if_available() also calls secure_read().
It means these functions are on the same level.
However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
and  pq_recvbuf() calls secure_read(). The level of these functions is 
different.

I think the purpose of pq_getbyte_if_available() is to get a character if it 
exists and
the purpose of pq_recvbuf() is to acquire data up to the expected length.
In your change, pq_getbyte_if_available() may have to do unnecessary process 
waiting or something.


Sorry, but this change is essential. We can have some available data in 
compression buffer and we need to try to fetch it in 
pq_getbyte_if_available()

instead of just returning EOF.


So how about changing your code like this?
The part that checks whether it is compressed is implemented as a #define 
macro(like fe_misc.c). And pq_recvbuf() and pq_getbyte_if_available() modify 
little, like this;

-   r = secure_read(MyProcPort, PqRecvBuffer + PqRecvLength,
-   PQ_RECV_BUFFER_SIZE - 
PqRecvLength);
+r = SOME_DEFINE_NAME_();

configure:
Adding following message to the top of zlib in configure
```
{$as_echo "$as_me:${as_lineno-$LINENO}:checking whethere to build with zstd 
support">&5
$as_echo_n "checking whether to build with zstd suppor... ">&6;}
```


Sorry, but it seems to me that the following fragment of configure is 
doing it:



+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress 
in -lzstd" >&5

+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compre

RE: Libpq support to connect to standby server as priority

2019-02-08 Thread Tsunakawa, Takayuki
From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> target_session_attrs checks for the default_transaction_readonly or not?

PG 11 uses transaction_read_only, not default_transaction_readonly.  That's 
fine, because its purpose is to get a read-only session as the name suggests, 
not to connect to a standby.

> target_server_type checks for whether the server is in recovery or not?

Yes.


> I feel having two options make this feature complex to use it from the user
> point of view?
> 
> The need of two options came because of a possibility of a master server
> with default_transaction_readonly set to true. Even if the default
> transaction
> is readonly, it is user changeable parameter, so there shouldn't be any
> problem.

No.  It's not good if the user has to be bothered by 
default_transaction_read_only when he simply wants to a standby.

> how about just adding one parameter that takes the options similar like
> JDBC?
> target_server_type - Master, standby and prefer-standby. (The option names
> can revised based on the common words on the postgresql docs?)

"Getting a read-only session" is not equal to "connecting to a standby", so two 
different parameters make sense.


> And one more thing, what happens when the server promotes to master but
> the connection requested is standby? I feel we can maintain the existing
> connections
> and later new connections can be redirected? comments?

Ideally, it should be possible for the user to choose the behavior like Oracle 
below.  But that's a separate feature.


9.2 Role Transitions Involving Physical Standby Databases
https://docs.oracle.com/en/database/oracle/oracle-database/18/sbydb/managing-oracle-data-guard-role-transitions.html#GUID-857F6F45-DC1C-4345-BD39-F3BE7D79F1CD
--
Keeping Physical Standby Sessions Connected During Role Transition

As of Oracle Database 12c Release 2 (12.2.0.1), when a physical standby 
database is converted into a primary you have the option to keep any sessions 
connected to the physical standby connected, without disruption, during the 
switchover/failover. 

To enable this feature, set the STANDBY_DB_PRESERVE_STATES initialization 
parameter in your init.ora file before the standby instance is started. This 
parameter applies to physical standby databases only. The allowed values are: 

NONE — No sessions on the standby are retained during a switchover/failover. 
This is the default value. 

ALL — User sessions are retained during switchover/failover. 

SESSION — User sessions are retained during switchover/failover. 
--


Would you like to work on this patch?  I'm not sure if I can take time, but I'm 
willing to do it if you don't have enough time.

As Tom mentioned, we need to integrate and clean patches in three mail threads:

* Make a new GUC_REPORT parameter, server_type, to show the server role 
(primary or standby).
* Add target_server_type libpq connection parameter, whose values are either 
primary, standby, or prefer_standby.
* Failover timeout, load balancing, etc. that someone proposed in the other 
thread?

(I wonder which of server_type or server_role feels natural in English.)

Or, would you like to share the work, e.g., libpq and server-side?


Regards
Takayuki Tsunakawa





Re: speeding up planning with partitions

2019-02-08 Thread David Rowley
On Fri, 8 Feb 2019 at 22:12, Amit Langote  wrote:
> 0001 is now a patch to remove duplicate code from set_append_rel_size.  It
> combines multiple blocks that have the same body doing
> set_dummy_rel_pathlist().
>
> 0002 is the "overhaul inherited update/delete planning"

I had started looking at v20's 0001.  I've not done a complete pass
over it yet, but I'll likely just start again since v21 is out now:

I've removed the things you've fixed in v21. I think most of these
will apply to the 0002 patch, apart form maybe #2.

1. In set_rel_pathlist(), I wonder if you should be skipping the
set_rel_pathlist_hook call when inherited_update is true.

Another comment mentions:

* not this rel.  Also, this rel's sole path (ModifyTable) will be set
* by inheritance_planner later, so we can't check its paths yet.

So you're adding any paths for this rel


2. The comment here mentions "partition", but it might just be a child
of an inheritance parent:

if (childpruned ||
!apply_child_basequals(root, rel, childrel, childRTE, appinfo) ||
relation_excluded_by_constraints(root, childrel, childRTE))
{
/* This partition needn't be scanned; skip it. */
set_dummy_rel_pathlist(childrel);
continue;
}

This occurs in both set_inherit_target_rel_sizes() and set_append_rel_size()

3. I think the following comment:

/* If the child itself is partitioned it may turn into a dummy rel. */

It might be better to have it as:

/*
* If the child is a partitioned table it may have been marked
* as a dummy rel from having all its partitions pruned.
*/

I mostly think that by saying "if the child itself is partitioned",
then you're implying that we're currently operating on a partitioned
table, but we might be working on an inheritance parent.

4. In set_inherit_target_rel_pathlists(), you have:

/*
* If set_append_rel_size() decided the parent appendrel was
* parallel-unsafe at some point after visiting this child rel, we
* need to propagate the unsafety marking down to the child, so that
* we don't generate useless partial paths for it.
*/
if (!rel->consider_parallel)
childrel->consider_parallel = false;

But I don't quite see why set_append_rel_size() would have ever been
called for that rel. It should have been
set_inherit_target_rel_sizes(). It seems like the entire chunk can be
removed since set_inherit_target_rel_sizes() does not set
consider_parallel.

5. In planner.c you mention:

* inherit_make_rel_from_joinlist - this translates the jointree, replacing
 * the target relation in the original jointree (the root parent) by
 * individual child target relations and performs join planning on the
 * resulting join tree, saving the RelOptInfos of resulting join relations
 * into the top-level PlannerInfo


I can't see a function named inherit_make_rel_from_joinlist().

6. No such function:

* First, save the unexpanded version of the query's targetlist.
* create_inherit_target_child_root will use it as base when expanding
* it for a given child relation as the query's target relation instead
* of the parent.
*/

and in:

/*
* Add missing Vars to child's reltarget.
*
* create_inherit_target_child_root() would've added only those that
* are needed to be present in the top-level tlist (or ones that
* preprocess_targetlist thinks are needed to be in the tlist.)  We
* may need other attributes such as those contained in WHERE clauses,
* which are already computed for the parent during
* deconstruct_jointree processing of the original query (child's
* query never goes through deconstruct_jointree.)
*/

7. Where is "false" being passed here?

/* We haven't expanded inheritance yet, so pass false. */
tlist = preprocess_targetlist(root);
root->processed_tlist = tlist;
qp_extra.tlist = tlist;
qp_extra.activeWindows = NIL;
qp_extra.groupClause = NIL;
planned_rel = query_planner(root, tlist, standard_qp_callback, &qp_extra);

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: libpq compression

2019-02-08 Thread Daniel Gustafsson
> On 8 Feb 2019, at 10:15, Konstantin Knizhnik  
> wrote:

> Frankly speaking, I do not think that such flexibility in choosing 
> compression algorithms is really needed.
> I do not expect that there will be many situations where old client has to 
> communicate with new server or visa versa.
> In most cases both client and server belongs to the same postgres 
> distributive and so implements the same compression algorithm.
> As far as we are compressing only temporary data (traffic), the problem of 
> providing backward compatibility seems to be not so important.

I don’t think this assumption is entirely valid, and would risk unnecessary
breakage.

cheers ./daniel




Re: ON SELECT rule on a table without columns

2019-02-08 Thread Andres Freund



On February 8, 2019 10:05:03 AM GMT+01:00, Rushabh Lathia 
 wrote:
>On Fri, Feb 8, 2019 at 12:48 PM Andres Freund 
>wrote:
>
>> Hi,
>>
>> On 2019-02-08 12:18:32 +0530, Ashutosh Sharma wrote:
>> > When "ON SELECT" rule is created on a table without columns, it
>> > successfully converts a table into the view. However, when the same
>is
>> > done using CREATE VIEW command, it fails with an error saying:
>"view
>> > must have at least one column". Here is what I'm trying to say:
>> >
>> > -- create table t1 without columns
>> > create table t1();
>> >
>> > -- create table t2 without columns
>> > create table t2();
>> >
>> > -- create ON SELECT rule on t1 - this would convert t1 from table
>to view
>> > create rule "_RETURN" as on select to t1 do instead select * from
>t2;
>> >
>> > -- now check the definition of t1
>> > \d t1
>> >
>> > postgres=# \d+ t1
>> > View "public.t1"
>> >  Column | Type | Collation | Nullable | Default | Storage |
>Description
>> >
>+--+---+--+-+-+-
>> > View definition:
>> >  SELECT
>> >FROM t2;
>> >
>> > The output of "\d+ t1" shows the definition of converted view t1
>which
>> > doesn't have any columns in the select query.
>> >
>> > Now, when i try creating another view with the same definition
>using
>> > CREATE VIEW command, it fails with the error -> ERROR:  view must
>have
>> > at least one column. See below
>> >
>> > postgres=# create view v1 as select from t2;
>> > ERROR:  view must have at least one column
>> >
>> > OR,
>> >
>> > postgres=# create view v1 as select * from t2;
>> > ERROR:  view must have at least one column
>> >
>> > Isn't that a bug in create rule command or am i missing something
>here ?
>> >
>> > If it is a bug, then, attached is the patch that fixes it.
>> >
>> > --
>> > With Regards,
>> > Ashutosh Sharma
>> > EnterpriseDB:http://www.enterprisedb.com
>>
>> > diff --git a/src/backend/rewrite/rewriteDefine.c
>> b/src/backend/rewrite/rewriteDefine.c
>> > index 3496e6f..cb51955 100644
>> > --- a/src/backend/rewrite/rewriteDefine.c
>> > +++ b/src/backend/rewrite/rewriteDefine.c
>> > @@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
>> >errmsg("could not
>convert
>> table \"%s\" to a view because it has row security enabled",
>> >
>>  RelationGetRelationName(event_relation;
>> >
>> > + if (event_relation->rd_rel->relnatts == 0)
>> > + ereport(ERROR,
>> > +
>>  (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
>> > +  errmsg("view must
>have at
>> least one column")));
>> > +
>> >   if (relation_has_policies(event_relation))
>> >   ereport(ERROR,
>> >
>>  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>
>> Maybe I'm missing something, but why do we want to forbid this?
>
>
>Because pg_dump - produce the output for such case as:
>
> CREATE VIEW public.foo AS
> SELECT
>   FROM public.bar;
>
>which fails to restore because we forbid this in create view:
>
>postgres@20625=#CREATE VIEW public.foo AS
>postgres-#  SELECT
>postgres-#FROM public.bar;
>ERROR:  view must have at least one column
>postgres@20625=#

You misunderstood my point: I'm asking why we shouldn't remove that check from 
views, rather than adding it to create rule.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



RE: Protect syscache from bloating with negative cache entries

2019-02-08 Thread Ideriha, Takeshi
>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>I made a rerun of benchmark using "-S -T 30" on the server build with no 
>assertion and
>-O2. The numbers are the best of three successive attempts.  The patched 
>version is
>running with cache_target_memory = 0, cache_prune_min_age = 600 and
>cache_entry_limit = 0 but pruning doesn't happen by the workload.
>
>master: 13393 tps
>v12   : 12625 tps (-6%)
>
>Significant degradation is found.
>
>Recuded frequency of dlist_move_tail by taking 1ms interval between two 
>succesive
>updates on the same entry let the degradation dissapear.
>
>patched  : 13720 tps (+2%)

It would be good to introduce some interval.
I followed your benchmark (initialized scale factor=10 and others are same 
option) 
and found the same tendency. 
These are average of 5 trials.
master:   7640.000538 
patch_v12:7417.981378 (3 % down against master)
patch_v13:7645.071787 (almost same as master)

These cases are not pruning happen workload as you mentioned.
I'd like to do benchmark of cache-pruning-case as well.
To demonstrate cache-pruning-case
right now I'm making hundreds of partitioned table and run select query for 
each partitioned table
using pgbench custom file. Maybe using small number of cache_prune_min_age or 
hard limit would be better.
Are there any good model?

># I'm not sure the name LRU_IGNORANCE_INTERVAL makes sens..
How about MIN_LRU_UPDATE_INTERVAL?

Regards,
Takeshi Ideriha




Re: Inconsistent error handling in the openssl init code

2019-02-08 Thread Michael Paquier
On Fri, Feb 08, 2019 at 09:36:59AM +0100, Daniel Gustafsson wrote:
> Done, thanks!  I took the liberty to mark you as reviewer since you’ve already
> spent time looking at the patch.

Thanks.  Please note that I can take care of the patch in a couple of
days if need be.
--
Michael


signature.asc
Description: PGP signature


Re: ON SELECT rule on a table without columns

2019-02-08 Thread Ashutosh Sharma
On Fri, Feb 8, 2019 at 3:05 PM Andres Freund  wrote:
>
>
>
> On February 8, 2019 10:05:03 AM GMT+01:00, Rushabh Lathia 
>  wrote:
> >On Fri, Feb 8, 2019 at 12:48 PM Andres Freund 
> >wrote:
> >
> >> Hi,
> >>
> >> On 2019-02-08 12:18:32 +0530, Ashutosh Sharma wrote:
> >> > When "ON SELECT" rule is created on a table without columns, it
> >> > successfully converts a table into the view. However, when the same
> >is
> >> > done using CREATE VIEW command, it fails with an error saying:
> >"view
> >> > must have at least one column". Here is what I'm trying to say:
> >> >
> >> > -- create table t1 without columns
> >> > create table t1();
> >> >
> >> > -- create table t2 without columns
> >> > create table t2();
> >> >
> >> > -- create ON SELECT rule on t1 - this would convert t1 from table
> >to view
> >> > create rule "_RETURN" as on select to t1 do instead select * from
> >t2;
> >> >
> >> > -- now check the definition of t1
> >> > \d t1
> >> >
> >> > postgres=# \d+ t1
> >> > View "public.t1"
> >> >  Column | Type | Collation | Nullable | Default | Storage |
> >Description
> >> >
> >+--+---+--+-+-+-
> >> > View definition:
> >> >  SELECT
> >> >FROM t2;
> >> >
> >> > The output of "\d+ t1" shows the definition of converted view t1
> >which
> >> > doesn't have any columns in the select query.
> >> >
> >> > Now, when i try creating another view with the same definition
> >using
> >> > CREATE VIEW command, it fails with the error -> ERROR:  view must
> >have
> >> > at least one column. See below
> >> >
> >> > postgres=# create view v1 as select from t2;
> >> > ERROR:  view must have at least one column
> >> >
> >> > OR,
> >> >
> >> > postgres=# create view v1 as select * from t2;
> >> > ERROR:  view must have at least one column
> >> >
> >> > Isn't that a bug in create rule command or am i missing something
> >here ?
> >> >
> >> > If it is a bug, then, attached is the patch that fixes it.
> >> >
> >> > --
> >> > With Regards,
> >> > Ashutosh Sharma
> >> > EnterpriseDB:http://www.enterprisedb.com
> >>
> >> > diff --git a/src/backend/rewrite/rewriteDefine.c
> >> b/src/backend/rewrite/rewriteDefine.c
> >> > index 3496e6f..cb51955 100644
> >> > --- a/src/backend/rewrite/rewriteDefine.c
> >> > +++ b/src/backend/rewrite/rewriteDefine.c
> >> > @@ -473,6 +473,11 @@ DefineQueryRewrite(const char *rulename,
> >> >errmsg("could not
> >convert
> >> table \"%s\" to a view because it has row security enabled",
> >> >
> >>  RelationGetRelationName(event_relation;
> >> >
> >> > + if (event_relation->rd_rel->relnatts == 0)
> >> > + ereport(ERROR,
> >> > +
> >>  (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> >> > +  errmsg("view must
> >have at
> >> least one column")));
> >> > +
> >> >   if (relation_has_policies(event_relation))
> >> >   ereport(ERROR,
> >> >
> >>  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >>
> >> Maybe I'm missing something, but why do we want to forbid this?
> >
> >
> >Because pg_dump - produce the output for such case as:
> >
> > CREATE VIEW public.foo AS
> > SELECT
> >   FROM public.bar;
> >
> >which fails to restore because we forbid this in create view:
> >
> >postgres@20625=#CREATE VIEW public.foo AS
> >postgres-#  SELECT
> >postgres-#FROM public.bar;
> >ERROR:  view must have at least one column
> >postgres@20625=#
>
> You misunderstood my point: I'm asking why we shouldn't remove that check 
> from views, rather than adding it to create rule.
>

Here is the second point from my previous response:

"Regarding why we can't allow select on a view without columns given
that select on a table without column is possible, I don't have any
answer :)"

I prepared the patch assuming that the current behaviour of create
view on a table without column is fine.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: PATCH: Include all columns in default names for foreign key constraints.

2019-02-08 Thread Peter Eisentraut
On 13/01/2019 01:55, Paul Martinez wrote:
> The two constraints have nearly identical names. With my patch the default 
> names
> will include both column names, so we have we will instead have this output:
> 
>  Foreign-key constraints:
>   "comments_tenant_id_commenter_id_fkey" FOREIGN KEY (tenant_id,
> commenter_id) REFERENCES users(tenant_id, id)
>   "comments_tenant_id_post_id_fkey" FOREIGN KEY (tenant_id, post_id)
> REFERENCES posts(tenant_id, id)
> 
> This makes the default names for foreign keys in line with the default names
> for indexes. Hopefully an uncontroversial change!

I think this is a good change.

> I pretty much just copied and pasted the implementation from
> ChooseIndexNameAddition and placed it in src/backend/commands/tablecmds.c.

The use of "name2" in the comment doesn't make sense outside the context
of indexcmds.c.  Maybe rewrite that a bit.

> Regression tests are in src/test/regress/sql/foreign_key.sql. I create two
> composite foreign keys on table, one via the CREATE TABLE statement, and the
> other in a ALTER TABLE statement. The generated names of the constraints are
> then queried from the pg_constraint table.

Existing regression tests already exercise this, and they are failing
all over the place because of the changes of the generated names.  That
is to be expected.  You should investigate those failures and adjust the
"expected" files.  Then you probably don't need your additional tests.

It might be worth having a test that runs into the 63-character length
limit somehow.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Pluggable Storage - Andres's take

2019-02-08 Thread Amit Khandekar
On Wed, 6 Feb 2019 at 18:30, Amit Khandekar  wrote:
>
> On Mon, 21 Jan 2019 at 08:31, Andres Freund  wrote:
> >
> > Hi,
> >
> > (resending with compressed attachements, perhaps that'll go through)
> >
> > On 2018-12-10 18:13:40 -0800, Andres Freund wrote:
> > > On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
> > > > FWIW, now that oids are removed, and the tuple table slot abstraction
> > > > got in, I'm working on rebasing the pluggable storage patchset ontop of
> > > > that.
> > >
> > > I've pushed a version to that to the git tree, including a rebased
> > > version of zheap:
> > > https://github.com/anarazel/postgres-pluggable-storage
>
> I worked on a slight improvement on the
> 0040-WIP-Move-xid-horizon-computation-for-page-level patch . Instead
> of pre-fetching all the required buffers beforehand, the attached WIP
> patch pre-fetches the buffers keeping a constant distance ahead of the
> buffer reads. It's a WIP patch because right now it just uses a
> hard-coded 5 buffers ahead. Haven't used effective_io_concurrency like
> how it is done in nodeBitmapHeapscan.c. Will do that next. But before
> that, any comments on the way I did the improvements would be nice.
>
> Note that for now, the patch is based on the pluggable-storage latest
> commit; it does not replace the 0040 patch in the patch series.

In the attached v1 patch, the prefetch_distance is calculated as
effective_io_concurrency + 10. Also it has some cosmetic changes.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


prefetch_xid_horizon_scan_v1.patch
Description: Binary data


Re: phase out ossp-uuid?

2019-02-08 Thread Jose Luis Tallon

On 7/2/19 23:03, Andres Freund wrote:

Hi,

On 2019-02-07 09:03:06 +, Dave Page wrote:

On Thu, Feb 7, 2019 at 8:26 AM Peter Eisentraut
 wrote:

I suggest we declare it deprecated in PG12 and remove it altogether in PG13.

Much as I'd like to get rid of it, we don't have an alternative for
Windows do we? The docs for 11 imply it's required for UUID support
(though the wording isn't exactly clear, saying it's required for
UUID-OSSP support!):
https://www.postgresql.org/docs/11/install-windows-full.html#id-1.6.4.8.8

Given that we've now integrated strong crypto support, and are relying
on it for security (scram), perhaps we should just add a core uuidv4?


This. But just make it "uuid" and so both parties will get their own:

On 7/2/19 11:37, I wrote:

AFAIR, Windows has its own DCE/v4 UUID generation support if needed 
UUID v5 can be generated using built-in crypto hashes. v1 are the ones 
(potentially) more cumbersome to generate plus they are the least 
useful IMHO.


- UUIDv3    <- with built-in crypto hashes

- UUIDv4    <- with built-in crypto random

- UUIDv5    <- with built-in crypto hashes

Only v1 remain. For those use cases one could use ossp-uuid so what 
about:


* Rename the extension's type to ossp_uuid or the like.

* Have uuid in-core (we already got the platform independent required 
crypto, so I wouldn't expect portability issues)


I reckon that most use cases should be either UUID v4 or UUID v5 these 
days. For those using v1 UUIDs, either implement v1 in core or provide 
some fallback/migration path; This would only affect the 
"uuid_generate_v1()" and "uuid_generate_v1mc()" calls AFAICS.



Moreover, the documentation (as far back as 9.4) already states:

"If you only need randomly-generated (version 4) UUIDs, consider using 
the |gen_random_uuid()| function from the pgcrypto 
 module instead."


So just importing the datatype into core would go a long way towards 
removing the dependency for most users.



Thanks,

    / J.L.




RE: [PROPOSAL]a new data type 'bytea' for ECPG

2019-02-08 Thread Matsumura, Ryo
Meskes-san

Thank you for your comment.
I remove it and attach a new patch. Please review it.
I feel sorry for asking you to reveiw without contribution.

Regards
Ryo Matsumura

> -Original Message-
> From: Michael Meskes [mailto:mes...@postgresql.org]
> Sent: Friday, February 1, 2019 6:29 PM
> To: Matsumura, Ryo/松村 量 
> Cc: 'Michael Meskes' ; Tsunakawa, Takayuki/綱川 貴
> 之 ; pgsql-hackers@lists.postgresql.org
> Subject: Re: [PROPOSAL]a new data type 'bytea' for ECPG
> 
> Matsumura-san,
> 
> > Sorry to bother you, but I would be grateful if you would comment to me.
> 
> Sorry, I didn't know you were waiting on a reply by me.
> 
> > > I have a question about ecpg manual when I add article for bytea.
> > > I wonder what does the following about VARCHAR mean.
> > > ...
> > > I think, if the footnote of VARCHAR is meaningless, I remove it while I
> add
> > > the article for bytea. (I didn't remove in this patch.)
> 
> I have no idea where the footnote comes from, but I agree that it doesn't seem
> to make sense. The datatype varchar in the C code is handled by the
> preprocessor and replaced by a struct definition anyway.
> 
> Feel free to remove.
> 
> Michael
> --
> Michael Meskes
> Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
> Meskes at (Debian|Postgresql) dot Org
> Jabber: michael at xmpp dot meskes dot org
> VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL
> 



ecpg_bytea_v1_3.patch
Description: ecpg_bytea_v1_3.patch


Re: Incorrect visibility test function assigned to snapshot

2019-02-08 Thread Antonin Houska
Bruce Momjian  wrote:

> On Wed, May 30, 2018 at 09:28:54AM -0400, Alvaro Herrera wrote:
> > On 2018-May-30, Antonin Houska wrote:
> > 
> > > In the header comment, SnapBuildInitialSnapshot() claims to set
> > > snapshot->satisfies to the HeapTupleSatisfiesMVCC test function, and 
> > > indeed it
> > > converts the "xid" array to match its semantics (i.e. the xid items 
> > > eventually
> > > represent running transactions as opposed to the committed ones). However 
> > > the
> > > test function remains HeapTupleSatisfiesHistoricMVCC as set by
> > > SnapBuildBuildSnapshot().
> > 
> > Interesting.  While this sounds like an oversight that should have
> > horrible consequences, it's seems not to because the current callers
> > don't seem to care about the ->satisfies function.  Are you able to come
> > up with some scenario in which it causes an actual problem?
> 
> Uh, are we going to fix this anyway?  Seems we should.

Sorry, I forgot. Patch is below and I'm going to add an entry to the next CF.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 2f185f7823..3394abb602 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -615,6 +615,8 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 
 		TransactionIdAdvance(xid);
 	}
+	/* And of course, adjust snapshot type accordingly. */
+	snap->snapshot_type = SNAPSHOT_MVCC;
 
 	snap->xcnt = newxcnt;
 	snap->xip = newxip;
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 6e02585e10..bc0166cc6f 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1502,6 +1502,13 @@ ImportSnapshot(const char *idstr)
 	 */
 	memset(&snapshot, 0, sizeof(snapshot));
 
+	/*
+	 * Do not rely on the fact that SNAPSHOT_MVCC is zero. (The core code
+	 * currently does not use this field of imported snapshot, but let's keep
+	 * things consistent.)
+	 */
+	snapshot.snapshot_type = SNAPSHOT_MVCC;
+
 	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
 	src_pid = parseIntFromText("pid:", &filebuf, path);
 	/* we abuse parseXidFromText a bit here ... */


Re: Inconsistent error handling in the openssl init code

2019-02-08 Thread Peter Eisentraut
On 08/02/2019 11:01, Michael Paquier wrote:
> On Fri, Feb 08, 2019 at 09:36:59AM +0100, Daniel Gustafsson wrote:
>> Done, thanks!  I took the liberty to mark you as reviewer since you’ve 
>> already
>> spent time looking at the patch.
> 
> Thanks.  Please note that I can take care of the patch in a couple of
> days if need be.

Fixed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Commit Fest 2019-01 is now closed

2019-02-08 Thread Chris Travers
On Fri, Feb 8, 2019 at 9:40 AM Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Bruce Momjian [mailto:br...@momjian.us]
> > I am thinking we should see which items we really want for PG 12 _now_
> > and allocate resources/help to get them done, rather than being
> > surprised they didn't make it.  I am glad we are in good shape with
> > CTEs, since that has been a long-requested feature.
>
> I want the partitioning performance to be comparable to a certain
> commercial DBMS, and Horiguchi-san's "protect syscache ..." to limit the
> upper size of syscache/relcache.  Otherwise, our organization may be put in
> a difficult situation...  Anyway, I recognize my colleagues and I also have
> to review others' patches.
>
> I'm sorry if I repeat what someone proposed in the past, but it would be
> nice if the CF app could show the number of modified lines (added +
> deleted) for each patch, e.g., the last line of diffstat command output.
> That would help young newbies to choose patches.
>

An important prerequisite here would be to list all patches on the most
recent email.  Right now it just lists one so you have to follow the link
to the hackers list entry and look at each patch one at a time.

So if we could grab all attachments ending in .diff or .patch and list line
counts, that would be a welcome improvement.


>
> Regards
> Takayuki Tsunakawa
>
>
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Commit Fest 2019-01 is now closed

2019-02-08 Thread Magnus Hagander
On Thu, Feb 7, 2019 at 10:34 PM Gavin Flower 
wrote:

> On 08/02/2019 00:53, Peter Eisentraut wrote:
> > On 06/02/2019 21:09, Magnus Hagander wrote:
> >> This has now been pushed and is available. I've set it up with stable,
> >> 12 and 13 as possible versions for now, but I have not added any tags to
> >> the existing patches (except for one, in order to test it).
> > What is the meaning of this?  If something is meant for 13, shouldn't it
> > be moved to the next commit fest?
> >
> > I find the current display a bit hard to read.  If we are going to use
> > the white-on-color style, perhaps each version could have a different
> > color.  Or just make it plain black-on-white text.
> >
> If your eyesight is poor, then odd colour combinations like white on
> grey, or other colours, is hard to read.
>
> Ten years ago, before about 7 eye operations, poor contrast would be
> extremely difficult, if not impossible, for me to read.
>
> Remember also that about 1 in 12 men are colour blind.
>

I'd be more than happy for somebody with morge knowledge of such matters
than me to think up a better color scheme. The only reason it has those
colors is that they're the default ones in the Bootstrap CSS framework.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


RE: performance issue in remove_from_unowned_list()

2019-02-08 Thread Ideriha, Takeshi
>From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
>But it's a bit funnier, because there's also DropRelationFiles() which does 
>smgrclose on
>a batch of relations too, and it says this
>
>/*
> * Call smgrclose() in reverse order as when smgropen() is called.
> * This trick enables remove_from_unowned_list() in smgrclose()
> * to search the SMgrRelation from the unowned list,
> * with O(1) performance.
> */
>for (i = ndelrels - 1; i >= 0; i--)
>...
>
>but it's called from two places in xact.c only. And if you trigger the issue 
>with 100k x
>CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>actually get the very same behavior with always traversing the unowned list 
>for some
>reason. I'm not quite sure why, but it seems the optimization is exactly the 
>wrong thing
>to do here.

So when DropRelationFiles() is called, order of calling smgr_close() and order 
of unowed list is always same?

This one was inroduced at b4166911 and I'd like to hear author and reviewer's 
opinion. 
https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com

Regards,
Takeshi Ideriha


Re: Inconsistent error handling in the openssl init code

2019-02-08 Thread Daniel Gustafsson
> On 8 Feb 2019, at 12:01, Peter Eisentraut  
> wrote:
> 
> On 08/02/2019 11:01, Michael Paquier wrote:
>> On Fri, Feb 08, 2019 at 09:36:59AM +0100, Daniel Gustafsson wrote:
>>> Done, thanks!  I took the liberty to mark you as reviewer since you’ve 
>>> already
>>> spent time looking at the patch.
>> 
>> Thanks.  Please note that I can take care of the patch in a couple of
>> days if need be.
> 
> Fixed, thanks.

Thanks, I’ve closed it in the commitfest with you as committer.

cheers ./daniel


Re: speeding up planning with partitions

2019-02-08 Thread David Rowley
On Fri, 8 Feb 2019 at 22:27, David Rowley  wrote:
> I had started looking at v20's 0001.  I've not done a complete pass
> over it yet, but I'll likely just start again since v21 is out now:

I've now done a complete pass over v21.

Here's what I wrote down.

8. Is this code in the wrong patch? I don't see any function named
build_dummy_partition_rel in this patch.

* Make child entries in the EquivalenceClass as well.  If the childrel
* appears to be a dummy one (one built by build_dummy_partition_rel()),
* no need to make any new entries, because anything that'd need those
* can instead use the parent's (rel).
*/
if (childrel->relid != rel->relid &&

9. "to use" seems out of place here. It makes more sense if you remove
those words.

* Add child subroots needed to use during planning for individual child
* targets

10. Is this comment really needed?

/*
* This is needed, because inheritance_make_rel_from_joinlist needs to
* translate root->join_info_list executing make_rel_from_joinlist for a
* given child.
*/

None of the other node types mention what they're used for.  Seems
like something that might get outdated pretty quickly.

11. adjust_appendrel_attrs_mutator: This does not seem very robust:

/*
* No point in searching if parent rel not mentioned in eclass; but we
* can't tell that for sure if parent rel is itself a child.
*/
for (cnt = 0; cnt < nappinfos; cnt++)
{
if (bms_is_member(appinfos[cnt]->parent_relid, ec->ec_relids))
{
appinfo = appinfos[cnt];
break;
}
}

What if the caller passes multiple appinfos and actually wants them
all processed?  You'll only process the first one you find that has an
eclass member.  I think you should just loop over each appinfo and
process all the ones that have a match, not just the first.

I understand the current caller only passes 1, but I don't think that
gives you an excuse to take a shortcut on the implementation.  I think
probably you've done this as that's what is done for Var in
adjust_appendrel_attrs_mutator(), but a Var can only belong to a
single relation. An EquivalenceClass can have members for multiple
relations.


13. adjust_appendrel_attrs_mutator: This seems wrong:

/*
* We have found and replaced the parent expression, so done
* with EC.
*/
break;

Surely there could be multiple members for the parent. Say:

UPDATE parted SET ... WHERE x = y AND y = 1;

14. adjust_appendrel_attrs_mutator: Comment is wrong. There's no
function called adjust_inherited_target_child_root and the EC is
copied in the function, not the caller.

/*
* Now fix up EC's relids set.  It's OK to modify EC like this,
* because caller must have made a copy of the original EC.
* For example, see adjust_inherited_target_child_root.
*/


15. I don't think "Copy it before modifying though." should be part of
this comment.

/*
* Adjust all_baserels to replace the original target relation with the
* child target relation.  Copy it before modifying though.
*/
subroot->all_baserels = adjust_child_relids(subroot->all_baserels,
1, &appinfo);

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] pgbench tap tests fail if the path contains a perl special character

2019-02-08 Thread Raúl Marín Rodríguez
Hi,

Any way I can help to get this commited? Should we add this to the commitfest?

Regards,
-- 
Raúl Marín Rodríguez
carto.com



Re: Problems with plan estimates in postgres_fdw

2019-02-08 Thread Etsuro Fujita

Hi Antonin,

(2019/02/08 2:04), Antonin Houska wrote:

Etsuro Fujita  wrote:

Here is an updated version of the patch set.  Changes are:


I've spent some time reviewing the patches.


Thanks for the review!


First, I wonder if the information on LIMIT needs to be passed to the
FDW. Cannot the FDW routines use root->tuple_fraction?


I think it's OK for the FDW to use the tuple_fraction, but I'm not sure 
the tuple_fraction should be enough.  For example, I don't think we can 
re-compute from the tuple_fraction the LIMIT/OFFSET values.



I think (although have
not verified experimentally) that the problem reported in [1] is not limited
to the LIMIT clause. I think the same problem can happen if client uses cursor
and sets cursor_tuple_fraction very low. Since the remote node is not aware of
this setting when running EXPLAIN, the low fraction can also make postgres_fdw
think that retrieval of the whole set is cheaper than it will appear to be
during the actual execution.


I think it would be better to get a fast-start plan on the remote side 
in such a situation, but I'm not sure we can do that as well with this 
patch, because in the cursor case, the FDW couldn't know in advance 
exactly how may rows will be fetched from the remote side using that 
cursor, so the FDW couldn't push down LIMIT.  So I'd like to leave that 
for another patch.



Some comments on coding:

0001-postgres_fdw-Perform-UPPERREL_ORDERED-step-remotely-v3.patch
-

* I'm not sure if UPPERREL_ORDERD should to deal with LIMIT at all. Note that
grouping_planner() does not call create_limit_path() until it's done with
create_ordered_paths(), and create_ordered_paths() is where the FDW is
requested to add its paths to UPPERREL_ORDERED relation.


Maybe I'm missing something, but the 0001 patch doesn't deal with LIMIT 
at all.  I added the parameter limit_tuples to PgFdwPathExtraData so 
that we can pass limit_tuples=-1, which means no LIMIT, to cost_sort(), 
though.



* add_foreign_ordered_paths()

Exprssion root->upper_targets[UPPERREL_FINAL] is used to access the target.

I think create_ordered_paths() should actually set the target so that
postgresGetForeignUpperPaths() can simply find it in
output_rel->reltarget. This is how postgres_fdw already retrieves the target
for grouped paths. Small patch is attached that makes this possible.


Seems like a good idea.  Thanks for the patch!  Will review.


* regression tests: I think test(s) should be added for queries that have
   ORDER BY clause but do not have GROUP BY (and also no LIMIT / OFFSET)
   clause. I haven't noticed such tests.


Will do.


0002-postgres_fdw-Perform-UPPERREL_FINAL-step-remotely-v3.patch
---

* add_foreign_final_paths()

Similar problem I reported for add_foreign_ordered_paths() above.

* adjust_limit_rows_costs()

Doesn't this function address more generic problem than the use of
postgres_fdw? If so, then it should be submitted as a separate patch. BTW, the
function is only used in pathnode.c, so it should be declared static.


Actually, this is simple refactoring for create_limit_path() so that 
that function can be shared with postgres_fdw.  See 
estimate_path_cost_size().  I'll separate that refactoring in the next 
version of the patch set.



Both patches:


One thing that makes me confused when I try to understand details is that the
new functions add_foreign_ordered_paths() and add_foreign_final_paths() both
pass "input_rel" for the "foreignrel" argument of estimate_path_cost_size():

estimate_path_cost_size(root, input_rel, ...)

whereas the existing add_foreign_grouping_paths() passes "grouped_rel":

estimate_path_cost_size(root, grouped_rel, ...)

Can you please make this consistent? If you do, you'll probably need to
reconsider your changes to estimate_path_cost_size().


This is intended and I think I should add more comments on that.  Let me 
explain.  These patches modified estimate_path_cost_size() so that we 
can estimate the costs of a foreign path for the UPPERREL_ORDERED or 
UPPERREL_FINAL rel, by adding the costs of the upper-relation processing 
(ie, ORDER BY and/or LIMIT/OFFSET, specified by PgFdwPathExtraData) to 
the costs of implementing the *underlying* relation for the upper 
relation (ie, scan, join or grouping rel, specified by the input_rel). 
So those functions call estimate_path_cost_size() with the input_rel, 
not the ordered_rel/final_rel, along with PgFdwPathExtraData.



And maybe related problem: why should FDW care about the effect of
apply_scanjoin_target_to_paths() like you claim to do here?

/*
 * If this is UPPERREL_ORDERED and/or UPPERREL_FINAL steps on the
 * final scan/join relation, the costs obtained from the cache
 * wouldn't yet contain the eval cost for the final scan/join target,
 * which would have been updated by apply_scanjoin_target_to

Re: Commit Fest 2019-01 is now closed

2019-02-08 Thread Peter Eisentraut
On 08/02/2019 12:27, Magnus Hagander wrote:
> I'd be more than happy for somebody with morge knowledge of such matters
> than me to think up a better color scheme. The only reason it has those
> colors is that they're the default ones in the Bootstrap CSS framework.

Can we have that column just normal black-and-white?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] pgbench tap tests fail if the path contains a perl special character

2019-02-08 Thread Daniel Gustafsson
> On 8 Feb 2019, at 13:33, Raúl Marín Rodríguez  wrote:

> Any way I can help to get this commited? Should we add this to the commitfest?

Please do, it’s a good way to ensure that the patch is tracked and not
forgotten about.

cheers ./daniel


Re: performance issue in remove_from_unowned_list()

2019-02-08 Thread Tomas Vondra



On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:
>> From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
>> But it's a bit funnier, because there's also DropRelationFiles() which does 
>> smgrclose on
>> a batch of relations too, and it says this
>>
>>/*
>> * Call smgrclose() in reverse order as when smgropen() is called.
>> * This trick enables remove_from_unowned_list() in smgrclose()
>> * to search the SMgrRelation from the unowned list,
>> * with O(1) performance.
>> */
>>for (i = ndelrels - 1; i >= 0; i--)
>>...
>>
>> but it's called from two places in xact.c only. And if you trigger the issue 
>> with 100k x
>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you
>> actually get the very same behavior with always traversing the unowned list 
>> for some
>> reason. I'm not quite sure why, but it seems the optimization is exactly the 
>> wrong thing
>> to do here.
> 
> So when DropRelationFiles() is called, order of calling smgr_close()
and order of unowed list is always same?
> 

Well, let's say you create 10k tables in a single transaction, and then
kill the server with "kill -9" right after the commit. Then on restart,
during recovery this happens

2019-02-08 14:16:21.781 CET [12817] LOG:  remove_from_unowned_list 10015
2019-02-08 14:16:21.871 CET [12817] LOG:  remove_from_unowned_list 10005
2019-02-08 14:16:21.967 CET [12817] LOG:  remove_from_unowned_list 10004
2019-02-08 14:16:22.057 CET [12817] LOG:  remove_from_unowned_list 10001
2019-02-08 14:16:22.147 CET [12817] LOG:  remove_from_unowned_list 1
2019-02-08 14:16:22.238 CET [12817] LOG:  remove_from_unowned_list 
2019-02-08 14:16:22.327 CET [12817] LOG:  remove_from_unowned_list 9998
2019-02-08 14:16:22.421 CET [12817] LOG:  remove_from_unowned_list 9996
2019-02-08 14:16:22.513 CET [12817] LOG:  remove_from_unowned_list 9995
2019-02-08 14:16:22.605 CET [12817] LOG:  remove_from_unowned_list 9994
2019-02-08 14:16:22.696 CET [12817] LOG:  remove_from_unowned_list 9993
...
2019-02-08 14:19:13.921 CET [12817] LOG:  remove_from_unowned_list 8396
2019-02-08 14:19:14.025 CET [12817] LOG:  remove_from_unowned_list 8395
2019-02-08 14:19:14.174 CET [12817] LOG:  remove_from_unowned_list 8394
2019-02-08 14:19:14.277 CET [12817] LOG:  remove_from_unowned_list 8393
2019-02-08 14:19:14.387 CET [12817] LOG:  remove_from_unowned_list 8392
2019-02-08 14:19:14.508 CET [12817] LOG:  remove_from_unowned_list 8391
2019-02-08 14:19:14.631 CET [12817] LOG:  remove_from_unowned_list 8390
2019-02-08 14:19:14.770 CET [12817] LOG:  remove_from_unowned_list 8389
...

This is with the attached debug patch, which simply prints index of the
unowned list index entry. And yes, it took ~3 minutes to get from 10k to
8k (at which point I interrupted the recovery and killed the cluster).

I see similar issue after creating a lot of tables in the same xact and
rolling it back, although in this case it's faster for some reason.

> This one was inroduced at b4166911 and I'd like to hear author and reviewer's 
> opinion. 
> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com
> 

That probably explains why we haven't seen the issue before.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..8eb724b80d 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -281,6 +281,7 @@ remove_from_unowned_list(SMgrRelation reln)
 {
 	SMgrRelation *link;
 	SMgrRelation cur;
+	int			idx = 0;
 
 	for (link = &first_unowned_reln, cur = *link;
 		 cur != NULL;
@@ -292,7 +293,12 @@ remove_from_unowned_list(SMgrRelation reln)
 			cur->next_unowned_reln = NULL;
 			break;
 		}
+
+		idx++;
 	}
+
+	if (idx > 0)
+		elog(LOG, "remove_from_unowned_list %d", idx);
 }
 
 /*


Re: Protect syscache from bloating with negative cache entries

2019-02-08 Thread MauMau
From: Tomas Vondra
> I don't think we need to remove the expired entries right away, if
there
> are only very few of them. The cleanup requires walking the hash
table,
> which means significant fixed cost. So if there are only few expired
> entries (say, less than 25% of the cache), we can just leave them
around
> and clean them if we happen to stumble on them (although that may
not be
> possible with dynahash, which has no concept of expiration) of
before
> enlarging the hash table.

I agree in that we don't need to evict cache entries as long as the
memory permits (within the control of the DBA.)

But how does the concept of expiration fit the catcache?  How would
the user determine the expiration time, i.e. setting of
syscache_prune_min_age?  If you set a small value to evict unnecessary
entries faster, necessary entries will also be evicted.  Some access
counter would keep accessed entries longer, but some idle time (e.g.
lunch break) can flush entries that you want to access after the lunch
break.

The idea of expiration applies to the case where we want possibly
stale entries to vanish and load newer data upon the next access.  For
example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP.  Is the
catcache based on the same idea with them?  No.

What we want to do is to evict never or infrequently used cache
entries.  That's naturally the task of LRU, isn't it?  Even the high
performance Memcached and Redis uses LRU when the cache is full.  As
Bruce said, we don't have to be worried about the lock contention or
something, because we're talking about the backend local cache.  Are
we worried about the overhead of manipulating the LRU chain?  The
current catcache already does it on every access; it calls
dlist_move_head() to put the accessed entry to the front of the hash
bucket.


> So if we want to address this case too (and we probably want), we
may
> need to discard the old cache memory context someho (e.g. rebuild
the
> cache in a new one, and copy the non-expired entries). Which is a
nice
> opportunity to do the "full" cleanup, of course.

The straightforward, natural, and familiar way is to limit the cache
size, which I mentioned in some previous mail.  We should give the DBA
the ability to control memory usage, rather than considering what to
do after leaving the memory area grow unnecessarily too large.  That's
what a typical "cache" is, isn't it?

https://en.wikipedia.org/wiki/Cache_(computing)

"To be cost-effective and to enable efficient use of data, caches must
be relatively small."


Another relevant suboptimal idea would be to provide each catcache
with a separate memory context, which is the child of
CacheMemoryContext.  This gives slight optimization by using the slab
context (slab.c) for a catcache with fixed-sized tuples.  But that'd
be a bit complex, I'm afraid for PG 12.


Regards
MauMau





Re: [PATCH] pgbench tap tests fail if the path contains a perl special character

2019-02-08 Thread Raúl Marín
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Tested `pgbench-tap-glob-fix-1.patch` with a "+" in the path and the tap tests 
now pass.

Re: Protect syscache from bloating with negative cache entries

2019-02-08 Thread MauMau
From: Alvaro Herrera
> I think seqscanning the hash table is going to be too slow;
Ideriha-san
> idea of having a dlist with the entries in LRU order (where each
entry
> is moved to head of list when it is touched) seemed good: it allows
you
> to evict older ones when the time comes, without having to scan the
rest
> of the entries.  Having a dlist means two more pointers on each
cache
> entry AFAIR, so it's not a huge amount of memory.

Absolutely.  We should try to avoid unpredictable long response time
caused by an occasional unlucky batch processing.  That makes the
troubleshooting when the user asks why they experience unsteady
response time.

Regards
MauMau








Re: ON SELECT rule on a table without columns

2019-02-08 Thread Tom Lane
Andres Freund  writes:
> You misunderstood my point: I'm asking why we shouldn't remove that check 
> from views, rather than adding it to create rule.

+1.  This seems pretty obviously to be something we just missed when
we changed things to allow zero-column tables.

regards, tom lane



Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-02-08 Thread Alexey Kondratov

On 21.01.2019 23:50, a.kondra...@postgrespro.ru wrote:

Thank you for the review! I have updated the patch according to your
comments and remarks. Please, find new version attached.


During the self-reviewing of the code and tests, I discovered some 
problems with build on Windows. New version of the patch is attached and 
it fixes this issue as well as includes some minor code revisions.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 99c6d94f37a797400d41545a271ff111b92e9361 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov 
Date: Fri, 21 Dec 2018 14:00:30 +0300
Subject: [PATCH] pg_rewind: options to use restore_command from
 postgresql.conf or command line.

---
 doc/src/sgml/ref/pg_rewind.sgml   |  30 +-
 src/backend/Makefile  |   4 +-
 src/backend/commands/extension.c  |   1 +
 src/backend/utils/misc/.gitignore |   1 -
 src/backend/utils/misc/Makefile   |   8 -
 src/backend/utils/misc/guc.c  | 434 +--
 src/bin/pg_rewind/Makefile|   2 +-
 src/bin/pg_rewind/parsexlog.c | 166 +-
 src/bin/pg_rewind/pg_rewind.c | 100 +++-
 src/bin/pg_rewind/pg_rewind.h |  10 +-
 src/bin/pg_rewind/t/001_basic.pl  |   4 +-
 src/bin/pg_rewind/t/002_databases.pl  |   4 +-
 src/bin/pg_rewind/t/003_extrafiles.pl |   4 +-
 src/bin/pg_rewind/t/RewindTest.pm |  93 +++-
 src/common/.gitignore |   1 +
 src/common/Makefile   |   9 +-
 src/{backend/utils/misc => common}/guc-file.l | 518 --
 src/include/common/guc-file.h |  50 ++
 src/include/utils/guc.h   |  39 +-
 src/tools/msvc/Mkvcbuild.pm   |   7 +-
 src/tools/msvc/clean.bat  |   2 +-
 21 files changed, 973 insertions(+), 514 deletions(-)
 delete mode 100644 src/backend/utils/misc/.gitignore
 rename src/{backend/utils/misc => common}/guc-file.l (60%)
 create mode 100644 src/include/common/guc-file.h

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 53a64ee29e..0c2441afa7 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -67,8 +67,10 @@ PostgreSQL documentation
ancestor. In the typical failover scenario where the target cluster was
shut down soon after the divergence, this is not a problem, but if the
target cluster ran for a long time after the divergence, the old WAL
-   files might no longer be present. In that case, they can be manually
-   copied from the WAL archive to the pg_wal directory, or
+   files might no longer be present. In that case, they can be automatically
+   copied by pg_rewind from the WAL archive to the 
+   pg_wal directory if either -r or
+   -R option is specified, or
fetched on startup by configuring  or
.  The use of
pg_rewind is not limited to failover, e.g.  a standby
@@ -200,6 +202,30 @@ PostgreSQL documentation
   
  
 
+ 
+  -r
+  --use-postgresql-conf
+  
+   
+Use restore_command in the postgresql.conf to
+retreive missing in the target pg_wal directory
+WAL files from the WAL archive.
+   
+  
+ 
+
+ 
+  -R restore_command
+  --restore-command=restore_command
+  
+   
+Specifies the restore_command to use for retrieval of the missing
+in the target pg_wal directory WAL files from
+the WAL archive.
+   
+  
+ 
+
  
   --debug
   
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 478a96db9b..721cb57e89 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -186,7 +186,7 @@ distprep:
 	$(MAKE) -C replication	repl_gram.c repl_scanner.c syncrep_gram.c syncrep_scanner.c
 	$(MAKE) -C storage/lmgr	lwlocknames.h lwlocknames.c
 	$(MAKE) -C utils	distprep
-	$(MAKE) -C utils/misc	guc-file.c
+	$(MAKE) -C common	guc-file.c
 	$(MAKE) -C utils/sort	qsort_tuple.c
 
 
@@ -307,7 +307,7 @@ maintainer-clean: distclean
 	  replication/syncrep_scanner.c \
 	  storage/lmgr/lwlocknames.c \
 	  storage/lmgr/lwlocknames.h \
-	  utils/misc/guc-file.c \
+	  common/guc-file.c \
 	  utils/sort/qsort_tuple.c
 
 
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index daf3f51636..195eb8a821 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -50,6 +50,7 @@
 #include "commands/defrem.h"
 #include "commands/extension.h"
 #include "commands/schemacmds.h"
+#include "common/guc-file.h"
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
diff --git a/src/backend/utils/misc/.gitignore b/src/backend/utils/misc/.gitignore
deleted file mode 100644
index 495b1aec76..00
--- a/src/backend/utils/misc/.gitignore
+++ /dev/null

Patch for SortSupport implementation on inet/cdir

2019-02-08 Thread Brandur Leach
Hi list,

I've attached a patch that implements SortSupport for the
inet/cidr types. It has the effect of typically reducing
the time taken to sort these types by ~50-60% (as measured
by `SELECT COUNT(DISTINCT ...)` which will carry over to
common operations like index creation, `ORDER BY`, and
`DISTINCT`.

As with other SortSupport implementations, this one
proposes an abbreviated key design that packs in as much
sorting-relevant information into the available datum as
possible while remaining faithful to the existing sorting
rules for the types. The key format depends on datum size
and whether working with IPv4 or IPv6. In most cases that
involves storing as many netmask bits as we can fit, but we
can do something a little more complete with IPv4 on 64-bit
— because inet data is small and the datum is large, we
can store enough information for a successful key-only
comparison in the majority of cases. Precise details
including background and bit-level diagrams are included in
the comments of the attached patch.

I've tried to take a variety of steps to build confidence
that the proposed SortSupport-based sort is correct:

1. It passes the existing inet regression suite (which was
   pretty complete already).

2. I added a few new test cases the suite, specifically
   trying to target edge cases like the minimums and
   maximums of each abbreviated key bit "boundary". The new
   cases were run against the master branch to double-check
   that they're right.

3. I've added a variety of assertions throughout the
   implementation to make sure that each step is seeing
   inputs with expected parameters, and only manipulates
   the bits that it's supposed to be manipulating.

4. I built large random data sets (10M rows) and sorted
   them against a development build to try and trigger the
   aforementioned assertions.

5. I built an index on 10M values and ran amcheck against
   it.

6. I wrote some unit tests to verify that the bit-level
   representation of the new abbreviated keys are indeed
   what we expect. They're available here [3]. I didn't
   include them in the patch because I couldn't find an
   easy way to produce an expected `.out` file for a 32-bit
   machine (experiments building with `setarch` didn't
   succeed). They might be overkill anyway. I can continue
   to pursue getting them working if reviewers think they'd
   be useful.

My benchmarking methodology and script is available here
[1], and involves gathering statistics for 100
`count(distinct ...)` queries at various data sizes. I've
saved the results I got on my machine here [2].

Hat tip to Peter Geoghegan for advising on an appropriate
abbreviated key design and helping answer many questions
along the way.

Brandur


SortSupport-for-inet-cidr-types-v1.patch
Description: Binary data


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Tom Lane
I wrote:
> I believe that we need to establish the following principles:
>
> * The terminology "internal_auto" is disastrously unhelpful.
> I propose calling these things "partition" dependencies instead.
>
> * Partition dependencies are not singletons.  They should *always*
> come in pairs, one on the parent partitioned object (partitioned
> index, constraint, trigger, etc) and one on the child partitioned
> table.  (Have I mentioned that our partition/partitioned terminology
> is also a disaster?)  Maybe someday there'll be a reason to have
> three or more of these for the same dependent object, but there's
> no reason to have just one --- you should use an internal dependency
> instead for that case.
>
> * Partition dependencies are made *in addition to*, not instead of,
> the dependencies the object would normally have.  In this way,
> for example, the unlink action in IndexSetParentIndex would just
> delete the partition dependencies and not have to worry about putting
> back the index's usual dependencies.  Consistent with that, it's
> simply wrong that index_create sometimes marks the "usual"
> dependencies as internal_auto rather than auto.  (It wasn't even
> doing that consistently; expression and predicate column dependencies
> were still "auto", which makes no sense at all.)  They should go
> back to being plain auto with the partition dependencies being added
> separately.  That will work properly given the changes that say that
> arriving at a partition-dependent object via an auto dependency is
> not sufficient license to delete the object.

Here's a patch along these lines.  Some notes worth making:

* After spending a fair amount of effort on the description of the
dependency types in catalogs.sgml, I decided to just rip out the
duplicative text in dependency.h in favor of a pointer to catalogs.sgml.
If anybody's really hot to have two copies, we could put that back, but
I don't see the point.

* The core idea of the changes in dependency.c is just to wait till the
end of the entire dependency tree traversal, and then (before we start
actually deleting anything) check to make sure that each targeted
partition-dependent object has been reached via a partition-type
dependency, implying that at least one of its partition owners was
deleted.  The existing data structure handles this nicely, we just
need a couple more flag bits for the specific need.  (BTW, I experimented
with whether we could handle internal dependencies the same way; but
it didn't work, because of the previously-poorly-documented fact that
an internal dependency is immediately turned into a reverse-direction
recursion.  We can't really muck with the dependency traversal order,
or we find ourselves deleting tables before their indices, etc.)

* I found two different ways in which dependency.c was still producing
traversal-order-sensitive results.  One we already understood is that the
first loop in findDependentObjects threw an error for the first internal
or partition dependency it came across; since an object can have both of
those, the results varied.  This was fixed by postponing the actual error
report to after the loop and adding an explicit preference order for what
to report.

* The other such issue is a pre-existing bug, which maybe we ought to
back-patch, though I can't recall seeing any field reports that seem
to match it: after recursing to an internal/extension dependency,
we need to ensure that whatever objflags were passed down to our level
get applied to the targetObjects entry for the current object.  Otherwise
the final state of the entry can vary depending on traversal order
(since orders in which the current call sees the object already in
targetObjects, or on the stack, would apply the objflags).  This also
provides a chance to verify, not just assume, that processing of the
internal/extension dependency deleted the current object.  As I mentioned
the other day, failing to ensure that the current object gets deleted
is very bad, and the pg_depend network is no longer so immutable that
we really ought to just assume that control came back to our object.

* The changes outside dependency.c just amount to applying the rules
stated above about how to use partition dependencies.  I reverted the
places where v11 had decided that partition dependencies could be
substituted for auto dependencies, and made sure they are always
created in pairs.


There are a couple of things I didn't do here because those parts
of the patch were written while I still had some hope of applying
it to v11.  Given that a catversion bump is needed anyway due to
the changes in what pg_depend entries are created for partitions,
we could change these:

* I renamed the dependency type to DEPENDENCY_PARTITION internally,
but left its pg_depend representation as 'I'.  In a green field
we'd probably have made it 'P' instead.

* We could get rid of partition_dependency_matches(), which is not
really anything but a kluge, by splitting DEPENDENCY_

Re: libpq compression

2019-02-08 Thread Robbie Harwood
Konstantin Knizhnik  writes:

> On 08.02.2019 10:01, Iwata, Aya wrote:
>
>>> I fixed all issues you have reported except using list of supported
>>> compression algorithms.
>>
>> Sure. I confirmed that.
>>
>>> It will require extra round of communication between client and
>>> server to make a decision about used compression algorithm.
>>
>> In beginning of this thread, Robbie Harwood said that no extra
>> communication needed.  I think so, too.
>
> Well, I think that this problem is more complex and requires more 
> discussion.
> There are three places determining choice of compression algorithm:
> 1. Specification of compression algorithm by client. Right now it is 
> just boolean "compression" parameter in connection string,
> but it is obviously possible to specify concrete algorithm here.
> 2. List of compression algorithms supported by client.
> 3. List of compression algorithms supported by server.
>
> Concerning first option I have very serious doubt that it is good idea 
> to let client choose compression protocol.
> Without extra round-trip it can be only implemented in this way:
> if client toggles compression option in connection string, then libpq 
> includes in startup packet list of supported compression algorithms.
> Then server intersects this list with its own set of supported 
> compression algorithms and if result is not empty, then
> somehow choose  one of the commonly supported algorithms and sends it to 
> the client with 'z' command.

The easiest way, which I laid out earlier in
id:jlgfu1gqjbk@redhat.com, is to have the server perform selection.
The client sends a list of supported algorithms in startup.  Startup has
a reply, so if the server wishes to use compression, then its startup
reply contains which algorithm to use.  Compression then begins after
startup.

If you really wanted to compress half the startup for some reason, you
can even have the server send a packet which consists of the choice of
compression algorithm and everything else in it subsequently
compressed.  I don't see this being useful.  However, you can use a
similar approach to let the client choose the algorithm if there were
some compelling reason for that (there is none I'm aware of to prefer
one to the other) - startup from client requests compression, reply from
server lists supported algorithms, next packet from client indicates
which one is in use along with compressed payload.

It may help to keep in mind that you are defining your own message type
here.

> Frankly speaking, I do not think that such flexibility in choosing 
> compression algorithms is really needed.
> I do not expect that there will be many situations where old client has 
> to communicate with new server or visa versa.
> In most cases both client and server belongs to the same postgres 
> distributive and so implements the same compression algorithm.
> As far as we are compressing only temporary data (traffic), the problem 
> of providing backward compatibility seems to be not so important.

Your comments have been heard, but this is the model that numerous folks
from project has told you we have.  Your code will not pass review
without algorithm agility.

>> src/backend/libpq/pqcomm.c :
>> In current Postgres source code, pq_recvbuf() calls secure_read()
>> and pq_getbyte_if_available() also calls secure_read().
>> It means these functions are on the same level.
>> However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
>> and  pq_recvbuf() calls secure_read(). The level of these functions is 
>> different.
>>
>> I think the purpose of pq_getbyte_if_available() is to get a
>> character if it exists and the purpose of pq_recvbuf() is to acquire
>> data up to the expected length.  In your change,
>> pq_getbyte_if_available() may have to do unnecessary process waiting
>> or something.
>
> Sorry, but this change is essential. We can have some available data
> in compression buffer and we need to try to fetch it in
> pq_getbyte_if_available() instead of just returning EOF.

Aya is correct about the purposes of these functions.  Take a look at
how the buffering in TLS or GSSAPI works for an example of how to do
this correctly.

As with agility, this is what multiple folks from the project have told
you is a hard requirement.  None of us will be okaying your code without
proper transport layering.

Thanks,
--Robbie


signature.asc
Description: PGP signature


Re: ON SELECT rule on a table without columns

2019-02-08 Thread Ashutosh Sharma
On Fri, Feb 8, 2019 at 7:55 PM Tom Lane  wrote:
>
> Andres Freund  writes:
> > You misunderstood my point: I'm asking why we shouldn't remove that check 
> > from views, rather than adding it to create rule.
>
> +1.  This seems pretty obviously to be something we just missed when
> we changed things to allow zero-column tables.
>

Thanks Andres for bringing up that point and thanks Tom for the confirmation.

Attached is the patch that allows us to create view on a table without
columns. I've also added some test-cases for it in create_view.sql.
Please have a look and let me know your opinion.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index 65f4b40..c49ae97 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -111,10 +111,6 @@ DefineVirtualRelation(RangeVar *relation, List *tlist, bool replace,
 		}
 	}
 
-	if (attrList == NIL)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
- errmsg("view must have at least one column")));
 
 	/*
 	 * Look up, check permissions on, and lock the creation namespace; also
diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out
index 141fc6d..ee41c40 100644
--- a/src/test/regress/expected/create_view.out
+++ b/src/test/regress/expected/create_view.out
@@ -1706,9 +1706,16 @@ select pg_get_ruledef(oid, true) from pg_rewrite
  43 AS col_b;
 (1 row)
 
+-- create view on a table without columns
+create table t0();
+create view v0 as select * from t0;
+select * from v0;
+--
+(0 rows)
+
 -- clean up all the random objects we made above
 \set VERBOSITY terse \\ -- suppress cascade details
 DROP SCHEMA temp_view_test CASCADE;
 NOTICE:  drop cascades to 27 other objects
 DROP SCHEMA testviewschm2 CASCADE;
-NOTICE:  drop cascades to 62 other objects
+NOTICE:  drop cascades to 64 other objects
diff --git a/src/test/regress/sql/create_view.sql b/src/test/regress/sql/create_view.sql
index 9480030..e5ca690 100644
--- a/src/test/regress/sql/create_view.sql
+++ b/src/test/regress/sql/create_view.sql
@@ -580,6 +580,11 @@ select pg_get_viewdef('tt23v', true);
 select pg_get_ruledef(oid, true) from pg_rewrite
   where ev_class = 'tt23v'::regclass and ev_type = '1';
 
+-- create view on a table without columns
+create table t0();
+create view v0 as select * from t0;
+select * from v0;
+
 -- clean up all the random objects we made above
 \set VERBOSITY terse \\ -- suppress cascade details
 DROP SCHEMA temp_view_test CASCADE;


Re: libpq compression

2019-02-08 Thread Konstantin Knizhnik




On 08.02.2019 12:33, Daniel Gustafsson wrote:

On 8 Feb 2019, at 10:15, Konstantin Knizhnik  wrote:
Frankly speaking, I do not think that such flexibility in choosing compression 
algorithms is really needed.
I do not expect that there will be many situations where old client has to 
communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres distributive 
and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the problem of 
providing backward compatibility seems to be not so important.

I don’t think this assumption is entirely valid, and would risk unnecessary
breakage.


I am also not sure in this assumption. We (PostgresPro) are having some 
issues now with CFS (file level compression of Postgres database).

Some build are using zstd, some are using zlib (default)...
zstd is faster and provides better compression ratio and is available at 
most platforms.
But zlib is available almost everywhere and is used by Postgres by 
default...
The only thing I know for sure: if we implement several algorithms and 
make it possible for database user to make a choice, then

we will get much more problems.

Right now Postgres is using zlib as the only supported compression 
algorithm in many places.
So may be libpq compression should also use only zlib and provide no 
other choices?


Concerning backward compatibility. Assume that we allow use zstd, but 
then Facebook change zstd license or some critical bug in it is found.
So we will have to exclude dependency on zstd. So there will be no 
backward compatibility in any case, even if we support more 
sophisticated negotiation between client and server in choosing 
compression algorithm.


What can be more interesting - is to support custom compression 
algorithms (optimized for the particular data flow).
But it seems to be completely different and much more sophisticated 
story. We have to provide some mechanism for loading foreign libraries 
at client side!

IMHO it is overkill.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Add client connection check during the execution of the query

2019-02-08 Thread s . cherkashin
The purpose of this patch is to stop the execution of continuous 
requests in case of a disconnection from the client. In most cases, the 
client must wait for a response from the server before sending new data 
- which means there should not remain unread data on the socket and we 
will be able to determine a broken connection.
Perhaps exceptions are possible, but I could not think of such a use 
case (except COPY). I would be grateful if someone could offer such 
cases or their solutions.
I added a test for the GUC variable when the client connects via SSL, 
but I'm not sure that this test is really necessary.


Best regards,
Sergey Cherkashin.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b6f5822b84..36d031df3c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8259,6 +8259,26 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+Sets a time interval, in milliseconds, between periodic
+verification of client-server connection during query execution.
+If the client aborts the connection, the query is terminated.
+   
+   
+Default value is zero - it disables connection
+checks, so the backend will detect client disconnection only when trying
+to send a response to the query.
+   
+  
+ 
+
  
 

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index c39617a430..7e43734845 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -120,6 +120,7 @@
  */
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
+int 		client_connection_check_interval;
 
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
@@ -1926,3 +1927,33 @@ pq_setkeepalivescount(int count, Port *port)
 
 	return STATUS_OK;
 }
+
+/* 
+ *	pq_check_client_connection - check if client connected to socket or not
+ * 
+ */
+void pq_check_client_connection(void)
+{
+	CheckClientConnectionPending = false;
+	if (IsUnderPostmaster &&
+		MyProcPort != NULL && !PqCommReadingMsg && !PqCommBusy)
+	{
+		char nextbyte;
+		int r;
+
+#ifdef WIN32
+		pgwin32_noblock = 1;
+#endif
+		r = recv(MyProcPort->sock, &nextbyte, 1, MSG_PEEK);
+#ifdef WIN32
+		pgwin32_noblock = 0;
+#endif
+
+		if (r == 0 || (r == -1 &&
+			errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR))
+		{
+			ClientConnectionLost = true;
+			InterruptPending = true;
+		}
+	}
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e773f20d9f..0230a0fdd0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3030,6 +3030,8 @@ ProcessInterrupts(void)
 	(errcode(ERRCODE_ADMIN_SHUTDOWN),
 	 errmsg("terminating connection due to administrator command")));
 	}
+	if (CheckClientConnectionPending)
+		pq_check_client_connection();
 	if (ClientConnectionLost)
 	{
 		QueryCancelPending = false; /* lost connection trumps QueryCancel */
@@ -4208,6 +4210,9 @@ PostgresMain(int argc, char *argv[],
 		 */
 		CHECK_FOR_INTERRUPTS();
 		DoingCommandRead = false;
+		if (client_connection_check_interval)
+			enable_timeout_after(SKIP_CLIENT_CHECK_TIMEOUT,
+ client_connection_check_interval);
 
 		/*
 		 * (5) turn off the idle-in-transaction timeout
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index fd51934aaf..c15aef3793 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -30,6 +30,7 @@ ProtocolVersion FrontendProtocol;
 volatile sig_atomic_t InterruptPending = false;
 volatile sig_atomic_t QueryCancelPending = false;
 volatile sig_atomic_t ProcDiePending = false;
+volatile sig_atomic_t CheckClientConnectionPending = false;
 volatile sig_atomic_t ClientConnectionLost = false;
 volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index c0b6231458..45bc8babbb 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -34,6 +34,7 @@
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
 #include "libpq/auth.h"
+#include "libpq/libpq.h"
 #include "libpq/libpq-be.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -72,6 +73,7 @@ static void ShutdownPostgres(int code, Datum arg);
 static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
+static void ClientCheckTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oi

Re: libpq compression

2019-02-08 Thread Konstantin Knizhnik




On 08.02.2019 19:26, Robbie Harwood wrote:

Konstantin Knizhnik  writes:


On 08.02.2019 10:01, Iwata, Aya wrote:


I fixed all issues you have reported except using list of supported
compression algorithms.

Sure. I confirmed that.


It will require extra round of communication between client and
server to make a decision about used compression algorithm.

In beginning of this thread, Robbie Harwood said that no extra
communication needed.  I think so, too.

Well, I think that this problem is more complex and requires more
discussion.
There are three places determining choice of compression algorithm:
1. Specification of compression algorithm by client. Right now it is
just boolean "compression" parameter in connection string,
but it is obviously possible to specify concrete algorithm here.
2. List of compression algorithms supported by client.
3. List of compression algorithms supported by server.

Concerning first option I have very serious doubt that it is good idea
to let client choose compression protocol.
Without extra round-trip it can be only implemented in this way:
if client toggles compression option in connection string, then libpq
includes in startup packet list of supported compression algorithms.
Then server intersects this list with its own set of supported
compression algorithms and if result is not empty, then
somehow choose  one of the commonly supported algorithms and sends it to
the client with 'z' command.

The easiest way, which I laid out earlier in
id:jlgfu1gqjbk@redhat.com, is to have the server perform selection.
The client sends a list of supported algorithms in startup.  Startup has
a reply, so if the server wishes to use compression, then its startup
reply contains which algorithm to use.  Compression then begins after
startup.

If you really wanted to compress half the startup for some reason, you
can even have the server send a packet which consists of the choice of
compression algorithm and everything else in it subsequently
compressed.  I don't see this being useful.  However, you can use a
similar approach to let the client choose the algorithm if there were
some compelling reason for that (there is none I'm aware of to prefer
one to the other) - startup from client requests compression, reply from
server lists supported algorithms, next packet from client indicates
which one is in use along with compressed payload.
I already replied you that that next package cannot indicate which 
algorithm the client has choosen.

Using magics or some other tricks are not reliable and acceptable solution/
It can be done only by introducing extra message type.

Actually it is not really needed, because if client sends to the server 
list of supported algorithms in startup packet,
then server can made a decision and inform client about it (using 
special message type as it is done now).
Then client doesn't need to make a choice. I can do it if everybody 
think that choice of compression algorithm is so important feature.
Just wonder: Postgres now is living good with hardcoded zlib and 
built-in LZ compression algorithm implementation.



It may help to keep in mind that you are defining your own message type
here.


Frankly speaking, I do not think that such flexibility in choosing
compression algorithms is really needed.
I do not expect that there will be many situations where old client has
to communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the problem
of providing backward compatibility seems to be not so important.

Your comments have been heard, but this is the model that numerous folks
from project has told you we have.  Your code will not pass review
without algorithm agility.


src/backend/libpq/pqcomm.c :
In current Postgres source code, pq_recvbuf() calls secure_read()
and pq_getbyte_if_available() also calls secure_read().
It means these functions are on the same level.
However in your change, pq_getbyte_if_available() calls pq_recvbuf(),
and  pq_recvbuf() calls secure_read(). The level of these functions is 
different.

I think the purpose of pq_getbyte_if_available() is to get a
character if it exists and the purpose of pq_recvbuf() is to acquire
data up to the expected length.  In your change,
pq_getbyte_if_available() may have to do unnecessary process waiting
or something.

Sorry, but this change is essential. We can have some available data
in compression buffer and we need to try to fetch it in
pq_getbyte_if_available() instead of just returning EOF.

Aya is correct about the purposes of these functions.  Take a look at
how the buffering in TLS or GSSAPI works for an example of how to do
this correctly.

As with agility, this is what multiple folks from the project have told
you is a hard requirement.  None of us will be okaying your code without
proper transport layering.

Guys, 

Re: pgsql: Remove references to Majordomo

2019-02-08 Thread Magnus Hagander
On Sat, Feb 2, 2019 at 9:18 AM Noah Misch  wrote:

> On Mon, Jan 28, 2019 at 07:29:39PM +0100, Magnus Hagander wrote:
> > On Mon, Jan 28, 2019 at 7:26 PM Tom Lane  wrote:
> > > Stephen Frost  writes:
> > > >> On Sun, Jan 27, 2019 at 2:28 AM Noah Misch 
> wrote:
> > > >>> What are those blocked infrastructure improvements?
> > >
> > > > The specific improvements we're talking about are DKIM/DMARC/SPF,
> which
> > > > is becoming more and more important to making sure that the email
> from
> > > > our lists can actually get through to the subscribers.
> > >
> > > Certainly those are pretty critical.  But can you give us a quick
> > > refresher on why dropping the @postgresql.org list aliases is
> > > necessary for that?  I thought we'd already managed to make the
> > > lists compliant with those specs.
> >
> > I believe it doesn't, as Stephen also agreed with upthread.
> >
> > We needed to move our *sending* out of the postgresql.org domain in
> order
> > to be able to treat them differently. But there is nothing preventing us
> > from receiving to e.g. pgsql-b...@postgresql.org and internally forward
> it
> > to @lists.postgresql.org, where we then deliver from.
> >
> > I believe we *can* do the same for all lists, but that part is more a
> > matter of cleaning up our infrastructure, which has a fair amount of
> cruft
> > to deal with those things. We have an easy workaround for a couple of
> lists
> > which owuld take only a fairly small amount of traffic over it, but we'd
> > like to get rid of the cruft to deal with the large batch of them.
>
> Ceasing to accept mail at pgsql-...@postgresql.org would cause a concrete,
> user-facing loss in that users replying to old messages would get a bounce.
> Also, I find pgsql-...@lists.postgresql.org uglier, since "lists" adds
> negligible information.  (The same is true of "pgsql", alas.)  If the cost
> of
> keeping pgsql-...@postgresql.org is limited to "cruft", I'd prefer to keep
> pgsql-...@postgresql.org indefinitely.
>

It very specifically *does* convey important information. It may not do so
to you, but posting to an @lists. domain is something that
implies that you understand you are posting to a list, more or less. Thus
it makes a big difference when it comes to things like GDPR, per the
information we have received from people who know a lot more about that
than we do. That part only applies to lists that are being delivered and
archived publicly.

I had forgotten about that part and went back to my notes.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
Why don't we provide a small reserved OID range that can be used by
patch authors temporarily, with the expectation that they'll be
replaced by "real" OIDs at the point the patch gets committed? This
would be similar the situation with catversion bumps -- we don't
expect patches that will eventually need them to have them.

It's considered good practice to choose an OID that's at the beginning
of the range shown by the unused_oids script, so naturally there is a
good chance that any patch that adds a system catalog entry will bit
rot prematurely. This seems totally unnecessary to me. You could even
have a replace_oids script under this system. That would replace the
known-temporary OIDs with mapped contiguous real values at the time of
commit (maybe it would just print out which permanent OIDs to use in
place of the temporary ones, and leave the rest up to the committer).
I don't do Perl, so I'm not volunteering for this.

-- 
Peter Geoghegan



Re: ON SELECT rule on a table without columns

2019-02-08 Thread Tom Lane
Ashutosh Sharma  writes:
> Attached is the patch that allows us to create view on a table without
> columns. I've also added some test-cases for it in create_view.sql.
> Please have a look and let me know your opinion.

Haven't read the patch, but a question seems in order here: should
we regard this as a back-patchable bug fix?  The original example
shows that it's possible to create a zero-column view in existing
releases, which I believe would then lead to dump/reload failures.
So that seems to qualify as a bug not just a missing feature.
On the other hand, given the lack of field complaints, maybe it's
not worth the trouble to back-patch.  I don't have a strong
opinion either way.

BTW, has anyone checked on what the matview code paths will do?
Or SELECT INTO?

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> Why don't we provide a small reserved OID range that can be used by
> patch authors temporarily, with the expectation that they'll be
> replaced by "real" OIDs at the point the patch gets committed? This
> would be similar the situation with catversion bumps -- we don't
> expect patches that will eventually need them to have them.

Quite a few people have used OIDs up around 8000 or 9000 for this purpose;
I doubt we need a formally reserved range for it.  The main problem with
doing it is the hazard that the patch'll get committed just like that,
suddenly breaking things for everyone else doing likewise.

(I would argue, in fact, that the reason we have any preassigned OIDs
above perhaps 6000 is that exactly this has happened before.)

A script such as you suggest might be a good way to reduce the temptation
to get lazy at the last minute.  Now that the catalog data is pretty
machine-readable, I suspect it wouldn't be very hard --- though I'm
not volunteering either.  I'm envisioning something simple like "renumber
all OIDs in range - into range -", perhaps with the
ability to skip any already-used OIDs in the target range.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 10:14 AM Tom Lane  wrote:
> A script such as you suggest might be a good way to reduce the temptation
> to get lazy at the last minute.  Now that the catalog data is pretty
> machine-readable, I suspect it wouldn't be very hard --- though I'm
> not volunteering either.  I'm envisioning something simple like "renumber
> all OIDs in range - into range -", perhaps with the
> ability to skip any already-used OIDs in the target range.

I imagined that the machine-readable catalog data would allow us to
assign non-numeric identifiers to this OID range. Perhaps there'd be a
textual symbol with a number in the range of 0-20 at the end. Those
would stick out like a sore thumb, making it highly unlikely that
anybody would forget about it at the last minute.

-- 
Peter Geoghegan



Re: Patch for SortSupport implementation on inet/cdir

2019-02-08 Thread Brandur Leach
Attached a V2 patch: identical to V1 except rebased and
with a new OID selected.


SortSupport-for-inet-cidr-types-v2.patch
Description: Binary data


Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Feb 8, 2019 at 10:14 AM Tom Lane  wrote:
>> A script such as you suggest might be a good way to reduce the temptation
>> to get lazy at the last minute.  Now that the catalog data is pretty
>> machine-readable, I suspect it wouldn't be very hard --- though I'm
>> not volunteering either.  I'm envisioning something simple like "renumber
>> all OIDs in range - into range -", perhaps with the
>> ability to skip any already-used OIDs in the target range.

> I imagined that the machine-readable catalog data would allow us to
> assign non-numeric identifiers to this OID range. Perhaps there'd be a
> textual symbol with a number in the range of 0-20 at the end. Those
> would stick out like a sore thumb, making it highly unlikely that
> anybody would forget about it at the last minute.

Um.  That would not be just an add-on script but something that
genbki.pl would have to accept.  I'm not excited about that; it would
complicate what's already complex, and if it works enough for test
purposes then it wouldn't really stop a committer who wasn't paying
attention from committing the patch un-revised.

To the extent that this works at all, OIDs in the 9000 range ought
to be enough of a flag already, I think.

regards, tom lane



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 10:29 AM Tom Lane  wrote:
> Um.  That would not be just an add-on script but something that
> genbki.pl would have to accept.  I'm not excited about that; it would
> complicate what's already complex, and if it works enough for test
> purposes then it wouldn't really stop a committer who wasn't paying
> attention from committing the patch un-revised.
>
> To the extent that this works at all, OIDs in the 9000 range ought
> to be enough of a flag already, I think.

I tend to agree that this isn't enough of a problem to justify making
genbki.pl significantly more complicated.

-- 
Peter Geoghegan



Re: ON SELECT rule on a table without columns

2019-02-08 Thread Ashutosh Sharma
On Fri, Feb 8, 2019 at 11:32 PM Tom Lane  wrote:
>
> Ashutosh Sharma  writes:
> > Attached is the patch that allows us to create view on a table without
> > columns. I've also added some test-cases for it in create_view.sql.
> > Please have a look and let me know your opinion.
>
> Haven't read the patch, but a question seems in order here: should
> we regard this as a back-patchable bug fix?  The original example
> shows that it's possible to create a zero-column view in existing
> releases, which I believe would then lead to dump/reload failures.
> So that seems to qualify as a bug not just a missing feature.
> On the other hand, given the lack of field complaints, maybe it's
> not worth the trouble to back-patch.  I don't have a strong
> opinion either way.
>

In my opinion, this looks like a bug fix that needs to be back ported,
else, we might encounter dump/restore failure in some cases, like the
one described in the first email.

> BTW, has anyone checked on what the matview code paths will do?
> Or SELECT INTO?
>

I just checked on that and found that both mat view and SELECT INTO
statement works like CREATE TABLE AS command and it doesn't really
care about the target list of the source table unlike normal views
which would error out when the source table has no columns.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com



Re: libpq compression

2019-02-08 Thread Andres Freund
On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:
> Frankly speaking, I do not think that such flexibility in choosing
> compression algorithms is really needed.
> I do not expect that there will be many situations where old client has to
> communicate with new server or visa versa.
> In most cases both client and server belongs to the same postgres
> distributive and so implements the same compression algorithm.
> As far as we are compressing only temporary data (traffic), the problem of
> providing backward compatibility seems to be not so important.

I think we should outright reject any patch without compression type
negotiation.



Re: Protect syscache from bloating with negative cache entries

2019-02-08 Thread Tomas Vondra



On 2/8/19 2:27 PM, MauMau wrote:
> From: Tomas Vondra
>> I don't think we need to remove the expired entries right away, if
>> there are only very few of them. The cleanup requires walking the
>> hash table, which means significant fixed cost. So if there are
>> only few expired entries (say, less than 25% of the cache), we can
>> just leave them around and clean them if we happen to stumble on
>> them (although that may not be possible with dynahash, which has no
>> concept of expiration) of before enlarging the hash table.
> 
> I agree in that we don't need to evict cache entries as long as the 
> memory permits (within the control of the DBA.)
> 
> But how does the concept of expiration fit the catcache?  How would 
> the user determine the expiration time, i.e. setting of 
> syscache_prune_min_age?  If you set a small value to evict
> unnecessary entries faster, necessary entries will also be evicted.
> Some access counter would keep accessed entries longer, but some idle
> time (e.g. lunch break) can flush entries that you want to access
> after the lunch break.
> 

I'm not sure what you mean by "necessary" and "unnecessary" here. What
matters is how often an entry is accessed - if it's accessed often, it
makes sense to keep it in the cache. Otherwise evict it. Entries not
accessed for 5 minutes are clearly not accessed very often, so and
getting rid of them will not hurt the cache hit ratio very much.

So I agree with Robert a time-based approach should work well here. It
does not have the issues with setting exact syscache size limit, it's
kinda self-adaptive etc.

In a way, this is exactly what the 5 minute rule [1] says about caching.

[1] http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf


> The idea of expiration applies to the case where we want possibly 
> stale entries to vanish and load newer data upon the next access.
> For example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP.
> Is the catcache based on the same idea with them?  No.
> 

I'm not sure what has this to do with those other databases.

> What we want to do is to evict never or infrequently used cache
> entries.  That's naturally the task of LRU, isn't it?  Even the high
> performance Memcached and Redis uses LRU when the cache is full.  As
> Bruce said, we don't have to be worried about the lock contention or
> something, because we're talking about the backend local cache.  Are
> we worried about the overhead of manipulating the LRU chain?  The
> current catcache already does it on every access; it calls
> dlist_move_head() to put the accessed entry to the front of the hash
> bucket.
> 

I'm certainly worried about the performance aspect of it. The syscache
is in a plenty of hot paths, so adding overhead may have significant
impact. But that depends on how complex the eviction criteria will be.

And then there may be cases conflicting with the criteria, i.e. running
into just-evicted entries much more often. This is the issue with the
initially proposed hard limits on cache sizes, where it'd be trivial to
under-size it just a little bit.

> 
>> So if we want to address this case too (and we probably want), we 
>> may need to discard the old cache memory context somehow (e.g. 
>> rebuild the cache in a new one, and copy the non-expired entries). 
>> Which is a nice opportunity to do the "full" cleanup, of course.
> 
> The straightforward, natural, and familiar way is to limit the cache 
> size, which I mentioned in some previous mail.  We should give the
> DBA the ability to control memory usage, rather than considering what
> to do after leaving the memory area grow unnecessarily too large.
> That's what a typical "cache" is, isn't it?
> 

Not sure which mail you're referring to - this seems to be the first
e-mail from you in this thread (per our archives).

I personally don't find explicit limit on cache size very attractive,
because it's rather low-level and difficult to tune, and very easy to
get it wrong (at which point you fall from a cliff). All the information
is in backend private memory, so how would you even identify syscache is
the thing you need to tune, or how would you determine the correct size?

> https://en.wikipedia.org/wiki/Cache_(computing)
> 
> "To be cost-effective and to enable efficient use of data, caches must
> be relatively small."
> 

Relatively small compared to what? It's also a question of how expensive
cache misses are.

> 
> Another relevant suboptimal idea would be to provide each catcache
> with a separate memory context, which is the child of
> CacheMemoryContext.  This gives slight optimization by using the slab
> context (slab.c) for a catcache with fixed-sized tuples.  But that'd
> be a bit complex, I'm afraid for PG 12.
> 

I don't know, but that does not seem very attractive. Each memory
context has some overhead, and it does not solve the issue of never
releasing memory to the OS. So we'd still have to rebuild the contexts
at some point, I'm afraid.

regards

-- 
Tomas Vo

Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Alvaro Herrera
On 2019-Feb-08, Tom Lane wrote:

> Also, I came across some coding in CloneFkReferencing() that looks fishy
> as hell: that function imagines that it can delete an existing trigger
> with nothing more than a summary CatalogTupleDelete().  I didn't do
> anything about that here, but if it's not broken, I'd like to see an
> explanation why not.  I added a comment complaining about the lack of
> pg_depend cleanup, and there's also the question of whether we don't
> need to broadcast a relcache inval for the trigger's table.

Oops, this is new code in 0464fdf07f69 (Jan 21st).  Unless you object,
I'll study a fix for this now, to avoid letting it appear in the minor
next week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 8:15 AM Tom Lane  wrote:
> > * Partition dependencies are not singletons.  They should *always*
> > come in pairs, one on the parent partitioned object (partitioned
> > index, constraint, trigger, etc) and one on the child partitioned
> > table.

> > * Partition dependencies are made *in addition to*, not instead of,
> > the dependencies the object would normally have.

> Here's a patch along these lines.

Formality: Unsurprisingly, your patch removes any need for the nbtree
patch series to paper-over test failures in indexing.out and
triggers.out, without creating any new issues with the regression
tests (a tiny number of harmless noise changes are needed -- no change
there). I can confirm that your patch fixes remaining regression test
breakage of the general variety described in my original e-mail of
November 5th. Once you commit this patch, I won't need to include any
kind of wonky workaround with each new revision of the patch series.

Many thanks for your help here.

> * After spending a fair amount of effort on the description of the
> dependency types in catalogs.sgml, I decided to just rip out the
> duplicative text in dependency.h in favor of a pointer to catalogs.sgml.

+1 -- I see no need for repetition. In general, the pg_depend docs
seem easier to follow now.

> * The core idea of the changes in dependency.c is just to wait till the
> end of the entire dependency tree traversal, and then (before we start
> actually deleting anything) check to make sure that each targeted
> partition-dependent object has been reached via a partition-type
> dependency, implying that at least one of its partition owners was
> deleted.  The existing data structure handles this nicely, we just
> need a couple more flag bits for the specific need.  (BTW, I experimented
> with whether we could handle internal dependencies the same way; but
> it didn't work, because of the previously-poorly-documented fact that
> an internal dependency is immediately turned into a reverse-direction
> recursion.  We can't really muck with the dependency traversal order,
> or we find ourselves deleting tables before their indices, etc.)
>
> * I found two different ways in which dependency.c was still producing
> traversal-order-sensitive results.  One we already understood is that the
> first loop in findDependentObjects threw an error for the first internal
> or partition dependency it came across; since an object can have both of
> those, the results varied.  This was fixed by postponing the actual error
> report to after the loop and adding an explicit preference order for what
> to report.

Makes sense.

> * The other such issue is a pre-existing bug, which maybe we ought to
> back-patch, though I can't recall seeing any field reports that seem
> to match it: after recursing to an internal/extension dependency,
> we need to ensure that whatever objflags were passed down to our level
> get applied to the targetObjects entry for the current object.  Otherwise
> the final state of the entry can vary depending on traversal order
> (since orders in which the current call sees the object already in
> targetObjects, or on the stack, would apply the objflags).

Hmm. This seems very subtle to me. Perhaps the comment you've added
above the new object_address_present_add_flags() call in
findDependentObjects() ought to explain the "current object gets
marked with objflags" issue first, while only then mentioning the
cross-check. The interface that object_address_present_add_flags()
presents seems kind of odd to me, though I don't doubt that it makes
sense in the wider context of the code.

> * The changes outside dependency.c just amount to applying the rules
> stated above about how to use partition dependencies.  I reverted the
> places where v11 had decided that partition dependencies could be
> substituted for auto dependencies, and made sure they are always
> created in pairs.

Makes sense. It's inherently necessary for code outside of
dependency.c to get it right.

> The main argument against changing these would be the risk that
> client code already knows about 'I'.  But neither pg_dump nor psql
> do, so I judge that probably there's little if anything out there
> that is special-casing that dependency type.

I lean towards changing these on HEAD, now that it's clear that
something very different will be needed for v11. I agree that the
"internal_auto" terminology is very unhelpful, and it seems like a
good idea to fully acknowledge that certain dependency graphs have
qualities that are best thought of as peculiar to partitioning. In the
unlikely event that this did end up breaking external client code that
relies on the old constants, then the client code was probably subtly
wrong to begin with. This may be one of those cases where breaking
client code in a noticeable way is desirable.

That said, I still don't think that partition_dependency_matches() is
all that bad, since the state is still right there in the

Re: performance issue in remove_from_unowned_list()

2019-02-08 Thread Tomas Vondra
On 2/8/19 2:27 PM, Tomas Vondra wrote:
> 
> 
> 
> On 2/8/19 12:32 PM, Ideriha, Takeshi wrote:
>>> From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
>>> But it's a bit funnier, because there's also DropRelationFiles() which does 
>>> smgrclose on
>>> a batch of relations too, and it says this
>>>
>>>/*
>>> * Call smgrclose() in reverse order as when smgropen() is called.
>>> * This trick enables remove_from_unowned_list() in smgrclose()
>>> * to search the SMgrRelation from the unowned list,
>>> * with O(1) performance.
>>> */
>>>for (i = ndelrels - 1; i >= 0; i--)
>>>...
>>>
>>> but it's called from two places in xact.c only. And if you trigger the 
>>> issue with 100k x
>>> CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, 
>>> you
>>> actually get the very same behavior with always traversing the unowned list 
>>> for some
>>> reason. I'm not quite sure why, but it seems the optimization is exactly 
>>> the wrong thing
>>> to do here.
>>
>> So when DropRelationFiles() is called, order of calling smgr_close()
> and order of unowed list is always same?
>>
> 
> Well, let's say you create 10k tables in a single transaction, and then
> kill the server with "kill -9" right after the commit. Then on restart,
> during recovery this happens
> 
> 2019-02-08 14:16:21.781 CET [12817] LOG:  remove_from_unowned_list 10015
> 2019-02-08 14:16:21.871 CET [12817] LOG:  remove_from_unowned_list 10005
> 2019-02-08 14:16:21.967 CET [12817] LOG:  remove_from_unowned_list 10004
> 2019-02-08 14:16:22.057 CET [12817] LOG:  remove_from_unowned_list 10001
> 2019-02-08 14:16:22.147 CET [12817] LOG:  remove_from_unowned_list 1
> 2019-02-08 14:16:22.238 CET [12817] LOG:  remove_from_unowned_list 
> 2019-02-08 14:16:22.327 CET [12817] LOG:  remove_from_unowned_list 9998
> 2019-02-08 14:16:22.421 CET [12817] LOG:  remove_from_unowned_list 9996
> 2019-02-08 14:16:22.513 CET [12817] LOG:  remove_from_unowned_list 9995
> 2019-02-08 14:16:22.605 CET [12817] LOG:  remove_from_unowned_list 9994
> 2019-02-08 14:16:22.696 CET [12817] LOG:  remove_from_unowned_list 9993
> ...
> 2019-02-08 14:19:13.921 CET [12817] LOG:  remove_from_unowned_list 8396
> 2019-02-08 14:19:14.025 CET [12817] LOG:  remove_from_unowned_list 8395
> 2019-02-08 14:19:14.174 CET [12817] LOG:  remove_from_unowned_list 8394
> 2019-02-08 14:19:14.277 CET [12817] LOG:  remove_from_unowned_list 8393
> 2019-02-08 14:19:14.387 CET [12817] LOG:  remove_from_unowned_list 8392
> 2019-02-08 14:19:14.508 CET [12817] LOG:  remove_from_unowned_list 8391
> 2019-02-08 14:19:14.631 CET [12817] LOG:  remove_from_unowned_list 8390
> 2019-02-08 14:19:14.770 CET [12817] LOG:  remove_from_unowned_list 8389
> ...
> 
> This is with the attached debug patch, which simply prints index of the
> unowned list index entry. And yes, it took ~3 minutes to get from 10k to
> 8k (at which point I interrupted the recovery and killed the cluster).
> 
> I see similar issue after creating a lot of tables in the same xact and
> rolling it back, although in this case it's faster for some reason.
> 
>> This one was inroduced at b4166911 and I'd like to hear author and 
>> reviewer's opinion. 
>> https://www.postgresql.org/message-id/CAHGQGwH0hwXwrCDnmUU2Twj5YgHcrmMCVD7O%3D1NrRTpHcbtCBw%40mail.gmail.com
>>
> 
> That probably explains why we haven't seen the issue before.
> 

FWIW I've just hit this issue (remove_from_unowned_list consuming 100%
CPU for long periods of time) in checkpointer, which calls smgrcloseall.
And that does this:

hash_seq_init(&status, SMgrRelationHash);

while ((reln = (SMgrRelation) hash_seq_search(&status)) != NULL)
smgrclose(reln);

which means the order of closing relations is entirely random in this
case, and there's no hope of smartly ordering the close requests.

I'm wondering if we should just get rid of all such optimizations, and
make the unowned list doubly-linked (WIP patch attached, needs fixing
the comments etc.).

The comment before remove_from_unowned_list() claims it's not worth it,
but I guess we may need to revisit the assumptions - the fact that we've
accumulated ordering optimizations on a number of places suggests they
may not be entirely accurate.

The doubly-linked list seems way less fragile than the attempts to order
the requests in a particular way, and it works all the time. It requires
an extra pointer in the struct, but per pahole size changes from 88 to
96 bytes - so the number of cachelines does not change, and the allocset
chunk size remains the same. So no problem here, IMO. But some testing
is probably needed.

I'm going to use this for the syscache patch benchmarking, because it's
rather impossible without it.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..02a5d5cdda 100644
--- a/s

Re: libpq compression

2019-02-08 Thread Konstantin Knizhnik




On 08.02.2019 21:57, Andres Freund wrote:

On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:

Frankly speaking, I do not think that such flexibility in choosing
compression algorithms is really needed.
I do not expect that there will be many situations where old client has to
communicate with new server or visa versa.
In most cases both client and server belongs to the same postgres
distributive and so implements the same compression algorithm.
As far as we are compressing only temporary data (traffic), the problem of
providing backward compatibility seems to be not so important.

I think we should outright reject any patch without compression type
negotiation.
Does it mean that it is necessary to support multiple compression 
algorithms and make it possible to perform switch between them at 
runtime? Right now compression algorithm is linked statically. 
Negotiation of compression type is currently performed but it only 
checks that server and client are implementing the same algorithm and 
disables compression if it is not true.


If we are going to support multiple compression algorithms, do we need 
dynamic loading of correspondent compression libraries or static linking 
is ok? In case of dynamic linking we need to somehow specify information 
about available compression algorithms.
Some special subdirectory for them so that I can traverse this directory 
and try to load correspondent libraries?


Only I find it too complicated for the addressed problem?



Re: libpq compression

2019-02-08 Thread Tomas Vondra
On 2/8/19 11:10 PM, Konstantin Knizhnik wrote:
> 
> 
> On 08.02.2019 21:57, Andres Freund wrote:
>> On 2019-02-08 12:15:58 +0300, Konstantin Knizhnik wrote:
>>> Frankly speaking, I do not think that such flexibility in choosing
>>> compression algorithms is really needed.
>>> I do not expect that there will be many situations where old client
>>> has to
>>> communicate with new server or visa versa.
>>> In most cases both client and server belongs to the same postgres
>>> distributive and so implements the same compression algorithm.
>>> As far as we are compressing only temporary data (traffic), the
>>> problem of
>>> providing backward compatibility seems to be not so important.
>> I think we should outright reject any patch without compression type
>> negotiation.
> Does it mean that it is necessary to support multiple compression
> algorithms and make it possible to perform switch between them at
> runtime?

IMHO the negotiation should happen at connection time, i.e. the server
should support connections compressed by different algorithms. Not sure
if that's what you mean by runtime.

AFAICS this is quite close to how negotiation of encryption algorithms
works, in TLS and so on. Client specifies supported algorithms, server
compares that to list of supported algorithms, deduces the encryption
algorithm and notifies the client.

To allow fall-back to uncompressed connection, use "none" as algorithm.
If there's no common algorithm, fail.

> Right now compression algorithm is linked statically.
> Negotiation of compression type is currently performed but it only
> checks that server and client are implementing the same algorithm and
> disables compression if it is not true.
> 

I don't think we should automatically fall-back to disabled compression,
when a client specifies compression algorithm.

> If we are going to support multiple compression algorithms, do we need
> dynamic loading of correspondent compression libraries or static linking
> is ok? In case of dynamic linking we need to somehow specify information
> about available compression algorithms.
> Some special subdirectory for them so that I can traverse this directory
> and try to load correspondent libraries?
> 
> Only I find it too complicated for the addressed problem?
> 

I don't think we need dynamic algorithms v1, but IMHO it'd be pretty
simple to do - just add a shared_preload_library which registers it in a
list in memory.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Feb-08, Tom Lane wrote:
>> Also, I came across some coding in CloneFkReferencing() that looks fishy
>> as hell: that function imagines that it can delete an existing trigger
>> with nothing more than a summary CatalogTupleDelete().  I didn't do
>> anything about that here, but if it's not broken, I'd like to see an
>> explanation why not.  I added a comment complaining about the lack of
>> pg_depend cleanup, and there's also the question of whether we don't
>> need to broadcast a relcache inval for the trigger's table.

> Oops, this is new code in 0464fdf07f69 (Jan 21st).  Unless you object,
> I'll study a fix for this now, to avoid letting it appear in the minor
> next week.

+1.  The best solution would presumably be to go through the normal
object deletion mechanism; though possibly there's a reason that
won't work given you're already inside some other DDL.

regards, tom lane



First-draft release notes for next week's releases

2019-02-08 Thread Tom Lane
... are committed at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5996cfc4665735a7e6e8d473bd66e8b11e320bbb

Please send comments/corrections by Sunday.

regards, tom lane



Re: First-draft release notes for next week's releases

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 5:19 PM Tom Lane  wrote:
> Please send comments/corrections by Sunday.

Note that "Fix deadlock in GIN vacuum introduced by 218f51584d5"
(which is commit fd83c83d on the master branch) is not that far from
being a complete revert of a v10 feature (this appears as "Reduce page
locking during vacuuming of GIN indexes" in the v10 release notes).
Perhaps that's something that needs to be pointed out directly, as
happened with the the recheck_on_update issue in the last point
release. We may even need to revise the v10 release notes.

-- 
Peter Geoghegan



Re: First-draft release notes for next week's releases

2019-02-08 Thread Tom Lane
Peter Geoghegan  writes:
> Note that "Fix deadlock in GIN vacuum introduced by 218f51584d5"
> (which is commit fd83c83d on the master branch) is not that far from
> being a complete revert of a v10 feature (this appears as "Reduce page
> locking during vacuuming of GIN indexes" in the v10 release notes).

Yeah, I saw that in the commit message, but didn't really think that
the release note entry needed to explain it that way.  Could be
argued differently though.

> We may even need to revise the v10 release notes.

Perhaps just remove that item from the 10.0 notes?

regards, tom lane



Re: First-draft release notes for next week's releases

2019-02-08 Thread Peter Geoghegan
On Fri, Feb 8, 2019 at 6:00 PM Tom Lane  wrote:
> Yeah, I saw that in the commit message, but didn't really think that
> the release note entry needed to explain it that way.  Could be
> argued differently though.

I'm pretty confident that somebody is going to miss this
functionality, if this account of how the patch helped Yandex is
anything to go by:

https://www.postgresql.org/message-id/7B44397E-5E0A-462F-8148-1C444640FA0B%40simply.name

> > We may even need to revise the v10 release notes.
>
> Perhaps just remove that item from the 10.0 notes?

The wording could be changed to reflect the current reality within
GIN. It's still useful that posting trees are only locked when there
are pages to be deleted.

-- 
Peter Geoghegan



Re: First-draft release notes for next week's releases

2019-02-08 Thread Amit Kapila
On Sat, Feb 9, 2019 at 6:49 AM Tom Lane  wrote:
>
> ... are committed at
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=5996cfc4665735a7e6e8d473bd66e8b11e320bbb
>

Author: Alvaro Herrera 
+Branch: master [0464fdf07] 2019-01-21 20:08:52 -0300
+Branch: REL_11_STABLE [123cc697a] 2019-01-21 19:59:07 -0300
+-->
+ 
+  Create or delete foreign key enforcement triggers correctly when
+  attaching or detaching a partition in a a partitioned table that
+  has a foreign-key constraint (Amit Langote, Álvaro Herrera)
+ 
+

It seems like there is a typo in the above sentence. /a a
partitioned/a partitioned

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: First-draft release notes for next week's releases

2019-02-08 Thread Tom Lane
Amit Kapila  writes:
> On Sat, Feb 9, 2019 at 6:49 AM Tom Lane  wrote:
> +  Create or delete foreign key enforcement triggers correctly when
> +  attaching or detaching a partition in a a partitioned table that
> +  has a foreign-key constraint (Amit Langote, Álvaro Herrera)

> It seems like there is a typo in the above sentence. /a a
> partitioned/a partitioned

Ooops, obviously my eyes had glazed over by the time I went back to
proofread.  Thanks for catching that.

regards, tom lane



Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-02-08 Thread Amit Kapila
On Fri, Feb 8, 2019 at 6:55 AM Haribabu Kommi  wrote:
>
> On Thu, Feb 7, 2019 at 9:31 PM Haribabu Kommi  
> wrote:
>>
>>
>> This is because of larger xact_commit value than default configuration. With 
>> the changed server configuration, that leads to generate more parallel 
>> workers and every parallel worker operation is treated as an extra commit, 
>> because of this reason, the total number of commits increased, but the 
>> overall query performance is decreased.
>>
>> Is there any relation of transaction commits to performance?
>>
>> Is there any specific reason to consider the parallel worker activity also 
>> as a transaction commit? Especially in my observation, if we didn't consider 
>> the parallel worker activity as separate commits, the test doesn't show an 
>> increase in transaction commits.
>
>
> The following statements shows the increase in the xact_commit value with
> parallel workers. I can understand that workers updating the seq_scan stats
> as they performed the seq scan.
>

Yeah, that seems okay, however, one can say that for the scan they
want to consider it as a single scan even if part of the scan is
accomplished by workers or may be a separate counter for parallel
workers scan.

> Is the same applied to parallel worker transaction
> commits also?
>

I don't think so.  It seems to me that we should consider it as a
single transaction.  Do you want to do the leg work for this and try
to come up with a patch?  On a quick look, I think we might want to
change AtEOXact_PgStat so that the commits for parallel workers are
not considered.  I think the caller has that information.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Patch for SortSupport implementation on inet/cdir

2019-02-08 Thread Edmund Horner
On Sat, 9 Feb 2019 at 04:48, Brandur Leach  wrote:
> I've attached a patch that implements SortSupport for the
> inet/cidr types. It has the effect of typically reducing
> the time taken to sort these types by ~50-60% (as measured
> by `SELECT COUNT(DISTINCT ...)` which will carry over to
> common operations like index creation, `ORDER BY`, and
> `DISTINCT`.

Hi Brandur,

I had a look at this.  Your V2 patch applies cleanly, and the code was
straightforward and well commented.  I appreciate the big comment at the
top of network_abbrev_convert explaining how you encode the data.

The tests pass.  I ran a couple of large scale tests myself and didn't find
any problems.  Sorting a million random inets in work_mem = 256MB goes from
roughty 3670ms to 1620ms with the SortSupport, which is pretty impressive.
(But that's in my debug build, so not a serious benchmark.)

An interesting thing about sorting IPv4 inets on 64-bit machines is that
when the inets are the same, the abbreviated comparitor will return 0 which
is taken by the sorting machinery to mean "the datums are the same up to
this point, so you need to call the full comparitor" -- but, in this case,
0 means "the datums truly are the same, no need to call the full
comparitor".  Since the full comparitor assumes its arguments to be the
original (typically pass-by-reference) datums, you can't do it there.
You'd need to add another optional comparitor to be called after the
abbreviated one.  In inet's case on a 64-bit machine, it would look at the
abbreviated datums and if they're both in the IPv4 family, would return 0
(knowing that the abbreviated comparitor has already done the real work).
I have no reason to believe this particular optimisation is worth anything
much, though; it's outside the scope of this patch, besides.

I have some comments on the comments:

network.c:552
* SortSupport conversion routine. Converts original inet/cidr
representations
* to abbreviated keys . The inet/cidr types are pass-by-reference, so is an
* optimization so that sorting routines don't have to pull full values from
* the heap to compare.

Looks like you have an extra space before the "." on line 553.  And
abbreviated keys being an optimisation for pass-by-reference types can be
taken for granted, so I think the last sentence is redundant.

network.c::567
* IPv4 and IPv6 are identical in this makeup, with the difference being that
* IPv4 addresses have a maximum of 32 bits compared to IPv6's 64 bits, so in
* IPv6 each part may be larger.

IPv6's addresses are 128 bit.  I'm not sure sure if "maximum" is accurate,
or whether you should just say "IPv4 addresses have 32 bits".

network.c::571
* inet/cdir types compare using these sorting rules. If inequality is
detected
* at any step, comparison is done. If any rule is a tie, the algorithm drops
* through to the next to break it:

When you say "comparison is done" it sounds like more comparing is going to
be done, but what I think you mean is that comparison is finished.

> [...]

> My benchmarking methodology and script is available here
> [1], and involves gathering statistics for 100
> `count(distinct ...)` queries at various data sizes. I've
> saved the results I got on my machine here [2].

I didn't see any links for [1], [2] and [3] in your email.

Finally, there's a duplicate CF entry:
https://commitfest.postgresql.org/22/1990/ .

Since you're updating https://commitfest.postgresql.org/22/1991/ , I
suggest you mark 1990 as Withdrawn to avoid confusion.  If there's a way to
remove it from the CF list, that would be even better.

Edmund


Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Amit Langote
On Sat, Feb 9, 2019 at 9:41 AM Tom Lane  wrote:
> Alvaro Herrera  writes:
> > On 2019-Feb-08, Tom Lane wrote:
> >> Also, I came across some coding in CloneFkReferencing() that looks fishy
> >> as hell: that function imagines that it can delete an existing trigger
> >> with nothing more than a summary CatalogTupleDelete().  I didn't do
> >> anything about that here, but if it's not broken, I'd like to see an
> >> explanation why not.  I added a comment complaining about the lack of
> >> pg_depend cleanup, and there's also the question of whether we don't
> >> need to broadcast a relcache inval for the trigger's table.
>
> > Oops, this is new code in 0464fdf07f69 (Jan 21st).  Unless you object,
> > I'll study a fix for this now, to avoid letting it appear in the minor
> > next week.
>
> +1.  The best solution would presumably be to go through the normal
> object deletion mechanism; though possibly there's a reason that
> won't work given you're already inside some other DDL.

Maybe:

- CatalogTupleDelete(trigrel, &trigtup->t_self);
+ RemoveTriggerById(trgform->oid)?

Thanks,
Amit



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2019-02-08 Thread Tom Lane
Amit Langote  writes:
> On Sat, Feb 9, 2019 at 9:41 AM Tom Lane  wrote:
>> +1.  The best solution would presumably be to go through the normal
>> object deletion mechanism; though possibly there's a reason that
>> won't work given you're already inside some other DDL.

> Maybe:
> - CatalogTupleDelete(trigrel, &trigtup->t_self);
> + RemoveTriggerById(trgform->oid)?

No, that's still the back end of the deletion machinery, and in particular
it would fail to clean pg_depend entries for the trigger.  Going in by the
front door would use performDeletion().  (See deleteOneObject() to get
an idea of what's being possibly missed out here.)

regards, tom lane