Using logical replication with older version subscribers

2019-01-07 Thread Masahiko Sawada
Hi,

Logical replication enables us to replicate data changes to different
major version PostgreSQL as the doc says[1]. However the current
logical replication can work fine only if replicating to a newer major
version PostgreSQL such as from 10 to 11. Regarding using logical
replication with older major version, say sending from 11 to 10, it
will stop when a subscriber receives a truncate change because it's
not supported at PostgreSQL 10.  I think there are use cases where
using logical replication with a subscriber of an older version
PostgreSQL but I'm not sure we should support it.

Of course in such case we can set the publication with publish =
'insert, update, delete' to not send truncate changes but it requres
users to recognize the feature differences between major vesions and
in the future it will get more complex. So I think it would be better
to be configured autometically by PostgreSQL.

To fix it we can make subscribers send its supporting message types to
the publisher at a startup time so that the publisher doesn't send
unsupported message types on the subscriber. Or as an another idea, we
can make subscribers ignore unsupported logical replication message
types instead of raising an error. Feedback is very welcome.

[1] https://www.postgresql.org/docs/devel/logical-replication.html

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Amit Langote
On 2019/01/07 16:35, Michael Paquier wrote:
> On Mon, Jan 07, 2019 at 01:49:49PM +0900, Amit Langote wrote:
>>  {
>>  /*
>> - * We currently only support writing to regular tables.
>> + * We currently only support writing to regular tables.  However, give
>> + * a more specific error for partitioned and foreign tables.
>>   */
>> +if (relkind == RELKIND_PARTITIONED_TABLE)
>> +ereport(ERROR,
>> +(errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> + errmsg("\"%s.%s\" is a partitioned table",
>> +nspname, relname),
>> + errdetail("Partitioned tables are not
>> supported as logical replication targets.")));
> 
> Could it be possible to avoid a full sentence in the primary error
> message?  Usually these are avoided:
> https://www.postgresql.org/docs/devel/error-style-guide.html
> 
> It seems to me that we may want something more like:
> Primary: "could not use \"%s.%s\" as logical replication target".
> Detail: "Relation %s.%s is a foreign table", "not a table", etc.

I've thought about that before and I tend to agree with you.  Maybe:

ERROR: cannot use "%s.%s" as logical replication target
DETAIL: Using partitioned tables as logical replication target is not
supported.

Sounds a bit repetitive, but perhaps it's better to use the words "not
supported" in the DETAIL message.

Thanks,
Amit




Re: Implementing Incremental View Maintenance

2019-01-07 Thread Mitar
Hi!

On Thu, Dec 27, 2018 at 4:57 AM Yugo Nagata  wrote:
> I would like to implement Incremental View Maintenance (IVM) on PostgreSQL.
> IVM is a technique to maintain materialized views which computes and applies
> only the incremental changes to the materialized views rather than
> recomputate the contents as the current REFRESH command does.

That sounds great! I am interested in this topic because I am
interested in reactive/live queries and support for them in
PostgreSQL. [1]

In that context, the problem is very similar: based on some state of
query results and updated source tables, determine what should be new
updates to send to the client describing changes to the query results.
So after computing those incremental changes, instead of applying them
to materialized view I would send them to the client. One could see
materialized views only type of consumers of such information about
incremental change.

So I would like to ask if whatever is done in this setting is done in
a way that one could also outside of the context of materialized view.
Not sure what would API be thought.

>From the perspective of reactive/live queries, this package [2] is
interesting. To my understanding, it adds to all base tables two
columns, one for unique ID and one for revision of the row. And then
rewrites queries so that this information is passed all the way to
query results. In this way it can then determine mapping between
inputs and outputs. I am not sure if it then does incremental update
or just uses that to determine if view is invalidated. Not sure if
there is anything about such approach in literature. Or why both index
and revision columns are needed.

> For these reasons, we started to think to implement IVM without relying on 
> OIDs
> and made a bit more surveys.

I also do not see much difference between asking users to have primary
key on base tables or asking them to have OIDs. Why do you think that
a requirement for primary keys is a hard one? I think we should first
focus on having IVM with base tables with primary keys. Maybe then
later on we could improve on that and make it also work without.

To me personally, having unique index on source tables and also on
materialized view is a reasonable restriction for this feature.
Especially for initial versions of it.

> However, the discussion about IVM is now stoped, so we would like to restart 
> and
> progress this.

What would be next steps in your view to move this further?

> If we can represent a change of UPDATE on a base table as query-like rather 
> than
> OLD and NEW, it may be possible to update the materialized view directly 
> instead
> of performing delete & insert.

Why do you need OLD and NEW? Don't you need just NEW and a list of
columns which changed from those in NEW? I use such diffing query [4]
to represent changes: first column has a flag telling if the row is
representing insert, update, and remove, the second column tells which
column are being changed in the case of the update, and then the NEW
columns follow.

I think that maybe standardizing structure for representing those
changes would be a good step towards making this modular and reusable.
Because then we can have three parts:

* Recording and storing changes in a standard format.
* A function which given original data, stored changes, computes
updates needed, also in some standard format.
* A function which given original data and updates needed, applies them.

> In the previous discussion[4], it is planned to start from "eager" approach. 
> In our PoC
> implementaion, we used the other aproach, that is, using REFRESH command to 
> perform IVM.
> I am not sure which is better as a start point, but I begin to think that the 
> eager
> approach may be more simple since we don't have to maintain base table 
> changes in other
> past transactions.

I think if we split things into three parts as I described above, then
this is just a question of configuration. Or you call all three inside
one trigger to update in "eager" fashion. Or you store computed
updates somewhere and then on demand apply those in "lazy" fashion.

> In the eager maintenance approache, we have to consider a race condition 
> where two
> different transactions change base tables simultaneously as discussed in [4].

But in the case of "lazy" maintenance there is a mirror problem: what
if later changes to base tables invalidate some previous change to the
materialized view. Imagine that one cell in a base table is first
updated too "foo" and we compute an update for the materialized view
to set it to "foo". And then the same cell is updated to "bar" and we
compute an update for the materialized view again. If we have not
applied any of those updates (because we are "lazy") now the
previously computed update can be discarded. We could still apply
both, but it would not be efficient.

[1] 
https://www.postgresql.org/message-id/flat/CAKLmikP%2BPPB49z8rEEvRjFOD0D2DV72KdqYN7s9fjh9sM_32ZA%40mail.gmail.com
[2] https:

Re: speeding up planning with partitions

2019-01-07 Thread David Rowley
On Fri, 4 Jan 2019 at 04:39, Justin Pryzby  wrote:
> Running 11dev with your v10 patch applied, this takes 2244ms with empty buffer
> cache after postmaster restarted on a totally untuned instance (and a new
> backend, with no cached opened files).
>
> I was curious why it took even 2sec, and why it did so many opens() (but not
> 20k of them that PG11 does):

It would be pretty hard to know that without seeing the query plan.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: A few new options for vacuumdb

2019-01-07 Thread Masahiko Sawada
On Sat, Jan 5, 2019 at 8:50 AM Bossart, Nathan  wrote:
>
> On 12/21/18, 11:14 AM, "Bossart, Nathan"  wrote:
> > On 12/21/18, 10:51 AM, "Robert Haas"  wrote:
> >> On Thu, Dec 20, 2018 at 11:48 AM Bossart, Nathan  
> >> wrote:
> >>> Either way, we'll still have to decide whether to fail or to silently
> >>> skip the option if you do something like specify --min-mxid-age for a
> >>> 9.4 server.
> >>
> >> +1 for fail.
> >
> > Sounds good.  I'll keep all the version checks and will fail for
> > unsupported options in the next patch set.
>
> Here's an updated set of patches with the following changes:
>
>  - 0002 adds the parenthesized syntax for ANALYZE.
>  - 0003 adds DISABLE_PAGE_SKIPPING for VACUUM.
>  - 0003 also ensures SKIP_LOCKED is applied for --analyze-only.
>  - 0004 alters vacuumdb to always use the catalog query to discover
>the list of tables to process.
>  - 0003, 0005, and 0006 now fail in vacuum_one_database() if a
>specified option is not available on the server version.
>  - If tables are listed along with --min-xid-age, --min-mxid-age, or
>--min-relation-size, each table is processed only if it satisfies
>the provided options.
>
> 0004 introduces a slight change to existing behavior.  Currently, if
> you specify a missing table, vacuumdb will process each table until
> it reaches the nonexistent one, at which point it will fail.  After
> 0004 is applied, vacuumdb will fail during the catalog query, and no
> tables will be processed.  I considered avoiding this change in
> behavior by doing an additional catalog lookup for each specified
> table to see whether it satisfies --min-xid-age, etc.  However, this
> introduced quite a bit more complexity and increased the number of
> catalog queries needed.
>
> Nathan
>

0002 and 0003 are merged and posted by Michael-san and it looks good
to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
is a few review comments.

-
Even if all tables are filtered by --min-relation-size, min-mxid-age
or min-xid-age the following message appeared.

$ vacuumdb --verbose --min-relation-size 100 postgres
vacuumdb: vacuuming database "postgres"

Since no tables are processed in this case isn't is better not to
output this message by moving the message after checking if ntup == 0?

-
You use pg_total_relation_size() to check the relation size but it
might be better to use pg_relation_size() instead. The
pg_total_relation_size() includes the all index size but I think it's
common based on my experience that we check only the table size
(including toast table) to do vacuum it or not.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line

2019-01-07 Thread Andrey Borodin
Hi!

Thanks for working on this feature, I believe it solves actual problem of HA 
systems.

> 30 окт. 2018 г., в 8:01, Michael Paquier  написал(а):
> 
> Another thing I am wondering is: do we actually need something complex?
> What we want to know is what data is necessary to build the file map, so
> we could also add an option to pg_rewind which checks what segments are
> necessary and lets the user know about them?
From my point of view fetching WALs automatically is much better option for 
automation.

> This also avoids the
> security-related problems of manipulating a command at option-level.
> This kind of options makes folks willing to use more sensitive data on
> command line, which is not always a good idea...

I do not see any new security problems here.. I'd be happy if anyone pointed me 
out where I can learn about them.

> 26 дек. 2018 г., в 19:11, Alexey Kondratov  
> написал(а):
> 
> Please, find the new version of patch attached. 

The refactoring of guc-file looks sane, but I'm not an expert in 
frontend\backend modularity.

Here are some my notes:
1. RestoreArchivedWAL() shares a lot of code with RestoreArchivedFile(). Is it 
possible\viable to refactor and extract common part?
2. IMV pg_rewind with %r restore_command should fail. %r is designed to clean 
archive from WALs, nothing should be deleted in case of fetching WALs for 
rewind. Last restartpoint has no meaning during rewind. Or does it? If so, 
let's comment about it.
3. RestoreArchivedFile() checks for signals, is it done by pg_rewind elsewhere?
4. No documentation is updated
5. -R takes precedence over -r without notes. Shouldn't we complain? Or may be 
we should take one from config, iif nothing found use -R?

Thanks!

Best regards, Andrey Borodin.


Re: speeding up planning with partitions

2019-01-07 Thread Amit Langote
Thanks Justin for reporting the results of your testing.

On 2019/01/07 17:40, David Rowley wrote:
> On Fri, 4 Jan 2019 at 04:39, Justin Pryzby  wrote:
>> Running 11dev with your v10 patch applied, this takes 2244ms with empty 
>> buffer
>> cache after postmaster restarted on a totally untuned instance (and a new
>> backend, with no cached opened files).
>>
>> I was curious why it took even 2sec, and why it did so many opens() (but not
>> 20k of them that PG11 does):
> 
> It would be pretty hard to know that without seeing the query plan.

Yeah, I too would be curious to see if the resulting plan really needs to
do those open()s.  If all the open()s being seen here are accounted for by
un-pruned partitions (across the UNION ALL) contained in the plan and
their indexes, then the patch has done enough to help.  If the open()s can
be traced to pruned partitions, then there's something wrong with the patch.

Thanks,
Amit




Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-07 Thread Christoph Berg
Re: Michael Paquier 2019-01-04 <20190104133305.gg2...@paquier.xyz>
> On Fri, Jan 04, 2019 at 11:41:15AM +0100, Peter Eisentraut wrote:
> > Note that pgxs supports PG_CPPFLAGS for adding custom pieces to CPPFLAGS
> > in a safe way.  Maybe we should add an equivalent for CFLAGS?  It's just
> > that it hasn't been needed so far.
> 
> +1.  Wouldn't it make sense to also have PG_LDFLAGS?  In some stuff I
> maintain, I have been abusing of PG_CPPFLAGS to pass some flags, which
> is not really right.  We also have contrib/passwordcheck/Makefile
> abuse of it to set -DUSE_CRACKLIB..

I'll buy whatever version that works. I'm using CFLAGS at the moment,
but could easily switch to other variables.

Note that the missing bit here is the CXX part, adding PG_CFLAGS alone
wouldn't help.

Updated patch attached.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
>From 51226dc97039b42e7fc8670a895ec51b8d715b12 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 13 Nov 2018 11:35:26 +0100
Subject: [PATCH] Add more flexibility for passing custom compiler flags

The existing machinery for extending CFLAGS and LDFLAGS via COPT and
PROFILE neglected to extend CXXFLAGS as well, causing third party
extensions written in C++ not to get the extra flags.

Also add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables to pgxs.mk
which will append to the corresponding make variables.
---
 src/Makefile.global.in |  6 ++
 src/makefiles/pgxs.mk  | 12 
 2 files changed, 18 insertions(+)

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 41c131412e..8ac511756c 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -679,8 +679,14 @@ ifdef COPT
LDFLAGS += $(COPT)
 endif
 
+ifdef CXXOPT
+   CXXFLAGS += $(CXXOPT)
+   LDFLAGS += $(CXXOPT)
+endif
+
 ifdef PROFILE
CFLAGS += $(PROFILE)
+   CXXFLAGS += $(PROFILE)
LDFLAGS += $(PROFILE)
 endif
 
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index d214cb9cf2..56a8b26183 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -53,6 +53,9 @@
 # tests require special configuration, or don't use pg_regress
 #   EXTRA_CLEAN -- extra files to remove in 'make clean'
 #   PG_CPPFLAGS -- will be added to CPPFLAGS
+#   PG_CFLAGS -- will be added to CFLAGS
+#   PG_CXXFLAGS -- will be added to CXXFLAGS
+#   PG_LDFLAGS -- will be added to LDFLAGS
 #   PG_LIBS -- will be added to PROGRAM link line
 #   PG_LIBS_INTERNAL -- same, for references to libraries within build tree
 #   SHLIB_LINK -- will be added to MODULE_big link line
@@ -119,6 +122,15 @@ endif
 ifdef PG_CPPFLAGS
 override CPPFLAGS := $(PG_CPPFLAGS) $(CPPFLAGS)
 endif
+ifdef PG_CFLAGS
+override CFLAGS := $(PG_CFLAGS) $(CFLAGS)
+endif
+ifdef PG_CXXFLAGS
+override CXXFLAGS := $(PG_CXXFLAGS) $(CXXFLAGS)
+endif
+ifdef PG_LDFLAGS
+override LDFLAGS := $(PG_LDFLAGS) $(LDFLAGS)
+endif
 
 # logic for HEADERS_* stuff
 
-- 
2.19.2



Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-07 Thread Christoph Berg
Re: To Michael Paquier 2019-01-07 <20190107091734.ga1...@msg.credativ.de>
> Updated patch attached.

Here's another revision that doesn't add an extra CXXOPT variable but
just extends CXXFLAGS whenever COPT or PROFILE are set, which seems
more usable.

It also updates the documentation.

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
>From 5abf37ae89e1680359e899de46e7ed709fc37125 Mon Sep 17 00:00:00 2001
From: Christoph Berg 
Date: Tue, 13 Nov 2018 11:35:26 +0100
Subject: [PATCH] Add more flexibility for passing custom compiler flags

The existing machinery for extending CFLAGS and LDFLAGS via COPT and
PROFILE neglected to extend CXXFLAGS as well, causing third party
extensions written in C++ not to get the extra flags.

Also add PG_CFLAGS, PG_CXXFLAGS, and PG_LDFLAGS variables to pgxs.mk
which will append to the corresponding make variables.
---
 doc/src/sgml/installation.sgml |  2 +-
 src/Makefile.global.in |  2 ++
 src/makefiles/pgxs.mk  | 12 
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 3c9544cc27..98af5d6f56 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1536,7 +1536,7 @@ su - postgres
  it will break many of configure's built-in tests.  To add
  such flags, include them in the COPT environment variable
  while running make.  The contents of COPT
- are added to both the CFLAGS and LDFLAGS
+ are added to both the CFLAGS, CXXFLAGS, and LDFLAGS
  options set up by configure.  For example, you could do
 
 make COPT='-Werror'
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 41c131412e..665923d9c3 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -676,11 +676,13 @@ endif
 #
 ifdef COPT
CFLAGS += $(COPT)
+   CXXFLAGS += $(COPT)
LDFLAGS += $(COPT)
 endif
 
 ifdef PROFILE
CFLAGS += $(PROFILE)
+   CXXFLAGS += $(PROFILE)
LDFLAGS += $(PROFILE)
 endif
 
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index d214cb9cf2..56a8b26183 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -53,6 +53,9 @@
 # tests require special configuration, or don't use pg_regress
 #   EXTRA_CLEAN -- extra files to remove in 'make clean'
 #   PG_CPPFLAGS -- will be added to CPPFLAGS
+#   PG_CFLAGS -- will be added to CFLAGS
+#   PG_CXXFLAGS -- will be added to CXXFLAGS
+#   PG_LDFLAGS -- will be added to LDFLAGS
 #   PG_LIBS -- will be added to PROGRAM link line
 #   PG_LIBS_INTERNAL -- same, for references to libraries within build tree
 #   SHLIB_LINK -- will be added to MODULE_big link line
@@ -119,6 +122,15 @@ endif
 ifdef PG_CPPFLAGS
 override CPPFLAGS := $(PG_CPPFLAGS) $(CPPFLAGS)
 endif
+ifdef PG_CFLAGS
+override CFLAGS := $(PG_CFLAGS) $(CFLAGS)
+endif
+ifdef PG_CXXFLAGS
+override CXXFLAGS := $(PG_CXXFLAGS) $(CXXFLAGS)
+endif
+ifdef PG_LDFLAGS
+override LDFLAGS := $(PG_LDFLAGS) $(LDFLAGS)
+endif
 
 # logic for HEADERS_* stuff
 
-- 
2.19.2



Re: Using logical replication with older version subscribers

2019-01-07 Thread Magnus Hagander
On Mon, Jan 7, 2019 at 9:01 AM Masahiko Sawada 
wrote:

> Hi,
>
> Logical replication enables us to replicate data changes to different
> major version PostgreSQL as the doc says[1]. However the current
> logical replication can work fine only if replicating to a newer major
> version PostgreSQL such as from 10 to 11. Regarding using logical
> replication with older major version, say sending from 11 to 10, it
> will stop when a subscriber receives a truncate change because it's
> not supported at PostgreSQL 10.  I think there are use cases where
> using logical replication with a subscriber of an older version
> PostgreSQL but I'm not sure we should support it.
>
> Of course in such case we can set the publication with publish =
> 'insert, update, delete' to not send truncate changes but it requres
> users to recognize the feature differences between major vesions and
> in the future it will get more complex. So I think it would be better
> to be configured autometically by PostgreSQL.
>
> To fix it we can make subscribers send its supporting message types to
> the publisher at a startup time so that the publisher doesn't send
> unsupported message types on the subscriber. Or as an another idea, we
> can make subscribers ignore unsupported logical replication message
> types instead of raising an error. Feedback is very welcome.
>
> [1] https://www.postgresql.org/docs/devel/logical-replication.html


How would that work in practice?

If an 11 server is sent a message saying "client does not support
truncate", and immediately generates an error, then you can no longer
replicate even if you turn off truncate. And if it delays it until the
actual replication of the item, then you just get the error on the primary
ìnstead of the standby?

I assume you are not suggesting a publication with truncation enabled
should just ignore replicating truncation if the downstream server doesn't
support it? Because if that's the suggestion, then a strong -1 from me on
that.

And definitely -1 for having a subscriber ignore messages it doesn't know
about. That's setting oneself up for getting invalid data on the
subscriber, because it skipped something that the publisher expected to be
done.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2019-01-07 Thread Haozhou Wang
Thanks very much for your comments.

To the best of my knowledge, smgr is a layer that abstract the storage
operations. Therefore, it is a good place to control or collect information
the storage operations without touching the physical storage layer.
Moreover, smgr is coming with actual disk IO operation (not consider the OS
cache) for postgres. So we do not need to worry about the buffer management
in postgres.
It will make the purpose of hook is pure: a hook for actual disk IO.

Regards,
Haozhou

On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier  wrote:

> On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
> > +1 for adding some hooks to support this kind of thing, but I think
> > the names you've chosen are not very good.  The hook name should
> > describe the place from which it is called, not the purpose for which
> > one imagines that it will be used, because somebody else might imagine
> > another use.  Both BufferExtendCheckPerms_hook_type and
> > SmgrStat_hook_type are imagining that they know what the hook does -
> > CheckPerms in the first case and Stat in the second case.
>
> I personally don't mind making Postgres more pluggable, but I don't
> think that we actually need the extra ones proposed here at the layer
> of smgr, as smgr is already a layer designed to call an underlying set
> of APIs able to extend, unlink, etc. depending on the storage type.
>
> > For this particular purpose, I don't immediately see why you need a
> > hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> > guaranteed to end up in smgrextend()?
>
> Yes, that's a bit awkward.
> --
> Michael


Re: de-deduplicate code in DML execution hooks in postgres_fdw

2019-01-07 Thread Etsuro Fujita

(2018/11/30 19:55), Etsuro Fujita wrote:

One thing we would need to discuss more about this is the name of a new
function added by the patch. IIRC, we have three options so far [1]:

1) perform_one_foreign_dml proposed by Ashutosh
2) execute_dml_single_row proposed by Michael
3) execute_parameterized_dml_stmt proposed by me

I'll withdraw my proposal; I think #1 and #2 would describe the
functionality better than #3. My vote would go to #2 or
execute_dml_stmt_single_row, which moves the function much more closer
to execute_dml_stmt. I'd like to hear the opinions of others.


On second thought I'd like to propose another option: 
execute_foreign_modify because I think this would match the existing 
helper functions for updating foreign tables in postgres_fdw.c better, 
such as create_foreign_modify, prepare_foreign_modify and 
finish_foreign_modify.  I don't really think the function name should 
contain "one" or "single_row".  Like the names of the calling APIs (ie, 
ExecForeignInsert, ExecForeignUpdate and ExecForeignDelete), I think 
it's OK to omit such words from the function name.  Here is an updated 
version of the patch.  In addition to the naming, I tweaked the function 
a little bit to match other functions (mainly variable names), moved it 
to the place where the helper functions are defined, fiddled with some 
comments, and removed an extra include file that the original patch added.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 390,395  static PgFdwModifyState *create_foreign_modify(EState *estate,
--- 390,400 
  	  List *target_attrs,
  	  bool has_returning,
  	  List *retrieved_attrs);
+ static TupleTableSlot *execute_foreign_modify(EState *estate,
+ 	   ResultRelInfo *resultRelInfo,
+ 	   CmdType operation,
+ 	   TupleTableSlot *slot,
+ 	   TupleTableSlot *planSlot);
  static void prepare_foreign_modify(PgFdwModifyState *fmstate);
  static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
  		 ItemPointer tupleid,
***
*** 1775,1832  postgresExecForeignInsert(EState *estate,
  		  TupleTableSlot *slot,
  		  TupleTableSlot *planSlot)
  {
! 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
! 	const char **p_values;
! 	PGresult   *res;
! 	int			n_rows;
! 
! 	/* Set up the prepared statement on the remote server, if we didn't yet */
! 	if (!fmstate->p_name)
! 		prepare_foreign_modify(fmstate);
! 
! 	/* Convert parameters needed by prepared statement to text form */
! 	p_values = convert_prep_stmt_params(fmstate, NULL, slot);
! 
! 	/*
! 	 * Execute the prepared statement.
! 	 */
! 	if (!PQsendQueryPrepared(fmstate->conn,
! 			 fmstate->p_name,
! 			 fmstate->p_nums,
! 			 p_values,
! 			 NULL,
! 			 NULL,
! 			 0))
! 		pgfdw_report_error(ERROR, NULL, fmstate->conn, false, fmstate->query);
! 
! 	/*
! 	 * Get the result, and check for success.
! 	 *
! 	 * We don't use a PG_TRY block here, so be careful not to throw error
! 	 * without releasing the PGresult.
! 	 */
! 	res = pgfdw_get_result(fmstate->conn, fmstate->query);
! 	if (PQresultStatus(res) !=
! 		(fmstate->has_returning ? PGRES_TUPLES_OK : PGRES_COMMAND_OK))
! 		pgfdw_report_error(ERROR, res, fmstate->conn, true, fmstate->query);
! 
! 	/* Check number of rows affected, and fetch RETURNING tuple if any */
! 	if (fmstate->has_returning)
! 	{
! 		n_rows = PQntuples(res);
! 		if (n_rows > 0)
! 			store_returning_result(fmstate, slot, res);
! 	}
! 	else
! 		n_rows = atoi(PQcmdTuples(res));
! 
! 	/* And clean up */
! 	PQclear(res);
! 
! 	MemoryContextReset(fmstate->temp_cxt);
! 
! 	/* Return NULL if nothing was inserted on the remote end */
! 	return (n_rows > 0) ? slot : NULL;
  }
  
  /*
--- 1780,1787 
  		  TupleTableSlot *slot,
  		  TupleTableSlot *planSlot)
  {
! 	return execute_foreign_modify(estate, resultRelInfo, CMD_INSERT,
!   slot, planSlot);
  }
  
  /*
***
*** 1839,1908  postgresExecForeignUpdate(EState *estate,
  		  TupleTableSlot *slot,
  		  TupleTableSlot *planSlot)
  {
! 	PgFdwModifyState *fmstate = (PgFdwModifyState *) resultRelInfo->ri_FdwState;
! 	Datum		datum;
! 	bool		isNull;
! 	const char **p_values;
! 	PGresult   *res;
! 	int			n_rows;
! 
! 	/* Set up the prepared statement on the remote server, if we didn't yet */
! 	if (!fmstate->p_name)
! 		prepare_foreign_modify(fmstate);
! 
! 	/* Get the ctid that was passed up as a resjunk column */
! 	datum = ExecGetJunkAttribute(planSlot,
!  fmstate->ctidAttno,
!  &isNull);
! 	/* shouldn't ever get a null result... */
! 	if (isNull)
! 		elog(ERROR, "ctid is NULL");
! 
! 	/* Convert parameters needed by prepared statement to text form */
! 	p_values = convert_prep_stmt_params(fmstate,
! 		(ItemPointer) DatumGetPointer(datum),
! 		slot);
! 
! 	/*
! 	 * Execute the prepared statement.
! 	

Re: pgsql: Remove WITH OIDS support, change oid catalog column visibility.

2019-01-07 Thread Amit Khandekar
On Sat, 5 Jan 2019 at 02:09, Andres Freund  wrote:
>
> Hi,
>
> On 2019-01-03 13:40:42 -0500, Tom Lane wrote:
> > I noticed that this patch has broken restores of existing dump files:
> >
> > psql:testbed.public.backup:82: ERROR:  unrecognized configuration parameter 
> > "default_with_oids"
> >
> > Quite aside from the fact that this won't work if the user tries to
> > restore in single-transaction or no-error mode, this is really really
> > unhelpful because it's impossible to tell whether the error is
> > ignorable or not, except by digging through the dump file.
> >
> > What you should do is restore that GUC, but add a check-hook that throws
> > an error for an attempt to set it to "true".

Attached is a patch that does this.

>
> Makes sense, I (or a colleague) will look into that next week. Wanna get my
> head out of pluggable storage first...
>
> Greetings,
>
> Andres Freund
>


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


errmsg_default_with_oids.patch
Description: Binary data


Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead)

2019-01-07 Thread Nikolay Shaplov
В письме от четверг, 3 января 2019 г. 17:15:08 MSK пользователь Alvaro Herrera 
написал:

> I would have liked to get a StaticAssert in the definition, but I don't
> think it's possible.  A standard Assert() should be possible, though.

Asserts are cool thing. I found some unexpected stuff. 

parallel_workers option is claimed to be heap-only option.

But in src/backend/optimizer/util/plancat.c in get_relation_info 
RelationGetParallelWorkers is being called for both heap and toast tables (and 
not only for them).

Because usually there are no reloptions for toast,  it returns default -1 
value. If some reloptions were set for toast table, RelationGetParallelWorkers 
will return a value from uninitialized memory.

This will happen because StdRdOptions structure is filled with values in 
fillRelOptions function. And fillRelOptions first iterates on options list, and 
then on elems. So if option is not in "option list" than it's value will not 
be set.

And options list comes from parseRelOptions where only options with proper 
relation kind is selected from [type]RelOpts[] arrays. So parallel_workers 
will not be added to options in the case of toast table because it is claimed 
to be applicable only to RELOPT_KIND_HEAP

Thus if toast has some options set, and rd_options in relation is not NULL, 
get_relation_info will use value from uninitialized memory as number of 
parallel workers.

To reproduce this Assert you can change RelationGetParallelWorkers macros in
src/include/utils/rel.h

#define IsHeapRelation(relation)\   
(relation->rd_rel->relkind == RELKIND_RELATION ||   \   
 relation->rd_rel->relkind == RELKIND_MATVIEW  )   

#define RelationGetParallelWorkers(relation, defaultpw) \   
(AssertMacro(IsHeapRelation(relation)), \   
 ((relation)->rd_options ?  \   
  ((StdRdOptions *) (relation)->rd_options)->parallel_workers :   \   
(defaultpw)))   

and see how it asserts. It will happen just on the db_initiaisation phase of 
make check.

If you add relation->rd_rel->relkind == RELKIND_TOASTEDVALUE to the Assertion, 
it will Assert in cases when get_relation_info is called for partitioned 
table. This case is not a problem for now, because partitioned table has no 
options and you will always have NULL in rd_options and get default value. But 
it will become a problem when somebody adds some options, especially it would 
be a problem if this options do not use StdRdOptions structure.
Also it is called for foreign tables and sequences. 

So my suggestion of a hotfix is to replcace 
rel->rel_parallel_workers = RelationGetParallelWorkers(relation,-1);
from get_relation_info with the following code

switch (relation->rd_rel->relkind)  
{   
case RELKIND_RELATION:  
case RELKIND_MATVIEW:   
rel->rel_parallel_workers = 
RelationGetParallelWorkers(relation,-1);
break;  
case RELKIND_TOASTVALUE:
case RELKIND_PARTITIONED_TABLE: 
case RELKIND_SEQUENCE:  
case REPLICA_IDENTITY_FULL: /* Foreign table */ 
rel->rel_parallel_workers = -1; 
break;  
default:
/* Other relkinds are not supported */  
Assert(false);  
}

But I am not familiar with get_relation_info and parallel_workers  specific. So 
I suspect real fix may be quite different.

Also I would suggest to fix it in all supported stable branches that have  
parallel_workers option, because this bug may give something unexpected when 
some toast options are set.




Re: Using logical replication with older version subscribers

2019-01-07 Thread Peter Eisentraut
On 07/01/2019 10:54, Magnus Hagander wrote:
> I assume you are not suggesting a publication with truncation enabled
> should just ignore replicating truncation if the downstream server
> doesn't support it? Because if that's the suggestion, then a strong -1
> from me on that. 

Yes, that's the reason why we intentionally left it as it is now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: START/END line number for COPY FROM

2019-01-07 Thread Peter Eisentraut
On 06/01/2019 12:59, Surafel Temesgen wrote:
> it is not always the case to have in control of the data importing it
> may came from
> external system

But the problem that David described remains:  If your data loading
requirement is so complicated that you need to load the file in chunks,
then doing it by line numbers will require you to skip over the leading
lines at every subsequent chunk.  That's not going to be good for larger
files.

I think your problem needs a different solution.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] Add accumulated statistics

2019-01-07 Thread Adrien NAYRAT

On 1/7/19 6:34 AM, Tsunakawa, Takayuki wrote:

1. Doesn't provide precise data
Sampling could miss intermittent short waits, e.g., buffer content lock waits 
during checkpoints.  This might make it difficult or impossible to solve 
transient performance problems, such as infrequent 100 millisecond response 
times while the normal response time is a few milliseconds.
The proposed wait event collection doesn't miss anything.

2. Overuses resources
We may be able to shorten the sampling interval to 10 ms or even 1 ms to detect 
short periods of problems.  However, the sampled data of active sessions become 
voluminous in memory and storage.  It would take longer to analyze those 
samples.  Also, the background sampling process prevents the CPU core from 
becoming idle to save power, which bgwriter and walwriter tries to avoid by 
hibernation.
The proposed wait event collection just records what actually happened.  No 
waste.  Would it use many resources if waits happen frequently?  That leads to 
our motivation to reduce waits.


FIY, wait events have been added in PoWA by using pg_wait_sampling  
extension :  
https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa.html


pg_wait_sampling sample the wait events in shared memory and PoWA store  
them.




Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-07 Thread Jesper Pedersen

Hi Peter,

On 1/3/19 7:03 AM, Peter Eisentraut wrote:

Perhaps more documentation would be useful here.



Here is v2 that just notes that the -j option isn't passed down.

Best regards,
 Jesper
>From fe0be6c1f5cbcaeb2981ff4dcfceae2e2cb286b7 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the -j option isn't passed down to vacuumdb by
 default.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pgupgrade.sgml |  3 ++-
 src/bin/pg_upgrade/check.c  | 21 -
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..937e95688d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,8 @@ NET STOP postgresql-&majorversion;
  in parallel;  a good place to start is the maximum of the number of
  CPU cores and tablespaces.  This option can dramatically reduce the
  time to upgrade a multi-database server running on a multiprocessor
- machine.
+ machine. Note, that this option isn't passed to the
+ vacuumdb application by default.
 
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..7bae43a0b1 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,11 +495,22 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
 	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
-- 
2.17.2



Re: speeding up planning with partitions

2019-01-07 Thread Justin Pryzby
On Mon, Jan 07, 2019 at 09:40:50PM +1300, David Rowley wrote:
> On Fri, 4 Jan 2019 at 04:39, Justin Pryzby  wrote:
> > Running 11dev with your v10 patch applied, this takes 2244ms with empty 
> > buffer
> > cache after postmaster restarted on a totally untuned instance (and a new
> > backend, with no cached opened files).
> >
> > I was curious why it took even 2sec, and why it did so many opens() (but not
> > 20k of them that PG11 does):
> 
> It would be pretty hard to know that without seeing the query plan.

The issue was this:
> > It turns out 1050 open()s are due to historic data which is no longer being
> > loaded and therefor never converted to relkind=p (but hasn't exceeded the
> > retention interval so not yet DROPped, either).

So there's no evidence of any issue with the patch.

Justin



Re: Using logical replication with older version subscribers

2019-01-07 Thread Masahiko Sawada
On Mon, Jan 7, 2019 at 6:54 PM Magnus Hagander  wrote:
>
> On Mon, Jan 7, 2019 at 9:01 AM Masahiko Sawada  wrote:
>>
>> Hi,
>>
>> Logical replication enables us to replicate data changes to different
>> major version PostgreSQL as the doc says[1]. However the current
>> logical replication can work fine only if replicating to a newer major
>> version PostgreSQL such as from 10 to 11. Regarding using logical
>> replication with older major version, say sending from 11 to 10, it
>> will stop when a subscriber receives a truncate change because it's
>> not supported at PostgreSQL 10.  I think there are use cases where
>> using logical replication with a subscriber of an older version
>> PostgreSQL but I'm not sure we should support it.
>>
>> Of course in such case we can set the publication with publish =
>> 'insert, update, delete' to not send truncate changes but it requres
>> users to recognize the feature differences between major vesions and
>> in the future it will get more complex. So I think it would be better
>> to be configured autometically by PostgreSQL.
>>
>> To fix it we can make subscribers send its supporting message types to
>> the publisher at a startup time so that the publisher doesn't send
>> unsupported message types on the subscriber. Or as an another idea, we
>> can make subscribers ignore unsupported logical replication message
>> types instead of raising an error. Feedback is very welcome.
>>
>> [1] https://www.postgresql.org/docs/devel/logical-replication.html
>
>
> How would that work in practice?
>
> If an 11 server is sent a message saying "client does not support truncate", 
> and immediately generates an error, then you can no longer replicate even if 
> you turn off truncate. And if it delays it until the actual replication of 
> the item, then you just get the error on the primary ìnstead of the standby?
>
> I assume you are not suggesting a publication with truncation enabled should 
> just ignore replicating truncation if the downstream server doesn't support 
> it? Because if that's the suggestion, then a strong -1 from me on that.
>

I'm thinking that the we can make the pgoutput plugin recognize that
the downstream server doesn't support it and not send it. For example,
even if we create a publication with publish = 'truncate' we send
nothing due to checking supported message types by pgoutput plugin if
the downstream server is PostgreSQL server and its version is older
than 10.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Use atexit() in initdb and pg_basebackup

2019-01-07 Thread Peter Eisentraut
On 05/01/2019 16:42, Alvaro Herrera wrote:
>> Yeah.  Actually, we already have a solution of this in pg_basebackup,
>> with a bool success variable.  I rewrote it like that.  At least it's
>> better for uniformity.
> 
> Ah, yeah, much better, LGTM.
> 
>> I also added an atexit() conversion in isolationtester.  It's mostly the
>> same thing.
> 
> LGTM in a quick eyeball.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Using logical replication with older version subscribers

2019-01-07 Thread Magnus Hagander
On Mon, Jan 7, 2019 at 3:37 PM Masahiko Sawada 
wrote:

> On Mon, Jan 7, 2019 at 6:54 PM Magnus Hagander 
> wrote:
> >
> > On Mon, Jan 7, 2019 at 9:01 AM Masahiko Sawada 
> wrote:
> >>
> >> Hi,
> >>
> >> Logical replication enables us to replicate data changes to different
> >> major version PostgreSQL as the doc says[1]. However the current
> >> logical replication can work fine only if replicating to a newer major
> >> version PostgreSQL such as from 10 to 11. Regarding using logical
> >> replication with older major version, say sending from 11 to 10, it
> >> will stop when a subscriber receives a truncate change because it's
> >> not supported at PostgreSQL 10.  I think there are use cases where
> >> using logical replication with a subscriber of an older version
> >> PostgreSQL but I'm not sure we should support it.
> >>
> >> Of course in such case we can set the publication with publish =
> >> 'insert, update, delete' to not send truncate changes but it requres
> >> users to recognize the feature differences between major vesions and
> >> in the future it will get more complex. So I think it would be better
> >> to be configured autometically by PostgreSQL.
> >>
> >> To fix it we can make subscribers send its supporting message types to
> >> the publisher at a startup time so that the publisher doesn't send
> >> unsupported message types on the subscriber. Or as an another idea, we
> >> can make subscribers ignore unsupported logical replication message
> >> types instead of raising an error. Feedback is very welcome.
> >>
> >> [1] https://www.postgresql.org/docs/devel/logical-replication.html
> >
> >
> > How would that work in practice?
> >
> > If an 11 server is sent a message saying "client does not support
> truncate", and immediately generates an error, then you can no longer
> replicate even if you turn off truncate. And if it delays it until the
> actual replication of the item, then you just get the error on the primary
> ìnstead of the standby?
> >
> > I assume you are not suggesting a publication with truncation enabled
> should just ignore replicating truncation if the downstream server doesn't
> support it? Because if that's the suggestion, then a strong -1 from me on
> that.
> >
>
> I'm thinking that the we can make the pgoutput plugin recognize that
> the downstream server doesn't support it and not send it. For example,
> even if we create a publication with publish = 'truncate' we send
> nothing due to checking supported message types by pgoutput plugin if
> the downstream server is PostgreSQL server and its version is older
> than 10.
>

That's the idea I definitely say a strong -1 to.

Ignoring the truncate message isn't going to make it work. It's just going
to mean that the downstream data is incorrect vs what the publisher
thought. The correct solution here is to not publish the truncate, which we
already have. I can see the point in changing it so the error message
becomes more obvious (already when the subscriber connects, and not a
random time later when the first truncate replicates), but *silently*
ignoring it seems like a terrible choice.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2019-01-07 Thread Nikolay Shaplov
В письме от четверг, 3 января 2019 г. 17:15:08 MSK пользователь Alvaro Herrera 
написал:

> > Can we think about backward compatibility aliases?
.
> > And keep them for as log as needed to avoid #if VERSION in thirdparty
> > code?
> Well, if you do this, at some point you need to tell the extension
> author to change the code.  Or they can just keep the code unchanged
> forever.  I don't think it's worth the bother.
Ok, you are the Boss ;-)

I've reverted all the macros names back to Relation* except those related to 
fillfactor. 

> I would have liked to get a StaticAssert in the definition, but I don't
> think it's possible.  A standard Assert() should be possible, though.

This is a really good idea. I felt uncomfortable with this (ViewOptions *) 
type convertation without checking that it is really View. With Assert I fell 
much more safe.

I've added AssertMacro to all options related macroses.

And so, adding asserts discovered a bug with parallel_workers options. That 
are defined as Heap only option, but in get_relation_info in src/backend/
optimizer/util/plancat.c  a RelationGetParallelWorkers macros is also called 
for toasts and other kinds of relations.
I've started a new thread dedicated to this issue, since it needs to be fixed 
regardless to this patch.
And for now  I added here a hotfix for this issue that is good enough for now.


I also reworked some comments. BTW do you know what is this comments spoke 
about:

 * RelationGetFillFactor() and RelationGetTargetPageFreeSpace() can only
 * be applied to relations that use this format or a superset for   
 * private options data.

It is speaking about some old times when there can be some other type of 
options? 'cause I do not understand what it is speaking about. 
I've removed it, I hope I did not remove anything important.diff --git a/.gitignore b/.gitignore
index 794e35b..2dfbbe1 100644
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index eece89a..833d084 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -22,7 +22,7 @@
 #include "access/htup_details.h"
 #include "access/nbtree.h"
 #include "access/reloptions.h"
-#include "access/spgist.h"
+#include "access/spgist_private.h"
 #include "access/tuptoaster.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -46,9 +46,8 @@
  * upper and lower bounds (if applicable); for strings, consider a validation
  * routine.
  * (ii) add a record below (or use add__reloption).
- * (iii) add it to the appropriate options struct (perhaps StdRdOptions)
- * (iv) add it to the appropriate handling routine (perhaps
- * default_reloptions)
+ * (iii) add it to the appropriate options struct
+ * (iv) add it to the appropriate handling routine
  * (v) make sure the lock level is set correctly for that operation
  * (vi) don't forget to document the option
  *
@@ -1019,7 +1018,7 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
 		case RELKIND_PARTITIONED_TABLE:
-			options = heap_reloptions(classForm->relkind, datum, false);
+			options = relation_reloptions(classForm->relkind, datum, false);
 			break;
 		case RELKIND_VIEW:
 			options = view_reloptions(datum, false);
@@ -1352,63 +1351,69 @@ fillRelOptions(void *rdopts, Size basesize,
 
 
 /*
- * Option parser for anything that uses StdRdOptions.
+ * Option parsing definition for autovacuum. Used in toast and heap options.
+ */
+
+#define AUTOVACUUM_RELOPTIONS(OFFSET)\
+		{"autovacuum_enabled", RELOPT_TYPE_BOOL, \
+		OFFSET + offsetof(AutoVacOpts, enabled)},\
+		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_threshold)},   \
+		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, analyze_threshold)},  \
+		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_delay)},  \
+		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_limit)},  \
+		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_min_age)}, \
+		{"autovacuum_freeze_max_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_max_age)}, \
+		{"autovacuum_freeze_table_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, freeze_table_age)},   \
+		{"autovacuum_multixact_freeze_min_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_min_age)},   \
+		{"autovacuum_multixact_freeze_max_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_max_age)},   \
+		{"autovacuum_multixact_freeze_table_age", RELOPT_TYPE_INT,   \
+		OFFSET + of

Re: Grant documentation about "all tables"

2019-01-07 Thread Lætitia Avrot
Hi Stephen,

Le sam. 5 janv. 2019 à 20:41, Stephen Frost  a écrit :

> Greetings Lætitia!
>
> * Lætitia Avrot (laetitia.av...@gmail.com) wrote:
> > When you look at Postgres' SQL reference documentation for `GRANT`, the
> > `ALL TABLES` clause is explained as :
> >
> > > ALL TABLES also affects views and foreign tables, just like
> > the specific-object GRANT command.
> >
> > A colleague of mine was asking himself if it included materialized views
> or
> > not (well, yes it does).
> >
> > I made that tiny patch to add materialized views to the list. It builds
> on
> > my laptop.
> >
> > Then another question crossed my mind... What about partitioned tables ?
> > I'm pretty sure it works for them too (because they're basically tables)
> > but should we add them too ? I couldn't decide whether to add them too or
> > not so I refrain from doing it and am asking you the question.
>
> The question here, at least in my mind, is if we feel it necessary to
> list out all of the specific kinds of "views" (as in, regular views and
> materialized views), and the same question applies to tables- do we list
> out all the specific kinds of "tables" (to include partitioned tables),
> or not?
>
> To figure that out, I'd suggest looking at existing documentation where
> we have similar lists and see what we've done in the past.  If those
> other cases list everything explicitly, then the answer is clear, and if
> they don't, then we can either leave the documentation as-is, or come up
> with a complete list of changes that need to be made.
>

Excellent advice. I looked at several doc pages that could be appropriate
on the subject and I came up with this list :

- [ALTER DEFAULT PRIVILEDGES](
https://www.postgresql.org/docs/current/sql-alterdefaultprivileges.html):
   This documentation page indicates that :
> Currently, only the privileges for schemas, tables (including views and
foreign tables), sequences, functions, and types (including domains) can be
altered.
-> foreign tables are mentioned but nothing about materialized views or
temporary or partitioned tables...

- [COPY](https://www.postgresql.org/docs/current/sql-copy.html) :
> COPY FROM can be used with plain, foreign, or partitioned tables or with
views that have INSTEAD OF INSERT triggers.
-> different tables are listed, still nothing about materialized views and
temporary tables

- [CREATE PUBLICATION](
https://www.postgresql.org/docs/current/sql-createpublication.html)
> FOR ALL TABLES : Marks the publication as one that replicates changes for
all tables in the database, including tables created in the future.
-> I included it here because this documentation page because it uses a
`FOR ALL TABLES` clause, but this time only base tables are included. (As
written in [that other page](
https://www.postgresql.org/docs/current/logical-replication-restrictions.html)
"Replication is only possible from base tables to base tables. That is, the
tables on the publication and on the subscription side must be normal
tables, not views, materialized views, partition root tables, or foreign
tables.")

- [sepgsql documentation page](
https://www.postgresql.org/docs/11/sepgsql.html)
> Currently, `sepgsql` allows security labels to be assigned to schemas,
tables, columns, sequences, views, and functions.
-> in this documentation, I suppose (from reading source code) "tables"
means all kind of tables and "views" means all kind of views, just as in
the `GRANT` documentation page.

So we have this:

| Meaning
 | Number of documentation pages |
|---|---|
| "tables" meaning all kind of tables and "views" meaning all kind of views
| 2 |
| ALL TABLES meaning only tables
| 1 |
| "tables" meaning all tables and views
 | 1 |
| "tables" meaning only tables
| 1 |

Maybe we need to standardize this. I think it's better to add explicitly
each type of table (plain, partitioned, foreign, temporary) or view (plain,
materialized) when needed, so there is no doubt for everyone. What do you
think ?

Cheers,

Lætitia
-- 
*Think! Do you really need to print this email ? *
*There is no Planet B.*


Re: Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead)

2019-01-07 Thread Alvaro Herrera
On 2019-Jan-07, Nikolay Shaplov wrote:

> Asserts are cool thing. I found some unexpected stuff. 
> 
> parallel_workers option is claimed to be heap-only option.
> 
> But in src/backend/optimizer/util/plancat.c in get_relation_info 
> RelationGetParallelWorkers is being called for both heap and toast tables 
> (and 
> not only for them).

Ugh.

I wonder if it makes sense for a toast table to have parallel_workers.
I suppose it's not useful, since a toast table is not supposed to be
scanned in bulk, only accessed through the tuptoaster interface.  But on
the other hand, you *can* do "select * from pg_toast_NNN", and it almost
all respects a toast table is just like a regular heap table.

> Because usually there are no reloptions for toast,  it returns default -1 
> value. If some reloptions were set for toast table, 
> RelationGetParallelWorkers 
> will return a value from uninitialized memory.

Well, if it returns a negative number or zero, the rest of the server
should behave identically to it returning the -1 that was intended.  And
if it returns a positive number, the worst that will happen is that a
Path structure somewhere will have a positive number of workers, but
since queries on toast tables are not planned in the regular way, most
likely those Paths will never exist anyway.

So while I agree that this is a bug, it seems pretty benign.

Unless I overlook something.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] check for ctags utility in make_ctags

2019-01-07 Thread Nikolay Shaplov
В письме от воскресенье, 6 января 2019 г. 17:50:36 MSK пользователь Andrew 
Dunstan написал:

> > The correct way to code this is to depend on the exit code,
> > not the text output:
> > 
> > if command -v etags >/dev/null
> > then
> >   : ok
> > else
> >   echo etags not found
> >   exit 1
> > fi
> 
> more succinctly,
> command -v etags >/dev/null || { echo etags not found; exit 1;}

If it is good enough for you, then is is good for me for sure...
Imported it to the patch.

diff --git a/.gitignore b/.gitignore
index 794e35b..2dfbbe1 100644
diff --git a/src/tools/make_ctags b/src/tools/make_ctags
index 1609c07..1e71692 100755
--- a/src/tools/make_ctags
+++ b/src/tools/make_ctags
@@ -2,6 +2,9 @@
 
 # src/tools/make_ctags
 
+command -v ctags >/dev/null || \
+	{ echo "'ctags' utility is not found" 1>&2; exit 1;}
+
 trap "rm -f /tmp/$$" 0 1 2 3 15
 rm -f ./tags
 
diff --git a/src/tools/make_etags b/src/tools/make_etags
index 3ce96bc..6dc6710 100755
--- a/src/tools/make_etags
+++ b/src/tools/make_etags
@@ -2,6 +2,9 @@
 
 # src/tools/make_etags
 
+command -v etags >/dev/null || \
+	{ echo "'etags' utility is not found" 1>&2; exit 1;}
+
 rm -f ./TAGS
 
 find `pwd`/ -type f -name '*.[chyl]' -print |


Re: partitioned tables referenced by FKs

2019-01-07 Thread Jesper Pedersen

Hi,

On 11/30/18 2:12 PM, Alvaro Herrera wrote:

Here's a more credible version of this patch series.



The patch series applies with hunks, and passes check-world.

No comments for 0001, 0002, 0003 and 0005.

While I'm still looking at 0004 - should we have a test that updates one 
of the constraints of fk to another partition in pk ?


I'll didn't try this yet with [1], so sending you a flamegraph of 
INSERTs off-list, since data loading takes a while during an OLTP based 
work load.


Thanks for working on this !

[1] https://commitfest.postgresql.org/21/1897/

Best regards,
 Jesper



Re: Doc client_min_messages patch vis. INFO message severity

2019-01-07 Thread Andrew Gierth
> "Karl" == Karl O Pinc  writes:

 Karl> Hi,
 Karl> Attached is documentation patch: doc_client_min_messages_v1.patch

 Karl> Document that INFO severity messages are always sent
 Karl> to the client.

Pushed, thanks.

-- 
Andrew (irc:RhodiumToad)



Re: ALTER INDEX fails on partitioned index

2019-01-07 Thread Alvaro Herrera
On 2019-Jan-05, Justin Pryzby wrote:

> 12dev and 11.1:
> 
> postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
> postgres=# CREATE INDEX ON t(i) WITH(fillfactor=11);
> postgres=# ALTER INDEX t_i_idx SET (fillfactor=12);
> ERROR:  42809: "t_i_idx" is not a table, view, materialized view, or index
> LOCATION:  ATWrongRelkindError, tablecmds.c:5031
> 
> I can't see that's deliberate,

Well, I deliberately ignored that aspect of the report at the time as it
seemed to me (per discussion in thread [1]) that this behavior was
intentional.  However, if I think in terms of things like
pages_per_range in BRIN indexes, this decision seems to be a mistake,
because surely we should propagate that value to children.

[1] 
https://www.postgresql.org/message-id/flat/CAH2-WzkOKptQiE51Bh4_xeEHhaBwHkZkGtKizrFMgEkfUuRRQg%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Problem with parallel_workers option (Was Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead)

2019-01-07 Thread Nikolay Shaplov
В письме от понедельник, 7 января 2019 г. 13:56:48 MSK пользователь Alvaro 
Herrera написал:

> > Asserts are cool thing. I found some unexpected stuff.
> > 
> > parallel_workers option is claimed to be heap-only option.
> > 
> > But in src/backend/optimizer/util/plancat.c in get_relation_info
> > RelationGetParallelWorkers is being called for both heap and toast tables
> > (and not only for them).
> 
> Ugh.
> 
> I wonder if it makes sense for a toast table to have parallel_workers.
> I suppose it's not useful, since a toast table is not supposed to be
> scanned in bulk, only accessed through the tuptoaster interface.  But on
> the other hand, you *can* do "select * from pg_toast_NNN", and it almost
> all respects a toast table is just like a regular heap table.
 
If parallel_workers is not intended to be used anywhere except heap and 
matview, then may be better to make fix more relaxed. Like 

if  (relation->rd_rel->relkind == RELKIND_RELATION ||  
 relation->rd_rel->relkind == RELKIND_MATVIEW  )   
 rel->rel_parallel_workers = 
RelationGetParallelWorkers(relation,-1);
else
 rel->rel_parallel_workers = -1;

> > Because usually there are no reloptions for toast,  it returns default -1
> > value. If some reloptions were set for toast table,
> > RelationGetParallelWorkers will return a value from uninitialized memory.
> 
> Well, if it returns a negative number or zero, the rest of the server
> should behave identically to it returning the -1 that was intended.  And
> if it returns a positive number, the worst that will happen is that a
> Path structure somewhere will have a positive number of workers, but
> since queries on toast tables are not planned in the regular way, most
> likely those Paths will never exist anyway.
> 
> So while I agree that this is a bug, it seems pretty benign.
It is mild until somebody introduce PartitionedlRelOptions. Then 
PartitionedlRelOptions * will be converted to StdRdOptions* and we will get 
segmentation fault...

So may be there is no need for back fix, but better to fix it now :-)
May be with the patch for StdRdOptions removal.




Re: ALTER INDEX fails on partitioned index

2019-01-07 Thread Justin Pryzby
On Mon, Jan 07, 2019 at 04:23:30PM -0300, Alvaro Herrera wrote:
> On 2019-Jan-05, Justin Pryzby wrote:
> 
> > 12dev and 11.1:
> > 
> > postgres=# CREATE TABLE t(i int)PARTITION BY RANGE(i);
> > postgres=# CREATE INDEX ON t(i) WITH(fillfactor=11);
> > postgres=# ALTER INDEX t_i_idx SET (fillfactor=12);
> > ERROR:  42809: "t_i_idx" is not a table, view, materialized view, or index
> > LOCATION:  ATWrongRelkindError, tablecmds.c:5031
> > 
> > I can't see that's deliberate,
> 
> Well, I deliberately ignored that aspect of the report at the time as it
> seemed to me (per discussion in thread [1]) that this behavior was
> intentional.  However, if I think in terms of things like
> pages_per_range in BRIN indexes, this decision seems to be a mistake,
> because surely we should propagate that value to children.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAH2-WzkOKptQiE51Bh4_xeEHhaBwHkZkGtKizrFMgEkfUuRRQg%40mail.gmail.com

I don't see any discussion regarding ALTER (?)

Actually, I ran into this while trying to set pages_per_range.
But shouldn't it also work for fillfactor ?

Thanks,
Justin



Re: WIP: Avoid creation of the free space map for small tables

2019-01-07 Thread John Naylor
On 1/3/19, Amit Kapila  wrote:
> Yeah, that makes sense, John, can you provide a patch on top of the
> current patch where we check either the last block or every other
> block.

I've attached two patches for testing. Each one applies on top of the
current patch.

Mithun, I'll respond to your other review comments later this week.

-John Naylor
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2ab1cb58b7..d6352aabd9 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -1103,15 +1103,9 @@ fsm_allow_writes(Relation rel, BlockNumber heapblk,
 static void
 fsm_local_set(Relation rel, BlockNumber curr_nblocks)
 {
-	BlockNumber blkno,
-cached_target_block;
+	BlockNumber cached_target_block;
 
-	/*
-	 * Mark blocks available starting after the last block we have mapped,
-	 * and ending at the current last block in the relation.
-	 */
-	for (blkno = fsm_local_map.nblocks; blkno < curr_nblocks; blkno++)
-		fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
+	fsm_local_map.map[curr_nblocks - 1] = FSM_LOCAL_AVAIL;
 
 	/* Cache the number of blocks. */
 	fsm_local_map.nblocks = curr_nblocks;
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 2ab1cb58b7..820a2579ae 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -1110,8 +1110,15 @@ fsm_local_set(Relation rel, BlockNumber curr_nblocks)
 	 * Mark blocks available starting after the last block we have mapped,
 	 * and ending at the current last block in the relation.
 	 */
-	for (blkno = fsm_local_map.nblocks; blkno < curr_nblocks; blkno++)
+	blkno = curr_nblocks - 1;
+	for (;;)
+	{
 		fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
+		if (blkno >= fsm_local_map.nblocks + 2)
+			blkno -= 2;
+		else
+			break;
+	}
 
 	/* Cache the number of blocks. */
 	fsm_local_map.nblocks = curr_nblocks;
diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index 4c44c4bd6e..973b9319e7 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,16 +2,16 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- Fill 2 blocks with as many large records as will fit
+-- Fill 3 blocks with as many large records as will fit
 -- No FSM
 INSERT INTO fsm_check_size SELECT g, rpad('', 1024, 'a')
-FROM generate_series(1,7*2) g;
+FROM generate_series(1,7*3) g;
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
- 16384 |0
+ 24576 |0
 (1 row)
 
 -- Clear some space on block 0
@@ -26,7 +26,7 @@ SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
- 16384 |0
+ 24576 |0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql
index fad99da1c2..a864c7fd88 100644
--- a/src/test/regress/sql/fsm.sql
+++ b/src/test/regress/sql/fsm.sql
@@ -4,10 +4,10 @@
 
 CREATE TABLE fsm_check_size (num int, str text);
 
--- Fill 2 blocks with as many large records as will fit
+-- Fill 3 blocks with as many large records as will fit
 -- No FSM
 INSERT INTO fsm_check_size SELECT g, rpad('', 1024, 'a')
-FROM generate_series(1,7*2) g;
+FROM generate_series(1,7*3) g;
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
@@ -16,7 +16,7 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 DELETE FROM fsm_check_size where num <= 5;
 VACUUM fsm_check_size;
 
--- Insert small record in block 1 to set the cached smgr targetBlock
+-- Insert small record in block 2 to set the cached smgr targetBlock
 INSERT INTO fsm_check_size values(99, 'b');
 
 -- Insert large record and make sure it goes in block 0 rather than


Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread John Naylor
On 1/6/19, Tom Lane  wrote:
> I've pushed that version (v8 + max_kw_len); if the buildfarm doesn't
> fall over, we can move on with looking at hashing.

Thank you. The API adjustment looks good, and I'm glad that splitting
out the aux info led to even further cleanups.

-John Naylor



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
I wrote:
> I took a quick look through the NetBSD nbperf sources at
> http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/
> and I concur with your judgment that we could manage translating
> that into Perl, especially if we only implement the parts we need.

Here's an implementation of that, using the hash functions you showed
upthread.  The speed of the Perl script seems to be pretty acceptable;
less than 100ms to handle the main SQL keyword list, on my machine.
Yeah, the C version might be less than 1ms, but I don't think that
we need to put up with non-Perl build tooling for that.

Using the same test case as before (parsing information_schema.sql),
I get runtimes around 3560 ms, a shade better than my jury-rigged
prototype.

Probably there's a lot to be criticized about the Perl style below;
anybody feel a need to rewrite it?

regards, tom lane

diff --git a/src/common/kwlookup.c b/src/common/kwlookup.c
index d72842e..445b99d 100644
*** a/src/common/kwlookup.c
--- b/src/common/kwlookup.c
***
*** 35,94 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *text,
    const ScanKeywordList *keywords)
  {
! 	int			len,
! i;
! 	char		word[NAMEDATALEN];
! 	const char *kw_string;
! 	const uint16 *kw_offsets;
! 	const uint16 *low;
! 	const uint16 *high;
! 
! 	len = strlen(text);
  
  	if (len > keywords->max_kw_len)
! 		return -1;/* too long to be any keyword */
! 
! 	/* We assume all keywords are shorter than NAMEDATALEN. */
! 	Assert(len < NAMEDATALEN);
  
  	/*
! 	 * Apply an ASCII-only downcasing.  We must not use tolower() since it may
! 	 * produce the wrong translation in some locales (eg, Turkish).
  	 */
! 	for (i = 0; i < len; i++)
! 	{
! 		char		ch = text[i];
! 
! 		if (ch >= 'A' && ch <= 'Z')
! 			ch += 'a' - 'A';
! 		word[i] = ch;
! 	}
! 	word[len] = '\0';
  
  	/*
! 	 * Now do a binary search using plain strcmp() comparison.
  	 */
! 	kw_string = keywords->kw_string;
! 	kw_offsets = keywords->kw_offsets;
! 	low = kw_offsets;
! 	high = kw_offsets + (keywords->num_keywords - 1);
! 	while (low <= high)
  	{
! 		const uint16 *middle;
! 		int			difference;
  
! 		middle = low + (high - low) / 2;
! 		difference = strcmp(kw_string + *middle, word);
! 		if (difference == 0)
! 			return middle - kw_offsets;
! 		else if (difference < 0)
! 			low = middle + 1;
! 		else
! 			high = middle - 1;
  	}
  
! 	return -1;
  }
--- 35,81 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *str,
    const ScanKeywordList *keywords)
  {
! 	size_t		len;
! 	uint32		h;
! 	const char *kw;
  
+ 	/*
+ 	 * Reject immediately if too long to be any keyword.  This saves useless
+ 	 * hashing and downcasing work on long strings.
+ 	 */
+ 	len = strlen(str);
  	if (len > keywords->max_kw_len)
! 		return -1;
  
  	/*
! 	 * Compute the hash function.  We assume it was generated to produce
! 	 * case-insensitive results.  Since it's a perfect hash, we need only
! 	 * match to the specific keyword it identifies.
  	 */
! 	h = keywords->hash(str, len);
  
  	/*
! 	 * Compare character-by-character to see if we have a match, applying an
! 	 * ASCII-only downcasing to the input characters.  We must not use
! 	 * tolower() since it may produce the wrong translation in some locales
! 	 * (eg, Turkish).
  	 */
! 	kw = GetScanKeyword(h, keywords);
! 	while (*str != '\0')
  	{
! 		char		ch = *str++;
  
! 		if (ch >= 'A' && ch <= 'Z')
! 			ch += 'a' - 'A';
! 		if (ch != *kw++)
! 			return -1;
  	}
+ 	if (*kw != '\0')
+ 		return -1;
  
! 	/* Success! */
! 	return h;
  }
diff --git a/src/include/common/kwlookup.h b/src/include/common/kwlookup.h
index 39efb35..a6609ee 100644
*** a/src/include/common/kwlookup.h
--- b/src/include/common/kwlookup.h
***
*** 14,19 
--- 14,22 
  #ifndef KWLOOKUP_H
  #define KWLOOKUP_H
  
+ /* Hash function used by ScanKeywordLookup */
+ typedef uint32 (*ScanKeywordHashFunc) (const void *key, size_t keylen);
+ 
  /*
   * This struct contains the data needed by ScanKeywordLookup to perform a
   * search within a set of keywords.  The contents are typically generated by
*** typedef struct ScanKeywordList
*** 23,28 
--- 26,32 
  {
  	const char *kw_string;		/* all keywords in order, separated by \0 */
  	const uint16 *kw_offsets;	/* offsets to the start of each keyword */
+ 	ScanKeywordHashFunc hash;	/* perfect hash function for keywords */
  	int			num_keywords;	/* number of keywords */
  	int			max_kw_len;		/* length of longest keyword */
  } ScanKeywordList;
diff --git a/src/interfaces/ecpg/preproc/Makefile b/src/interfaces/ecpg/preproc/Makefile
index b5b74a3..abfe3cc 100644
*** a/src/interfaces/ecpg/preproc/Makefile
--- b/src/interfaces/ecpg/preproc/Makefile
*** preproc.y: ../../../backend/parser/gram.
*** 57,63 
  
  # generate keyword headers
  c_kwlist_d.h: c_kwlist.h $(GEN_KEYWORDLIST)
! 	$(PERL) $(GEN_KEYWORDL

Re: [proposal] Add an option for returning SQLSTATE in psql error message

2019-01-07 Thread didier
On Sat, Jan 5, 2019 at 6:30 PM Tom Lane  wrote:

> Peter Eisentraut  writes:
> > Why are you not including a test for \set VERBOSITY verbose?
>
> Stability of the output would be a problem ...
>
> Yes it could moreover the functionality wasn't tested before.
Should I add one ?


GSoC 2019

2019-01-07 Thread Stephen Frost
Greetings -hackers,

Google Summer of Code is back for 2019!  They have a similar set of
requirements, expectations, and timeline as last year.

Now is the time to be working to get together a set of projects we'd
like to have GSoC students work on over the summer.  Similar to last
year, we need to have a good set of projects for students to choose from
in advance of the deadline for mentoring organizations.

The deadline for Mentoring organizations to apply is: February 6.

The list of accepted organization will be published around February 26.

Unsurprisingly, we'll need to have an Ideas page again, so I've gone
ahead and created one (copying last year's):

https://wiki.postgresql.org/wiki/GSoC_2019

Google discusses what makes a good "Ideas" list here:

https://google.github.io/gsocguides/mentor/defining-a-project-ideas-list.html

All the entries are marked with '2018' to indicate they were pulled from
last year.  If the project from last year is still relevant, please
update it to be '2019' and make sure to update all of the information
(in particular, make sure to list yourself as a mentor and remove the
other mentors, as appropriate).

New entries are certainly welcome and encouraged, just be sure to note
them as '2019' when you add it.

Projects from last year which were worked on but have significant
follow-on work to be completed are absolutely welcome as well- simply
update the description appropriately and mark it as being for '2019'.

When we get closer to actually submitting our application, I'll clean
out the '2018' entries that didn't get any updates.

As a reminder, each idea on the page should be in the format that the
other entries are in and should include:

- Project title/one-line description
- Brief, 2-5 sentence, description of the project (remember, these are
  12-week projects)
- Description of programming skills needed and estimation of the
  difficulty level
- List of potential mentors
- Expected Outcomes

As with last year, please consider PostgreSQL to be an "Umbrella"
project and that anything which would be considered "PostgreSQL Family"
per the News/Announce policy [2] is likely to be acceptable as a
PostgreSQL GSoC project.

In other words, if you're a contributor or developer on barman,
pgBackRest, the PostgreSQL website (pgweb), the PgEU/PgUS website code
(pgeu-website), pgAdmin4, PostgresXL, pgbouncer, Citus, pldebugger, the
PG RPMs (pgrpms), the JDBC driver, the ODBC driver, or any of the many
other PG Family projects, please feel free to add a project for
consideration!  If we get quite a few, we can organize the page further
based on which project or maybe what skills are needed or similar.

Let's have another great year of GSoC with PostgreSQL!

Thanks!

Stephen

[1]: https://developers.google.com/open-source/gsoc/timeline
[2]: https://wiki.postgresql.org/wiki/NewsEventsApproval


signature.asc
Description: PGP signature


Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 16:11:04 -0500, Tom Lane wrote:
> I wrote:
> > I took a quick look through the NetBSD nbperf sources at
> > http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.bin/nbperf/
> > and I concur with your judgment that we could manage translating
> > that into Perl, especially if we only implement the parts we need.
> 
> Here's an implementation of that, using the hash functions you showed
> upthread.  The speed of the Perl script seems to be pretty acceptable;
> less than 100ms to handle the main SQL keyword list, on my machine.
> Yeah, the C version might be less than 1ms, but I don't think that
> we need to put up with non-Perl build tooling for that.
> 
> Using the same test case as before (parsing information_schema.sql),
> I get runtimes around 3560 ms, a shade better than my jury-rigged
> prototype.
> 
> Probably there's a lot to be criticized about the Perl style below;
> anybody feel a need to rewrite it?

Hm, shouldn't we extract the perfect hash generation into a perl module
or such? It seems that there's plenty other possible uses for it.

Greetings,

Andres Freund



Re: valgrind issues on Fedora 28

2019-01-07 Thread Tomas Vondra
On 12/15/18 12:32 AM, Andrew Dunstan wrote:
> 
> On 12/14/18 4:35 PM, Tomas Vondra wrote:
>> On 12/14/18 4:58 PM, Tom Lane wrote:
>>> ...
>>>
>>> In general, I'm not particularly on board with our valgrind.supp
>>> carrying suppressions for code outside our own code base: I think
>>> that's assuming WAY too much about which version of what is installed
>>> on a particular box.
>>>
>> Fair point.
>>
>>> Maybe we could do something to make it simpler to have custom
>>> suppressions?  Not sure what, though.
>>>
>> I was thinking that perhaps we could allows specifying path to extra
>> suppressions and pass that to valgrind.
>>
>> But we don't actually invoke valgrind, that's something people do on
>> their own anyway - so we don't have anywhere to pass the path to. And
>> whoever invokes valgrind can simply stick it directly into the command
>> they're using (as it allows specifying multiple --suppressions=
>> options). Or perhaps just put it into ~/.valgrindrc.
>>
>> So perhaps we should simply revert that commit and be done with it.
>>
>> One place that will need to solve it is buildfarm client, but it could
>> pick either of the options I mentioned.
>>
>>
> 
> The buildfarm client has a parameter in the config file for valgrind
> options. All you would have to do is an an extra --suppressions setting
> in there.
> 

OK, makes sense. I'll revert the commit tomorrow.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Displaying and dumping of table access methods

2019-01-07 Thread Andres Freund
Hi,

Over in [1] we're discussing the development of the pluggable storage
patchset, which allows different ways of storing table data.

One thing I'd like to discuss with a wider audience than the
implementation details is psql and pg_dump handling of table access
methods.

Currently the patchset neither dumps nor displays table access
methods . That's clearly not right.

The reason for that however is not that it's hard to dump/display
code-wise, but that to me the correct behaviour is not obvious.

The reason to make table storage pluggable is after all that the table
access method can be changed, and part of developing new table access
methods is being able to run the regression tests.

A patch at [2] adds display of a table's access method to \d+ - but that
means that running the tests with a different default table access
method (e.g. using PGOPTIONS='-c default_table_access_method=...)
there'll be a significant number of test failures, even though the test
results did not meaningfully differ.

Similarly, if pg_dump starts to dump table access methods either
unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
unimportant differences.

A third issue, less important in my opinion, is that specifying the
table access method means that it's harder to dump/restore into a table
with a different AM.


One way to solve this would be for psql/pg_dump to only define the table
access methods for tables that differ from the currently configured
default_table_access_method. That'd mean that only a few tests that
intentionally test AMs would display/dump the access method.  On the
other hand that seems like it's a bit too much magic.

While I don't like that option, I haven't really come up with something
better.  Having alternative outputs for nearly every test file for
e.g. zheap if/when we merge it, seems unmaintainable. It's less insane
for the pg_dump tests.

An alternative approach based on that would be to hack pg_regress to
magically ignore "Access method: ..." type differences, but that seems
like a bad idea to me.

Greetings,

Andres Freund

[1] https://postgr.es/m/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de
[2] 
https://postgr.es/m/ca+q6zcwmhsblkko7eq95t15m3r1x9fccm0kt3dgs2ggsro9...@mail.gmail.com
[3] https://postgr.es/m/20181215193700.nov7bphxyge4q...@alap3.anarazel.de



Re: Displaying and dumping of table access methods

2019-01-07 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> Over in [1] we're discussing the development of the pluggable storage
> patchset, which allows different ways of storing table data.
> 
> One thing I'd like to discuss with a wider audience than the
> implementation details is psql and pg_dump handling of table access
> methods.
> 
> Currently the patchset neither dumps nor displays table access
> methods . That's clearly not right.

Agreed.

> The reason for that however is not that it's hard to dump/display
> code-wise, but that to me the correct behaviour is not obvious.

While it might be a lot of changes to the regression output results, I
tend to feel that showng the access method is a very important aspect of
the relation and therefore should be in \d output.

> The reason to make table storage pluggable is after all that the table
> access method can be changed, and part of developing new table access
> methods is being able to run the regression tests.

We certainly want people to be able to run the regression tests, but it
feels like we will need more regression tests in the future as we wish
to cover both the built-in heap AM and the new zheap AM, so we should
really have those both run independently.  I don't think we'll be able
to have just one set of regression tests that cover everything
interesting for both, sadly.  Perhaps there's a way to have a set of
regression tests which are run for both, and another set that's run for
each, but building all of that logic is a fair bit of work and I'm not
sure how much it's really saving us.

> A patch at [2] adds display of a table's access method to \d+ - but that
> means that running the tests with a different default table access
> method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> there'll be a significant number of test failures, even though the test
> results did not meaningfully differ.

Yeah, I'm not really thrilled with this approach.

> Similarly, if pg_dump starts to dump table access methods either
> unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> unimportant differences.

In reality, pg_dump already depends on quite a few defaults to be in
place, so I don't see a particular issue with adding this into that set.
New tests would need to have new pg_dump checks, of course, but that's
generally the case as well.

> A third issue, less important in my opinion, is that specifying the
> table access method means that it's harder to dump/restore into a table
> with a different AM.

I understand this concern but I view it as an independent consideration.
There are a lot of transformations which one might wish for when dumping
and restoring data, a number of which are handled through various
options (--no-owner, --no-acls, etc) and it seems like we could do
something similar here.

> One way to solve this would be for psql/pg_dump to only define the table
> access methods for tables that differ from the currently configured
> default_table_access_method. That'd mean that only a few tests that
> intentionally test AMs would display/dump the access method.  On the
> other hand that seems like it's a bit too much magic.

I'm not a fan of depending on the currently set
default_table_access_method..  When would that be set in the pg_restore
process?  Or in the SQL script that's created?  Really though, that does
strike me as quite a bit of magic.

> While I don't like that option, I haven't really come up with something
> better.  Having alternative outputs for nearly every test file for
> e.g. zheap if/when we merge it, seems unmaintainable. It's less insane
> for the pg_dump tests.

I'm thinking less of alternative output files and more of additional
tests for zheap cases...

> An alternative approach based on that would be to hack pg_regress to
> magically ignore "Access method: ..." type differences, but that seems
> like a bad idea to me.

I agree, that's not a good idea.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2019-01-07 Thread Haribabu Kommi
On Tue, Dec 18, 2018 at 3:53 PM Amit Kapila  wrote:

> On Wed, Dec 12, 2018 at 3:54 PM Haribabu Kommi 
> wrote:
> >
> > On Thu, Nov 29, 2018 at 1:57 PM Amit Kapila 
> wrote:
> >> >
> >>
> >> Agreed.  Hari, can you change the patch as per the requirements of
> option-4.
> >
> >
> > Apologies for delay. Thanks to all your opinions.
> >
> > Attached is the updated patch as per the option-4 and added a test also
> to ensure
> > the function strictness.
> >
>
> Can you add a few examples in the documentation?  See if it is
> possible to extend the existing documentation section (F.29.4),
> otherwise, add a new section.
>

Apologies for the delay.
Attached the updated patch with addition of few examples usage
of pg_stat_statements_reset() function with new parameters to the
existing sample output.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Extend-pg_stat_statements_reset-to-reset-statistics_v15.patch
Description: Binary data


Re: Displaying and dumping of table access methods

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > Over in [1] we're discussing the development of the pluggable storage
> > patchset, which allows different ways of storing table data.
> > 
> > One thing I'd like to discuss with a wider audience than the
> > implementation details is psql and pg_dump handling of table access
> > methods.
> > 
> > Currently the patchset neither dumps nor displays table access
> > methods . That's clearly not right.
> 
> Agreed.
> 
> > The reason for that however is not that it's hard to dump/display
> > code-wise, but that to me the correct behaviour is not obvious.
> 
> While it might be a lot of changes to the regression output results, I
> tend to feel that showng the access method is a very important aspect of
> the relation and therefore should be in \d output.

I don't see how that'd be feasible.  Imo it is/was absolutely crucial
for zheap development to be able to use the existing postgres tests.


> > The reason to make table storage pluggable is after all that the table
> > access method can be changed, and part of developing new table access
> > methods is being able to run the regression tests.
> 
> We certainly want people to be able to run the regression tests, but it
> feels like we will need more regression tests in the future as we wish
> to cover both the built-in heap AM and the new zheap AM, so we should
> really have those both run independently.  I don't think we'll be able
> to have just one set of regression tests that cover everything
> interesting for both, sadly.  Perhaps there's a way to have a set of
> regression tests which are run for both, and another set that's run for
> each, but building all of that logic is a fair bit of work and I'm not
> sure how much it's really saving us.

I don't think there's any sort of contradiction here. I don't think it's
feasible to have tests tests for every feature duplicated for each
supported AM, we have enough difficulty maintaining our current
tests. But that doesn't mean it's a problem to have individual test
[files] run with an explicitly assigned AM - the test can just do a SET
default_table_access_method = zheap; or explicitly say USING zheap.

> > A patch at [2] adds display of a table's access method to \d+ - but that
> > means that running the tests with a different default table access
> > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > there'll be a significant number of test failures, even though the test
> > results did not meaningfully differ.
> 
> Yeah, I'm not really thrilled with this approach.
> 
> > Similarly, if pg_dump starts to dump table access methods either
> > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > unimportant differences.
> 
> In reality, pg_dump already depends on quite a few defaults to be in
> place, so I don't see a particular issue with adding this into that set.
> New tests would need to have new pg_dump checks, of course, but that's
> generally the case as well.

I am not sure what you mean here? Are you suggesting that there'd be a
separate set of pg_dump test for zheap and every other possible future
AM?


To me the approach you're suggesting is going to lead to an explosion of
redundant tests, which are really hard to maintain, especially for
out-of-tree AMs. Out of tree AMs with the setup I propose can just
install the AM into the template database and set PGOPTIONS, and they
have pretty good coverage.

Greetings,

Andres Freund



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
I wrote:
> Probably there's a lot to be criticized about the Perl style below;
> anybody feel a need to rewrite it?

Here's a somewhat better version.  I realized that I was being too
slavishly tied to the data structures used in the C version; in Perl
it's easier to manage the lists of edges as hashes.  I can't see any
need to distinguish left and right edge sets, either, so this just
has one such hash per vertex.

Also, it seems to me that we *can* make intelligent use of unused
hashtable entries to exit early on many non-keyword inputs.  The reason
the existing code fails to do so is that it computes the sums and
differences of hashtable entries in unsigned modulo arithmetic; but if
we make the hashtable entries signed, we can set them up as exact
differences and drop the final modulo operation in the hash function.
Then, any out-of-range sum must indicate an input that is not a keyword
(because it is touching a pair of hashtable entries that didn't go
together) and we can exit early from the caller.  This in turn lets us
mark unused hashtable entries with large values to ensure that sums
involving them will be out of range.

A weak spot in that argument is that it's not entirely clear how
large the differences can get --- with an unlucky series of
collisions, maybe they could get large enough to overflow int16?
I don't think that's likely for the size of problem this script
is going to encounter, so I just put in an error check for it.
But it could do with closer analysis before deciding that this
is a general-purpose solution.

regards, tom lane

diff --git a/src/common/kwlookup.c b/src/common/kwlookup.c
index d72842e..9dc1fee 100644
*** a/src/common/kwlookup.c
--- b/src/common/kwlookup.c
***
*** 35,94 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *text,
    const ScanKeywordList *keywords)
  {
! 	int			len,
! i;
! 	char		word[NAMEDATALEN];
! 	const char *kw_string;
! 	const uint16 *kw_offsets;
! 	const uint16 *low;
! 	const uint16 *high;
! 
! 	len = strlen(text);
  
  	if (len > keywords->max_kw_len)
! 		return -1;/* too long to be any keyword */
! 
! 	/* We assume all keywords are shorter than NAMEDATALEN. */
! 	Assert(len < NAMEDATALEN);
  
  	/*
! 	 * Apply an ASCII-only downcasing.  We must not use tolower() since it may
! 	 * produce the wrong translation in some locales (eg, Turkish).
  	 */
! 	for (i = 0; i < len; i++)
! 	{
! 		char		ch = text[i];
  
! 		if (ch >= 'A' && ch <= 'Z')
! 			ch += 'a' - 'A';
! 		word[i] = ch;
! 	}
! 	word[len] = '\0';
  
  	/*
! 	 * Now do a binary search using plain strcmp() comparison.
  	 */
! 	kw_string = keywords->kw_string;
! 	kw_offsets = keywords->kw_offsets;
! 	low = kw_offsets;
! 	high = kw_offsets + (keywords->num_keywords - 1);
! 	while (low <= high)
  	{
! 		const uint16 *middle;
! 		int			difference;
  
! 		middle = low + (high - low) / 2;
! 		difference = strcmp(kw_string + *middle, word);
! 		if (difference == 0)
! 			return middle - kw_offsets;
! 		else if (difference < 0)
! 			low = middle + 1;
! 		else
! 			high = middle - 1;
  	}
  
! 	return -1;
  }
--- 35,89 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *str,
    const ScanKeywordList *keywords)
  {
! 	size_t		len;
! 	int			h;
! 	const char *kw;
  
+ 	/*
+ 	 * Reject immediately if too long to be any keyword.  This saves useless
+ 	 * hashing and downcasing work on long strings.
+ 	 */
+ 	len = strlen(str);
  	if (len > keywords->max_kw_len)
! 		return -1;
  
  	/*
! 	 * Compute the hash function.  We assume it was generated to produce
! 	 * case-insensitive results.  Since it's a perfect hash, we need only
! 	 * match to the specific keyword it identifies.
  	 */
! 	h = keywords->hash(str, len);
  
! 	/*
! 	 * An out-of-range result implies no match.  (This can happen for
! 	 * non-keyword inputs because the hash function will sum two unrelated
! 	 * hashtable entries.)
! 	 */
! 	if (h < 0 || h >= keywords->num_keywords)
! 		return -1;
  
  	/*
! 	 * Compare character-by-character to see if we have a match, applying an
! 	 * ASCII-only downcasing to the input characters.  We must not use
! 	 * tolower() since it may produce the wrong translation in some locales
! 	 * (eg, Turkish).
  	 */
! 	kw = GetScanKeyword(h, keywords);
! 	while (*str != '\0')
  	{
! 		char		ch = *str++;
  
! 		if (ch >= 'A' && ch <= 'Z')
! 			ch += 'a' - 'A';
! 		if (ch != *kw++)
! 			return -1;
  	}
+ 	if (*kw != '\0')
+ 		return -1;
  
! 	/* Success! */
! 	return h;
  }
diff --git a/src/include/common/kwlookup.h b/src/include/common/kwlookup.h
index 39efb35..dbff367 100644
*** a/src/include/common/kwlookup.h
--- b/src/include/common/kwlookup.h
***
*** 14,19 
--- 14,22 
  #ifndef KWLOOKUP_H
  #define KWLOOKUP_H
  
+ /* Hash function used by ScanKeywordLookup */
+ typedef int (*ScanKeywordHashFunc) (const void *key, size_t keylen);
+ 
  /*
 

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Tom Lane
Andres Freund  writes:
> Hm, shouldn't we extract the perfect hash generation into a perl module
> or such? It seems that there's plenty other possible uses for it.

Such as?  But in any case, that sounds like a task for someone with
more sense of Perl style than I have.

regards, tom lane



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 19:37:51 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Hm, shouldn't we extract the perfect hash generation into a perl module
> > or such? It seems that there's plenty other possible uses for it.
>
> Such as?

Builtin functions for one, which we'd swatted down last time round due
to gperfs defficiencies. But I think there's plenty more potential,
e.g. it'd make sense from a performance POV to use a perfect hash
function for locks on builtin objects (the hashtable for lookups therein
shows up prominently in a fair number of profiles, and they are a large
percentage of the acquistions). I'm certain there's plenty more, I've
not though too much about it.


> But in any case, that sounds like a task for someone with
> more sense of Perl style than I have.

John, any chance you could help out with that... :)

Greetings,

Andres Freund



Re: btree.sgml typo?

2019-01-07 Thread Tatsuo Ishii
>> On Sat, Jan 5, 2019 at 1:35 AM Tatsuo Ishii  wrote:
>>>   PostgreSQL includes an implementation of the
>>>   standard btree (multi-way binary tree) index data
>>>   structure.
>>>
>>> I think the term "btree" here means "multi-way balanced tree", rather
>>> than "multi-way binary tree". In fact in our btree, there could be
>>> more than one key in a node. Patch attached.
>> 
>> +1 for applying this patch. The existing wording is highly confusing,
>> especially because many people already incorrectly think that a B-Tree
>> is just like a self-balancing binary search tree.
>> 
>> There is no consensus on exactly what the "b" actually stands for, but
>> it's definitely not "binary". I suppose that the original author meant
>> that a B-Tree is a generalization of a binary tree, which is basically
>> true -- though that's a very academic point.
> 
> Any objection for this? If not, I will commit the patch to master and
> REL_11_STABLE branches (btree.sgml first appeared in PostgreSQL 11).

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-07 Thread Michael Paquier
On Mon, Jan 07, 2019 at 10:32:20AM +0100, Christoph Berg wrote:
> Here's another revision that doesn't add an extra CXXOPT variable but
> just extends CXXFLAGS whenever COPT or PROFILE are set, which seems
> more usable.
> 
> It also updates the documentation.

The documentation is not fully updated.  It still lacks the
description for each new field.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 10:32:20 +0100, Christoph Berg wrote:
> Re: To Michael Paquier 2019-01-07 <20190107091734.ga1...@msg.credativ.de>
> > Updated patch attached.
> 
> Here's another revision that doesn't add an extra CXXOPT variable but
> just extends CXXFLAGS whenever COPT or PROFILE are set, which seems
> more usable.

Why does that seem more usable? How's that supposed to be used for flags
that aren't valid for both languages?

Greetings,

Andres Freund



Re: ALTER INDEX fails on partitioned index

2019-01-07 Thread Michael Paquier
On Mon, Jan 07, 2019 at 01:34:08PM -0600, Justin Pryzby wrote:
> I don't see any discussion regarding ALTER (?)
> 
> Actually, I ran into this while trying to set pages_per_range.
> But shouldn't it also work for fillfactor ?

Like ALTER TABLE, the take for ALTER INDEX is that we are still
lacking a ALTER INDEX ONLY flavor which would apply only to single
partitioned indexes instead of applying it down to a full set of
partitions below the partitioned entry on which the DDL is defined.
That would be useful for SET STATISTICS as well.  So Alvaro's decision
looks right to me as of what has been done in v11.
--
Michael


signature.asc
Description: PGP signature


Re: Displaying and dumping of table access methods

2019-01-07 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > Over in [1] we're discussing the development of the pluggable storage
> > > patchset, which allows different ways of storing table data.
> > > 
> > > One thing I'd like to discuss with a wider audience than the
> > > implementation details is psql and pg_dump handling of table access
> > > methods.
> > > 
> > > Currently the patchset neither dumps nor displays table access
> > > methods . That's clearly not right.
> > 
> > Agreed.
> > 
> > > The reason for that however is not that it's hard to dump/display
> > > code-wise, but that to me the correct behaviour is not obvious.
> > 
> > While it might be a lot of changes to the regression output results, I
> > tend to feel that showng the access method is a very important aspect of
> > the relation and therefore should be in \d output.
> 
> I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> for zheap development to be able to use the existing postgres tests.

I don't agree with the general assumption that "we did this for
development and therefore it should be done that way forever".

Instead, I would look at adding tests where there's a difference
between the two, or possibly some difference, and make sure that there
isn't, and make sure that the code paths are covered.

> > > The reason to make table storage pluggable is after all that the table
> > > access method can be changed, and part of developing new table access
> > > methods is being able to run the regression tests.
> > 
> > We certainly want people to be able to run the regression tests, but it
> > feels like we will need more regression tests in the future as we wish
> > to cover both the built-in heap AM and the new zheap AM, so we should
> > really have those both run independently.  I don't think we'll be able
> > to have just one set of regression tests that cover everything
> > interesting for both, sadly.  Perhaps there's a way to have a set of
> > regression tests which are run for both, and another set that's run for
> > each, but building all of that logic is a fair bit of work and I'm not
> > sure how much it's really saving us.
> 
> I don't think there's any sort of contradiction here. I don't think it's
> feasible to have tests tests for every feature duplicated for each
> supported AM, we have enough difficulty maintaining our current
> tests. But that doesn't mean it's a problem to have individual test
> [files] run with an explicitly assigned AM - the test can just do a SET
> default_table_access_method = zheap; or explicitly say USING zheap.

I don't mean to suggest that there's a contradiction.  I don't have any
problem with new tests being added which set the default AM to zheap, as
long as it's clear that such is happening for downstream tests.

> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> > 
> > Yeah, I'm not really thrilled with this approach.
> > 
> > > Similarly, if pg_dump starts to dump table access methods either
> > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > unimportant differences.
> > 
> > In reality, pg_dump already depends on quite a few defaults to be in
> > place, so I don't see a particular issue with adding this into that set.
> > New tests would need to have new pg_dump checks, of course, but that's
> > generally the case as well.
> 
> I am not sure what you mean here? Are you suggesting that there'd be a
> separate set of pg_dump test for zheap and every other possible future
> AM?

I'm suggesting that pg_dump would have additional tests for zheap, in
addition to the existing tests we already have.  No more, no less,
really.

> To me the approach you're suggesting is going to lead to an explosion of
> redundant tests, which are really hard to maintain, especially for
> out-of-tree AMs. Out of tree AMs with the setup I propose can just
> install the AM into the template database and set PGOPTIONS, and they
> have pretty good coverage.

I'm frankly much less interested in out-of-tree AMs as we aren't going
to have in-core regression tests for them anyway, because we can't as
they aren't in our tree and, ultimately, I don't find them to have
anywhere near the same value that in-core AMs have.

I don't have any problem with out-of-tree AMs hacking things up as they
see fit and then running whatever tests they want, but it is, and always
will be, a very different discussion and ultimately a different result
when we're talking about what will be incorporated and supported as part
of core.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: speeding up planning with partitions

2019-01-07 Thread Amit Langote
On 2019/01/07 23:13, Justin Pryzby wrote:
> The issue was this:
>>> It turns out 1050 open()s are due to historic data which is no longer being
>>> loaded and therefor never converted to relkind=p (but hasn't exceeded the
>>> retention interval so not yet DROPped, either).
> 
> So there's no evidence of any issue with the patch.

Ah, so by this you had meant that the historic data is still a old-style
(9.6-style) inheritance hierarchy, which gets folded under the UNION ALL
whose other children are new-style partitioned tables.

Thanks,
Amit




Re: Displaying and dumping of table access methods

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > > * Andres Freund (and...@anarazel.de) wrote:
> > > > Over in [1] we're discussing the development of the pluggable storage
> > > > patchset, which allows different ways of storing table data.
> > > >
> > > > One thing I'd like to discuss with a wider audience than the
> > > > implementation details is psql and pg_dump handling of table access
> > > > methods.
> > > >
> > > > Currently the patchset neither dumps nor displays table access
> > > > methods . That's clearly not right.
> > >
> > > Agreed.
> > >
> > > > The reason for that however is not that it's hard to dump/display
> > > > code-wise, but that to me the correct behaviour is not obvious.
> > >
> > > While it might be a lot of changes to the regression output results, I
> > > tend to feel that showng the access method is a very important aspect of
> > > the relation and therefore should be in \d output.
> >
> > I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> > for zheap development to be able to use the existing postgres tests.
>
> I don't agree with the general assumption that "we did this for
> development and therefore it should be done that way forever".
>
> Instead, I would look at adding tests where there's a difference
> between the two, or possibly some difference, and make sure that there
> isn't, and make sure that the code paths are covered.

I think this approach makes no sense whatsover.  It's entirely possible
to encounter bugs in table AM relevant code in places one would not
think so. But even if one were, foolishly, to exclude those, the pieces
of code we know are highly affected by the way the AM works are a a
significant (at the very least 10-20% of tests) percentage of our
tests. Duplicating them even just between heap and zheap would be a
major technical debt.  DML, ON CONFLICT, just about all isolationtester
test, etc all are absolutely crucial to test different AMs for
correctness.


> > To me the approach you're suggesting is going to lead to an explosion of
> > redundant tests, which are really hard to maintain, especially for
> > out-of-tree AMs. Out of tree AMs with the setup I propose can just
> > install the AM into the template database and set PGOPTIONS, and they
> > have pretty good coverage.
>
> I'm frankly much less interested in out-of-tree AMs as we aren't going
> to have in-core regression tests for them anyway, because we can't as
> they aren't in our tree and, ultimately, I don't find them to have
> anywhere near the same value that in-core AMs have.

I think you must be missing my point: Adding spurious differences due to
"Table Access Method: heap" vs "Table Access Method: blarg" makes it
unnecessarily hard to reuse the in-core tests for any additional AM, be
it in-core or out of core.  I fail to see what us not having explicit
tests for such AMs in core has to do with my point.

Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
better from a maintainability POV than including the AM in the output.

Andres



Re: A few new options for vacuumdb

2019-01-07 Thread Michael Paquier
On Mon, Jan 07, 2019 at 06:10:21PM +0900, Masahiko Sawada wrote:
> 0002 and 0003 are merged and posted by Michael-san and it looks good
> to me, so I've looked at the 0001, 0004, 0005 and 0006 patches. Here
> is a few review comments.

I have done another round on 0002/0003 (PQfinish was lacking after
error-ing on incompatible options) and committed the thing.

> Even if all tables are filtered by --min-relation-size, min-mxid-age
> or min-xid-age the following message appeared.
> 
> $ vacuumdb --verbose --min-relation-size 100 postgres
> vacuumdb: vacuuming database "postgres"
> 
> Since no tables are processed in this case isn't is better not to
> output this message by moving the message after checking if ntup ==
> 0?

Hmm.  My take on this one is that we attempt to vacuum relations on
this database, but this may finish by doing nothing.  Printing this
message is still important to let the user know that an attempt was
done so I would keep the message.

> You use pg_total_relation_size() to check the relation size but it
> might be better to use pg_relation_size() instead. The
> pg_total_relation_size() includes the all index size but I think it's
> common based on my experience that we check only the table size
> (including toast table) to do vacuum it or not.

Yes, I am also not completely sure if the way things are defined in
the patch are completely what we are looking for.  Still, I have seen
as well people complain about the total amount of space a table is
physically taking on disk, including its indexes.  So from the user
experience the direction taken by the patch makes sense, as much as
does the argument you do.
--
Michael


signature.asc
Description: PGP signature


Re: Displaying and dumping of table access methods

2019-01-07 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On 2019-01-07 19:19:46 -0500, Stephen Frost wrote:
> > > > * Andres Freund (and...@anarazel.de) wrote:
> > > > > Over in [1] we're discussing the development of the pluggable storage
> > > > > patchset, which allows different ways of storing table data.
> > > > >
> > > > > One thing I'd like to discuss with a wider audience than the
> > > > > implementation details is psql and pg_dump handling of table access
> > > > > methods.
> > > > >
> > > > > Currently the patchset neither dumps nor displays table access
> > > > > methods . That's clearly not right.
> > > >
> > > > Agreed.
> > > >
> > > > > The reason for that however is not that it's hard to dump/display
> > > > > code-wise, but that to me the correct behaviour is not obvious.
> > > >
> > > > While it might be a lot of changes to the regression output results, I
> > > > tend to feel that showng the access method is a very important aspect of
> > > > the relation and therefore should be in \d output.
> > >
> > > I don't see how that'd be feasible.  Imo it is/was absolutely crucial
> > > for zheap development to be able to use the existing postgres tests.
> >
> > I don't agree with the general assumption that "we did this for
> > development and therefore it should be done that way forever".
> >
> > Instead, I would look at adding tests where there's a difference
> > between the two, or possibly some difference, and make sure that there
> > isn't, and make sure that the code paths are covered.
> 
> I think this approach makes no sense whatsover.  It's entirely possible
> to encounter bugs in table AM relevant code in places one would not
> think so. But even if one were, foolishly, to exclude those, the pieces
> of code we know are highly affected by the way the AM works are a a
> significant (at the very least 10-20% of tests) percentage of our
> tests. Duplicating them even just between heap and zheap would be a
> major technical debt.  DML, ON CONFLICT, just about all isolationtester
> test, etc all are absolutely crucial to test different AMs for
> correctness.

I am generally on board with minimizing the amount of duplicate code
that we have, but we must run those tests independently, so it's really
a question of if we build a system where we can parametize a set of
tests to run and then run them for every AM and compare the output to
either the generalized output or the per-AM output, or if we build on
the existing system and simply have an independent set of tests.  It's
not clear to me, at the current point, which will be the lower level of
ongoing effort, but when it comes to the effort required today, it seems
pretty clear to me that whacking around the current test environment to
rerun tests is a larger amount of effort.  If that's the requirement,
then so be it and I'm on-board, but I'm also open to considering a
lesser requirement for a completely new feature.

> > > To me the approach you're suggesting is going to lead to an explosion of
> > > redundant tests, which are really hard to maintain, especially for
> > > out-of-tree AMs. Out of tree AMs with the setup I propose can just
> > > install the AM into the template database and set PGOPTIONS, and they
> > > have pretty good coverage.
> >
> > I'm frankly much less interested in out-of-tree AMs as we aren't going
> > to have in-core regression tests for them anyway, because we can't as
> > they aren't in our tree and, ultimately, I don't find them to have
> > anywhere near the same value that in-core AMs have.
> 
> I think you must be missing my point: Adding spurious differences due to
> "Table Access Method: heap" vs "Table Access Method: blarg" makes it
> unnecessarily hard to reuse the in-core tests for any additional AM, be
> it in-core or out of core.  I fail to see what us not having explicit
> tests for such AMs in core has to do with my point.

I don't think I'm missing your point.  If you believe that we should be
swayed by this argument into agreeing to change what we believe the
user-facing psql \d output should be, then I am very hopeful that you're
completely wrong.  The psql \d output should be driven by what will be
best for our users, not by what's best by external AMs or, really,
anything else.

> Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> better from a maintainability POV than including the AM in the output.

I'm pretty sure I said in my last reply that I'm alright with psql and
pg_dump not outputting a result for the default value, provided the
default is understood to always really be the default, but, again, what
we should be concerned about here is what the end user is dealing with
and I'm not particularly incensed to support something different even if
it's around a variable of some kind for external AM

Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Michael Paquier
On Mon, Jan 07, 2019 at 05:28:27PM +0900, Amit Langote wrote:
> On 2019/01/07 16:35, Michael Paquier wrote:
>> It seems to me that we may want something more like:
>> Primary: "could not use \"%s.%s\" as logical replication target".
>> Detail: "Relation %s.%s is a foreign table", "not a table", etc.
> 
> I've thought about that before and I tend to agree with you.  Maybe:
> 
> ERROR: cannot use "%s.%s" as logical replication target
> DETAIL: Using partitioned tables as logical replication target is not
> supported.
> 
> Sounds a bit repetitive, but perhaps it's better to use the words "not
> supported" in the DETAIL message.

Or the detailed message could just say "\"%s.%s\" is a foreign table"
and such flavor for other relkinds?  It is redundant to repeat
"logical replication target" for both message parts.  The primary
message to use "cannot" instead of "could" is much better, so that
part sounds fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: Displaying and dumping of table access methods

2019-01-07 Thread Andres Freund
Hi,

On 2019-01-07 21:08:58 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > On 2019-01-07 20:30:13 -0500, Stephen Frost wrote:
> > > I don't agree with the general assumption that "we did this for
> > > development and therefore it should be done that way forever".
> > >
> > > Instead, I would look at adding tests where there's a difference
> > > between the two, or possibly some difference, and make sure that there
> > > isn't, and make sure that the code paths are covered.
> >
> > I think this approach makes no sense whatsover.  It's entirely possible
> > to encounter bugs in table AM relevant code in places one would not
> > think so. But even if one were, foolishly, to exclude those, the pieces
> > of code we know are highly affected by the way the AM works are a a
> > significant (at the very least 10-20% of tests) percentage of our
> > tests. Duplicating them even just between heap and zheap would be a
> > major technical debt.  DML, ON CONFLICT, just about all isolationtester
> > test, etc all are absolutely crucial to test different AMs for
> > correctness.
>
> I am generally on board with minimizing the amount of duplicate code
> that we have, but we must run those tests independently, so it's really
> a question of if we build a system where we can parametize a set of
> tests to run and then run them for every AM and compare the output to
> either the generalized output or the per-AM output, or if we build on
> the existing system and simply have an independent set of tests.  It's
> not clear to me, at the current point, which will be the lower level of
> ongoing effort

Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
again with a different default AM. That's precisely why I'm talking
about this. Just setting PGOPTIONS='-c
default_table_access_method=zheap' in the new makefile target (the ms
run scripts are similar) is sufficient.  And we don't need to force
everyone to constantly run tests with e.g. both heap and zheap, it's
sufficient to do so on a few buildfarm machines, and whenever changing
AM level code.  Rerunning all the tests with a different AM is just
setting the same environment variable, but running check-world as the
target.

Obviously that does not preclude a few tests that explicitly test
features specific to an AM. E.g. the zheap branch's tests have an
explicit zheap regression file and a few zheap specific isolationtester
spec files that a) test zheap specific beheaviour b) make sure that the
most basic zheap behaviour is tested even when not running the tests
with zheap as the default AM.

And even if you were to successfully argue that it's sufficient during
normal development to only have a few zheap specific additional tests,
we'd certainly want to make it possible to occasionally explicitly run
the rest of the tests under zheap to see whether additional stuff has
been broken - and that's much harder to sift through if there's a lot of
spurious test failures due to \d[+] outputting additional/differing
data.


> ..., but when it comes to the effort required today, it seems pretty
> clear to me that whacking around the current test environment to rerun
> tests is a larger amount of effort.

How did you come to that conclusion? Adding a makefile and vcregress.pl
target is pretty trivial.


> > Even just having a psql variable that says HIDE_NONDEFAULT_TABLE_AMS or
> > HIDE_TABLE_AMS that's set by default by pg_regress would be *vastly*
> > better from a maintainability POV than including the AM in the output.
>
> I'm pretty sure I said in my last reply that I'm alright with psql and
> pg_dump not outputting a result for the default value

I don't see that anywhere in your replies.  Are you referring to:

> I don't have any problem with new tests being added which set the
> default AM to zheap, as long as it's clear that such is happening for
> downstream tests.

? If so, that's not addressing the reason why I'm suggesting to have
something like HIDE_TABLE_AMS. The point is that that'd allow us to
cater the default psql output to humans, while still choosing not to
display the AM for regression tests (thereby allowing to run the tests).


> provided the default is understood to always really be the default

What do you mean by that? Are you arguing that it should be impossible
in test scenarios to override default_table_access_method? Or that
pg_dump/psql should check for a hardcoded 'heap' AM (via a macro or
whatnot)?


> I'm also a bit confused as to why we are spending a good bit of time
> arguing about external AMs without any discussion or definition of what
> they are or what their requirements are.  If such things seriously
> exist, then let us talk about them and try to come up with solutions for
> them; if they don't, then we can talk about them when they come up.

We are working seriously hard on making AMs pluggable. Zheap is not yet,
and won't be that soon, part of core. The concerns for an in-core zheap
(which ne

Re: could recovery_target_timeline=latest be the default in standby mode?

2019-01-07 Thread Michael Paquier
On Mon, Dec 31, 2018 at 01:21:00PM +0200, David Steele wrote:
> This patch looks good to me.

(Sorry for the delay here)
0001 looks fine to me.

-to the latest timeline found in the archive, which is useful in
-a standby server. Other than that you only need to set this parameter
+to the latest timeline found in the archive.  That is the default.
+   
Isn't it useful to still mention that the default is useful especially
for standby servers?

-the WAL archive. If you plan to have multiple standby servers for high
-availability purposes, set recovery_target_timeline to
-latest, to make the standby server follow the timeline 
change
-that occurs at failover to another standby.
+the WAL archive.
I think that we should still keep this recommendation as well, as well
as the one below.

-   RecoveryTargetTimeLineGoal rttg = RECOVERY_TARGET_TIMELINE_CONTROLFILE;
+   RecoveryTargetTimeLineGoal rttg;
Good to remove this noise.

> Yes, that's exactly what I was thinking.

Agreed.

> There don't seem to be any tests for recovery_target_timeline=current. This
> is an preexisting condition but it probably wouldn't hurt to add one.

Yes, I got to wonder while looking at this patch why we don't have one
yet in 003_recovery_targets.pl.  That's easy enough to do thanks to
the extra rows inserted after doing the stuff for the LSN-based
restart point, and attached is a patch to add the test.  Peter, could
you merge it with 0001?  I am fine to take care of that myself if
necessary.
--
Michael
diff --git a/src/test/recovery/t/003_recovery_targets.pl b/src/test/recovery/t/003_recovery_targets.pl
index b46b318f5a..fe59221b57 100644
--- a/src/test/recovery/t/003_recovery_targets.pl
+++ b/src/test/recovery/t/003_recovery_targets.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 8;
+use Test::More tests => 9;
 
 # Create and test a standby from given backup, with a certain recovery target.
 # Choose $until_lsn later than the transaction commit that causes the row
@@ -92,8 +92,11 @@ $node_master->safe_psql('postgres',
 my $lsn5 = my $recovery_lsn =
   $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
 
+# Rows used to restore up to the end of current timeline.
 $node_master->safe_psql('postgres',
 	"INSERT INTO tab_int VALUES (generate_series(5001,6000))");
+my $lsn6 =
+  $node_master->safe_psql('postgres', "SELECT pg_current_wal_lsn()");
 
 # Force archiving of WAL file
 $node_master->safe_psql('postgres', "SELECT pg_switch_wal()");
@@ -114,6 +117,9 @@ test_recovery_standby('name', 'standby_4', $node_master, \@recovery_params,
 @recovery_params = ("recovery_target_lsn = '$recovery_lsn'");
 test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
 	"5000", $lsn5);
+@recovery_params = ("recovery_target_timeline = 'current'");
+test_recovery_standby('LSN', 'standby_6', $node_master, \@recovery_params,
+	"6000", $lsn6);
 
 # Multiple targets
 #
@@ -126,9 +132,9 @@ test_recovery_standby('LSN', 'standby_5', $node_master, \@recovery_params,
"recovery_target_name = ''",
"recovery_target_time = '$recovery_time'");
 test_recovery_standby('multiple overriding settings',
-   'standby_6', $node_master, \@recovery_params, "3000", $lsn3);
+   'standby_7', $node_master, \@recovery_params, "3000", $lsn3);
 
-my $node_standby = get_new_node('standby_7');
+my $node_standby = get_new_node('standby_8');
 $node_standby->init_from_backup($node_master, 'my_backup', has_restoring => 1);
 $node_standby->append_conf('postgresql.conf', "recovery_target_name = '$recovery_name'
 recovery_target_time = '$recovery_time'");


signature.asc
Description: PGP signature


Re: Displaying and dumping of table access methods

2019-01-07 Thread Michael Paquier
On Mon, Jan 07, 2019 at 06:31:52PM -0800, Andres Freund wrote:
> Huh? It's absolutely *trivial* from a buildsystem POV to run the tests
> again with a different default AM. That's precisely why I'm talking
> about this. Just setting PGOPTIONS='-c
> default_table_access_method=zheap' in the new makefile target (the ms
> run scripts are similar) is sufficient.  And we don't need to force
> everyone to constantly run tests with e.g. both heap and zheap, it's
> sufficient to do so on a few buildfarm machines, and whenever changing
> AM level code.  Rerunning all the tests with a different AM is just
> setting the same environment variable, but running check-world as the
> target.

Another point is that having default_table_access_method facilitates
the restore of tables across AMs similarly to tablespaces, so CREATE
TABLE dumps should not include the AM part.

> And even if you were to successfully argue that it's sufficient during
> normal development to only have a few zheap specific additional tests,
> we'd certainly want to make it possible to occasionally explicitly run
> the rest of the tests under zheap to see whether additional stuff has
> been broken - and that's much harder to sift through if there's a lot of
> spurious test failures due to \d[+] outputting additional/differing
> data.

The specific-heap tests could be included as an extra module in
src/test/modules easily, so removing from the main tests what is not
completely transparent may make sense.  Most users use make-check to
test a patch quickly, so we could miss some bugs because of that
during review.  Still, people seem to be better-educated lately in the
fact that they need to do an actual check-world when checking a patch
at full.  So personally I can live with a split where it makes sense.
Being able to easily validate an AM implementation would be nice.
Isolation tests may be another deal though for DMLs.

> We are working seriously hard on making AMs pluggable. Zheap is not yet,
> and won't be that soon, part of core. The concerns for an in-core zheap
> (which needs to maintain the test infrastructure during the remainder of
> its out-of-core development!) and out-of-core AMs are pretty aligned.  I
> don't get your confusion.

I would imagine that a full-fledged AM should be able to maintain
compatibility with the full set of queries that heap is able to
support, so if you can make the tests transparent enough so as they
can be run for any AMs without alternate input in the core tree, then
that's a goal worth it.  Don't you have plan inconsistencies as well
with zheap?

In short, improving portability incrementally is good for the
long-term prospective.  From that point of view adding the AM to \d+
output may be a bad idea, as there are modules out of core which 
rely on psql meta-commands, and it would be nice to be able to test
those tests as well for those plugins with different types of AMs.
--
Michael


signature.asc
Description: PGP signature


Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Amit Langote
On 2019/01/08 11:10, Michael Paquier wrote:
> On Mon, Jan 07, 2019 at 05:28:27PM +0900, Amit Langote wrote:
>> On 2019/01/07 16:35, Michael Paquier wrote:
>>> It seems to me that we may want something more like:
>>> Primary: "could not use \"%s.%s\" as logical replication target".
>>> Detail: "Relation %s.%s is a foreign table", "not a table", etc.
>>
>> I've thought about that before and I tend to agree with you.  Maybe:
>>
>> ERROR: cannot use "%s.%s" as logical replication target
>> DETAIL: Using partitioned tables as logical replication target is not
>> supported.
>>
>> Sounds a bit repetitive, but perhaps it's better to use the words "not
>> supported" in the DETAIL message.
> 
> Or the detailed message could just say "\"%s.%s\" is a foreign table"
> and such flavor for other relkinds?  It is redundant to repeat
> "logical replication target" for both message parts.
Yeah, I think so too.  I also noticed that the patch uses
ERRCODE_WRONG_OBJECT_TYPE as the error code,  whereas we may want to use
ERRCODE_FEATURE_NOT_SUPPORTED.  Thoughts on that?

Attached updated patch, which changes the detail message text as you
suggest and updates the error code.

Thanks,
Amit
diff --git a/src/backend/executor/execReplication.c 
b/src/backend/executor/execReplication.c
index 5bd3bbc35e..c95153b463 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -609,8 +609,24 @@ CheckSubscriptionRelkind(char relkind, const char *nspname,
 const char *relname)
 {
/*
-* We currently only support writing to regular tables.
+* We currently only support writing to regular tables.  However, give
+* a more specific error for partitioned and foreign tables.
 */
+   if (relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot use \"%s.%s\" as logical 
replication target",
+   nspname, relname),
+errdetail("\"%s.%s\" is a partitioned table.",
+   nspname, relname)));
+   else if (relkind == RELKIND_FOREIGN_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot use \"%s.%s\" as logical 
replication target",
+   nspname, relname),
+errdetail("\"%s.%s\" is a foreign table.",
+   nspname, relname)));
+
if (relkind != RELKIND_RELATION)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),


Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Michael Paquier
On Tue, Jan 08, 2019 at 01:13:11PM +0900, Amit Langote wrote:
> Yeah, I think so too.  I also noticed that the patch uses
> ERRCODE_WRONG_OBJECT_TYPE as the error code,  whereas we may want to use
> ERRCODE_FEATURE_NOT_SUPPORTED.  Thoughts on that?

ERRCODE_WRONG_OBJECT_TYPE is the right call I think as the feature is
actually supported, just not for all the object types.

> Attached updated patch, which changes the detail message text as you
> suggest and updates the error code.

Another suggestion I would have is also to change the third message of
CheckSubscriptionRelkind() so as its style maps the two others you are
adding, as what's on HEAD is not a model of its kind with its
formulation using a full sentence.
--
Michael


signature.asc
Description: PGP signature


Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Amit Langote
On 2019/01/08 13:44, Michael Paquier wrote:
> On Tue, Jan 08, 2019 at 01:13:11PM +0900, Amit Langote wrote:
>> Yeah, I think so too.  I also noticed that the patch uses
>> ERRCODE_WRONG_OBJECT_TYPE as the error code,  whereas we may want to use
>> ERRCODE_FEATURE_NOT_SUPPORTED.  Thoughts on that?
> 
> ERRCODE_WRONG_OBJECT_TYPE is the right call I think as the feature is
> actually supported, just not for all the object types.

I meant to say that the feature to add relations to a subscription is not
supported for certain relation types, even though it's practically
*possible* to do so.  But maybe, I'm misunderstanding the error code
selection policy.

>> Attached updated patch, which changes the detail message text as you
>> suggest and updates the error code.
> 
> Another suggestion I would have is also to change the third message of
> CheckSubscriptionRelkind() so as its style maps the two others you are
> adding, as what's on HEAD is not a model of its kind with its
> formulation using a full sentence.

I'm not totally opposed to do that while we're here, but note that there
might be many such instances in the existing code.

Thanks,
Amit




GUC parameters accepts values in both octal and hexadecimal formats

2019-01-07 Thread Haribabu Kommi
Hi Hackers,

The Server GUC parameters accepts values in both Octal and hexadecimal
formats also.

postgres=# set max_parallel_workers_per_gather='0x10';
postgres=# show max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
-
 16

postgres=# set max_parallel_workers_per_gather='010';
postgres=# show max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
-
 8

I can check that this behavior exists for quite some time, but I am not
able to find any documentation related to it? Can some one point me to
relevant section where it is available? If not exists, is it fine to add it?

Regards,
Haribabu Kommi
Fujitsu Australia


OpenBSD versus semaphores

2019-01-07 Thread Tom Lane
I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.

So I looked around for an alternative, and found out that modern
OpenBSD releases support named POSIX semaphores (though not unnamed
ones, at least not shared unnamed ones).  What's more, it appears that
in this implementation, named semaphores don't eat open file descriptors
as they do on macOS, removing our major objection to using them.

I don't have any OpenBSD installation on hardware that I'd take very
seriously for performance testing, but some light testing with
"pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
has just about the same performance as a build with SysV semaphores.

This all leads to the thought that maybe we should be selecting
PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD.  At the very least,
our docs ought to recommend it as a credible alternative for
people who don't want to get into building custom kernels.

I've checked that this works back to OpenBSD 6.0, and scanning
their man pages suggests that the feature appeared in 5.5.
5.5 isn't that old (2014) so possibly people are still running
older versions, but we could easily put in version-specific
default logic similar to what's in src/template/darwin.

Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/e6ecf989-9d5a-9dc5-12de-96985b6e5a5f%40mksoft.nu



Re: could recovery_target_timeline=latest be the default in standby mode?

2019-01-07 Thread David Steele

On 1/8/19 5:37 AM, Michael Paquier wrote:


-to the latest timeline found in the archive, which is useful in
-a standby server. Other than that you only need to set this parameter
+to the latest timeline found in the archive.  That is the default.
+   
Isn't it useful to still mention that the default is useful especially
for standby servers?


Agreed.


-the WAL archive. If you plan to have multiple standby servers for high
-availability purposes, set recovery_target_timeline to
-latest, to make the standby server follow the timeline 
change
-that occurs at failover to another standby.
+the WAL archive.
I think that we should still keep this recommendation as well, as well
as the one below.


Agreed.




There don't seem to be any tests for recovery_target_timeline=current. This
is an preexisting condition but it probably wouldn't hurt to add one.


Yes, I got to wonder while looking at this patch why we don't have one
yet in 003_recovery_targets.pl.  That's easy enough to do thanks to
the extra rows inserted after doing the stuff for the LSN-based
restart point, and attached is a patch to add the test.  Peter, could
you merge it with 0001?  I am fine to take care of that myself if
necessary.


The new test looks good.

Regards,
--
-David
da...@pgmasters.net



Re: OpenBSD versus semaphores

2019-01-07 Thread Thomas Munro
On Tue, Jan 8, 2019 at 7:14 PM Tom Lane  wrote:
> I've been toying with OpenBSD lately, and soon noticed a seriously
> annoying problem for running Postgres on it: by default, its limits
> for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
> greatly constrain the number of connections for a single installation,
> it means that our TAP tests fail because you can't start two postmasters
> concurrently (cf [1]).
>
> Raising the annoyance factor considerably, AFAICT the only way to
> increase these settings is to build your own custom kernel.
>
> So I looked around for an alternative, and found out that modern
> OpenBSD releases support named POSIX semaphores (though not unnamed
> ones, at least not shared unnamed ones).  What's more, it appears that
> in this implementation, named semaphores don't eat open file descriptors
> as they do on macOS, removing our major objection to using them.
>
> I don't have any OpenBSD installation on hardware that I'd take very
> seriously for performance testing, but some light testing with
> "pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
> has just about the same performance as a build with SysV semaphores.
>
> This all leads to the thought that maybe we should be selecting
> PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD.  At the very least,
> our docs ought to recommend it as a credible alternative for
> people who don't want to get into building custom kernels.
>
> I've checked that this works back to OpenBSD 6.0, and scanning
> their man pages suggests that the feature appeared in 5.5.
> 5.5 isn't that old (2014) so possibly people are still running
> older versions, but we could easily put in version-specific
> default logic similar to what's in src/template/darwin.
>
> Thoughts?

No OpenBSD here, but I was curious enough to peek at their
implementation.  Like others, they create a tiny file under /tmp for
each one, mmap() and close the fd straight away.  Apparently don't
support shared sem_init() yet (EPERM).  So your plan seems good to me.
CC'ing Pierre-Emmanuel (OpenBSD PostgreSQL port maintainer) in case he
is interested.

Wild speculation:  I wouldn't be surprised if POSIX named semas
perform better than SysV semas on a large enough system, since they'll
live on different pages.  At a glance, their sys_semget apparently
allocates arrays of struct sem without padding and I think they
probably get about 4 to a cacheline; see our experience with an 8
socket box leading to commit 2d306759 where we added our own padding.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Improve selectivity estimate for range queries

2019-01-07 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 21 Dec 2018 11:50:28 -0500, Tom Lane  wrote in 
<28533.1545411...@sss.pgh.pa.us>
> "Yuzuko Hosoya"  writes:
> > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> >> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" 
> >>  wrote in
> >> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
> >>> To handle such cases I've thought up of an idea based on a previous
> >>> commit[1] which solved a similar problem related to
> >>> DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
> >>> which shows whether the selectivity
> 
> >> The commit [1] added almost no complexity, but this does. Affects many
> >> functions to introduce tighter coupling between operator-selectivity
> >> functions and clause selectivity functions. isdefault travells alone
> >> too long just to bind remote functions. We might need more pricipled
> >> way to handle that.
> 
> Yeah, I don't think that this approach is a reasonable tradeoff to
> eliminate a narrow case.  In particular, what's been done here to
> clause_selectivity is totally unprincipled and poorly thought out:
> you can't add an output parameter that's set in only some cases.
> Yet, there's no good way to decide how to set it in many cases.
> For instance, what would you do for an AND or OR where some of
> the branches are default estimates and some not?
> 
> >> Around the [1], there was a discussion about introducing the notion of
> >> uncertainty to selectivity.  The isdefualt is a kind of uncertainty
> >> value indicating '0/100% uncertain'. So my feeling is saying that
> >> it's better that Selectivity has certinty component then building a
> >> selectivity arithmetics involving uncertainty, though I don't have a
> >> clear picture.
> 
> > I see.  Indeed if we would change Selectivity so that it includes 
> > information about default/non-default, such problems I mentioned 
> > would be solved.  But I'm afraid that this modification would lead
> > to a big impact, since there are lots of functions that calculate
> > selectivity.  I also don't have a concreate idea, so I will think 
> > about it.  Besides that, I'd like to ask community whether such
> > modification would be acceptable or not.
> 
> The planner definitely has a lot of problems that could be addressed
> with some sort of uncertainty framework ... but it'd be a huge
> undertaking, which is why nobody's tried yet.

Sure.

> A smaller-footprint way to fix the immediate problem might be to
> change the values of DEFAULT_INEQ_SEL and friends so that they're
> even less likely to be matched by accident.  That is, instead of
> 0., use 0.342 or some such.  It might

Sounds reasonable.

> seem that that's just moving the problem around, but I think it
> might be possible to show that such a value couldn't be computed
> by scalarltsel given a histogram with no more than 1 members.
> (I haven't tried to actually prove that, but it seems intuitive
> that the set of possible results would be quantized with no more
> than about 5 digits precision.)

FWIW, I got the following result on my environment. It seems
different enough if this holds on all supported platforms, though
there still is a case where the result of a sequence of
arithmetics makes false match.

x = 0.1
   d = 1.00e+30 match
   d = 1.00e+31: 0.00 match
x = 0.34197
   d = 0.00e+00 match
   d = 1.00e+00: 0.00 match

 t.c
#include 
#include 

int test(double x)
{
  double d = 1.0;
  double d0 = 0;

  fprintf(stderr, "x = %19.17f\n", x);
  while (d != d0)
  {
double z = floor(d * 3);
double z1 = z + 1.0;
double y = d / z;
double y1 = d / z1;

/* Check if both sides of d * 3 doesn't make match */
if (y != x && y1 != x)
{
  fprintf(stderr, "   d = %e match\n", d0);
  fprintf(stderr, "   d = %e: %f match\n", d);
  return 0;
}

d0 = d;
d = d * 10;
  }
}

int main(void)
{
  test(0.);
  test(0.342);

  return 0;
}


-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: OpenBSD versus semaphores

2019-01-07 Thread Tom Lane
Thomas Munro  writes:
> On Tue, Jan 8, 2019 at 7:14 PM Tom Lane  wrote:
>> So I looked around for an alternative, and found out that modern
>> OpenBSD releases support named POSIX semaphores (though not unnamed
>> ones, at least not shared unnamed ones).  What's more, it appears that
>> in this implementation, named semaphores don't eat open file descriptors
>> as they do on macOS, removing our major objection to using them.

> No OpenBSD here, but I was curious enough to peek at their
> implementation.  Like others, they create a tiny file under /tmp for
> each one, mmap() and close the fd straight away.

Oh, yeah, I can see a bunch of tiny mappings with procmap.  I wonder
whether that scales any better than an open FD per semaphore, when
it comes to forking a bunch of child processes that will inherit
all those mappings or FDs.  I've not tried to benchmark child process
launch as such --- as I said, I'm not running this on hardware that
would support serious benchmarking.

BTW, I just finished finding out that recent NetBSD (8.99.25) has
working code paths for *both* named and unnamed POSIX semaphores.
However, it appears that both code paths involve an open FD per
semaphore, so it's likely not something we want to recommend using.

regards, tom lane



Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-07 Thread Michael Paquier
On Tue, Jan 08, 2019 at 03:05:42PM +0900, Amit Langote wrote:
> I meant to say that the feature to add relations to a subscription is not
> supported for certain relation types, even though it's practically
> *possible* to do so.  But maybe, I'm misunderstanding the error code
> selection policy.

I can see your point, though I would stick with
ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and
because the code is intended to not work on anything else than plain
tables, at least now.

> I'm not totally opposed to do that while we're here, but note that there
> might be many such instances in the existing code.

Sure.  I am not noticing anything in the surrounding area but it is
easy enough to miss something.
--
Michael


signature.asc
Description: PGP signature


Re: Improve selectivity estimate for range queries

2019-01-07 Thread Kyotaro HORIGUCHI
Mmm.

At Tue, 08 Jan 2019 16:26:38 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190108.162638.106314087.horiguchi.kyot...@lab.ntt.co.jp>
> FWIW, I got the following result on my environment. It seems
> different enough if this holds on all supported platforms, though
> there still is a case where the result of a sequence of
> arithmetics makes false match.
> 
> x = 0.1
>d = 1.00e+30 match
>d = 1.00e+31: 0.00 match

Of course the "match" in the last line above is a mistake of "NOT
match".


-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: OpenBSD versus semaphores

2019-01-07 Thread Mikael Kjellström



On 2019-01-08 07:14, Tom Lane wrote:

I've been toying with OpenBSD lately, and soon noticed a seriously
annoying problem for running Postgres on it: by default, its limits
for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
greatly constrain the number of connections for a single installation,
it means that our TAP tests fail because you can't start two postmasters
concurrently (cf [1]).

Raising the annoyance factor considerably, AFAICT the only way to
increase these settings is to build your own custom kernel.


You don't need to build your custom kernel to change those settings.

Just add:

kern.seminfo.semmni=20

to /etc/sysctl.conf and reboot

/Mikael



Re: OpenBSD versus semaphores

2019-01-07 Thread Abel Abraham Camarillo Ojeda
On Tue, Jan 8, 2019 at 12:14 AM Tom Lane  wrote:

> I've been toying with OpenBSD lately, and soon noticed a seriously
> annoying problem for running Postgres on it: by default, its limits
> for SysV semaphores are only SEMMNS=60, SEMMNI=10.  Not only does that
> greatly constrain the number of connections for a single installation,
> it means that our TAP tests fail because you can't start two postmasters
> concurrently (cf [1]).
>
> Raising the annoyance factor considerably, AFAICT the only way to
> increase these settings is to build your own custom kernel.
>

This is not accurate, you can change this values via sysctl(1), extracted
from OpenBSD postgresql port:

Tuning for busy servers

===
The default sizes in the GENERIC kernel for SysV semaphores are only
just large enough for a database with the default configuration
(max_connections 40) if no other running processes use semaphores.
In other cases you will need to increase the limits. Adding the
following in /etc/sysctl.conf will be reasonable for many systems:

kern.seminfo.semmni=60
kern.seminfo.semmns=1024

To serve a large number of connections (>250), you may need higher
values for the above.



http://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/databases/postgresql/pkg/README-server?rev=1.25&content-type=text/plain


> So I looked around for an alternative, and found out that modern
> OpenBSD releases support named POSIX semaphores (though not unnamed
> ones, at least not shared unnamed ones).  What's more, it appears that
> in this implementation, named semaphores don't eat open file descriptors
> as they do on macOS, removing our major objection to using them.
>
> I don't have any OpenBSD installation on hardware that I'd take very
> seriously for performance testing, but some light testing with
> "pgbench -S" suggests that a build with PREFERRED_SEMAPHORES=NAMED_POSIX
> has just about the same performance as a build with SysV semaphores.
>
> This all leads to the thought that maybe we should be selecting
> PREFERRED_SEMAPHORES=NAMED_POSIX on OpenBSD.  At the very least,
> our docs ought to recommend it as a credible alternative for
> people who don't want to get into building custom kernels.
>
> I've checked that this works back to OpenBSD 6.0, and scanning
> their man pages suggests that the feature appeared in 5.5.
> 5.5 isn't that old (2014) so possibly people are still running
> older versions, but we could easily put in version-specific
> default logic similar to what's in src/template/darwin.
>
> Thoughts?
>
> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/e6ecf989-9d5a-9dc5-12de-96985b6e5a5f%40mksoft.nu
>
>