Re: [HACKERS] Small fix: avoid passing null pointers to memcpy()

2019-05-26 Thread didier
Hi,

On Fri, May 24, 2019 at 6:35 PM Tom Lane  wrote:


> These seem to be down to use of AnyArrayType:
>
> typedef union AnyArrayType
> {
> ArrayType   flt;
> ExpandedArrayHeader xpn;
> } AnyArrayType;
>
> ASAN seems to believe that use of this union entitles the compiler to
> assume 8-byte alignment even when touching fields of a short-header

In my understanding  union has to be aligned to ExpandedArrayHeader (8
bytes) or it's a UB.

On x64 it could be an issue if AnyArrayType alignment is less than 4,
sse is enable and suddenly compiler choose to use sse instructions
with 16 bytes requirement then compiler may not emit the right code.

It's not rhetorical, big surprise first time you get a SIGBUS signal,
or a SIGFPE doing integer math, on x64.

Regards
Didier




Re: Read-only access to temp tables for 2PC transactions

2019-05-26 Thread Simon Riggs
On Fri, 24 May 2019 at 18:09, Andres Freund  wrote:

> Hi,
>
> On 2019-05-24 19:37:15 +0300, Konstantin Knizhnik wrote:
> > From my point of view releasing all temporary table locks after
> preparing of
> > 2PC transaction is not technically possible:
> > assume that this transaction has  updated some tuples of temporary table
> - them
> > are not visible to other transactions until 2PC is committed,
> > but since lock is removed, other transactions can update the same tuple.
>
> I don't think tuple level actions are the problem? Those doesn't require
> table level locks to be held.
>
> Generally, I fail to see how locks themselves are the problem.


Agreed


> The
> problem are the catalog entries for the temp table, the relation forks,
> and the fact that a session basically couldn't drop (and if created in
> that transaction, use) etc the temp table after the PREPARE.
>

I don't see there is a problem here, but run out of time to explain more,
for a week.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


vacuumdb as server application in v12 release note

2019-05-26 Thread Tatsuo Ishii
In the v12 beta1 release note:

E.1.3.9. Server Applications

Allow vacuumdb to select tables for vacuum based on...

Why is vacuumdb listed in "Server Applications"? It's in "PostgreSQL
Client Applications" section in our manual.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp




Re: Fix typos for v12

2019-05-26 Thread Amit Kapila
On Sat, May 25, 2019 at 8:36 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > I have taken one pass over it and all fixes seem to be correct and got
> > introduced in v12.  I will re-verify them once again and then commit
> > your patch if I don't found any problem.  In the meantime, if anyone
> > else wants to look at it, that would be great.
>
> FWIW, I'd counsel against applying the changes in imath.h/.c, as that
> is not our code, and unnecessary variations from upstream will just
> make it harder to track upstream.
>

This occurred to me as well while reviewing, but I thought typo fixes
should be fine.  Anyway, I have excluded those before pushing.  So, if
we want to fix these, then maybe one has to first get this fixed in
upstream first and then take from there.

>  The rest of this looks fine.
>

Thanks, pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: This seems like very unfriendly behaviour

2019-05-26 Thread Dave Cramer
On Sun, 26 May 2019 at 01:40, Jaime Casanova 
wrote:

> On Sat, 25 May 2019 at 08:35, Dave Cramer  wrote:
> >
> > How do I get rid of this slot ?
> >
> > select pg_drop_replication_slot('mysub');
> > ERROR:  replication slot "mysub" is active for PID 13065
> > test_database=# select * from pg_subscription;
> >  subdbid | subname | subowner | subenabled | subconninfo | subslotname |
> subsynccommit | subpublications
> >
> -+-+--++-+-+---+-
> > (0 rows)
> >
> > test_database=# select * from pg_publication;
> >  pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete |
> pubtruncate
> >
> -+--+--+---+---+---+-
> > (0 rows)
> >
>
> Can you check "select * from pg_stat_replication"?
>
> also, what pid is being reported in pg_replication_slot for this slot?
> do you see a process in pg_stat_activity for that pid? in the os?
>

Well it turned out it was on receiver. I did get rid of it, but still not a
friendly message.

Thanks

Dave Cramer


Re: GSoD Introductory Resources and Tutorial Projects

2019-05-26 Thread Stephen Frost
Greetings,

* sharon clark (sc...@hotmail.com) wrote:
> I plan to submit a proposal for both the PostgreSQL Introductory Resources 
> and Tutorial projects, but I’m open to learning technologies for ANY other 
> projects listed. Please feel free to contact me with any questions.

Thanks for reaching out.  As part of the proposal, you'll want to
include a detailed description (a great deal more than what you included
in this initial email) of the technical writing project.  I encourage
you to reach out to pgsql-docs list with your specific ideas and
suggestions around the topics you're interested, so we can discuss them
and hopefully help you put together a good project proposal.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: This seems like very unfriendly behaviour

2019-05-26 Thread Andres Freund
Hi,

On May 26, 2019 9:49:49 AM EDT, Dave Cramer  wrote:
>On Sun, 26 May 2019 at 01:40, Jaime Casanova
>
>wrote:
>
>> On Sat, 25 May 2019 at 08:35, Dave Cramer 
>wrote:
>> >
>> > How do I get rid of this slot ?
>> >
>> > select pg_drop_replication_slot('mysub');
>> > ERROR:  replication slot "mysub" is active for PID 13065
>> > test_database=# select * from pg_subscription;
>> >  subdbid | subname | subowner | subenabled | subconninfo |
>subslotname |
>> subsynccommit | subpublications
>> >
>>
>-+-+--++-+-+---+-
>> > (0 rows)
>> >
>> > test_database=# select * from pg_publication;
>> >  pubname | pubowner | puballtables | pubinsert | pubupdate |
>pubdelete |
>> pubtruncate
>> >
>>
>-+--+--+---+---+---+-
>> > (0 rows)
>> >
>>
>> Can you check "select * from pg_stat_replication"?
>>
>> also, what pid is being reported in pg_replication_slot for this
>slot?
>> do you see a process in pg_stat_activity for that pid? in the os?
>>
>
>Well it turned out it was on receiver. I did get rid of it, but still
>not a
>friendly message.

What behavior would you like? It's similar to how we behave with dropping 
databases,  roles etc.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: GSoD Introductory Resources and Tutorial Projects

2019-05-26 Thread sharon clark
Hello Stephen,

Thank you for the information. I'll be in touch.

Best wishes,

Sharon

Get Outlook for Android


From: Stephen Frost 
Sent: Sunday, May 26, 2019 7:33:25 AM
To: sharon clark
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: GSoD Introductory Resources and Tutorial Projects

Greetings,

* sharon clark (sc...@hotmail.com) wrote:
> I plan to submit a proposal for both the PostgreSQL Introductory Resources 
> and Tutorial projects, but I’m open to learning technologies for ANY other 
> projects listed. Please feel free to contact me with any questions.

Thanks for reaching out.  As part of the proposal, you'll want to
include a detailed description (a great deal more than what you included
in this initial email) of the technical writing project.  I encourage
you to reach out to pgsql-docs list with your specific ideas and
suggestions around the topics you're interested, so we can discuss them
and hopefully help you put together a good project proposal.

Thanks!

Stephen


Re: Fix typos for v12

2019-05-26 Thread Alexander Lakhin
26.05.2019 16:49, Amit Kapila wrote:
> This occurred to me as well while reviewing, but I thought typo fixes
> should be fine.  Anyway, I have excluded those before pushing.  So, if
> we want to fix these, then maybe one has to first get this fixed in
> upstream first and then take from there.
>
>>  The rest of this looks fine.
>>
> Thanks, pushed.
Thank you Amit!
I've filed a Pull Request in the imath project:
https://github.com/creachadair/imath/pull/39

Best regards,
Alexander




Different row estimations on base rels

2019-05-26 Thread Donald Dong
Hi,

I noticed the estimated rows of the base relations during the join
searching is *very* different from the estimations in the final plan.

Join search (rows of the initial_rels):
RELOPTINFO (ct): rows=1 width=4
RELOPTINFO (it): rows=1 width=4
RELOPTINFO (mc): rows=17567 width=32
RELOPTINFO (mi_idx): rows=1380035 width=8
RELOPTINFO (t): rows=2528356 width=25

The final plan:
Seq Scan on company_type ct
  (cost=0.00..1.05 rows=1 width=4)
Seq Scan on info_type it
  (cost=0.00..2.41 rows=1 width=4)
Parallel Seq Scan on movie_companies mc
  (cost=0.00..37814.90 rows=7320 width=32)
Parallel Seq Scan on movie_info_idx mi_idx
  (cost=0.00..13685.15 rows=575015 width=8)
Index Scan using title_pkey on title t
  (cost=0.43..0.58 rows=1 width=25)

By looking at the joinrel->rows, I would expect relation t to have
the largest size, however, this is not true at all. I wonder what's
causing this observation, and how to get estimations close to the
final plan?

Thank you,
Donald Dong




Re: Fix typos for v12

2019-05-26 Thread David Fetter
On Sun, May 26, 2019 at 06:43:41PM +0300, Alexander Lakhin wrote:
> 26.05.2019 16:49, Amit Kapila wrote:
> > This occurred to me as well while reviewing, but I thought typo fixes
> > should be fine.  Anyway, I have excluded those before pushing.  So, if
> > we want to fix these, then maybe one has to first get this fixed in
> > upstream first and then take from there.
> >
> >>  The rest of this looks fine.
> >>
> > Thanks, pushed.
> Thank you Amit!
> I've filed a Pull Request in the imath project:
> https://github.com/creachadair/imath/pull/39

I noticed that it's gone from upstream.  I also noticed that upstream
did a release in January since the previous pull. Is it worth trying
to merge those in as they arrive?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Fix inconsistencies for v12

2019-05-26 Thread Amit Kapila
On Sun, May 26, 2019 at 2:20 AM Alexander Lakhin  wrote:
>
> Hello hackers,
>
> Please also consider fixing the following inconsistencies found in new
> v12 code:
>
> 1. AT_AddOids - remove (orphaned after 578b2297)
> 2. BeingModified ->TM_BeingModified (for consistency)
>

/*
- * A tuple is locked if HTSU returns BeingModified.
+ * A tuple is locked if HTSU returns TM_BeingModified.
  */
  if (htsu == TM_BeingModified)

I think the existing comment is quite clear.  I mean we can change it
if we want, but I don't see the dire need to do it.


> 3. copy_relation_data -> remove (orphaned after d25f5191)

- * reason is the same as in tablecmds.c's copy_relation_data(): we're
- * writing data that's not in shared buffers, and so a CHECKPOINT
- * occurring during the rewriteheap operation won't have fsync'd data we
- * wrote before the checkpoint.
+ * reason is that we're writing data that's not in shared buffers, and
+ * so a CHECKPOINT occurring during the rewriteheap operation won't
+ * have fsync'd data we wrote before the checkpoint.

It seems to me that the same thing is moved to storage.c's
RelationCopyStorage() in the commit mentioned by you.  So, isn't it
better to change the comment accordingly rather than entirely removing
the reference to a similar comment in another place?

> 4. endblock -> endblk (an internal inconsistency)
> 5. ExecContextForcesOids - not changed, but may be should be removed
> (orphaned after 578b2297)

Yes, we should remove the use of ExecContextForcesOids.  We are using
es_result_relation_info at other places as well, so I think we can
change the comment to indicate the same.

>
> The separate patches for all the defects (except 5) are attached. In
> case a single patch is preferable, I can produce it too.
>

It is okay, we can review the individual patches and then combine them
later.   I couldn't get a chance to review all of them yet.  Thanks
for working on this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Rearranging ALTER TABLE to avoid multi-operations bugs

2019-05-26 Thread Tom Lane
We've had numerous bug reports complaining about the fact that ALTER TABLE
generates subsidiary commands that get executed unconditionally, even
if they should be discarded due to an IF NOT EXISTS or other condition;
see e.g. #14827, #15180, #15670, #15710.  In [1] I speculated about
fixing this by having ALTER TABLE maintain an array of flags that record
the results of initial tests for column existence, and then letting it
conditionalize execution of subcommands on those flags.  I started to
fool around with that concept today, and quickly realized that my
original thought of just adding execute-if-this-flag-is-true markers to
AlterTableCmd subcommands was insufficient.  Most of the problems are with
independent commands that execute before or after the main AlterTable,
and would not have any easy connection to state maintained by AlterTable.

The way to fix this, I think, is to provide an AlterTableCmd subcommand
type that just wraps an arbitrary utility statement, and then we can
conditionalize execution of such subcommands using the flag mechanism.
So instead of generating independent "before" and "after" statements,
transformAlterTableStmt would just produce a single AlterTable with
everything in its list of subcommands --- but we'd still use the generic
ProcessUtility infrastructure to execute subcommands that correspond
to existing standalone statements.

Looking into parse_utilcmd.c with an eye to making it do that, I almost
immediately ran across bugs we hadn't even known were there in ALTER TABLE
ADD/DROP GENERATED.  These have got a different but arguably-related
flavor of bug: they are making decisions inside transformAlterTableStmt
that might be wrong by the time we get to execution.  Thus for example

regression=# create table t1 (f1 int);
CREATE TABLE
regression=# alter table t1 add column f2 int not null,
alter column f2 add generated always as identity;
ALTER TABLE
regression=# insert into t1 values(0);
ERROR:  no owned sequence found

This happens because transformAlterTableStmt thinks it can generate
the sequence creation commands for the AT_AddIdentity subcommand,
and also figures it's okay to just ignore the case where the column
doesn't exist.  So we create the column but then we don't make the
sequence.  There are similar bugs in AT_SetIdentity processing, and
I rather suspect that it's also unsafe for AT_AlterColumnType to be
looking at the column's attidentity state --- though I couldn't
demonstrate a bug in that path, because of the fact that 
AT_AlterColumnType executes in a pass earlier than anything that
could change attidentity.

This can't be fixed just by conditionalizing execution of subcommands,
because we need to know the target column's type in order to set up the
sequence correctly.  So what has to happen to fix these things is to
move the decisions, and the creation of the subcommand parsetrees,
into ALTER TABLE execution.

That requires pretty much the same support for recursively calling
ProcessUtility() from AlterTable() that we'd need for the subcommand
wrapper idea.  So I went ahead and tackled it as a separate project,
and attached is the result.

I'm not quite sure if I'm satisfied with the approach shown here.
I made a struct containing the ProcessUtility parameters that need
to be passed down through the recursion, originally with the idea
that this struct might be completely opaque outside utility.c.
However, there's a good deal of redundancy in that approach ---
the relid and stmt parameters of AlterTable() are really redundant
with stuff in the struct.  So now I'm wondering if it would be better
to merge all that stuff and just have the struct as AlterTable's sole
argument.  I'm also not very sure whether AlterTableInternal() ought
to be modified so that it uses or at least creates a valid struct;
it doesn't *need* to do so today, but maybe someday it will.

And the whole thing has a faint air of grottiness about it too.
This makes the minimum changes to what we've got now, but I can't
help thinking it'd look different if we'd designed from scratch.
The interactions with event triggers seem particularly ad-hoc.
It's also ugly that CreateTable's recursion is handled differently
from AlterTable's.

Anybody have thoughts about a different way to approach it?

regards, tom lane

[1] https://postgr.es/m/7824.1525200...@sss.pgh.pa.us

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index e34d4cc..73c56ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -86,6 +86,7 @@
 #include "storage/lock.h"
 #include "storage/predicate.h"
 #include "storage/smgr.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -340,12 +341,15 @@ static void validateForeignKeyConstraint(char *conname,
 		 Relation rel, Relation pkrel,
 		 Oid pkindOid, Oid constraintOid);
 static void ATController(AlterTableStmt *parse

Re: Fix inconsistencies for v12

2019-05-26 Thread Tom Lane
Amit Kapila  writes:
> On Sun, May 26, 2019 at 2:20 AM Alexander Lakhin  wrote:
>> 5. ExecContextForcesOids - not changed, but may be should be removed
>> (orphaned after 578b2297)

> Yes, we should remove the use of ExecContextForcesOids.

Unless grep is failing me, ExecContextForcesOids is in fact gone.
All that's left is one obsolete mention in a comment, which should
certainly be cleaned up.

However, the full context of the mention is

/*
 * call ExecInitNode on each of the plans to be executed and save the
 * results into the array "mt_plans".  This is also a convenient place to
 * verify that the proposed target relations are valid and open their
 * indexes for insertion of new index entries.  Note we *must* set
 * estate->es_result_relation_info correctly while we initialize each
 * sub-plan; ExecContextForcesOids depends on that!
 */

which makes one wonder if the code to twiddle
estate->es_result_relation_info during subplan init is dead code.  If so
we probably ought to remove it, as it's surely confusing.  If it's not
dead, then this comment ought to be updated to explain the surviving
reason(s), not simply deleted.

regards, tom lane




Re: Confusing error message for REINDEX TABLE CONCURRENTLY

2019-05-26 Thread Michael Paquier
On Sat, May 25, 2019 at 02:42:59PM +1200, David Rowley wrote:
> Also, I think people probably will care more about the fact that
> nothing was done for that table rather than if the table happens to
> have no indexes. For the non-concurrently case, that just happened to
> be the same thing.

This is equally confusing for plain REINDEX as well, no?  Taking your
previous example:
=#  REINDEX TABLE listp;
WARNING:  0A000: REINDEX of partitioned tables is not yet implemented,
skipping "listp"
LOCATION:  reindex_relation, index.c:3513
NOTICE:  0: table "listp" has no indexes
LOCATION:  ReindexTable, indexcmds.c:2452
REINDEX

In this case the relation has partitioned indexes, not indexes, so
that's actually correct.  Still it seems to me that some users could
get confused by the current wording.

For invalid indexes you would get that:
=# create table aa (a int);
CREATE TABLE
=# insert into aa values (1),(1);
INSERT 0 2
=# create unique index concurrently aai on aa(a);
ERROR:  23505: could not create unique index "aai"
DETAIL:  Key (a)=(1) is duplicated.
SCHEMA NAME:  public
TABLE NAME:  aa
CONSTRAINT NAME:  aai
LOCATION:  comparetup_index_btree, tuplesort.c:405
=# reindex table concurrently aa;
WARNING:  0A000: cannot reindex invalid index "public.aai"
concurrently, skipping
LOCATION:  ReindexRelationConcurrently, indexcmds.c:2772
NOTICE:  0: table "aa" has no indexes
LOCATION:  ReindexTable, indexcmds.c:2452
REINDEX

As you mention for reindex_relation() no indexes <=> nothing to do,
still let's not rely on that.  Instead of making the error message
specific to concurrent operations, I would suggest to change it to
"table foo has no indexes to reindex".  What do you think about the
attached?
--
Michael
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 40ea629ffe..ac4a32953a 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2447,7 +2447,7 @@ ReindexTable(RangeVar *relation, int options, bool concurrent)
 
 	if (!result)
 		ereport(NOTICE,
-(errmsg("table \"%s\" has no indexes",
+(errmsg("table \"%s\" has no indexes to reindex",
 		relation->relname)));
 
 	return heapOid;
@@ -2676,6 +2676,9 @@ ReindexMultipleTables(const char *objectName, ReindexObjectType objectKind,
  * The locks taken on parent tables and involved indexes are kept until the
  * transaction is committed, at which point a session lock is taken on each
  * relation.  Both of these protect against concurrent schema changes.
+ *
+ * Returns true if any indexes have been rebuilt (including toast table's
+ * indexes when relevant).
  */
 static bool
 ReindexRelationConcurrently(Oid relationOid, int options)
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index c8bc6be061..4b62c9045a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1944,9 +1944,9 @@ DROP TABLE reindex_verbose;
 CREATE TABLE concur_reindex_tab (c1 int);
 -- REINDEX
 REINDEX TABLE concur_reindex_tab; -- notice
-NOTICE:  table "concur_reindex_tab" has no indexes
+NOTICE:  table "concur_reindex_tab" has no indexes to reindex
 REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
-NOTICE:  table "concur_reindex_tab" has no indexes
+NOTICE:  table "concur_reindex_tab" has no indexes to reindex
 ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
 -- Normal index with integer column
 CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
@@ -2043,10 +2043,10 @@ ERROR:  REINDEX is not yet implemented for partitioned indexes
 -- REINDEX is a no-op for partitioned tables
 REINDEX TABLE concur_reindex_part_10;
 WARNING:  REINDEX of partitioned tables is not yet implemented, skipping "concur_reindex_part_10"
-NOTICE:  table "concur_reindex_part_10" has no indexes
+NOTICE:  table "concur_reindex_part_10" has no indexes to reindex
 REINDEX TABLE CONCURRENTLY concur_reindex_part_10;
 WARNING:  REINDEX of partitioned tables is not yet implemented, skipping "concur_reindex_part_10"
-NOTICE:  table "concur_reindex_part_10" has no indexes
+NOTICE:  table "concur_reindex_part_10" has no indexes to reindex
 SELECT relid, parentrelid, level FROM pg_partition_tree('concur_reindex_part_index')
   ORDER BY relid, level;
  relid | parentrelid | level 
@@ -2150,7 +2150,7 @@ DELETE FROM concur_reindex_tab4 WHERE c1 = 1;
 -- The invalid index is not processed when running REINDEX TABLE.
 REINDEX TABLE CONCURRENTLY concur_reindex_tab4;
 WARNING:  cannot reindex invalid index "public.concur_reindex_ind5" concurrently, skipping
-NOTICE:  table "concur_reindex_tab4" has no indexes
+NOTICE:  table "concur_reindex_tab4" has no indexes to reindex
 \d concur_reindex_tab4
 Table "public.concur_reindex_tab4"
  Column |  Type   | Collation | Nullable | Default 


signature.asc
Description: PGP signature


Re: Why does pg_checksums -r not have a long option?

2019-05-26 Thread Michael Paquier
On Sun, May 26, 2019 at 08:35:30AM +0200, Fabien COELHO wrote:
> Probably? Attached a patch.

No objections with adding a long option for that stuff.  But I do have
an objection with the naming because we have another tool able to work
on relfilenodes:
$ oid2name --help | grep FILE
  -f, --filenode=FILENODEshow info for table with given file node

In this case, long options are new as of 1aaf532 which is recent, but
-f is around for a much longer time.

Perhaps we should use the same mapping for consistency?
pg_verify_checksums has been using -r for whatever reason, but as we
do a renaming of the binary for v12 we could just fix that
inconsistency as well.  Hence I would suggest that for the option
description:
"-f, --filenode=FILENODE"

I would also switch to the long option name in the tests at the same
time, this makes the perl scripts easier to follow.
--
Michael


signature.asc
Description: PGP signature


BEFORE UPDATE trigger on postgres_fdw table not work

2019-05-26 Thread Shohei Mochizuki

Hi,

I noticed returning a modified record in a row-level BEFORE UPDATE trigger
on postgres_fdw foreign tables do not work. Attached patch fixes this issue.

Below are scenarios similar to postgres_fdw test to reproduce the issue.

postgres=# CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS 
(dbname 'postgres',port '5432');
postgres=# CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
postgres=# create table loc1 (f1 serial, f2 text);
postgres=# create foreign table rem1 (f1 serial, f2 text)
postgres-#   server loopback options(table_name 'loc1');

postgres=# CREATE FUNCTION trig_row_before_insupdate() RETURNS TRIGGER AS $$
postgres$#   BEGIN
postgres$# NEW.f2 := NEW.f2 || ' triggered !';
postgres$# RETURN NEW;
postgres$#   END
postgres$# $$ language plpgsql;

postgres=# CREATE TRIGGER trig_row_before_insupd BEFORE INSERT OR UPDATE ON rem1
postgres-# FOR EACH ROW EXECUTE PROCEDURE trig_row_before_insupdate();

-- insert trigger is OK
postgres=# INSERT INTO rem1 values(1, 'insert');
postgres=# SELECT * FROM rem1;
 f1 | f2
+
  1 | insert triggered !
(1 row)

-- update trigger is OK if we update f2
postgres=# UPDATE rem1 set f2 = 'update';
postgres=# SELECT * FROM rem1;
 f1 | f2
+
  1 | update triggered !


Without attached patch:

postgres=# UPDATE rem1 set f1 = 10;
postgres=# SELECT * FROM rem1;
 f1 | f2
+
 10 | update triggered !
(1 row)

f2 should be updated by trigger, but not.
This is because current fdw code adds only columns to RemoteSQL that were
explicitly targets of the UPDATE as follows.

postgres=# EXPLAIN (verbose, costs off)
UPDATE rem1 set f1 = 10;
 QUERY PLAN
-
 Update on public.rem1
   Remote SQL: UPDATE public.loc1 SET f1 = $2 WHERE ctid = $1  <--- not set f2
   ->  Foreign Scan on public.rem1
 Output: 10, f2, ctid, rem1.*
 Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
(5 rows)

With attached patch, f2 is updated by a trigger and "f2 = $3" is added to 
remote SQL
as follows.

postgres=# UPDATE rem1 set f1 = 10;
postgres=# select * from rem1;
 f1 |   f2
+
 10 | update triggered ! triggered !
(1 row)

postgres=# EXPLAIN (verbose, costs off)
postgres-# UPDATE rem1 set f1 = 10;
  QUERY PLAN
---
 Update on public.rem1
   Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
   ->  Foreign Scan on public.rem1
 Output: 10, f2, ctid, rem1.*
 Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
(5 rows)

My patch adds all columns to a target list of remote update query
as in INSERT case if a before update trigger exists.

I tried to add only columns modified in trigger to the target list of
a remote update query, but I cannot find simple way to do that because
update query is built during planning phase at postgresPlanForeignModify
while it is difficult to decide which columns are modified by a trigger
until query execution.

Regards,

--
Shohei Mochizuki
TOSHIBA CORPORATION
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index e034b03..546d97b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -6558,6 +6558,14 @@ SELECT * from loc1;
   2 | skidoo triggered !
 (2 rows)
 
+UPDATE rem1 set f1 = 10;
+SELECT * from loc1;
+ f1 |   f2   
++
+ 10 | skidoo triggered ! triggered !
+ 10 | skidoo triggered ! triggered !
+(2 rows)
+
 DELETE FROM rem1;
 -- Add a second trigger, to check that the changes are propagated correctly
 -- from trigger to trigger
@@ -6670,7 +6678,7 @@ NOTICE:  trig_row_after(23, skidoo) AFTER ROW INSERT ON 
rem1
 NOTICE:  NEW: (13,"test triggered !")
   ctid  
 
- (0,27)
+ (0,29)
 (1 row)
 
 -- cleanup
@@ -6774,10 +6782,10 @@ BEFORE UPDATE ON rem1
 FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
 EXPLAIN (verbose, costs off)
 UPDATE rem1 set f2 = '';  -- can't be pushed down
- QUERY PLAN  
--
+  QUERY PLAN   
+---
  Update on public.rem1
-   Remote SQL: UPDATE public.loc1 SET f2 = $2 WHERE ctid = $1
+   Remote SQL: UPDATE public.loc1 SET f1 = $2, f2 = $3 WHERE ctid = $1
->  Foreign Scan on public.rem1
  Output: f1, ''::text, ctid, rem1.*
  Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
@@ -7404,12 +7412,12 @@ AFTER UPDATE OR DELETE ON bar2
 FOR EACH ROW EXECUTE PROCEDURE trigger_

Re: Inconsistent error message wording for REINDEX CONCURRENTLY

2019-05-26 Thread Michael Paquier
On Wed, May 15, 2019 at 11:17:51AM +0900, Michael Paquier wrote:
> Perhaps something like the attached for the REINDEX portion would make
> the world a better place?  What do you think?

Alvaro, do you have extra thoughts about this patch improving the
error message consistency for REINDEX CONCURRENTLY.  I quite like the
suggestions you made and this makes the error strings more
project-like, so I would like to apply it.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Unlogged tables cleanup

2019-05-26 Thread Michael Paquier
On Thu, May 23, 2019 at 09:14:59AM -0400, Robert Haas wrote:
> Suppose we create an unlogged table and then crash. The main fork
> makes it to disk, and the init fork does not.  Before WAL replay, we
> remove any main forks that have init forks, but because the init fork
> was lost, that does not happen.  Recovery recreates the init fork.
> After WAL replay, we try to copy_file() each _init fork to the
> corresponding main fork. That fails, because copy_file() expects to be
> able to create the target file, and here it can't do that because it
> already exists.

Ah, sorry.  Now I understand your point.  I got your previous message
wrong as I thought about the fact that init forks are logged, hence
after a crash you should still have these at the end of recovery, and
I got confused when you wrote that init forks just disappear.  However
you were referring to the state of a cluster before recovery begins,
and I was thinking about the state where the cluster has run recovery
and has reached consistency.

If we should do something about such cases, then this brings us back
to do (INIT | CLEANUP) at the end of recovery anyway?
--
Michael


signature.asc
Description: PGP signature


Re: initdb recommendations

2019-05-26 Thread Michael Paquier
On Fri, May 24, 2019 at 08:23:57AM -0700, Noah Misch wrote:
> Our sspi auth is a more-general version of peer auth, and it works over TCP.
> It would be a simple matter of programming to support "peer" on Windows,
> consisting of sspi auth with an implicit pg_ident map.

I am not sure that it is much worth complicating the HBA rules with an
extra alias knowing that it is possible to map pg_ident to use a regex
matching pattern.

> Nonetheless, I agree password would be fine.

Fine for me.
--
Michael


signature.asc
Description: PGP signature


Re: docs about FKs referencing partitioned tables

2019-05-26 Thread Michael Paquier
On Thu, May 23, 2019 at 12:02:56AM -0700, Paul A Jungwirth wrote:
> The section in the docs (5.10) just before the one I changed has
> similar warnings:
> 
>> Other types of constraints (unique, primary key, and foreign key
>> constraints) are not inherited. 
> 
> and
> 
>> A serious limitation of the inheritance feature is that indexes
>> (including unique constraints) and foreign key constraints only
>> apply to single tables, not to their inheritance children.

Yes.

> I moved the paragraph to a section describing inheritance as an
> alternative partitioning solution to declarative partitions. Since
> using inheritance to partition a table requires giving up foreign
> keys, it seems worthwhile to include that among the other caveats. (It
> wasn't necessary to include it before because declarative partitions
> had the same drawback, and it was already expressed in the paragraph I
> took out.) In my opinion mentioning this limitation would be helpful
> to people.

Well, the point I would like to outline is that section 5.11.2 about
declarative partitioning and 5.11.3 about partitioning with
inheritance treat about two separate, independent partitioning
methods.  So removing the paragraph from the declarative partitioning
section mentioning foreign keys referencing partitioned tables is
fine, because that's not the case anymore...

> I was trying to make a minimal change by keeping most of the original
> wording, but I agree that different language would be more accurate.
> What do you think of something like this?:
> 
> + 
> +  
> +   While foreign keys may be defined that reference a parent
> +   table, they will not see records from its child tables. Since
> +   the parent table is typically empty, adding any record (with a
> +   non-null foreign key) to the referencing table will raise an error.
> +  
> + 

...  However you are adding a paragraph for something which is
completely unrelated to the issue we are trying to fix.  If I were to
add something, I think that I would be more general than what you are
trying here and just mention a link to the previous paragraph about
the caveats of inheritance as they apply to single table members of an
inheritance tree and not a full set:
"Indexes and foreign key constraint apply to single tables and not
their inheritance children, hence they have some caveats to
be aware of."
Still this is a duplicate of a sentence which is just a couple of
paragraphs back.
--
Michael


signature.asc
Description: PGP signature


Re: Inconsistent error message wording for REINDEX CONCURRENTLY

2019-05-26 Thread Alvaro Herrera
On 2019-May-27, Michael Paquier wrote:

> On Wed, May 15, 2019 at 11:17:51AM +0900, Michael Paquier wrote:
> > Perhaps something like the attached for the REINDEX portion would make
> > the world a better place?  What do you think?
> 
> Alvaro, do you have extra thoughts about this patch improving the
> error message consistency for REINDEX CONCURRENTLY.  I quite like the
> suggestions you made and this makes the error strings more
> project-like, so I would like to apply it.

I wonder if we really want to abolish all distinction between "cannot do
X" and "Y is not supported".  I take the former to mean that the
operation is impossible to do for some reason, while the latter means we
just haven't implemented it yet and it seems likely to get implemented
in a reasonable timeframe.  See some excellent commentary about about
the "can not" wording at
https://postgr.es/m/CA+TgmoYS8jKhETyhGYTYMcbvGPwYY=qa6yyp9b47mx7mwee...@mail.gmail.com

I notice your patch changes "catalog relations" to "system catalogs".
I think we predominantly prefer the latter, so that part of your change
seems OK.  (In passing, I noticed we have a couple of places using
"system catalog tables", which is weird.)

I think reindexing system catalogs concurrently is a complex enough
undertaking that implementing it is far enough in the future that the
"cannot" wording is okay; but reindexing partitioned tables is not so
obviously out of the question.  We do have "is not yet implemented" in a
couple of other places, so all things considered I'm not so sure about
changing that one to "cannot".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] WAL logging problem in 9.4.3?

2019-05-26 Thread Kyotaro HORIGUCHI
Thanks for the comment!

At Fri, 24 May 2019 19:33:32 -0700, Noah Misch  wrote in 
<20190525023332.ge1624...@rfd.leadboat.com>
> On Mon, May 20, 2019 at 03:54:30PM +0900, Kyotaro HORIGUCHI wrote:
> > Following this direction, the attached PoC works *at least for*
> > the wal_optimization TAP tests, but doing pending flush not in
> > smgr but in relcache.
> 
> This task, syncing files created in the current transaction, is not the kind
> of task normally assigned to a cache.  We already have a module, storage.c,
> that maintains state about files created in the current transaction.  Why did
> you use relcache instead of storage.c?

The reason was at-commit sync needs buffer flush beforehand. But
FlushRelationBufferWithoutRelCache() in v11 can do
that. storage.c is reasonable as the place.

> On Tue, May 21, 2019 at 09:29:48PM +0900, Kyotaro HORIGUCHI wrote:
> > This is a tidier version of the patch.
> 
> > - Move the substantial work to table/index AMs.
> > 
> >   Each AM can decide whether to support WAL skip or not.
> >   Currently heap and nbtree support it.
> 
> Why would an AM find it important to disable WAL skip?

The reason is currently it's AM's responsibility to decide
whether to skip WAL or not.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





RE: Why does not subquery pruning conditions inherit to parent query?

2019-05-26 Thread Kato, Sho
Friday, May 24, 2019 5:10 PM, David Rowley wrote:
> The planner can only push quals down into a subquery, it cannot pull quals
> from a subquery into the outer query.
> 
> If you write the query like:
> 
> explain select * from jta, (select a, max(b) from jtb group by a ) c1
> where jta.a = c1.a and c1.a = 1;
> 
> you should get the plan that you want.

Thank you for your replay.

You are right. I should do that.
However, following query looks like the subquery qual is pushed down into the 
outer query.

postgres=# explain select * from jta, (select a from jtb where a = 1) c1 where 
jta.a = c1.a;
QUERY PLAN
--
 Nested Loop  (cost=0.00..81.94 rows=143 width=8)
   ->  Seq Scan on jta0  (cost=0.00..41.88 rows=13 width=4)
 Filter: (a = 1)
   ->  Materialize  (cost=0.00..38.30 rows=11 width=4)
 ->  Seq Scan on jtb0  (cost=0.00..38.25 rows=11 width=4)
   Filter: (a = 1)
(6 rows)

So, I think I could improve this behavior.
Why such a difference occur?

regards,

Sho Kato
> -Original Message-
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
> Sent: Friday, May 24, 2019 5:10 PM
> To: Kato, Sho/加藤 翔 
> Cc: pgsql-hack...@postgresql.org
> Subject: Re: Why does not subquery pruning conditions inherit to parent
> query?
> 
> On Fri, 24 May 2019 at 19:44, Kato, Sho  wrote:
> > I execute following query to the partitioned table, but the plan is
> different from my assumption, so please tell me the reason.
> >
> > postgres=# explain select * from jta, (select a, max(b)  from jtb where
> a = 1 group by a ) c1 where jta.a = c1.a;
> >QUERY PLAN
> >
> 
> --
> > --  Hash Join  (cost=38.66..589.52 rows=1402 width=12)
> >Hash Cond: (jta0.a = jtb0.a)
> >->  Append  (cost=0.00..482.50 rows=25500 width=4)
> >  ->  Seq Scan on jta0  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta1  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta2  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta3  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta4  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta5  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta6  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta7  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta8  (cost=0.00..35.50 rows=2550 width=4)
> >  ->  Seq Scan on jta9  (cost=0.00..35.50 rows=2550 width=4)
> >->  Hash  (cost=38.53..38.53 rows=11 width=8)
> >  ->  GroupAggregate  (cost=0.00..38.42 rows=11 width=8)
> >Group Key: jtb0.a
> >->  Seq Scan on jtb0  (cost=0.00..38.25 rows=11
> width=8)
> >  Filter: (a = 1)
> > (18 rows)
> >
> > I assume that subquery aggregate only pruned table and parent query
> joins pruned table and subquery results.
> > However, parent query scan all partitions and join.
> > In my investigation, because is_simple_query() returns false if
> subquery contains GROUP BY, parent query does not prune.
> > Is it possible to improve this?
> 
> The planner can only push quals down into a subquery, it cannot pull quals
> from a subquery into the outer query.
> 
> If you write the query like:
> 
> explain select * from jta, (select a, max(b) from jtb group by a ) c1
> where jta.a = c1.a and c1.a = 1;
> 
> you should get the plan that you want.
> 
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
> 
> 



Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-05-26 Thread Amit Langote
On 2019/05/24 23:28, Tom Lane wrote:
> So my thought, if we want to do something about this, is not "find
> some things we can pfree at the end of planning" but "find a way
> to use a separate context for each statement in the query string".
> Maybe multi-query strings could be handled by setting up a child
> context of MessageContext (after we've done the raw parsing there
> and determined that indeed there are multiple queries), running
> parse analysis and planning in that context, and resetting that
> context after each query.

Maybe like the attached?  I'm not sure if we need to likewise be concerned
about exec_sql_string() being handed multi-query strings.

Thanks,
Amit
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..0165f794b0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -984,7 +984,8 @@ static void
 exec_simple_query(const char *query_string)
 {
CommandDest dest = whereToSendOutput;
-   MemoryContext oldcontext;
+   MemoryContext oldcontext,
+ stmtcontext;
List   *parsetree_list;
ListCell   *parsetree_item;
boolsave_log_statement_stats = log_statement_stats;
@@ -1132,10 +1133,14 @@ exec_simple_query(const char *query_string)
/*
 * OK to analyze, rewrite, and plan this query.
 *
-* Switch to appropriate context for constructing querytrees 
(again,
-* these must outlive the execution context).
+* Switch to appropriate context for constructing querytrees.
+* Memory allocated during this construction is released before
+* the generated plan is executed.
 */
-   oldcontext = MemoryContextSwitchTo(MessageContext);
+   stmtcontext = AllocSetContextCreate(MessageContext,
+   
"statement parse/plan context",
+   
ALLOCSET_DEFAULT_SIZES);
+   oldcontext = MemoryContextSwitchTo(stmtcontext);
 
querytree_list = pg_analyze_and_rewrite(parsetree, query_string,

NULL, 0, NULL);
@@ -1143,6 +1148,14 @@ exec_simple_query(const char *query_string)
plantree_list = pg_plan_queries(querytree_list,

CURSOR_OPT_PARALLEL_OK, NULL);
 
+   /*
+* Copy the plan trees into the longer-lived MessageContext and 
delete
+* the temporary context used to generate them.
+*/
+   MemoryContextSwitchTo(MessageContext);
+   plantree_list = copyObject(plantree_list);
+   MemoryContextDelete(stmtcontext);
+
/* Done with the snapshot used for parsing/planning */
if (snapshot_set)
PopActiveSnapshot();


some questions about fast-path-lock

2019-05-26 Thread Alex
I got some idea from the README under storage/lmgr and read some code of
LockAcquireExtended ,   but I still have some questions now.

LWLockAcquire(&MyProc->backendLock, LW_EXCLUSIVE);
if (FastPathStrongRelationLocks->count[fasthashcode] != 0)
acquired = false;
else
 acquired = FastPathGrantRelationLock(locktag->locktag_field2,
lockmode);

1.  In the README,   it says:  "A key point of this algorithm is that it
must be possible to verify the
absence of possibly conflicting locks without fighting over a shared LWLock
or
spinlock.  Otherwise, this effort would simply move the contention
bottleneck
from one place to another."

but in the code, there is LWLockAcquire in the above code.  Actually I
can't think out how can we proceed without a lock.

2.   Why does the MyProc->backendLock work?   it is MyProc not a global
lock.

3. for the line,acquired =
FastPathGrantRelationLock(locktag->locktag_field2,
lockmode);I think it should  be able to replaced with  "acquired =
true" (but obviously I'm wrong)  .   I read "FastPathGrantRelationLock" but
can't understand it.


Any hint will be helpful.   thanks!


Re: Why does pg_checksums -r not have a long option?

2019-05-26 Thread Fabien COELHO


Hello Michael-san,


No objections with adding a long option for that stuff.  But I do have
an objection with the naming because we have another tool able to work
on relfilenodes:
$ oid2name --help | grep FILE
 -f, --filenode=FILENODEshow info for table with given file node

In this case, long options are new as of 1aaf532 which is recent, but
-f is around for a much longer time.

Perhaps we should use the same mapping for consistency?

pg_verify_checksums has been using -r for whatever reason, but as we
do a renaming of the binary for v12 we could just fix that
inconsistency as well.  Hence I would suggest that for the option
description:
"-f, --filenode=FILENODE"


Fine with me, I was not particularly happy with "relfilenode", but just 
picked it up for consistency with -r.



I would also switch to the long option name in the tests at the same
time, this makes the perl scripts easier to follow.


Ok. Attached.

I've used both -f & --filenode in the test to check that the renaming was 
ok. I have reordered the options in the documentation so that they appear 
in alphabetical order, as for some reason --progress was out of it.


--
Fabien.diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index a0ffeb0ab0..6d8dd6be29 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -100,6 +100,16 @@ PostgreSQL documentation
   
  
 
+ 
+  -f filenode
+  --filenode=filenode
+  
+   
+Only validate checksums in the relation with specified relation file node.
+   
+  
+ 
+
  
   -N
   --no-sync
@@ -116,25 +126,6 @@ PostgreSQL documentation
   
  
 
- 
-  -v
-  --verbose
-  
-   
-Enable verbose output. Lists all checked files.
-   
-  
- 
-
- 
-  -r relfilenode
-  
-   
-Only validate checksums in the relation with specified relfilenode.
-   
-  
- 
-
  
   -P
   --progress
@@ -146,6 +137,16 @@ PostgreSQL documentation
   
  
 
+ 
+  -v
+  --verbose
+  
+   
+Enable verbose output. Lists all checked files.
+   
+  
+ 
+
  
-V
--version
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 37fe20bb75..cd621e5417 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -36,7 +36,7 @@ static int64 blocks = 0;
 static int64 badblocks = 0;
 static ControlFileData *ControlFile;
 
-static char *only_relfilenode = NULL;
+static char *only_filenode = NULL;
 static bool do_sync = true;
 static bool verbose = false;
 static bool showprogress = false;
@@ -83,7 +83,7 @@ usage(void)
 	printf(_("  -N, --no-sync  do not wait for changes to be written safely to disk\n"));
 	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
-	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_(" [-f, --filenode]=NODE   check only relation with specified file node\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -318,7 +318,7 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly)
 			/*
 			 * Cut off at the segment boundary (".") to get the segment number
 			 * in order to mix it into the checksum. Then also cut off at the
-			 * fork boundary, to get the relfilenode the file belongs to for
+			 * fork boundary, to get the relation file node the file belongs to for
 			 * filtering.
 			 */
 			strlcpy(fnonly, de->d_name, sizeof(fnonly));
@@ -339,7 +339,7 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly)
 			if (forkpath != NULL)
 *forkpath++ = '\0';
 
-			if (only_relfilenode && strcmp(only_relfilenode, fnonly) != 0)
+			if (only_filenode && strcmp(only_filenode, fnonly) != 0)
 /* Relfilenode not to be included */
 continue;
 
@@ -373,6 +373,7 @@ main(int argc, char *argv[])
 		{"enable", no_argument, NULL, 'e'},
 		{"no-sync", no_argument, NULL, 'N'},
 		{"progress", no_argument, NULL, 'P'},
+		{"filenode", required_argument, NULL, 'f'},
 		{"verbose", no_argument, NULL, 'v'},
 		{NULL, 0, NULL, 0}
 	};
@@ -400,7 +401,7 @@ main(int argc, char *argv[])
 		}
 	}
 
-	while ((c = getopt_long(argc, argv, "cD:deNPr:v", long_options, &option_index)) != -1)
+	while ((c = getopt_long(argc, argv, "cD:deNPf:v", long_options, &option_index)) != -1)
 	{
 		switch (c)
 		{
@@ -422,13 +423,13 @@ main(int argc, char *argv[])
 			case 'D':
 DataDir = optarg;
 break;
-			case 'r':
+			case 'f':
 if (atoi(optarg) == 0)
 {
-	pg_log_error("invalid relfilenode specification, must be numeric: %s", optarg);
+	pg_log_error(

Re: Excessive memory usage in multi-statement queries w/ partitioning

2019-05-26 Thread Amit Langote
Hi,

On 2019/05/24 21:18, Joe Conway wrote:
> On 5/24/19 1:47 AM, Amit Langote wrote:
>> I too have had similar thoughts on the matter.  If the planner had built
>> all its subsidiary data structures in its own private context (or tree of
>> contexts) which is reset once a plan for a given query is built and passed
>> on, then there wouldn't be an issue of all of that subsidiary memory
>> leaking into MessageContext.  However, the problem may really be that
>> we're subjecting the planner to use cases that it wasn't perhaps designed
>> to perform equally well under -- running it many times while handling the
>> same message.  It is worsened by the fact that the query in question is
>> something that ought to have been documented as not well supported by the
>> planner; David has posted a documentation patch for that [1].  PG 12 has
>> alleviated the situation to a large degree, so you won't see the OOM
>> occurring for this query, but not for all queries unfortunately.
> 
> I admittedly haven't followed this thread too closely, but if having 100
> partitions causes out of memory on pg11, that sounds like a massive
> regression to me.

You won't run out of memory if you are running just one query per message,
but that's not the case in this discussion.  With multi-query submissions
like in this case, memory taken up by parsing and planning of *all*
queries adds up to a single MessageContext, so can lead to OOM if there
are enough queries to load up MessageContext beyond limit.  The only point
I was trying to make in what I wrote is that reaching OOM of this sort is
easier with partitioning, because of the age-old behavior that planning
UPDATE/DELETE queries on inherited tables (and so partitioned tables)
needs tons of memory that grows as the number of child tables / partitions
increases.

We fixed things in PG 12, at least for partitioning, so that as long as a
query needs to affect only a small number of partitions of the total
present, its planning will use only a fixed amount of CPU and memory, so
increasing the number of partitions won't lead to explosive growth in
memory used.  You might be able to tell however that that effort had
nothing to do improving the situation with multi-query submissions.

Thanks,
Amit