Re: slight tweaks to documentation about runtime pruning

2018-12-09 Thread David Rowley
On Wed, 5 Dec 2018 at 20:24, Amit Langote  wrote:
> Documentation of run-time pruning tells readers to inspect "nloops"
> property of the EXPLAIN ANALYZE output, but I think that's a typo of
> "loops" which is actually output ("internal variable to track that
> property is indeed nloops).

I agree. The 'n' should be dropped there.

> However, for pruned partitions' subplans, what's actually shown is the
> string "(never executed)", not loops.  So, wouldn't it be better to tell
> the readers to look for that instead of "loops"?

I don't really see any issues with the existing documentation here.
Remember that pruning can be performed multiple times when a parameter
changes that was found to match the partition key and the
Append/MergeAppend is rescanned. For example:

create table listp (a int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
create index on listp(a);
set enable_bitmapscan=0;
explain analyze select * from (values(1),(1),(2)) a(a) inner join
listp l on a.a = l.a;
 QUERY PLAN

 Nested Loop  (cost=0.15..91.50 rows=76 width=8) (actual
time=0.013..0.013 rows=0 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=4)
(actual time=0.001..0.002 rows=3 loops=1)
   ->  Append  (cost=0.15..30.23 rows=26 width=4) (actual
time=0.003..0.003 rows=0 loops=3)
 ->  Index Only Scan using listp1_a_idx on listp1 l
(cost=0.15..15.05 rows=13 width=4) (actual time=0.002..0.002 rows=0
loops=2)
   Index Cond: (a = "*VALUES*".column1)
   Heap Fetches: 0
 ->  Index Only Scan using listp2_a_idx on listp2 l_1
(cost=0.15..15.05 rows=13 width=4) (actual time=0.003..0.003 rows=0
loops=1)
   Index Cond: (a = "*VALUES*".column1)
   Heap Fetches: 0
 Planning Time: 0.158 ms
 Execution Time: 0.042 ms
(11 rows)

listp1 was scanned twice (loops=2), listp2 was scanned just once.

Now it is true that if the subplan was executed 0 times that it will
appear as "(never executed)", but do we really need to explain in this
area that "(never executed)" means loops=0?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Should new partitions inherit their tablespace from their parent?

2018-12-09 Thread David Rowley
Thank you for looking at this.

On Fri, 7 Dec 2018 at 20:15, Michael Paquier  wrote:
> -ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace)
> +ATExecSetTableSpaceNoStorage(Relation rel, Oid newTableSpace)
> NoStorage looks strange as routine name for this case.  Would something
> like ATExecPartedRelSetTableSpace be more adapted perhaps?

I'd say it's better to name the function in the more general purpose
way.  Perhaps we'll get some relkind in the future that's not a
partitioned table or index that might need the reltablespace set. When
that happens, if we name the function the way you're proposing, then
we might end up with some other function which does the same thing, or
the patch adding the new relkind would have to rename the function to
something more like how I have it.  To save from either of those
having to happen, would it not be better to give it the generic name
now?

Instead of renaming the function, I've altered the Assert() to allow
all relkinds which don't have storage and also updated the comment to
remove the mention of partitioned tables and indexes.

> +   else if (stmt->partbound)
> +   {
> +   RangeVar   *parent;
> +   Relationparentrel;
> +
> +   /*
> +* For partitions, when no other tablespace is specified, we default
> +* the tablespace to the parent partitioned table's.
> +*/
> Okay, so the direct parent is what counts, and not the top-most parent.
> Could you add a test with multiple level of partitioned tables, like:
> - parent is in tablespace 1.
> - child is created in tablespace 2.
> - grandchild is created, which should inherit tablespace 2 from the
> child, but not tablespace 1 from the parent.  In the existing example,
> as one partition is used to test the top-most parent and another for the
> new default, it looks cleaner to create a third partition which would be
> itself a partitioned table.

Sounds good. I've modified the existing test just to do it that way. I
don't think there's a need to have multiple tests for that.

I've attached an updated patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v2-0001-Allow-newly-created-partitions-to-inherit-their-p.patch
Description: Binary data


Re: slight tweaks to documentation about runtime pruning

2018-12-09 Thread Amit Langote
On Sun, Dec 9, 2018 at 8:13 PM David Rowley
 wrote:
> listp1 was scanned twice (loops=2), listp2 was scanned just once.
>
> Now it is true that if the subplan was executed 0 times that it will
> appear as "(never executed)", but do we really need to explain in this
> area that "(never executed)" means loops=0?

Ah, I see what you mean.  So, "(never executed)" is not the only sign
of of run-time pruning occurring.  The value of loops can be different
for different subplans / partitions, and it being lower for a given
subplan means that the subplan has been pruned more number of times.

Thanks,
Amit



Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:
> > I wonder if we maybe should have a regression test for every such
> > function which just queries the catalog in a way to force the function
> > to be called for every relation defined in the regression tests, to
> > ensure that it doesn't segfault or throw an error..
> 
> Like sqlsmith?  It looks hard to me to make something like that part of
> the main regression test suite, as that's going to be costly and hard to
> scale with.

No, I mean something like:

with x as (select pg_partition_tree(relname) from pg_class)
select 1 from x limit 1;

or whatever it takes to make sure that the function is run against every
entry in pg_class (or whatever is appropriate) while not returning the
results (since we don't actually care about the output, we just want to
make sure it doesn't ERROR or crash).

sqlsmith, as I recall, doesn't care about ERROR cases, it's just looking
for crashes, so it's not quite the same thing.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Sun, Dec 09, 2018 at 08:15:07AM +0900, Michael Paquier wrote:
> > On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:
> >> We should really have a more clearly defined policy around this, but my
> >> recollection is that we often prefer to return NULL rather than throwing
> >> an error for the convenience of people doing things like querying
> >> pg_class using similar functions.
> > 
> > Yes, that's visibly right.  At least that's what I can see from the
> > various pg_get_*def and pg_*_is_visible.  Returning NULL would indeed
> > be more consistent.
> 
> Thinking more about your argument, scanning fully pg_class is quite
> sensible as well because there is no need to apply an extra qual on
> relkind, so let's change the function as you suggest, by returning NULL
> on invalid relation type.  Any opinions about the attached then which
> does the switch?

Looks alright on a quick glance, but shouldn't you also update the
comment..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Collatability of type "name"

2018-12-09 Thread Tom Lane
I've been experimenting with the task proposed in [1] of expanding
the text_ops operator family to include type "name" as well as
cross-type text vs. name operators.  These operators would need to
offer collation-aware sorting, since that's exactly the difference
between text_ops and the non-collation-aware name_ops opfamily.
I ran into a nasty stumbling block almost immediately: the proposed
name vs. name comparison operators fail, because the parser sees
that both inputs are of noncollatable types so it doesn't assign
any collation to the operator node.

I experimented with leaving out the name vs. name operators and
just adding cross-type text vs. name and name vs. text operators.
That turns out not to work well at all.  Aside from the fact that
opr_sanity whines about an incomplete operator family, I found
various situations where the planner fails, complaining about
things like "missing operator 1(19,19) in opfamily 1994".  The
root of that mess seems to be that we've supposed that if an
equality operator is marked mergejoinable then it is mergejoinable
in every opfamily that it's a member of.  But that isn't true in
an opfamily structure like this.  For instance "text = name" should
be mergejoinable in the name_ops opclass, since we know how to sort
both text and name in non-collation-aware ways.  But it's not
mergejoinable in the text_ops opclass if text_ops doesn't provide
collation-aware name vs. name operators to sort the name input with.

We could probably fix that, at the cost of about tripling the work
needed to detect whether an operator is really mergejoinable, but
I have little confidence that there aren't more problems lurking
behind it.  There are a lot of aspects of EquivalenceClass processing
that look pretty questionable if we're trying to support operators
that act this way.  For instance, if we derive "a = c" given "a = b"
and "b = c", the equality operator in "a = c" might be mergejoinable
in a different set of opclasses than the other two operators are,
making it debatable whether it can be thought to belong to the same
EquivalenceClass at all.

So the other approach I'm contemplating is to mark type name as
collatable (with "C" as its typcollation, probably).  There are
two plausible sub-approaches:

1. The regular name comparison operators remain non-collation-aware.
This would be the least invasive way but it'd have the odd side-effect
that expressions like "namecoll1 < namecoll2 COLLATE something"
would be accepted but the collation would be ignored.  Also, we'd
have to invent some new names for the collation-aware name-vs-name
operators, and I don't see any obvious candidate for that.

2. Upgrade the name comparison operators to be collation-aware,
with (probably) all the same optimizations for C collation as we
have for text.  This'd be a cleaner end result but it seems like
there are a lot of potential side-effects, e.g. syscache lookups
would have to be prepared to pass the right collation argument
to name comparisons.

I feel like #2 is probably really the Right Thing, but it's also
sounding like significantly more work than I thought this was going
to involve.  Not sure if it's worth the effort right now.

Also, I think that either solution would lead to some subtle changes
in semantics.  For example, right now if you compare a name column
to a text value, you get a text (collation-aware) comparison using
the database's default collation.  It looks like if name columns
are marked with attcollation = 'C', that would win and the comparison
would now have 'C' collation unless you explicitly override it with
a COLLATE clause.  I'm not sure this is a bad thing --- it'd be more
likely to match the sort order of the index on the column --- but it
could surprise people.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/5978.1544030...@sss.pgh.pa.us



Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Tom Lane
Stephen Frost  writes:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:
>>> I wonder if we maybe should have a regression test for every such
>>> function which just queries the catalog in a way to force the function
>>> to be called for every relation defined in the regression tests, to
>>> ensure that it doesn't segfault or throw an error..

> No, I mean something like:
> with x as (select pg_partition_tree(relname) from pg_class)
> select 1 from x limit 1;
> or whatever it takes to make sure that the function is run against every
> entry in pg_class (or whatever is appropriate) while not returning the
> results (since we don't actually care about the output, we just want to
> make sure it doesn't ERROR or crash).

I'm going to object to that on cost grounds.  It is not reasonable to
run moderately-expensive functions like this on more than a thousand
pg_class entries in order to test what are just a few distinct cases,
especially in code that's highly unlikely to break once written.
Or at least, it's not reasonable to do that every time anybody runs
the regression tests for the rest of our lives.

Moreover, this would only help check a specific new function if the
author or reviewer remembered to add a test case that does it.
Since the whole problem here is "I forgot to consider this", I don't
have much faith in that happening.

Maybe we should have some sort of checklist of things to think about
when adding new SQL-visible functions, rather than trying to
memorialize every such consideration as a regression test no matter
how expensive.  Admittedly, "I forgot to go over the checklist" is
still a problem; but at least it's not penalizing every developer and
every buildfarm run till kingdom come.

regards, tom lane



Re: Collatability of type "name"

2018-12-09 Thread Pavel Stehule
Hi

ne 9. 12. 2018 v 18:50 odesílatel Tom Lane  napsal:

> I've been experimenting with the task proposed in [1] of expanding
> the text_ops operator family to include type "name" as well as
> cross-type text vs. name operators.  These operators would need to
> offer collation-aware sorting, since that's exactly the difference
> between text_ops and the non-collation-aware name_ops opfamily.
> I ran into a nasty stumbling block almost immediately: the proposed
> name vs. name comparison operators fail, because the parser sees
> that both inputs are of noncollatable types so it doesn't assign
> any collation to the operator node.
>
> I experimented with leaving out the name vs. name operators and
> just adding cross-type text vs. name and name vs. text operators.
> That turns out not to work well at all.  Aside from the fact that
> opr_sanity whines about an incomplete operator family, I found
> various situations where the planner fails, complaining about
> things like "missing operator 1(19,19) in opfamily 1994".  The
> root of that mess seems to be that we've supposed that if an
> equality operator is marked mergejoinable then it is mergejoinable
> in every opfamily that it's a member of.  But that isn't true in
> an opfamily structure like this.  For instance "text = name" should
> be mergejoinable in the name_ops opclass, since we know how to sort
> both text and name in non-collation-aware ways.  But it's not
> mergejoinable in the text_ops opclass if text_ops doesn't provide
> collation-aware name vs. name operators to sort the name input with.
>
> We could probably fix that, at the cost of about tripling the work
> needed to detect whether an operator is really mergejoinable, but
> I have little confidence that there aren't more problems lurking
> behind it.  There are a lot of aspects of EquivalenceClass processing
> that look pretty questionable if we're trying to support operators
> that act this way.  For instance, if we derive "a = c" given "a = b"
> and "b = c", the equality operator in "a = c" might be mergejoinable
> in a different set of opclasses than the other two operators are,
> making it debatable whether it can be thought to belong to the same
> EquivalenceClass at all.
>
> So the other approach I'm contemplating is to mark type name as
> collatable (with "C" as its typcollation, probably).  There are
> two plausible sub-approaches:
>
> 1. The regular name comparison operators remain non-collation-aware.
> This would be the least invasive way but it'd have the odd side-effect
> that expressions like "namecoll1 < namecoll2 COLLATE something"
> would be accepted but the collation would be ignored.  Also, we'd
> have to invent some new names for the collation-aware name-vs-name
> operators, and I don't see any obvious candidate for that.
>
> 2. Upgrade the name comparison operators to be collation-aware,
> with (probably) all the same optimizations for C collation as we
> have for text.  This'd be a cleaner end result but it seems like
> there are a lot of potential side-effects, e.g. syscache lookups
> would have to be prepared to pass the right collation argument
> to name comparisons.
>
> I feel like #2 is probably really the Right Thing, but it's also
> sounding like significantly more work than I thought this was going
> to involve.  Not sure if it's worth the effort right now.
>
> Also, I think that either solution would lead to some subtle changes
> in semantics.  For example, right now if you compare a name column
> to a text value, you get a text (collation-aware) comparison using
> the database's default collation.  It looks like if name columns
> are marked with attcollation = 'C', that would win and the comparison
> would now have 'C' collation unless you explicitly override it with
> a COLLATE clause.  I'm not sure this is a bad thing --- it'd be more
> likely to match the sort order of the index on the column --- but it
> could surprise people.
>

The sort of table's names is not too common operation. I don't see a C
collate for names as any risk.

Regards

Pavel

>
> Thoughts?
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/5978.1544030...@sss.pgh.pa.us
>
>


Re: automatically assigning catalog toast oids

2018-12-09 Thread Tom Lane
John Naylor  writes:
> Commit 96cdeae07 added toast tables to most catalogs. One disadvantage
> is that the toast declarations require hard-coded oids, even though
> only shared catalogs actually need stable oids. Now that we can assign
> oids on the fly, it makes sense to do so for toast tables as well, as
> in the attached.

I'm a bit dubious that this is a good idea.  It's handy, at least for
forensic situations, that the system catalogs have stable OIDs.

regards, tom lane



Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> >> On Sat, Dec 08, 2018 at 08:46:08AM -0500, Stephen Frost wrote:
> >>> I wonder if we maybe should have a regression test for every such
> >>> function which just queries the catalog in a way to force the function
> >>> to be called for every relation defined in the regression tests, to
> >>> ensure that it doesn't segfault or throw an error..
> 
> > No, I mean something like:
> > with x as (select pg_partition_tree(relname) from pg_class)
> > select 1 from x limit 1;
> > or whatever it takes to make sure that the function is run against every
> > entry in pg_class (or whatever is appropriate) while not returning the
> > results (since we don't actually care about the output, we just want to
> > make sure it doesn't ERROR or crash).
> 
> I'm going to object to that on cost grounds.  It is not reasonable to
> run moderately-expensive functions like this on more than a thousand
> pg_class entries in order to test what are just a few distinct cases,
> especially in code that's highly unlikely to break once written.

Yeah, I had been wondering if it would be too expensive also.

I don't entirely buy off on the argument that it's code that's 'highly
unlikely to break once written' though- we do add new relkinds from time
to time, for example.  Perhaps we could have these functions run just
once per relkind.

> Moreover, this would only help check a specific new function if the
> author or reviewer remembered to add a test case that does it.

We could possibly write a test which would run every function that
accepts the typical data types (oid/regclass/name/etc) instead of
depending on the author to remember to add it.

> Since the whole problem here is "I forgot to consider this", I don't
> have much faith in that happening.

Yeah, I agree that we'd want something more automated as, otherwise, it
would definitely be likely to be forgotten.

> Maybe we should have some sort of checklist of things to think about
> when adding new SQL-visible functions, rather than trying to
> memorialize every such consideration as a regression test no matter
> how expensive.  Admittedly, "I forgot to go over the checklist" is
> still a problem; but at least it's not penalizing every developer and
> every buildfarm run till kingdom come.

This seems like something we should do regardless.  Moreover, I'd
suggest that we start documenting some of these policies in the way that
we have a style guide for errors and such- we need a "system function
style guide" that could start with something like "prefer to return NULL
instead of ERROR on unexpected but otherwise valid inputs, and test that
the code doesn't segfault when given a variety of inputs".

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2018-12-09 Thread Sergei Kornilov
Hello

I review code and documentation and i have few notes. Did you register this 
patch in CF app?

I found one error in phase 4. Simple reproducer:

> create table test (i int);
> create index this_is_very_large_exactly_maxnamelen_index_name_wink_wink_wink 
> on test (i);
> create index this_is_very_large_exactly_maxnamelen_index_name_wink_winkccold 
> on test (i);
> reindex table CONCURRENTLY test;

This fails with error

> ERROR:  duplicate key value violates unique constraint 
> "pg_class_relname_nsp_index"
> DETAIL:  Key (relname, 
> relnamespace)=(this_is_very_large_exactly_maxnamelen_index_name_wink_win_ccold,
>  2200) already exists.

CommandCounterIncrement() in (or after) index_concurrently_swap will fix this 
issue.

> ReindexPartitionedIndex(Relation parentIdx)
>   ereport(ERROR,
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>errmsg("REINDEX is not yet implemented for partitioned 
> indexes")));

I think we need add errhint("you can REINDEX each partition separately") or 
something similar. 
Also can we omit this warning for reindex database? All partition must be in 
same database and warning in such case is useless: we have warning, but doing 
reindex for each partition => we reindex partitioned table correctly.

Another behavior issue i found with reindex (verbose) schema/database: INFO 
ereport is printed twice for each table.

> INFO:  relation "measurement_y2006m02" was reindexed
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.07 s.
> INFO:  table "public.measurement_y2006m02" was reindexed

One from ReindexMultipleTables and another (with pg_rusage_show) from 
ReindexRelationConcurrently.

> ReindexRelationConcurrently
> if (!indexRelation->rd_index->indisvalid)

it is better use IndexIsValid macro here? And same question about added 
indexform->indisvalid in src/backend/commands/tablecmds.c

>   
>   An index build with the CONCURRENTLY option failed, 
> leaving
>   an invalid index. Such indexes are useless but it can be
>-  convenient to use REINDEX to rebuild them. Note that
>-  REINDEX will not perform a concurrent build. To 
>build the
>-  index without interfering with production you should drop the index and
>-  reissue the CREATE INDEX CONCURRENTLY command.
>+  convenient to use REINDEX to rebuild them.
>  

This documentation change seems wrong for me: reindex concurrently does not 
rebuild invalid indexes. To fix invalid indexes we still need reindex with lock 
table or recreate this index concurrently.

> +   A first pass to build the index is done for each new index entry.
> +   Once the index is built, its flag pg_class.isready 
> is
> +   switched to true
> +   At this point pg_class.indisvalid is switched to
> +   true for the new index and to false for 
> the old, and
> +   Old indexes have pg_class.isready switched to 
> false

Should be pg_index.indisvalid and pg_index.indisready, right?

regards, Sergei



Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> ... especially in code that's highly unlikely to break once written.

> I don't entirely buy off on the argument that it's code that's 'highly
> unlikely to break once written' though- we do add new relkinds from time
> to time, for example.  Perhaps we could have these functions run just
> once per relkind.

Well, the relevant code is likely to be "if relkind is not x, y, or z,
then PG_RETURN_NULL".  If we add a new relkind and forget to consider the
function, the outcome is a NULL result that perhaps should not have been
NULL ... but a test like this won't help us notice that.

regards, tom lane



[PATCH] Minor cleanups of BRIN code

2018-12-09 Thread Marti Raudsepp
Hi list,

I was messing around a bit with BRIN code and found some cleanups to be made:
1. Remove dead code in brin_form_tuple
2. Add missing Datum conversion macros
3. Use AttrNumber type in places using 1-based attribute numbers

Though it's not hard to find cases all over source code similar to (2)
and (3), are these kinds of cleanups considered useful?

Regards,
Marti Raudsepp
From 002067288d7d8db0c2107e6266f95c53a1614b1d Mon Sep 17 00:00:00 2001
From: Marti Raudsepp 
Date: Wed, 25 Oct 2017 22:30:14 +0300
Subject: [PATCH] brin: Minor cleanups of BRIN code

* Remove dead code in brin_form_tuple
* Add missing Datum conversion macros
* Use AttrNumber type in places using 1-based attribute numbers
---
 src/backend/access/brin/README   |  2 +-
 src/backend/access/brin/brin.c   |  4 ++--
 src/backend/access/brin/brin_inclusion.c | 11 ++-
 src/backend/access/brin/brin_minmax.c| 16 +---
 src/backend/access/brin/brin_tuple.c |  1 -
 5 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/brin/README b/src/backend/access/brin/README
index 636d96545b..3e80bad62f 100644
--- a/src/backend/access/brin/README
+++ b/src/backend/access/brin/README
@@ -21,7 +21,7 @@ possible to support the amgettuple interface.  Instead, we only provide
 amgetbitmap support.  The amgetbitmap routine returns a lossy TIDBitmap
 comprising all pages in those page ranges that match the query
 qualifications.  The recheck step in the BitmapHeapScan node prunes tuples
-that are not visible according to the query qualifications.
+that are not visible according to the query predicates.
 
 An operator class must have the following entries:
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index e95fbbcea7..0e8f4e0d08 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -252,7 +252,7 @@ brininsert(Relation idxRel, Datum *values, bool *nulls,
 	   PointerGetDatum(bdesc),
 	   PointerGetDatum(bval),
 	   values[keyno],
-	   nulls[keyno]);
+	   BoolGetDatum(nulls[keyno]));
 			/* if that returned true, we need to insert the updated tuple */
 			need_insert |= DatumGetBool(result);
 		}
@@ -647,7 +647,7 @@ brinbuildCallback(Relation index,
 		  attr->attcollation,
 		  PointerGetDatum(state->bs_bdesc),
 		  PointerGetDatum(col),
-		  values[i], isnull[i]);
+		  values[i], BoolGetDatum(isnull[i]));
 	}
 }
 
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 6ce355c6a9..4918138014 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -81,10 +81,11 @@ typedef struct InclusionOpaque
 	FmgrInfo	strategy_procinfos[RTMaxStrategyNumber];
 } InclusionOpaque;
 
-static FmgrInfo *inclusion_get_procinfo(BrinDesc *bdesc, uint16 attno,
+static FmgrInfo *inclusion_get_procinfo(BrinDesc *bdesc, AttrNumber attno,
 	   uint16 procnum);
-static FmgrInfo *inclusion_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
-Oid subtype, uint16 strategynum);
+static FmgrInfo * inclusion_get_strategy_procinfo(BrinDesc *bdesc,
+AttrNumber attno, Oid subtype,
+uint16 strategynum);
 
 
 /*
@@ -588,7 +589,7 @@ brin_inclusion_union(PG_FUNCTION_ARGS)
  * or null if it is not exists.
  */
 static FmgrInfo *
-inclusion_get_procinfo(BrinDesc *bdesc, uint16 attno, uint16 procnum)
+inclusion_get_procinfo(BrinDesc *bdesc, AttrNumber attno, uint16 procnum)
 {
 	InclusionOpaque *opaque;
 	uint16		basenum = procnum - PROCNUM_BASE;
@@ -646,7 +647,7 @@ inclusion_get_procinfo(BrinDesc *bdesc, uint16 attno, uint16 procnum)
  * made here, see that function too.
  */
 static FmgrInfo *
-inclusion_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno, Oid subtype,
+inclusion_get_strategy_procinfo(BrinDesc *bdesc, AttrNumber attno, Oid subtype,
 uint16 strategynum)
 {
 	InclusionOpaque *opaque;
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 0f6aa33a45..5f28cf362c 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -29,7 +29,7 @@ typedef struct MinmaxOpaque
 	FmgrInfo	strategy_procinfos[BTMaxStrategyNumber];
 } MinmaxOpaque;
 
-static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, uint16 attno,
+static FmgrInfo *minmax_get_strategy_procinfo(BrinDesc *bdesc, AttrNumber attno,
 			 Oid subtype, uint16 strategynum);
 
 
@@ -68,7 +68,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
 	BrinDesc   *bdesc = (BrinDesc *) PG_GETARG_POINTER(0);
 	BrinValues *column = (BrinValues *) PG_GETARG_POINTER(1);
 	Datum		newval = PG_GETARG_DATUM(2);
-	bool		isnull = PG_GETARG_DATUM(3);
+	bool		isnull = PG_GETARG_BOOL(3);
 	Oid			colloid = PG_GET_COLLATION();
 	FmgrInfo   *cmpFn;
 	Datum		compar;
@@ -281,8 +281,9 @@ brin_minmax_union(PG_FUNCTION_ARGS)
 	/* Adjust minimum, if B's min is less than A

Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2018-12-09 Thread Alexander Korotkov
On Sat, Dec 8, 2018 at 12:48 PM Andrey Borodin  wrote:
> > 8 дек. 2018 г., в 6:54, Alexander Korotkov  
> > написал(а):
> >
> > Yep, please find attached draft patch.
>
> Patch seems good to me, I'll check it in more detail.
> The patch gets posting item at FirstOffsetNumber instead of 
> btree->getLeftMostChild(). This seem OK, since dataGetLeftMostPage() is doing 
> just the same, but with few Assert()s.

I'd like to evade creating GinBtree for just calling
getLeftMostChild().  Also, few more places in ginvacuum.c do the same.
We have the same amount of Assert()s in ginVacuumPostingTreeLeaves().
So, let's keep it uniform.

I would also like Peter Geoghegan to take a look at this patch before
committing it.

> > BTW, it seems that I've another bug in GIN.  README says that
> >
> > "However, posting trees are only
> > fully searched from left to right, starting from the leftmost leaf. (The
> > tree-structure is only needed by insertions, to quickly find the correct
> > insert location). So as long as we don't delete the leftmost page on each
> > level, a search can never follow a downlink to page that's about to be
> > deleted."
> >
> > But that's not really true once we teach GIN to skip parts of posting
> > trees in PostgreSQL 9.4.  So, there might be a risk to visit page to
> > be deleted using downlink.  But in order to get real problem, vacuum
> > should past cleanup stage and someone else should reuse this page,
> > before we traverse downlink.  Thus, the risk of real problem is very
> > low.  But it still should be addressed.
>
> There's a patch above in this thread 
> 0001-Use-correct-locking-protocol-during-GIN-posting-tree.patch where I 
> propose stamping every deleted page with GinPageSetDeleteXid(page, 
> ReadNewTransactionId()); and avoid reusing the page before 
> TransactionIdPrecedes(GinPageGetDeleteXid(page), RecentGlobalDataXmin).
> Should we leave alone this bug for future fixes to keep current fix 
> noninvasive?

I think since is separate bug introduced in PostgreSQL 9.4, which
should be backpatched with separate commit.  Could you please extract
patch dealing with GinPageSetDeleteXid() and GinPageGetDeleteXid().
The rest of work made in your patch should be considered for master.

BTW, what do you think about locking order in ginRedoDeletePage()?  Do
you have any explanation of current behavior?  I'll try to reach
Teodor tomorrow with this question.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Why does TupleDescInitBuiltinEntry lack a "default: error" case?

2018-12-09 Thread Tom Lane
If the passed-in type OID isn't one of the very short list that
TupleDescInitBuiltinEntry supports, it will silently hand back
a broken TupleDesc, rather than throwing an error.  How can
this possibly be considered good code?

(So far as I can see, none of the extant callers could hit
such an error, but it's still bad code.)

regards, tom lane



Re: Why does TupleDescInitBuiltinEntry lack a "default: error" case?

2018-12-09 Thread Andres Freund
Hi,

On 2018-12-09 14:49:35 -0500, Tom Lane wrote:
> If the passed-in type OID isn't one of the very short list that
> TupleDescInitBuiltinEntry supports, it will silently hand back
> a broken TupleDesc, rather than throwing an error.  How can
> this possibly be considered good code?

+1

Greetings,

Andres Freund



Re: automatically assigning catalog toast oids

2018-12-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> John Naylor  writes:
> > Commit 96cdeae07 added toast tables to most catalogs. One disadvantage
> > is that the toast declarations require hard-coded oids, even though
> > only shared catalogs actually need stable oids. Now that we can assign
> > oids on the fly, it makes sense to do so for toast tables as well, as
> > in the attached.
> 
> I'm a bit dubious that this is a good idea.  It's handy, at least for
> forensic situations, that the system catalogs have stable OIDs.

I tend to agree...  What's the advantage of assigning them on the fly?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: automatically assigning catalog toast oids

2018-12-09 Thread Andres Freund
Hi,

On 2018-12-09 15:42:57 -0500, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > John Naylor  writes:
> > > Commit 96cdeae07 added toast tables to most catalogs. One disadvantage
> > > is that the toast declarations require hard-coded oids, even though
> > > only shared catalogs actually need stable oids. Now that we can assign
> > > oids on the fly, it makes sense to do so for toast tables as well, as
> > > in the attached.
> > 
> > I'm a bit dubious that this is a good idea.  It's handy, at least for
> > forensic situations, that the system catalogs have stable OIDs.

Hm, but won't they have that for major versions anyway? We ought not to
change the .bki generation in a way that results in differing oids after
a release, no?


> I tend to agree...  What's the advantage of assigning them on the fly?

No idea if that's John's reasoning, but I do like not having to do yet
another manual step that you need to remember/figure out when adding a
new catalog relation.

Greetings,

Andres Freund



Re: automatically assigning catalog toast oids

2018-12-09 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-09 15:42:57 -0500, Stephen Frost wrote:
>> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> I'm a bit dubious that this is a good idea.  It's handy, at least for
> forensic situations, that the system catalogs have stable OIDs.

> Hm, but won't they have that for major versions anyway? We ought not to
> change the .bki generation in a way that results in differing oids after
> a release, no?

Well, that's just a different very-easily-broken assumption.  There are
a lot of things that make auto-assigned OIDs unstable, and I do not think
that we want to guarantee that they'll hold still across a release series.

BTW, now that I'm looking at it, I very much dislike the way commit
578b2297 handled auto-assignment of OIDs in catalogs.  Not only do I not
want to assign more OIDs that way, I don't think we can safely ship v12
like this.  There's basically nothing at all that guarantees
genbki-assigned OIDs won't overlap initdb-assigned OIDs.  In fact,
I think it's about guaranteed to blow up in the face of anybody who
inserts manually-assigned OIDs above around 9000.

What we probably need to do is restructure the FirstBootstrapObjectId
business so that there are separate, well-defined ranges for genbki.pl
and initdb to use.

regards, tom lane



Re: automatically assigning catalog toast oids

2018-12-09 Thread Andres Freund
Hi,

On 2018-12-09 17:14:42 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-12-09 15:42:57 -0500, Stephen Frost wrote:
> >> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > I'm a bit dubious that this is a good idea.  It's handy, at least for
> > forensic situations, that the system catalogs have stable OIDs.
>
> > Hm, but won't they have that for major versions anyway? We ought not to
> > change the .bki generation in a way that results in differing oids after
> > a release, no?
>
> Well, that's just a different very-easily-broken assumption.  There are
> a lot of things that make auto-assigned OIDs unstable, and I do not think
> that we want to guarantee that they'll hold still across a release series.

Why wouldn't they be for genbki (rather than initdb) assigned oids?  I
don't think it's reasonable to add new functions or such post release
that would have move oid assignments for other objects?


> BTW, now that I'm looking at it, I very much dislike the way commit
> 578b2297 handled auto-assignment of OIDs in catalogs.  Not only do I not
> want to assign more OIDs that way

Why do you not want that?


> , I don't think we can safely ship v12 like this.  There's basically
> nothing at all that guarantees genbki-assigned OIDs won't overlap
> initdb-assigned OIDs.  In fact, I think it's about guaranteed to blow
> up in the face of anybody who inserts manually-assigned OIDs above
> around 9000.

I'm not sure I really see a pressing problem with that. I think it'd not
be insane to treat this as a "well, don't do that then".


> What we probably need to do is restructure the FirstBootstrapObjectId
> business so that there are separate, well-defined ranges for genbki.pl
> and initdb to use.

I'm fine with adding a distinct range, the earlier version of the patch
had that. I'd asked for comments if anybody felt a need to keep that,
nobody replied...  I alternatively proposed that we could just start at
FirstNormalObjectId for those and update the server's oid start value to
the maximum genbki assigned oid.  Do you have preferences around that?

Greetings,

Andres Freund



Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Michael Paquier
On Sun, Dec 09, 2018 at 02:07:29PM -0500, Tom Lane wrote:
> Stephen Frost  writes:
>> I don't entirely buy off on the argument that it's code that's 'highly
>> unlikely to break once written' though- we do add new relkinds from time
>> to time, for example.  Perhaps we could have these functions run just
>> once per relkind.
> 
> Well, the relevant code is likely to be "if relkind is not x, y, or z,
> then PG_RETURN_NULL".  If we add a new relkind and forget to consider the
> function, the outcome is a NULL result that perhaps should not have been
> NULL ... but a test like this won't help us notice that.

Yes, in order to prevent problems with newly-introduced relkinds, I
think that the checks within functions should be careful to check only
for relkinds that they support, and not list those they do not support.
--
Michael


signature.asc
Description: PGP signature


Re: pg_partition_tree crashes for a non-defined relation

2018-12-09 Thread Michael Paquier
On Sun, Dec 09, 2018 at 12:49:18PM -0500, Stephen Frost wrote:
> Looks alright on a quick glance, but shouldn't you also update the
> comment..?

The comment on HEAD or with the patch is that:
/* Only allow relation types that can appear in partition trees. */

This still looks adapted to me.  Or would you reword it because "allow"
rather implies that an ERROR is returned?  Would you prefer changing it
something like that perhaps:
"Return NULL for relation types that do not appear in partition trees."
--
Michael


signature.asc
Description: PGP signature


Re: automatically assigning catalog toast oids

2018-12-09 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-09 17:14:42 -0500, Tom Lane wrote:
>> Well, that's just a different very-easily-broken assumption.  There are
>> a lot of things that make auto-assigned OIDs unstable, and I do not think
>> that we want to guarantee that they'll hold still across a release series.

> Why wouldn't they be for genbki (rather than initdb) assigned oids?  I
> don't think it's reasonable to add new functions or such post release
> that would have move oid assignments for other objects?

As you've got this set up, we couldn't change *anything* for fear of
it moving auto-assignments; there's no isolation between catalogs.

Another thing I seriously dislike is that this allows people to omit OIDs
from .dat entries in catalogs where we traditionally hand-assign OIDs.
That's not a good idea because it would mean those entries don't have
stable OIDs, whereas the whole point of hand assignment is to ensure
all built-in objects of a particular type have stable OIDs.  Now, you
could argue about the usefulness of that policy for any given catalog;
but if we decide that catalog X doesn't need stable OIDs then that should
be an intentional policy change, not something that can happen because
one lazy hacker didn't follow the policy.

(This is, btw, another reason why I don't much like allowing toast
tables to not have stable OIDs; it confuses what the policy is for
pg_class.)

>> In fact, I think it's about guaranteed to blow
>> up in the face of anybody who inserts manually-assigned OIDs above
>> around 9000.

> I'm not sure I really see a pressing problem with that. I think it'd not
> be insane to treat this as a "well, don't do that then".

I can name more than one company that will be pretty damn unhappy when
we break their local patches that make use of 9K-range OIDs.  For better
or worse, the policy for the last 20 years has been that OIDs up to 
are available for hand assignment.  What you've just done is to break
that for OIDs above some boundary that's not even very well defined
(though it seems to be somewhere around 8500 as of HEAD).

>> What we probably need to do is restructure the FirstBootstrapObjectId
>> business so that there are separate, well-defined ranges for genbki.pl
>> and initdb to use.

> I'm fine with adding a distinct range, the earlier version of the patch
> had that. I'd asked for comments if anybody felt a need to keep that,
> nobody replied...  I alternatively proposed that we could just start at
> FirstNormalObjectId for those and update the server's oid start value to
> the maximum genbki assigned oid.  Do you have preferences around that?

Yeah, I thought about the latter as well.  But it adds complexity to the
bootstrap process and makes it harder to tell what assigned a particular
OID, so I'd rather go with the former, at least until the OID situation
gets too tight to allow for daylight between the ranges.

It looks to me like as of HEAD, genbki.pl is auto-assigning about 1470
OIDs.  Meanwhile, on my RHEL6 machine, initdb is auto-assigning about
1740 OIDs (what a coincidence); of those, 872 are collation entries
that are absorbed from the system environment.  So the second number is
likely to vary a lot from platform to platform.  (I don't have ICU
enabled; I wonder how many that typically adds.)

I'd be inclined to allow say 2000 OIDs for genbki.pl, with 4384 therefore
available for initdb.  We could expect to have to raise the boundary
from time to time, but not very often.

regards, tom lane



RE: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-12-09 Thread Tsunakawa, Takayuki
From: Magnus Hagander [mailto:mag...@hagander.net]
> In this particular case, it seems it *was* helpful, no? That's how you found
> out the customer used a broken antivirus product, which may certainly also
> cause *other* issues.
> 
> Some sort of rate limiting to reduce the volume might help, but the message
> itself seems to have clearly been useful.

+1
We can change pgwin32_ReserveSharedMemoryRegion() to take an argument "int 
loglevel".  Then the caller first calls it with LOG, and DEBUGx afterwards.  It 
may also be helpful for the caller to output "LOG:  tried %d times to reserve 
shared memory region" when the caller ran the function twice or more before 
success.  That explains the possibly long time or CPU spikes of connection 
establishment.


Regards
Takayuki Tsunakawa




RE: Statement-level rollback

2018-12-09 Thread Tsunakawa, Takayuki
From: Andres Freund [mailto:and...@anarazel.de]
> Isn't the realistic scenario for migrations that people will turn this
> feature on on a more global basis? If they need to do per transaction choices,
> that makes this much less useful for easy migrations.

Agreed.  My approach of per transaction choice may be overkill.  Actually, I 
didn't think per-transaction change was really necessary in practice.  But I 
didn't think of any reason to prohibit per transaction change, so I just wanted 
to allow flexibility.

I think per database/user change (ALTER DATABASE/USER) can be useful, in cases 
where applications are migrated from other DBMSs to a PostgreSQL instance.  
That is, database consolidation for easier data analytics and database 
management.  One database/schema holds data for a PostgreSQL application, and 
another one stores data for a migrated application.

Or, should we recommend a separate PostgreSQL instance on a different VM or 
container, and just introduce the parameter only in postgresql.conf?


Regards
Takayuki Tsunakawa



RE: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-12-09 Thread Takahashi, Ryohei
Hi Noah, Magnus and Tsunakawa-san,


Thank you for replying.


> Can you adopt pgbouncer, to reduce
> the frequency of starting new backend processes?  That should improve your
> performance, too.

Actually, before I found that F-secure causes this message,
I recommend my customer to use connection pooling to reduce the number of 
connection times.


> Could you collect the information
> http://postgr.es/m/20181203053506.gb2860...@rfd.leadboat.com requests?
> That may help us understand your system's unusual behavior.  (The issue in 
> that
> thread is related but not identical.)

Sorry. Since my customer uses PostgreSQL in production environment,
I cannot deploy debug modules.


> In this particular case, it seems it *was* helpful, no? That's how you found
> out the customer used a broken antivirus product, which may certainly also
> cause *other* issues.
> 
> Some sort of rate limiting to reduce the volume might help, but the message
> itself seems to have clearly been useful.

Yes. You are right.
The message itself was useful.
Your idea to reduce the volume seems good.


> We can change pgwin32_ReserveSharedMemoryRegion() to take an argument "int
> loglevel".  Then the caller first calls it with LOG, and DEBUGx afterwards.
> It may also be helpful for the caller to output "LOG:  tried %d times to
> reserve shared memory region" when the caller ran the function twice or
> more before success.  That explains the possibly long time or CPU spikes
> of connection establishment.

It seems good idea for me.


Regards,
Ryohei Takahashi



Re: Should new partitions inherit their tablespace from their parent?

2018-12-09 Thread Michael Paquier
On Mon, Dec 10, 2018 at 02:05:29AM +1300, David Rowley wrote:
> On Fri, 7 Dec 2018 at 20:15, Michael Paquier  wrote:
>> -ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace)
>> +ATExecSetTableSpaceNoStorage(Relation rel, Oid newTableSpace)
>> NoStorage looks strange as routine name for this case.  Would something
>> like ATExecPartedRelSetTableSpace be more adapted perhaps?
> 
> I'd say it's better to name the function in the more general purpose
> way.  Perhaps we'll get some relkind in the future that's not a
> partitioned table or index that might need the reltablespace set. When
> that happens, if we name the function the way you're proposing, then
> we might end up with some other function which does the same thing, or
> the patch adding the new relkind would have to rename the function to
> something more like how I have it.  To save from either of those
> having to happen, would it not be better to give it the generic name
> now?
> 
> Instead of renaming the function, I've altered the Assert() to allow
> all relkinds which don't have storage and also updated the comment to
> remove the mention of partitioned tables and indexes.

-   Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
+   Assert(rel->rd_rel->relkind == RELKIND_VIEW ||
+  rel->rd_rel->relkind == RELKIND_COMPOSITE_TYPE ||
+  rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+  rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ||
+  rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);

Alvaro has begun a thread about that recently, so it seems to me that
this assertion could become invalid (per se my comments down-thread):
https://www.postgresql.org/message-id/20181206215552.fm2ypuxq6nhpwjuc@alvherre.pgsql

Of course, it depends on the conclusion of this other thread.  It will
perhaps make sense at some point to review on which relkinds this
function can work on, but I would recommend to review that behavior when
it actually makes sense and keep the current interface simple.  So your
first version looked fine to me on those grounds.  Another important
thing which would help with a restricted assertion is that if this code
path begins to be taken for other relkinds then the developer who
implements a new feature depending on it will need to look at this code
and consider the use-case behind it, instead of assuming that it will
hopefully just work.

>> +   else if (stmt->partbound)
>> +   {
>> +   RangeVar   *parent;
>> +   Relationparentrel;
>> +
>> +   /*
>> +* For partitions, when no other tablespace is specified, we default
>> +* the tablespace to the parent partitioned table's.
>> +*/
>> Okay, so the direct parent is what counts, and not the top-most parent.
>> Could you add a test with multiple level of partitioned tables, like:
>> - parent is in tablespace 1.
>> - child is created in tablespace 2.
>> - grandchild is created, which should inherit tablespace 2 from the
>> child, but not tablespace 1 from the parent.  In the existing example,
>> as one partition is used to test the top-most parent and another for the
>> new default, it looks cleaner to create a third partition which would be
>> itself a partitioned table.
> 
> Sounds good. I've modified the existing test just to do it that way. I
> don't think there's a need to have multiple tests for that.
> 
> I've attached an updated patch.

Thanks for the new version.  The new test case looks fine to me.
--
Michael


signature.asc
Description: PGP signature


Re: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-12-09 Thread Michael Paquier
On Mon, Dec 10, 2018 at 02:15:07AM +, Takahashi, Ryohei wrote:
>> We can change pgwin32_ReserveSharedMemoryRegion() to take an argument "int
>> loglevel".  Then the caller first calls it with LOG, and DEBUGx afterwards.
>> It may also be helpful for the caller to output "LOG:  tried %d times to
>> reserve shared memory region" when the caller ran the function twice or
>> more before success.  That explains the possibly long time or CPU spikes
>> of connection establishment.
> 
> It seems good idea for me.

Like a couple of others on this thread I doubt that lowering the elevel
here would be a good thing, as keeping it noisy has been what allows to
know that something has gone wrong, no?  The current behavior is
useful.
--
Michael


signature.asc
Description: PGP signature


Re: min_parallel_table_size and inheritence

2018-12-09 Thread Amit Kapila
On Sun, Dec 9, 2018 at 6:24 AM Justin Pryzby  wrote:
>
> The docs say:
> https://www.postgresql.org/docs/current/runtime-config-query.html
> |min_parallel_table_scan_size Sets the minimum amount of table data that must 
> be scanned in order for a parallel scan to be considered. [...]
>
> I'd like to set parallel_min_table_size=32MB, but it looks like that won't do
> what I intend for at least one of our tables using inheritence.
>
> It seems to me that an individual table should not be scanned in parallel if
> its size is below the threshold, even if it's a child and has siblings which
> are larger and scanned in parallel.
>
> I found that the current behavior seems to be more or less deliberate,

Yes, you are right.

> but
> maybe should be revisited following implementation of "parallel append" node,
> as previously discussed.
>

Why?  How does parallel append help for individual table/partition
scan?  If you want the parallel scan to be forced for any individual
table, you might want to try setting 'parallel_workers' parameter for
that table.  For example 'Alter Table tbl Set (parallel_workers=2);'

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-12-09 Thread Michael Paquier
On Thu, Dec 06, 2018 at 11:18:12PM +, Bossart, Nathan wrote:
> That seems reasonable to me.

Thanks, committed after renaming a couple of variables, after adding a
comment about the use of unlink() and also adding a comment explaining
what the different retry counters are here for.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Log PostgreSQL version number on startup

2018-12-09 Thread Michael Paquier
On Wed, Nov 21, 2018 at 11:32:46AM -0500, Stephen Frost wrote:
> * Christoph Berg (christoph.b...@credativ.de) wrote:
>> it has bugged me for a long time that there's no clear "PostgreSQL is
>> starting" message in the server log file. I'd like to change that for
>> two reasons:
> 
> +1

+1.  One complain which could be formulated is that this makes the
logs at startup more noisy.  Now your patch has an issue if you want to
ensure that this information gets added to the log files, because at
this stage of startup the GUCs are not loaded, hence this is sent to
stderr, and perhaps not on the log files.
--
Michael


signature.asc
Description: PGP signature