Re: [HACKERS] path toward faster partition pruning

2018-03-05 Thread Amit Langote
On 2018/03/03 0:47, Robert Haas wrote:
> On Fri, Mar 2, 2018 at 6:21 AM, Amit Langote wrote:
>> Given the patch's implementation, we'll have to make the structure of that
>> Node tree a bit more complex than a simple List.  For one thing, the patch
>> handles OR clauses by performing pruning separately for each arm and them
>> combining partitions selected across OR arms using set union.  By
>> "performing pruning" in the last sentence I meant following steps similar
>> to ones you wrote in your message:
>>
>> 1. Segregating pruning clauses into per-partition-key Lists, that is,
>>generate_partition_clauses() producing a PartitionClauseInfo,
>>
>> 2. Removing redundant clauses from each list, that is,
>>remove_redundant_clauses() to produce lists with just one member per
>>operator strategy for each partition key,
>>
>> 3. Extracting Datum values from the clauses to form equal/min/max tuples
>>and setting null or not null bits for individual keys, that is,
>>extract_bounding_datums() producing a PartScanKeyInfo, and
>>
>> 4. Finally pruning with those Datum tuples and null/not null info, that
>>is, get_partitions_for_keys().
>>
>> Steps 2-4 are dependent on clauses providing Datums, which all the clauses
>> may or may not do.  Depending on whether or not, we'll have to defer those
>> steps to run time.
> 
> I don't see that there's a real need to perform step 2 at all.  I
> mean, if you have x > $1 and x > $2 in the query, you can just compute
> the set of partitions for the first clause, compute the set of
> partitions for the second clause, and then intersect.  That doesn't
> seem obviously worse than deciding which of $1 and $2 is greater and
> then pruning only based on whichever one is greater in this case.

If we can accommodate this step with your "pruning program" idea below, I
think we should still try to keep the remove_redundant_clauses() step.

If we can keep this step, x > $1 and x > $1 will require 1 comparison + 1
bsearch over bounds, whereas without it, it'd be 2 bsearches over bounds +
1 Bitmapset operation.  I think David expressed a similar concern about
the performance in his reply down-thread.

That too as long as we implement this step such that having it in the
pipeline doesn't restrict the representation that we need to have the
"pruning steps" in.  Let's see.

>> * What do we encode into the Node tree attached to the plan?  Clauses that
>>   haven't gone through steps 2 and 3 (something like PartitionClauseInfo)
>>   or the product of step 3 (something like PartScanKeyInfo)?
>>
>> * How do we account for OR clauses?  Perhaps by having the aforementioned
>>   Node trees nested inside the top-level one, wherein there will be one
>>   nested node per arm of an OR clause.
> 
> Suppose we define the notion of a pruning program.  A pruning program
> can use any number of registers, which have integer numbers starting
> with 0 and counting upward as high as necessary.  Each register holds
> a Bitmapset.  The result of a pruning program is the value of register
> 0 when the program completes.  A pruning program consists of a list of
> steps, each of which is either a PruningBaseStep or a
> PruningCombineStep.  A PruningCombineStep modifies the contents of the
> target register based on the contents of a source register in one of
> the following three ways: (1) UNION -- all bits set in source become
> set in target; (2) INTERSECT -- all bits clear in source become clear
> in target; (3) DIFFERENCE -- all bits set in source become clear in
> target.  A PruningBaseStep consists of a strategy (equality,
> less-than, etc.), an output register, and list of expressions --
> either as many as there are partition keys, or for range partitioning
> perhaps fewer; it prunes based on the strategy and the expressions and
> overwrites the output register with the partitions that would be
> selected.
> 
> Example #1.  Table is hash-partitioned on a and b.  Given a query like
> SELECT * FROM tab WHERE a = 100 AND b = 233, we create a single-step
> program:
> 
> 1. base-step (strategy =, register 0, expressions 100, 233)
> 
> If there were an equality constraint on one of the two columns, we
> would not create a pruning program at all, because no pruning is
> possible.
> 
> Example #2. Table is list-partitioned on a.  Given a query like SELECT
> * FROM tab WHERE (a = $1 OR a = $2) AND a != $3, we create this
> program:
> 
> 1. base-step (strategy =, register 0, expressions $1)
> 2. base-step (strategy =, register 1, expressions $2)
> 3. base-step (strategy !=, register 2, expressions $3)
> 4. combine-step (target-register 0, source-register 1, strategy union)
> 5. combine-step (target-register 0, source-register 2, strategy difference)
> 
> (This is unoptimized -- one could do better by reversing steps 3 and 4
> and using reusing register 1 instead of needing register 2, but that
> kind of optimization is probably not too important.)
> 
> Example #3. Table is range-partitioned on a a

Re: [PATCH] Verify Checksums during Basebackups

2018-03-05 Thread Michael Banck
Hi,

On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:
> So sure, if we go with WARNING + exit with an errorcode, that is perhaps
> the best combination of the two.

I had a look at how to go about this, but it appears to be a bit
complicated; the first problem is that sendFile() and sendDir() don't
have status return codes that could be set on checksum verifcation
failure. So I added a global variable and threw an ereport(ERROR) at the
end of perform_base_backup(), but then I realized that `pg_basebackup'
the client program purges the datadir it created if it gets an error:

|pg_basebackup: final receive failed: ERROR:  Checksum mismatch during
|basebackup
|
|pg_basebackup: removing data directory "data2"

So I guess this would have to be sent back via the replication protocol,
but I don't see an off-hand way to do this easily?

Another option would be to see whether it is possible to verify the
checksum on the client side, but then only pg_basebackup (and no other
possible external tools using BASE_BACKUP) would profit.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Vik Fearing
On 03/04/2018 08:06 PM, Tom Lane wrote:
> Edmund Horner  writes:
>> On 26 January 2018 at 13:44, Vik Fearing  wrote:
>>> On 01/26/2018 01:28 AM, Edmund Horner wrote:
 The patch mentioned attempts to put savepoints around the tab
 completion query where appropriate.
> 
>>> I am -1 on this idea.
> 
>> May I ask why?  It doesn't stop psql working against older versions,
>> as it checks that the server supports savepoints.
> 
> I looked into this patch and was disappointed to discover that it had
> only a very ad-hoc solution to the problem of version-dependent tab
> completion queries.  We need something better --- in particular, the
> recent prokind changes mean that there needs to be a way to make
> SchemaQuery queries version-dependent.
> 
> So ... here is a modest proposal.  It invents a VersionedQuery concept
> and also extends the SchemaQuery infrastructure to allow those to be
> versioned.  I have not taken this nearly as far as it could be taken,
> since it's mostly just proposing mechanism.  To illustrate the
> VersionedQuery infrastructure, I fixed it so it wouldn't send
> publication/subscription queries to pre-v10 servers, and to illustrate
> the versioned SchemaQuery infrastructure, I fixed the prokind problems.
> 
> If people like this approach, I propose to commit this more or less
> as-is.  The select-tab-completion patch would then need to be rewritten
> to use this infrastructure, but I think that should be straightforward.
> As a separate line of work, the infrastructure could be applied to fix
> the pre-existing places where tab completion fails against old servers.
> But that's probably work for v12 or beyond, unless somebody's really
> motivated to do it right now.

Tab completion on functions in SELECT (a subset of this thread's patch)
is quite important to me personally.  I will work on this in the coming
days.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Jeevan Chalke
On Fri, Mar 2, 2018 at 3:22 AM, Robert Haas  wrote:

> On Thu, Mar 1, 2018 at 5:34 AM, Jeevan Chalke
>  wrote:
> > Attached new patchset after rebasing my changes over these changes and on
> > latest HEAD.
>
> +* We have already created a Gather or Gather Merge path atop
> cheapest
> +* partial path. Thus the partial path referenced by the
> Gather node needs
> +* to be preserved as adding new partial paths in same rel may
> delete this
> +* referenced path. To do this we need to clear the
> partial_pathlist from
> +* the partially_grouped_rel as we may add partial paths again
> while doing
> +* partitionwise aggregation. Keeping older partial path intact
> seems
> +* reasonable too as it might possible that the final path
> chosen which is
> +* using it wins, but the underneath partial path is not the
> cheapest one.
>
> This isn't a good design.  You shouldn't create a Gather or Gather
> Merge node until all partial paths have been added.  I mean, the point
> is to put a Gather node on top of the cheapest path, not the path that
> is currently the cheapest but might not actually be the cheapest once
> we've added them all.
>

To be honest, I didn't know that we should not generated Gather or Gather
Merge until we have all possible partial paths in place. I realize it
recently while debugging one issue reported by Rajkumar off-list. While
working on that fix, what I have observed is
- I have cheapest partial path with cost say 10, a Gather on it increased
cost to 11.
- Later when I add a partial path it has a cost say 9 but a gather on it
resulted is total cost to 12.
This means, the first Gather path is the cheapest one but not the
underneath partial path and unfortunately that got removed when my partial
path is added into the partial_pathlist.

Due to this, I thought it is better to have both paths valid and to avoid
deleting earlier cheapest partial_path, I chose to reset the
partially_grouped_rel->partial_pathlist.

But, yes per comment in generate_gather_paths() and as you said, we should
add Gather or Gather Merge only after we have done with all partial path
creation. Sorry for not knowing this before.


>
> +add_gather_or_gather_merge(PlannerInfo *root,
>
> Please stop picking generic function names for functions that have
> very specific purposes.  I don't really think that you need this to be
> a separate function at all, but it it is certainly NOT a
> general-purpose function for adding a Gather or Gather Merge node.
>

OK. Got it now.


>
> +/*
> + * Collect statistics about aggregates for estimating costs of
> + * performing aggregation in parallel or in partial, if not
> already
> + * done. We use same cost for all the children as they will be
> same
> + * anyways.
> + */
>
> If it only needs to be done once, do we really have to have it inside
> the loop?  I see that you're using the varno-translated
> partial_target->exprs and target->exprs, but if the specific varnos
> don't matter, why not just use the untranslated version of the targets
> before entering the loop?  And if the specific varnos do matter, then
> presumably you need to do it every time.
>

Yes. It can be pulled outside a loop.


>
> This is not a full review, but I'm out of time for right now.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] Partition-wise aggregation/grouping

2018-03-05 Thread Jeevan Chalke
On Sat, Mar 3, 2018 at 12:12 AM, Robert Haas  wrote:

> On Thu, Mar 1, 2018 at 4:52 PM, Robert Haas  wrote:
> > This is not a full review, but I'm out of time for right now.
>
> Another thing I see here now is that create_grouping_paths() and
> create_child_grouping_paths() are extremely similar.  Isn't there some
> way we can refactor things so that we can reuse the code instead of
> duplicating it?
>

Yes. I too observed the same after our re-design.

To avoid code duplication, I am now calling create_grouping_paths() for
child relation too.

However, to perform Gather or Gather Merge once we have all partial paths
ready, and to avoid too many existing code rearrangement, I am calling
try_partitionwise_grouping() before we do any aggregation/grouping on whole
relation. By doing this, we will be having all partial paths in
partially_grouped_rel and then existing code will do required finalization
along with any Gather or Gather Merge, if required.

Please have a look over attached patch-set and let me know if it needs
further changes.

Thanks


> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


partition-wise-agg-v15.tar.gz
Description: application/gzip


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-03-05 Thread Pavan Deolasee
On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera 
wrote:

>
>
> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
> expand_targetlist() runs *after*, not before, so how could it have
> affected the result?
>

If I understand correctly, planner must have called expand_targetlist()
once for the parent relation's descriptor and added any dropped columns
from the parent relation. So we may not find mapped attributes for those
dropped columns in the parent. I haven't actually tested this case though.

I wonder if it makes sense to actually avoid expanding on-conflict-set
targetlist in case the target is a partition table and deal with it during
execution, like we are doing now.


> I'm obviously confused about what
> expand_targetlist call this comment is talking about.  Anyway I wanted
> to make it use resjunk entries instead, but that broke some other case
> that I didn't have time to research yesterday.  I'll get back to this
> soon, but in the meantime, here's what I have.
>

+   conv_setproj =
+
 adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+ RelationGetDescr(partrel),
+
 RelationGetRelationName(partrel),
+ firstVarno,
+ conv_setproj);

Aren't we are adding back Vars referring to the parent relation, after
having converted the existing ones to use partition relation's varno? May
be that works because missing attributes are already added during planning
and expand_targetlist() here only adds dropped columns, which are just NULL
constants.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: chained transactions

2018-03-05 Thread Simon Riggs
On 2 March 2018 at 07:18, Andres Freund  wrote:
> Hi,
>
> On 2018-02-28 22:35:52 -0500, Peter Eisentraut wrote:
>> This feature is meant to help manage transaction isolation in
>> procedures.
>
> This is a major new feature, submitted the evening before the last CF
> for v11 starts. Therefore I think it should just be moved to the next
> fest, it doesn't seems realistic to attempt to get this into v11.

Looks like a useful patch that adds fairly minor syntax that follows
the SQL Standard.

It introduces no new internal infrastructure, so I can't call this a
major feature.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GSOC 2018 ideas

2018-03-05 Thread Aleksander Alekseev
Hello Charles,

>Went through the documents listed by you, and they are helpful!
> It seems the main purpose of extension pg_protobuf is to parse
> a protobuf struct and return the decoded field. May I ask how these kinds
> of extensions are used in postgreSQL (or in other words, the scenarios to
> use these plugins)?

There are a few ideas behind all of this.

1) Sometimes people are not quite happy with strict relational schema by
various reasons and prefer something more agile, like XML or JSON. These
formats are indeed more convenient under certain circumstances, for
instance in terms of ease of changing and migrating the schema.

2) One drawback of JSON is redundancy. For instance, you have to store
the names of all document fields. These names don't carry much
information but consume disk space and RAM thus affecting the overall
performance. ZSON extension [1] partially solved this issue. However I
wouldn't call it particularly convenient and the whole approach of
compressing JSON seems to me more like a dirty hack, not a solution. The
problem appeared because of using the wrong data format in the first
place.

3) Unlike JSON, formats like Protobuf or Thrift are binary formats and
most importantly don't store any field names. Thus they don't create a
problem described above. However, PostgreSQL is not capable to access
Protobuf fields out-of-the-box, for instance to index these fields. This
is what pg_protobuf is for.

Hopefully this answers you question. If you have other questions please
don't hesitate to ask!

[1]: https://github.com/postgrespro/zson


-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Online enabling of checksums

2018-03-05 Thread Michael Banck
Hi,

Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:
> Agreed.  Looking at our current error messages, “in file” is conventionally
> followed by the filename.  I do however think “calculated” is better than
> “expected” since it conveys clearly that the compared checksum is calculated 
> by
> pg_verify_checksum and not read from somewhere.
> 
> How about something like this?
> 
> _(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found %X”),
>   progname, fn, blockno, csum, header->pd_checksum);

I still find that confusing, but maybe it's just me. I thought the one
in the pageheader is the "expected" checksum, and we compare the "found"
or "computed/calculated" (in the page itself) against it.

I had the same conversation with an external tool author, by the way:

https://github.com/uptimejp/postgres-toolkit/issues/48



Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Re: Online enabling of checksums

2018-03-05 Thread Magnus Hagander
On Mon, Mar 5, 2018 at 10:43 AM, Michael Banck 
wrote:

> Hi,
>
> Am Sonntag, den 04.03.2018, 23:30 +0100 schrieb Daniel Gustafsson:
> > Agreed.  Looking at our current error messages, “in file” is
> conventionally
> > followed by the filename.  I do however think “calculated” is better than
> > “expected” since it conveys clearly that the compared checksum is
> calculated by
> > pg_verify_checksum and not read from somewhere.
> >
> > How about something like this?
> >
> > _(“%s: checksum mismatch in file \”%s\”, block %d: calculated %X, found
> %X”),
> >   progname, fn, blockno, csum, header->pd_checksum);
>
> I still find that confusing, but maybe it's just me. I thought the one
> in the pageheader is the "expected" checksum, and we compare the "found"
> or "computed/calculated" (in the page itself) against it.
>
> I had the same conversation with an external tool author, by the way:
>

Maybe we should just say "on disk" for the one that's on disk, would that
break the confusion? So "calculated %X, found %X on disk"?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Better Upgrades

2018-03-05 Thread Daniel Gustafsson
> On 02 Mar 2018, at 12:59, Greg Stark  wrote:

> My feeling is that worrying about in-place binary upgrades today is
> wasted effort. Already the window for installations where this is
> useful is narrow -- you have to be big enough that the resources for
> deploying a second instance is significant but not so big that the
> downtime and risk is untenable.

I might be colorblind from $dayjob, but I don’t think that these installations
(data warehouses et.al) are that uncommon.  They are also installations that
risk staying on an old version due to upgrades being non-trivial (not saying
that in-place is trivial, just that there are places where it may make sense).

> I have the feeling that in-place
> binary upgrades are going to end up sapping developer time

Having worked on supporting the 8.2->8.3 on-disk format change in pg_upgrade
for GPDB, I am not arguing against that.  Not at all.

cheers ./daniel


Re: [HACKERS] [PATCH] Incremental sort

2018-03-05 Thread Tomas Vondra
Hi,

I have started reviewing the patch and doing some testing, and I have
pretty quickly ran into a segfault. Attached is a simple reproducer and
an backtrace. AFAICS the bug seems to be somewhere in the tuplesort
changes, likely resetting a memory context too soon or something like
that. I haven't investigated it further, but it matches my hunch that
tuplesort is likely where the bugs will be.

Otherwise the patch seems fairly complete. A couple of minor things that
I noticed while eyeballing the changes in a diff editor.


1) On a couple of places the new code has this comment

/* even when not parallel-aware */

while all the immediately preceding blocks use

/* even when not parallel-aware, for EXPLAIN ANALYZE */

I suggest using the same comment, otherwise it kinda suggests it's not
because of EXPLAIN ANALYZE.


2) I think the purpose of sampleSlot should be explicitly documented
(and I'm not sure "sample" is a good term here, as is suggest some sort
of sampling (for example nodeAgg uses grp_firstTuple).


3) skipCols/SkipKeyData seems a bit strange too, I think. I'd use
PresortedKeyData or something like that.


4) In cmpSortSkipCols, when checking if the columns changed, the patch
does this:

n = ((IncrementalSort *) node->ss.ps.plan)->skipCols;

for (i = 0; i < n; i++)
{
... check i-th key ...
}

My hunch is that checking the keys from the last one, i.e.

for (i = (n-1); i >= 0; i--)
{

}

would be faster. The reasoning is that with "ORDER BY a,b" the column
"b" changes more often. But I've been unable to test this because of the
segfault crashes.


5) The changes from

if (pathkeys_contained_in(...))

to

n = pathkeys_common(pathkeys, subpath->pathkeys);


if (n == 0)

seem rather inconvenient to me, as it makes the code unnecessarily
verbose. I wonder if there's a better way to deal with this.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
test2=# EXPLAIN SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a, b;
   QUERY PLAN   

 Incremental Sort  (cost=147027.04..269419.25 rows=100 width=20)
   Sort Key: t.a, t.b
   Presorted Key: t.a
   ->  Sort  (cost=136537.84..139037.84 rows=100 width=20)
 Sort Key: t.a
 ->  Seq Scan on t  (cost=0.00..16370.00 rows=100 width=20)
(6 rows)


crash.sql
Description: application/sql
Core was generated by `postgres: tomas test2 [local] SELECT 
  '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  MemoryContextResetOnly (context=0x0) at mcxt.c:158
158 if (!context->isReset)
(gdb) bt
#0  MemoryContextResetOnly (context=0x0) at mcxt.c:158
#1  0x560f622f07b3 in tuplesort_reset (state=0x560f632e56a0) at 
tuplesort.c:1391
#2  0x560f620a2d17 in ExecIncrementalSort (pstate=0x560f632e1890) at 
nodeIncrementalSort.c:280
#3  0x560f6207d5da in ExecProcNode (node=0x560f632e1890) at 
../../../src/include/executor/executor.h:239
#4  ExecutePlan (execute_once=, dest=0x560f632daf28, 
direction=, numberTuples=0, sendTuples=, 
operation=CMD_SELECT, use_parallel_mode=, 
planstate=0x560f632e1890, estate=0x560f632e1680) at execMain.c:1721
#5  standard_ExecutorRun (queryDesc=0x560f63233f60, direction=, 
count=0, execute_once=) at execMain.c:361
#6  0x560f621b8cfc in PortalRunSelect (portal=portal@entry=0x560f63277e40, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x560f632daf28) at pquery.c:932
#7  0x560f621ba136 in PortalRun (portal=portal@entry=0x560f63277e40, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x560f632daf28, 
altdest=altdest@entry=0x560f632daf28, 
completionTag=0x7fffdd875ac0 "") at pquery.c:773
#8  0x560f621b60cb in exec_simple_query (query_string=0x560f63213820 
"SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a, b;") at 
postgres.c:1120
#9  0x560f621b7df4 in PostgresMain (argc=, 
argv=argv@entry=0x560f63240678, dbname=, username=) at postgres.c:4144
#10 0x560f61ef56d1 in BackendRun (port=0x560f63236130) at postmaster.c:4412
#11 BackendStartup (port=0x560f63236130) at postmaster.c:4084
#12 ServerLoop () at postmaster.c:1757
#13 0x560f62148242 in PostmasterMain (argc=3, argv=0x560f6320e3e0) at 
postmaster.c:1365
#14 0x560f61ef66b7 in main (argc=3, argv=0x560f6320e3e0) at main.c:228


Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner
On 5 March 2018 at 08:06, Tom Lane  wrote:
> I looked into this patch and was disappointed to discover that it had
> only a very ad-hoc solution to the problem of version-dependent tab
> completion queries.  We need something better --- in particular, the
> recent prokind changes mean that there needs to be a way to make
> SchemaQuery queries version-dependent.

Thanks for the review Tom.

I was avoiding changing things that already worked and hence ended up
with the ad-hoc code.  It's much better have a unified approach to
handling this, though.

> So ... here is a modest proposal.  It invents a VersionedQuery concept
> and also extends the SchemaQuery infrastructure to allow those to be
> versioned.  I have not taken this nearly as far as it could be taken,
> since it's mostly just proposing mechanism.  To illustrate the
> VersionedQuery infrastructure, I fixed it so it wouldn't send
> publication/subscription queries to pre-v10 servers, and to illustrate
> the versioned SchemaQuery infrastructure, I fixed the prokind problems.

(Hmm, I guess the new prokind column may be germane to my query for
callable functions.)

> If people like this approach, I propose to commit this more or less
> as-is.  The select-tab-completion patch would then need to be rewritten
> to use this infrastructure, but I think that should be straightforward.

The SELECT completion patch was definitely ugly.  I think the
VersionedQuery is a nice way of making it less ad-hoc, but the work of
writing the numerous query versions, where necessary, remains.

My patch had 4 versions: PRE_81, PRE_90, PRE_96, and current.  This
could be reformulated as

static const VersionedQuery
Query_for_list_of_selectable_functions_or_attributes[] = {
{90600, ... },
{9, ... },
{80100, ... },
{70300, ... },
{0, NULL}
};

I do think the version indicators are nicer when they mean "supported
as of this server version" rather than my "fallback if the server
version is prior to this".

Is it overkill to have so many variations?

I am still just slightly unclear on where we are in relation to the
SAVEPOINT patch -- is that redundant now?

For SELECT support, I would be happy with applying:

  a. Your patch
  b. A reworked simple SELECT patch (possibly with fewer query versions)
  c. A SAVEPOINT patch *if* it's deemed useful [1]

And perhaps later,
  d. A cleverer SELECT patch (supporting additional items after the
first comma, for instance)

> As a separate line of work, the infrastructure could be applied to fix
> the pre-existing places where tab completion fails against old servers.
> But that's probably work for v12 or beyond, unless somebody's really
> motivated to do it right now.
>
> regards, tom lane

Edmund

[1] There is more complexity with tab completions and transactions
than I want to tackle just for SELECT; see
https://www.postgresql.org/message-id/flat/CAMyN-kAyFTC4Xavp%2BD6XYOoAOZQW2%3Dc79htji06DXF%2BuF6StOQ%40mail.gmail.com#CAMyN-kAyFTC4Xavp+D6XYOoAOZQW2=c79htji06dxf+uf6s...@mail.gmail.com



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Edmund Horner
On 5 March 2018 at 21:46, Vik Fearing  wrote:
> Tab completion on functions in SELECT (a subset of this thread's patch)
> is quite important to me personally.  I will work on this in the coming
> days.

It's something I've missed numerous times in the last few months at
work.  I guess I should really be running a psql with my own
preliminary patches applied; but I'm only a novice pgsql-hacker, and
easily default to using the official one.

If you are going to work on a patch just for functions, you should
probably make it a SchemaQuery.  I did not find a way to support
schema-qualified functions in a query that also returned column names.

Cheers,
Edmund



Re: Better Upgrades

2018-03-05 Thread Daniel Gustafsson
> On 02 Mar 2018, at 01:03, Bruce Momjian  wrote:
> 
> On Tue, Feb  6, 2018 at 01:51:09PM +0100, Daniel Gustafsson wrote:
>>> On 06 Feb 2018, at 01:09, David Fetter  wrote:
>> 
>>> - pg_upgrade is very much a blocker for on-disk format changes.
>> 
>> I wouldn’t call it a blocker, but pg_upgrade across an on-disk format change
>> would be a very different experience from what we have today since it would
>> need to read and rewrite data rather than hardlink/copy.  Definitely not a
>> trivial change though, that I completely agree with.
> 
> Uh, not necessarily.  To allow for on-disk format changes, pg_upgrade
> _could_ rewrite the data files as it copies them (not link), or we could
> modify the backend to be able to read the old format.  We have already
> done that for some changes to data and index types.

Right, that is another option.  I guess we’ll have to wait and see what the
impact will be for the available options when we get there, until there is an
actual on-disk change to reason around it’s a fairly academic discussion.

cheers ./daniel


Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-03-05 Thread Daniel Gustafsson
> On 05 Mar 2018, at 02:41, Michael Paquier  wrote:
> On Sun, Mar 04, 2018 at 11:15:58PM +0100, Daniel Gustafsson wrote:

>> If no new TLS library is supported in v11, we still got cleaner SSL support 
>> out
>> of it due to the work performed to further remove our dependency on OpenSSL, 
>> so
>> we still come out on top IMO. Thanks Peter et.al!
> 
> I am definitely interested in plugging in more generic APIs for this
> release, so as people can also fork Postgres and implement their own SSL
> implementation more easily.  One patch that still applies in this area
> is I think this one to allow SSL implementations let the backend know
> more easily is channel binding needs to be implemented or not:
> https://commitfest.postgresql.org/17/1491/

Right.  Regardless of the state of this patch I hope that can make it into 11
to further make 11 a good base for hacking on SSL code.

>> So, TL;DR: both frontend and backend API are implemented except for SCRAM
>> channel binding and CRL file support, it needs tests and documentation, it 
>> does
>> not implement pgcrypto, it will be fixed to support whichever GUC strategy 
>> the
>> project decides on for multiple TLS library support.
> 
> One bit of conclusion in this area is that at Peter E has argued in
> favor of having separate configure switches for each each SSL
> implementation instead of reworking things into a single, generic
> switch.  The GUC portion is also pointing in the direction of having one
> set of parameters per implementation so as assigning default values is a
> no-brainer.

FWIW I completely agree with this approach.  With a single set of GUCs for all
implementations I fear that we risk ending up with configurations that require
shoehorning, and that will no-doubt open the risk for vulnerable servers due to
the configuration being hard to get right.  Either approach does introduce a
problem for moving a hand-edited postgresql.conf files between clusters running
different libraries, but thats hard to avoid I think.

cheers ./daniel


Re: [PATCH] Verify Checksums during Basebackups

2018-03-05 Thread Stephen Frost
Michael,

* Michael Banck (michael.ba...@credativ.de) wrote:
> On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:
> > So sure, if we go with WARNING + exit with an errorcode, that is perhaps
> > the best combination of the two.
> 
> I had a look at how to go about this, but it appears to be a bit
> complicated; the first problem is that sendFile() and sendDir() don't
> have status return codes that could be set on checksum verifcation
> failure. So I added a global variable and threw an ereport(ERROR) at the
> end of perform_base_backup(), but then I realized that `pg_basebackup'
> the client program purges the datadir it created if it gets an error:
> 
> |pg_basebackup: final receive failed: ERROR:  Checksum mismatch during
> |basebackup
> |
> |pg_basebackup: removing data directory "data2"

Oh, ugh.

> So I guess this would have to be sent back via the replication protocol,
> but I don't see an off-hand way to do this easily?

The final ordinary result set could be extended to include the
information about checksum failures..?  I'm a bit concerned about what
to do when there are a lot of checksum failures though..  Ideally, you'd
identify all of the pages in all of the files where a checksum failed
(just throwing an error such as the one proposed above is really rather
terrible since you have no idea what block, or even what table, failed
the checksum...).

I realize this is moving the goalposts a long way, but I had actually
always envisioned having file-by-file pg_basebackup being put in at some
point, instead of tablespace-by-tablespace, which would allow for both
an ordinary result set being returned for each file that could contain
information such as the checksum failure and what pages failed, and be a
stepping stone for parallel pg_basebackup..

> Another option would be to see whether it is possible to verify the
> checksum on the client side, but then only pg_basebackup (and no other
> possible external tools using BASE_BACKUP) would profit.

Reviewing the original patch and considering this issue, I believe there
may be a larger problem- while very unlikely, there's been concern that
it's possible to read a half-written page (and possibly only the second
half) and end up with a checksum failure due to that.  In pgBackRest, we
address that by doing another read of the page and by checking the LSN
vs. where we started the backup (if the LSN is more recent than when the
backup started then we don't have to care about the page- it'll be in
the WAL).

If we're going to solve that issue the same way pgBackRest does, then
you'd really have to do it server-side, I'm afraid..  Either that, or
add a way for the client to request individual blocks be re-sent, but
that would be awful difficult for pg_basebackup to update in the
resulting tar file if it was compressed..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Verify Checksums during Basebackups

2018-03-05 Thread Michael Banck
Hi,

Am Montag, den 05.03.2018, 06:36 -0500 schrieb Stephen Frost:
> Michael,
> 
> * Michael Banck (michael.ba...@credativ.de) wrote:
> > On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:
> > > So sure, if we go with WARNING + exit with an errorcode, that is perhaps
> > > the best combination of the two.
> > 
> > I had a look at how to go about this, but it appears to be a bit
> > complicated; the first problem is that sendFile() and sendDir() don't
> > have status return codes that could be set on checksum verifcation
> > failure. So I added a global variable and threw an ereport(ERROR) at the
> > end of perform_base_backup(), but then I realized that `pg_basebackup'
> > the client program purges the datadir it created if it gets an error:
> > 
> > > pg_basebackup: final receive failed: ERROR:  Checksum mismatch during
> > > basebackup
> > > 
> > > pg_basebackup: removing data directory "data2"
> 
> Oh, ugh.

I came up with the attached patch, which sets a checksum_failure
variable in both basebackup.c and pg_basebackup.c, and emits an ereport
with (for now) ERRCODE_DATA_CORRUPTED at the end of
perform_base_backup(), which gets caught in pg_basebackup and then used
to not cleanup the datadir, but exit with a non-zero exit code.

Does that seem feasible?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuerdiff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 361cde3291..11fb009b59 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -100,6 +100,9 @@ static TimeOffset elapsed_min_unit;
 /* The last check of the transfer rate. */
 static TimestampTz throttled_last;
 
+/* Whether a checksum failure occured. */
+static bool checksum_failure;
+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves are
@@ -208,6 +211,8 @@ perform_base_backup(basebackup_options *opt)
 	labelfile = makeStringInfo();
 	tblspc_map_file = makeStringInfo();
 
+	checksum_failure = false;
+
 	startptr = do_pg_start_backup(opt->label, opt->fastcheckpoint, &starttli,
   labelfile, &tablespaces,
   tblspc_map_file,
@@ -566,6 +571,12 @@ perform_base_backup(basebackup_options *opt)
 		pq_putemptymessage('c');
 	}
 	SendXlogRecPtrResult(endptr, endtli);
+
+	if (checksum_failure == true)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			errmsg("Checksum mismatch during basebackup\n")));
+
 }
 
 /*
@@ -1295,10 +1306,13 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 checksum = pg_checksum_page((char *) page, blkno + segmentno*RELSEG_SIZE);
 phdr = (PageHeader) page;
 if (phdr->pd_checksum != checksum)
+{
 	ereport(WARNING,
-			(errmsg("checksum mismatch in file \"%s\", block %d: "
-			"expected %x, found %x", readfilename, blkno,
+			(errmsg("checksum mismatch in file \"%s\", segment %d, block %d: "
+			"expected %x, found %x", readfilename, segmentno, blkno,
 			phdr->pd_checksum, checksum)));
+	checksum_failure = true;
+}
 blkno++;
 			}
 		}
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1b32592063..e1b364b4ef 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -39,6 +39,7 @@
 #include "replication/basebackup.h"
 #include "streamutil.h"
 
+#define ERRCODE_DATA_CORRUPTED	"XX001"
 
 typedef struct TablespaceListCell
 {
@@ -81,6 +82,7 @@ static char *xlog_dir = NULL;
 static char format = 'p';		/* p(lain)/t(ar) */
 static char *label = "pg_basebackup base backup";
 static bool noclean = false;
+static bool checksum_failure = false;
 static bool showprogress = false;
 static int	verbose = 0;
 static int	compresslevel = 0;
@@ -155,7 +157,7 @@ cleanup_directories_atexit(void)
 	if (success || in_log_streamer)
 		return;
 
-	if (!noclean)
+	if (!noclean && !checksum_failure)
 	{
 		if (made_new_pgdata)
 		{
@@ -195,7 +197,7 @@ cleanup_directories_atexit(void)
 	}
 	else
 	{
-		if (made_new_pgdata || found_existing_pgdata)
+		if ((made_new_pgdata || found_existing_pgdata) && !checksum_failure)
 			fprintf(stderr,
 	_("%s: data directory \"%s\" not removed at user's request\n"),
 	progname, basedir);
@@ -1970,8 +1972,17 @@ BaseBackup(void)
 	res = PQgetResult(conn);
 	if (PQresultStatus(res) != PGRES_COMMAND_OK)
 	{
-		fprintf(stderr, _("%s: final receive failed: %s"),
-progname, PQerrorMessage(conn));
+		const char *sqlstate = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+		if (sqlstate &&
+			strcmp(sqlstate, ERRCODE_DATA_CORRUPTED) == 0)
+		{
+			fprintf(stderr, _("%s: checksum

Re: [PATCH] Verify Checksums during Basebackups

2018-03-05 Thread Stephen Frost
Michael,

* Michael Banck (michael.ba...@credativ.de) wrote:
> Am Montag, den 05.03.2018, 06:36 -0500 schrieb Stephen Frost:
> > * Michael Banck (michael.ba...@credativ.de) wrote:
> > > On Sun, Mar 04, 2018 at 06:19:00PM +0100, Magnus Hagander wrote:
> > > > So sure, if we go with WARNING + exit with an errorcode, that is perhaps
> > > > the best combination of the two.
> > > 
> > > I had a look at how to go about this, but it appears to be a bit
> > > complicated; the first problem is that sendFile() and sendDir() don't
> > > have status return codes that could be set on checksum verifcation
> > > failure. So I added a global variable and threw an ereport(ERROR) at the
> > > end of perform_base_backup(), but then I realized that `pg_basebackup'
> > > the client program purges the datadir it created if it gets an error:
> > > 
> > > > pg_basebackup: final receive failed: ERROR:  Checksum mismatch during
> > > > basebackup
> > > > 
> > > > pg_basebackup: removing data directory "data2"
> > 
> > Oh, ugh.
> 
> I came up with the attached patch, which sets a checksum_failure
> variable in both basebackup.c and pg_basebackup.c, and emits an ereport
> with (for now) ERRCODE_DATA_CORRUPTED at the end of
> perform_base_backup(), which gets caught in pg_basebackup and then used
> to not cleanup the datadir, but exit with a non-zero exit code.
> 
> Does that seem feasible?

Ah, yes, I had thought about using a WARNING or NOTICE or similar also
to pass back the info about the checksum failure during the backup, that
seems like it would work as long as pg_basebackup captures that
information and puts it into a log or on stdout or similar.

I'm a bit on the fence about if we shouldn't just have pg_basebackup
always return a non-zero exit code on a WARNING being seen during the
backup instead.  Given that there's a pretty clear SQL code for this
case, perhaps throwing an ERROR and then checking the SQL code isn't
an issue though.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgbench randomness initialization

2018-03-05 Thread Fabien COELHO


Hello Chapman,

Here is v9.


It needs s/explicitely/explicitly/ in the docs.


Done.

The parsing of the seed involves matters of taste, I guess: if it were a 
signed int, then sscanf's built-in %i would do everything those three 
explicit hex/octal/decimal branches do, but there's no unsigned version 
of %i. Then there's strtoul(..., base=0), which accepts the same choice 
of bases, but there's no unsigned-int-width version of that. Maybe it 
would still look cleaner to use strtoul and just check that the result 
fits in unsigned int? As I began, it comes down to taste ... this code 
does work.


I must admit that I'm not too happy with the result as well, so I dropped 
the octal/hexadecimal parsing.



I am not sure about the "idem for :random_seed" part: Does this mean that a 
value
could be given with -Drandom_seed on the command line, and become the value
of :random_seed, possibly different from the value given to --random-seed?
Is that intended? (Perhaps it is; I'm merely asking.)


The "idem" is about setting the variable but not overwritting it if it 
already exists. The intention is that :random_seed is the random seed, 
unless the user set it to something else in which case it is the user's 
value. I've improved the variable description in the doc to point out that 
the value may be overwritten with -D.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 5f28023..4ef0adc 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -680,6 +680,43 @@ pgbench  options 
 d
  
 
  
+  
--random-seed=SEED
+  
+   
+Set random generator seed.  Seeds the system random number generator,
+which then produces a sequence of initial generator states, one for
+each thread.
+Values for SEED may be:
+time (the default, the seed is based on the current 
time),
+rand (use a strong random source, failing if none
+is available), or an unsigned integer value.
+The random generator is invoked explicitly from a pgbench script
+(random... functions) or implicitly (for instance 
option
+--rate uses it to schedule transactions).
+When explicitly set, the value used for seeding is shown on the 
terminal.
+Any value allowed for SEED may also be
+provided through the environment variable
+PGBENCH_RANDOM_SEED.
+To ensure that the provided seed impacts all possible uses, put this 
option
+first or use the environment variable.
+  
+  
+Setting the seed explicitly allows to reproduce a 
pgbench
+run exactly, as far as random numbers are concerned.
+As the random state is managed per thread, this means the exact same
+pgbench run for an identical invocation if there is 
one
+client per thread and there are no external or data dependencies.
+From a statistical viewpoint reproducing runs exactly is a bad idea 
because
+it can hide the performance variability or improve performance unduly,
+e.g. by hitting the same pages as a previous run.
+However, it may also be of great help for debugging, for instance
+re-running a tricky case which leads to an error.
+Use wisely.
+   
+  
+ 
+
+ 
   
--sampling-rate=rate
   

@@ -874,14 +911,19 @@ pgbench  options 
 d
 
  
   
-scale 
-   current scale factor
-  
-
-  
 client_id 
unique number identifying the client session (starts from 
zero)
   
+
+  
+random_seed 
+   random generator seed (unless overwritten with 
-D)
+  
+
+  
+scale 
+   current scale factor
+  
  
 

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5c07dd9..b516fb3 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -146,6 +146,9 @@ int64   latency_limit = 0;
 char  *tablespace = NULL;
 char  *index_tablespace = NULL;
 
+/* random seed used when calling srandom() */
+int64 random_seed = -1;
+
 /*
  * end of configurable parameters
  */
@@ -561,6 +564,7 @@ usage(void)
   "  --log-prefix=PREFIX  prefix for transaction time log 
file\n"
   "   (default: \"pgbench_log\")\n"
   "  --progress-timestamp use Unix epoch timestamps for 
progress\n"
+  "  --random-seed=SEED   set random seed (\"time\", 
\"rand\", integer)\n"
   "  --sampling-rate=NUM  fraction of transactions to log 
(e.g., 0.01 for 1%%)\n"
   "\nCommon options:\n"
   "  -d, --debug  print debugging output\n"
@@ -4353,6 +4357,49 @@ printResults(TState *threads, StatsData *total, 
instr_time total_time,
}
 }
 
+/* call srandom base

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2018-03-05 Thread Fabien COELHO


Minor rebase after vpath fix (e94f2bc809a0c684185666f19d81f6496e732a3a).

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 5c07dd9..1fd2451 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5163,6 +5163,96 @@ main(int argc, char **argv)
return 0;
 }
 
+/* display the progress report */
+static void
+doProgressReport(TState *thread, StatsData *plast, int64 *plast_report,
+int64 now, int64 thread_start)
+{
+   StatsData   cur;
+   int64   run = now - *plast_report,
+   ntx;
+   double  tps,
+   total_run,
+   latency,
+   sqlat,
+   lag,
+   stdev;
+   chartbuf[64];
+   int i;
+
+   /*
+* Add up the statistics of all threads.
+*
+* XXX: No locking. There is no guarantee that we get an
+* atomic snapshot of the transaction count and latencies, so
+* these figures can well be off by a small amount. The
+* progress report's purpose is to give a quick overview of
+* how the test is going, so that shouldn't matter too much.
+* (If a read from a 64-bit integer is not atomic, you might
+* get a "torn" read and completely bogus latencies though!)
+*/
+   initStats(&cur, 0);
+   for (i = 0; i < nthreads; i++)
+   {
+   mergeSimpleStats(&cur.latency, &thread[i].stats.latency);
+   mergeSimpleStats(&cur.lag, &thread[i].stats.lag);
+   cur.cnt += thread[i].stats.cnt;
+   cur.skipped += thread[i].stats.skipped;
+   }
+
+   /* we count only actually executed transactions */
+   ntx = (cur.cnt - cur.skipped) - (plast->cnt - plast->skipped);
+   total_run = (now - thread_start) / 100.0;
+   tps = 100.0 * ntx / run;
+   if (ntx > 0)
+   {
+   latency = 0.001 * (cur.latency.sum - plast->latency.sum) / ntx;
+   sqlat = 1.0 * (cur.latency.sum2 - plast->latency.sum2) / ntx;
+   stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency);
+   lag = 0.001 * (cur.lag.sum - plast->lag.sum) / ntx;
+   }
+   else
+   {
+   latency = sqlat = stdev = lag = 0;
+   }
+
+   if (progress_timestamp)
+   {
+   /*
+* On some platforms the current system timestamp is
+* available in now_time, but rather than get entangled
+* with that, we just eat the cost of an extra syscall in
+* all cases.
+*/
+   struct timeval tv;
+
+   gettimeofday(&tv, NULL);
+   snprintf(tbuf, sizeof(tbuf), "%ld.%03ld s",
+(long) tv.tv_sec, (long) (tv.tv_usec / 1000));
+   }
+   else
+   {
+   /* round seconds are expected, but the thread may be late */
+   snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
+   }
+
+   fprintf(stderr,
+   "progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
+   tbuf, tps, latency, stdev);
+
+   if (throttle_delay)
+   {
+   fprintf(stderr, ", lag %.3f ms", lag);
+   if (latency_limit)
+   fprintf(stderr, ", " INT64_FORMAT " skipped",
+   cur.skipped - plast->skipped);
+   }
+   fprintf(stderr, "\n");
+
+   *plast = cur;
+   *plast_report = now;
+}
+
 static void *
 threadRun(void *arg)
 {
@@ -5422,89 +5512,8 @@ threadRun(void *arg)
now = INSTR_TIME_GET_MICROSEC(now_time);
if (now >= next_report)
{
-   /* generate and show report */
-   StatsData   cur;
-   int64   run = now - last_report,
-   ntx;
-   double  tps,
-   total_run,
-   latency,
-   sqlat,
-   lag,
-   stdev;
-   chartbuf[64];
-
-   /*
-* Add up the statistics of all threads.
-*
-* XXX: No locking. There is no guarantee that 
we get an
-* atomic snapshot of the transaction count and 
latencies, so
-* these figures can w

Re: All Taxi Services need Index Clustered Heap Append

2018-03-05 Thread Aleksander Alekseev
Hello Darafei,

> After my talk at pgconf.ru Alexander Korotkov encouraged me to share my
> story and thoughts in -hackers.
> [...]
>  - An append-only table of shape (id uuid, ts timestamp, geom geometry,
> heading speed accuracy float, source text).
> A btree index on (id, ts).
> [...]
> select * from positions where id = :id and ts between :start_ts and :end_ts;

Thank you for sharing the detailed description of the problem!

Frankly, considering your requirements, including one regarding data
durability and consistency, I would advice to write a microservice that
stores data in non-transactional append-only one-file-per-driver-id
fashion. For serializing the data I would advice to use something like
Thrift or Protobuf. Also it would be a good idea to store CRC of every
record. It will take like one workday of any Go/Java developer and will
solve your problem entirely.

Granted, PostgreSQL is a powerful database. But knowing it development
process and current resources limitations (for instance, amount of people
willing to do code review) objectively it will take a few years before
this problem will be solved in the vanilla code, plus some time before
AWS deploys it. (Assuming there is a PostgreSQL core developer
interested in this particular task and he or she has time to spare.)

It is my understanding that waiting a few years is not an option in your
case. This is why I'm proposing an alternative solution.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: All Taxi Services need Index Clustered Heap Append

2018-03-05 Thread Komяpa
>
> This approach mixes well with hash
> partitioning. It would be neat indeed if PostgreSQL do something
> equivalent on its own, and pluggable storage work being done could
> enable index organized tables that would help. But you probably need
> something right now.
>

Fixing glaring issues (no vacuum and thus no Index-Only Scan on append-only
tables, vacuum processing all of the eternity of btree) by 11 will get most
of spike-nails out of the microservice code, and we can probably live with
them until 11 gets to RDS.

I also don't see why a pluggable storage is a must for the clustered write.
Postgres does have a mechanism for selecting the next page to write tuple
to, right now it's just looking at FSM - but what if it just peeked at
existing index that already has enough the data to route tuple to correct
page on write?



> I guess I don't have to tell you that it looks like your needs have
> outgrown what RDS works well with and you are in for a painful move
> sooner or later.
>

Painful move where to? If we just run a Postgres instance without RDS we'll
get the pain of setting up Postgres and replication and backups and
autofailover, with no visible gain except if we get some private /
unaccepted patches applied to it. If we can get these things right upstream
why would we want to switch?

Per my colleagues, MySQL offers clustered index, also MySQL is available on
RDS without the need of "painful move", which is doable by writing to two
locations for a day and then pointing readers to new DB. But if we can
instead do no move and be sure the issues are gone upstream before we hit
the limit of spike-nails we're running on currently, wouldn't that be
better? :)

Darafei Praliaskouski,
GIS Engineer / Juno Minsk


Re: Server Crash while executing pg_replication_slot_advance (second time)

2018-03-05 Thread tushar

Hi,

There is an another similar issue where i am getting an error -

postgres=# select * from pg_replication_slots ;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot2 | test_decoding | logical   |  13223 | postgres | 
f | f  |    |  |  569 | 0/2B000370  | 
0/2B0003A8

(1 row)

postgres=#  SELECT * FROM 
pg_replication_slot_advance('regression_slot2',pg_switch_wal());

    slot_name |  end_lsn
--+
 regression_slot2 | 0/2B0003C0
(1 row)

postgres=#  SELECT * FROM 
pg_replication_slot_advance('regression_slot2',pg_switch_wal());

ERROR:  invalid record length at 0/2B0003C0: wanted 24, got 0
postgres=#

regards,

On 02/16/2018 04:04 PM, tushar wrote:


Hi,

Getting an another server crash against latest sources of v11 while 
executing pg_replication_slot_advance second time . This issue is 
also  reproducible  with the patch given at  



Test-Case scenario-

postgres=#  SELECT * FROM 
pg_create_logical_replication_slot('regression_slot1', 
'test_decoding', true);

    slot_name |    lsn
--+---
 regression_slot1 | 0/4000258
(1 row)

postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |   8756 |  |  557 | 0/4000220   | 
0/4000258

(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/4000270
(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/500
(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/500');

  end_lsn
---
 0/500
(1 row)

postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |   8756 |  |  557 | 0/4000220   | 
0/500

(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/578
(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/600
(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/600');

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

Stack trace -

Core was generated by `postgres: centos postgres [local] 
SELECT  '.

Program terminated with signal 6, Aborted.
#0  0x003746e325e5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 
krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 
libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-48.el6_8.4.x86_64 
zlib-1.2.3-29.el6.x86_64

(gdb) bt
#0  0x003746e325e5 in raise () from /lib64/libc.so.6
#1  0x003746e33dc5 in abort () from /lib64/libc.so.6
#2  0x00a11f8a in ExceptionalCondition (conditionName=0xad5f00 
"!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + 
((8) - 1)) & ~((uintptr_t) ((8) - 1)",
    errorType=0xad5eed "FailedAssertion", fileName=0xad5ee0 
"xlogreader.c", lineNumber=243) at assert.c:54
#3  0x0055df1d in XLogReadRecord (state=0x2d859c0, 
RecPtr=83886080, errormsg=0x7ffd6a4359f8) at xlogreader.c:243
#4  0x008516bc in pg_logical_replication_slot_advance 
(startlsn=83886080, moveto=100663296) at slotfuncs.c:370
#5  0x00851a21 in pg_replication_slot_advance 
(fcinfo=0x7ffd6a435bb0) at slotfuncs.c:488
#6  0x006defdc in ExecMakeTableFunctionResult 
(setexpr=0x2d5e260, econtext=0x2d5df58, argContext=0x2da97f0, 
expectedDesc=0x2d5f780, randomAccess=0 '\000') at execSRF.c:231
#7  0x006f1782 in FunctionNext (node=0x2d5de40) at 
nodeFunctionscan.c:95
#8  0x006de80d in ExecScanFetch (node=0x2d5de40, 
accessMtd=0x6f16cb , recheckMtd=0x6f1abd 
) at execScan.c:95
#9 

Re: [PATCH] Find additional connection service files in pg_service.conf.d directory

2018-03-05 Thread Andreas 'ads' Scherbaum
On Thu, Mar 1, 2018 at 7:40 PM, Curt Tilmes  wrote:

> On Thu, Mar 1, 2018 at 1:13 PM, Andres Freund  wrote:
>


> > I'd also insist that the file ending is ".conf".
>
> New patch limits files to ".conf".
>

Looked over this patch and found that you limit the directory entries to
files only.
Would it make sense to allow links as well? That would allow other
programs/distributions to place a link in the config directory and point to
a service file in their own directory.


Regards,

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-05 Thread Etsuro Fujita

Hi Arthur,

(2018/03/03 18:51), Arthur Zakirov wrote:

On Wed, Feb 28, 2018 at 05:22:42PM +0900, Etsuro Fujita wrote:

I rebased the patch over HEAD and revised comments/docs a little bit. Please
find attached a new version of the patch.


I've reviewed the patch.

The code is good, clear and it is pretty small. There are documentation
fixes and additional regression tests.

Unfortunately the patch is outdated and it needs rebasing. Outdated
files are regression tests files.

After rebasing regression tests they pass.


I rebased the patch over HEAD.  Please find attached an updated patch.

Thank you for the review!

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 140,145  static void deparseSubqueryTargetList(deparse_expr_cxt *context);
--- 140,146 
  static void deparseReturningList(StringInfo buf, PlannerInfo *root,
  	 Index rtindex, Relation rel,
  	 bool trig_after_row,
+ 	 List *withCheckOptionList,
  	 List *returningList,
  	 List **retrieved_attrs);
  static void deparseColumnRef(StringInfo buf, int varno, int varattno,
***
*** 1645,1658  deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
   * deparse remote INSERT statement
   *
   * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseInsertSql(StringInfo buf, PlannerInfo *root,
   Index rtindex, Relation rel,
   List *targetAttrs, bool doNothing,
!  List *returningList, List **retrieved_attrs)
  {
  	AttrNumber	pindex;
  	bool		first;
--- 1646,1660 
   * deparse remote INSERT statement
   *
   * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
!  * which is returned to *retrieved_attrs.
   */
  void
  deparseInsertSql(StringInfo buf, PlannerInfo *root,
   Index rtindex, Relation rel,
   List *targetAttrs, bool doNothing,
!  List *withCheckOptionList, List *returningList,
!  List **retrieved_attrs)
  {
  	AttrNumber	pindex;
  	bool		first;
***
*** 1701,1720  deparseInsertSql(StringInfo buf, PlannerInfo *root,
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_insert_after_row,
! 		 returningList, retrieved_attrs);
  }
  
  /*
   * deparse remote UPDATE statement
   *
   * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by RETURNING (if any), which is returned
!  * to *retrieved_attrs.
   */
  void
  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
   Index rtindex, Relation rel,
!  List *targetAttrs, List *returningList,
   List **retrieved_attrs)
  {
  	AttrNumber	pindex;
--- 1703,1723 
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_insert_after_row,
! 		 withCheckOptionList, returningList, retrieved_attrs);
  }
  
  /*
   * deparse remote UPDATE statement
   *
   * The statement text is appended to buf, and we also create an integer List
!  * of the columns being retrieved by WITH CHECK OPTION or RETURNING (if any),
!  * which is returned to *retrieved_attrs.
   */
  void
  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
   Index rtindex, Relation rel,
!  List *targetAttrs,
!  List *withCheckOptionList, List *returningList,
   List **retrieved_attrs)
  {
  	AttrNumber	pindex;
***
*** 1743,1749  deparseUpdateSql(StringInfo buf, PlannerInfo *root,
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_update_after_row,
! 		 returningList, retrieved_attrs);
  }
  
  /*
--- 1746,1752 
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_update_after_row,
! 		 withCheckOptionList, returningList, retrieved_attrs);
  }
  
  /*
***
*** 1836,1842  deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
    &context);
  	else
  		deparseReturningList(buf, root, rtindex, rel, false,
! 			 returningList, retrieved_attrs);
  }
  
  /*
--- 1839,1845 
    &context);
  	else
  		deparseReturningList(buf, root, rtindex, rel, false,
! 			 NIL, returningList, retrieved_attrs);
  }
  
  /*
***
*** 1858,1864  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_delete_after_row,
! 		 returningList, retrieved_attrs);
  }
  
  /*
--- 1861,1867 
  
  	deparseReturningList(buf, root, rtindex, rel,
  		 rel->trigdesc && rel->trigdesc->trig_delete_after_row,
! 		 NIL, returningList, retrieved_attrs);
  }
  
  /*
***
*** 1919,1925 **

Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Michael Banck
Hi Alvaro,

On Thu, Mar 01, 2018 at 04:00:52PM -0300, Alvaro Herrera wrote:
> strdup -> pg_strdup()

Fixed.

> I wonder if, instead of doing strcmp() all over the place, we should
> give this behavior a separate boolean flag in lclContext.

I mused a bit about what to name that flag and came up with `discard',
but maybe somebody has a better idea?

In any case, new patch attached which does this.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..8657ac30e6 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -512,7 +512,10 @@ cfopen_write(const char *path, const char *mode, int compression)
 #ifdef HAVE_LIBZ
 		char	   *fname;
 
-		fname = psprintf("%s.gz", path);
+		if (strcmp(path, NULL_DEVICE) != 0)
+			fname = psprintf("%s.gz", path);
+		else
+			fname = pg_strdup(path);
 		fp = cfopen(fname, mode, compression);
 		free_keep_errno(fname);
 #else
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 8dd1915998..ab17adbc62 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -62,6 +62,13 @@ typedef struct _z_stream
 typedef z_stream *z_streamp;
 #endif
 
+/* Null device */
+#ifdef WIN32
+#define NULL_DEVICE "NUL"
+#else
+#define NULL_DEVICE "/dev/null"
+#endif
+
 /* Data block types */
 #define BLK_DATA 1
 #define BLK_BLOBS 3
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 4aabb40f59..5936a799c7 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -54,6 +54,7 @@ typedef struct
 
 	cfp		   *blobsTocFH;		/* file handle for blobs.toc */
 	ParallelState *pstate;		/* for parallel backup / restore */
+	int			discard;		/* target is NULL_DEVICE */
 } lclContext;
 
 typedef struct
@@ -158,6 +159,11 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 
 	ctx->directory = AH->fSpec;
 
+	if (strcmp(ctx->directory, NULL_DEVICE) == 0)
+		ctx->discard = 1;
+	else
+		ctx->discard = 0;
+
 	if (AH->mode == archModeWrite)
 	{
 		struct stat st;
@@ -192,9 +198,15 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 			}
 		}
 
-		if (!is_empty && mkdir(ctx->directory, 0700) < 0)
-			exit_horribly(modulename, "could not create directory \"%s\": %s\n",
-		  ctx->directory, strerror(errno));
+		/*
+		 * Create the output directory, unless it exists already and is empty.
+		 * If we are discarding output to the null device, we must not
+		 * create it.
+		 */
+		if (!is_empty && !ctx->discard)
+			if (mkdir(ctx->directory, 0711) < 0)
+exit_horribly(modulename, "could not create directory \"%s\": %s\n",
+			  ctx->directory, strerror(errno));
 	}
 	else
 	{			/* Read Mode */
@@ -602,8 +614,10 @@ _CloseArchive(ArchiveHandle *AH)
 		/*
 		 * In directory mode, there is no need to sync all the entries
 		 * individually. Just recurse once through all the files generated.
+		 * If we are discarding output to the null device, it does not
+		 * need syncing and would emit a warning if we did.
 		 */
-		if (AH->dosync)
+		if (AH->dosync && !ctx->discard)
 			fsync_dir_recurse(ctx->directory, progname);
 	}
 	AH->FH = NULL;
@@ -659,7 +673,14 @@ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	char		fname[MAXPGPATH];
 
-	snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
+	/*
+	 * If we are discarding output to the null device, just use that as
+	 * fname.
+	 */
+	if (ctx->discard)
+		snprintf(fname, MAXPGPATH, NULL_DEVICE);
+	else
+		snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
 
 	ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression);
 
@@ -721,9 +742,18 @@ setFilePath(ArchiveHandle *AH, char *buf, const char *relativeFilename)
 	if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
 		exit_horribly(modulename, "file name too long: \"%s\"\n", dname);
 
-	strcpy(buf, dname);
-	strcat(buf, "/");
-	strcat(buf, relativeFilename);
+	/*
+	 * If we are discarding output to the null device, we cannot set a file
+	 * path and just set the buffer to the null device.
+	 */
+	if (ctx->discard)
+	{
+		strcpy(buf, NULL_DEVICE);
+	} else {
+		strcpy(buf, dname);
+		strcat(buf, "/");
+		strcat(buf, relativeFilename);
+	}
 }
 
 /*


Re: new function for tsquery creartion

2018-03-05 Thread Aleksander Alekseev
It seems that this patch doesn't apply anymore, see 
http://commitfest.cputube.org/

The new status of this patch is: Waiting on Author


Re: Cache lookup errors with functions manipulation object addresses

2018-03-05 Thread Aleksander Alekseev
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hello Michael,

It looks like that this patch doesn't apply anymore: 
http://commitfest.cputube.org/

The new status of this patch is: Waiting on Author


Re: Better Upgrades

2018-03-05 Thread David Fetter
On Mon, Mar 05, 2018 at 11:18:20AM +0100, Daniel Gustafsson wrote:
> > On 02 Mar 2018, at 12:59, Greg Stark  wrote:
> 
> > My feeling is that worrying about in-place binary upgrades today
> > is wasted effort. Already the window for installations where this
> > is useful is narrow -- you have to be big enough that the
> > resources for deploying a second instance is significant but not
> > so big that the downtime and risk is untenable.
> 
> I might be colorblind from $dayjob,

I agree.

> but I don’t think that these installations (data warehouses et.al)
> are that uncommon.

Data warehouses are by no means rare. They also need backups, just as
any other system does, and that means at the very least duplicate
storage on at least one separate node, even if that node can't
actually bring the warehouse back up in the case of a catastrophic
failure of the original warehouse.

> They are also installations that risk staying on an old version due
> to upgrades being non-trivial (not saying that in-place is trivial,
> just that there are places where it may make sense).

I see in-place upgrades as the riskiest of the possible options, and
that's not just in the case of PostgreSQL.  Every other system with
that feature has had catastrophic failures that were impossible to
predict in advance.  That reality turns on the fundamentals of
constructing such systems, to wit:

- They take enormous amounts of deep knowledge to get right.
- The people who have that knowledge are not inclined to doing what
  amounts to an invisible feature.
- They are perforce done as the last thing before a release, often
  blocking other features and holding up the release process as a
  consequence.
- They can never really be tested for corner cases--see above for one
  of the reasons.
- There can be no realistic back-out plan when something goes wrong.

> > I have the feeling that in-place binary upgrades are going to end
> > up sapping developer time
> 
> Having worked on supporting the 8.2->8.3 on-disk format change in
> pg_upgrade for GPDB, I am not arguing against that.  Not at all.

Your experience reflects one of the fundamental problems with those
systems. It wasn't a one-off.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Flexible configuration for full-text search

2018-03-05 Thread Aleksander Alekseev
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Unfortunately this patch doesn't apply anymore: http://commitfest.cputube.org/

Re: inserts into partitioned table may cause crash

2018-03-05 Thread Etsuro Fujita

(2018/03/01 21:40), Etsuro Fujita wrote:

(2018/02/28 17:36), Amit Langote wrote:

Attached a patch to fix that, which would need to be back-patched to 10.


Good catch! Will review.


I started reviewing this.  I think the analysis you mentioned upthread 
would be correct, but I'm not sure the patch is the right way to go 
because I think that exception handling added by the patch throughout 
ExecInsert such as:


@@ -408,7 +408,10 @@ ExecInsert(ModifyTableState *mtstate,
slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);

if (slot == NULL)   /* "do nothing" */
-   return NULL;
+   {
+   result = NULL;
+   goto restore_estate_result_rel;
+   }

would reduce the code readability.

An alternative fix for this would be to handle the set/reset of 
estate->es_result_relation_info in a higher level ie, ExecModifyTable, 
like the attached: (1) before calling ExecInsert, we do the preparation 
work for tuple routing of one tuple (eg, determine the partition for the 
tuple and convert the format of the tuple in a separate function, which 
also sets estate->es_result_relation_info to the partition for 
ExecInsert to work on it; then (2) we call ExecInsert, which just 
inserts into the partition; then (3) we reset 
estate->es_result_relation_info back to the root partitioned table.  One 
good thing about the alternative is to return ExecInsert somehow to 
PG9.6, which would make maintenance easier.  COPY has the same issue, so 
the attached also fixes that.  (Maybe we could do some refactoring to 
use the same code in both cases, but I'm not sure we should do that as a 
fix.)  What do you think about the alternative?


Best regards,
Etsuro Fujita
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
***
*** 2768,2779  CopyFrom(CopyState cstate)
  			 * tuples inserted by an INSERT command.
  			 */
  			processed++;
  
! 			if (saved_resultRelInfo)
! 			{
! resultRelInfo = saved_resultRelInfo;
! estate->es_result_relation_info = resultRelInfo;
! 			}
  		}
  	}
  
--- 2768,2780 
  			 * tuples inserted by an INSERT command.
  			 */
  			processed++;
+ 		}
  
! 		/* Restore the saved ResultRelInfo */
! 		if (saved_resultRelInfo)
! 		{
! 			resultRelInfo = saved_resultRelInfo;
! 			estate->es_result_relation_info = resultRelInfo;
  		}
  	}
  
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***
*** 67,72  static void ExecSetupChildParentMapForTcs(ModifyTableState *mtstate);
--- 67,77 
  static void ExecSetupChildParentMapForSubplan(ModifyTableState *mtstate);
  static TupleConversionMap *tupconv_map_for_subplan(ModifyTableState *node,
  		int whichplan);
+ static TupleTableSlot *ExecPrepareTupleRouting(ModifyTableState *mtstate,
+ 		PartitionTupleRouting *proute,
+ 		EState *estate,
+ 		ResultRelInfo *rootRelInfo,
+ 		TupleTableSlot *slot);
  
  /*
   * Verify that the tuples to be produced by INSERT or UPDATE match the
***
*** 265,271  ExecInsert(ModifyTableState *mtstate,
  {
  	HeapTuple	tuple;
  	ResultRelInfo *resultRelInfo;
- 	ResultRelInfo *saved_resultRelInfo = NULL;
  	Relation	resultRelationDesc;
  	Oid			newId;
  	List	   *recheckIndexes = NIL;
--- 270,275 
***
*** 282,381  ExecInsert(ModifyTableState *mtstate,
  	 * get information on the (current) result relation
  	 */
  	resultRelInfo = estate->es_result_relation_info;
- 
- 	/* Determine the partition to heap_insert the tuple into */
- 	if (mtstate->mt_partition_tuple_routing)
- 	{
- 		int			leaf_part_index;
- 		PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
- 
- 		/*
- 		 * Away we go ... If we end up not finding a partition after all,
- 		 * ExecFindPartition() does not return and errors out instead.
- 		 * Otherwise, the returned value is to be used as an index into arrays
- 		 * proute->partitions[] and proute->partition_tupconv_maps[] that will
- 		 * get us the ResultRelInfo and TupleConversionMap for the partition,
- 		 * respectively.
- 		 */
- 		leaf_part_index = ExecFindPartition(resultRelInfo,
- 			proute->partition_dispatch_info,
- 			slot,
- 			estate);
- 		Assert(leaf_part_index >= 0 &&
- 			   leaf_part_index < proute->num_partitions);
- 
- 		/*
- 		 * Save the old ResultRelInfo and switch to the one corresponding to
- 		 * the selected partition.  (We might need to initialize it first.)
- 		 */
- 		saved_resultRelInfo = resultRelInfo;
- 		resultRelInfo = proute->partitions[leaf_part_index];
- 		if (resultRelInfo == NULL)
- 		{
- 			resultRelInfo = ExecInitPartitionInfo(mtstate,
-   saved_resultRelInfo,
-   proute, estate,
-   leaf_part_index);
- 			Assert(resultRelInfo != NULL);
- 		}
- 
- 		/* We do not yet have a way to insert into a foreign partition */
- 		if (r

Re: Transform for pl/perl

2018-03-05 Thread Pavel Stehule
Hi

I am looking on this patch. I found few issues:

1. compile warning

I../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2
-I/usr/lib64/perl5/CORE  -c -o jsonb_plperl.o jsonb_plperl.c
jsonb_plperl.c: In function ‘SV_FromJsonbValue’:
jsonb_plperl.c:69:9: warning: ‘result’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
  return result;
 ^~
jsonb_plperl.c: In function ‘SV_FromJsonb’:
jsonb_plperl.c:142:9: warning: ‘result’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
  return result;
 ^~

2. bad comment

/*
 * SV_ToJsonbValue
 *
 * Transform Jsonb into SV --- propably reverse direction
 */


/*
 * HV_ToJsonbValue
 *
 * Transform Jsonb into SV
 */

/*
 * plperl_to_jsonb(SV *in)
 *
 * Transform Jsonb into SV
 */

3. Do we need two identical tests fro PLPerl and PLPerlu? Why?

Regards

Pavel


Re: [PATCH] Find additional connection service files in pg_service.conf.d directory

2018-03-05 Thread Curt Tilmes
On Mon, Mar 5, 2018 at 7:45 AM, Andreas 'ads' Scherbaum
 wrote:
> On Thu, Mar 1, 2018 at 7:40 PM, Curt Tilmes  wrote:
>> On Thu, Mar 1, 2018 at 1:13 PM, Andres Freund  wrote:
> Looked over this patch and found that you limit the directory entries to
> files only.
> Would it make sense to allow links as well? That would allow other
> programs/distributions to place a link in the config directory and point to
> a service file in their own directory.

I call stat(2), which from my understanding for a link will follow the
link and return
the information from the file linked to, not the link itself (which
lstat(2) would do).

I tried it out and it seems to work fine for links.

Curt



Re: Incorrect use of "an" and "a" in code comments and docs

2018-03-05 Thread Alvaro Herrera
Michael Paquier wrote:
> On Mon, Mar 05, 2018 at 07:42:40PM +1300, Thomas Munro wrote:
> > $ git grep ' a SQL ' | wc -l
> >  642
> > $ git grep ' an SQL ' | wc -l
> >  219
> > 
> > /me grabs popcorn
> 
> ess-queue-el, sir.

Yeah, but
http://patorjk.com/blog/2012/01/26/pronouncing-sql-s-q-l-or-sequel/

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: perltidy version

2018-03-05 Thread Alvaro Herrera
Magnus Hagander wrote:

> For example, Debian ships with 20140328, which produces the attached diff.
> I'm not sure if we want to go to whatever is a "common version on most
> platforms" today, or just "whatever is latest" if we do upgrade. AFAICT
> RHEL 7 seems to be on 20121207, RHEL 6 on 20090616. And in Ubuntu, 14.04
> has 20120701, 16.04 has 20140328, and current devel has 20140328. In
> general there seems to be very little overlap there, except Debian and
> Ubuntu covers the same versions.

here's the changelog 
https://metacpan.org/source/SHANCOCK/Perl-Tidy-20180220/CHANGES

The wikipedia page claims that the latest stable release is 20160302,
but that seems to be just because the page is out of date (last update
is before the following 2017-05 release)

It's hard to form an opinion based on this.  I don't think picking one
because of its availability in some distribution is useful, since almost
everyone is going to have to download a custom one anyway, whichever
distribution we pick -- unless it's mine, of course, hah.

I think we should just pick some recent one and use it for X years; use
that one for all backbranches.  I propose X=3.  I propose 20170521
(newer ones seem to cater for stuff that I think we mostly don't use).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Creating backup history files for backups taken from standbys

2018-03-05 Thread David Steele
On 3/5/18 1:06 AM, Michael Paquier wrote:
> On Fri, Mar 02, 2018 at 03:41:57PM -0500, David Steele wrote:
>> On 3/2/18 1:03 PM, Fujii Masao wrote:
>>> On Fri, Mar 2, 2018 at 1:07 PM, Michael Paquier  wrote:
 We would talk about two backups running
 simultaneously on a standby, which would overlap with each other to
 generate a file aimed only at being helpful for debugging purposes, and
 we provide no information now for backups taken from standbys.  We could
 of course make that logic a bit smarter by checking if there is an
 extsing file with the same name and create a new file with a different
 name.  But is that worth the complication? That's where I am not
 convinced, and that's the reason why this patch is doing things this
 way.
> 
> Sorry for my late reply, I was thinking more about the whole thing.
> 
>>> What about including the backup history file in the base backup instead of
>>> creating it in the standby's pg_wal and archiving it? As the good side 
>>> effect
>>> of this approach, we can use the backup history file for debugging purpose
>>> even when WAL archiving is not enabled.
> 
> That's not going to be much helpful for tar'ed base backups as the
> backup history file would be at the end of the stream :(  For a debug
> file, that's not really helpful.
> 
>> I don't think this is a good idea, even if it does solve the race
>> condition.  First, it would be different from the way primary-style
>> backups generate this file.  Second, these files would not be excluded
>> from future restore / backup cycles so they could build up after a while
>> and cause confusion.  I guess we could teach PG to delete them or
>> pg_basebackup to ignore them, but that doesn't seem worth the effort.
> 
> Something I did not consider though is that this causes archive commands
> to choke on those identical file names, so any archive command which is
> not able to handle that would cause WAL to bloat on the standby.  For
> this reason I would rather drop the current approach taken.  Remember
> that the docs tell to use a plain "cp" for example, so this would cause
> standbys to go silently down.

That's a good point.  The pgBackRest archive command would definitely
not like this.

> Something that I would find way more portable though is the addition of
> the backup history file in the output of pg_stop_backup(), which would
> map as well with what Fujii-san's idea to add this file to the backup
> data itself, and do the same for backups taken on a primary.  However
> this would mean moving the 'c' marker marking the end of the tar stream
> within do_pg_stop_backup(), and I am in no way a fan of that: the
> handling of the tar stream should be out of reach of the start and stop
> phases.

If the file is stored with the backup instead of the archive then I
don't think it adds much.  Why not just add stop time and stop LSN to
backup_label and get rid of the history file entirely.

I've been working closely with backup for nearly five years now and I've
need had need to look at a .backup file.  Other than writing code to
handle it in archiving, I've barely given it a thought.

> Another idea that I have here as well would be to change a bit the
> history backup file name so as the backup name is added to it, but that
> does not fly high as pg_basebackup uses its own name with spaces
> (Yeah!), or even better the PID of the backend taking the backup.

This could work, but seems complicated for little benefit.  It would
also require some archive commands to be updated to understand the new
format.

> The renaming is more appealing, still not perfect.  So my mood of the
> day would be to just drop the patch entirely.

I think that's the best idea for now.  It doesn't seem worth using
cycles in the last CF coming up with a new approach.

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: perltidy version

2018-03-05 Thread Magnus Hagander
On Mon, Mar 5, 2018 at 2:53 PM, Alvaro Herrera 
wrote:

> Magnus Hagander wrote:
>
> > For example, Debian ships with 20140328, which produces the attached
> diff.
> > I'm not sure if we want to go to whatever is a "common version on most
> > platforms" today, or just "whatever is latest" if we do upgrade. AFAICT
> > RHEL 7 seems to be on 20121207, RHEL 6 on 20090616. And in Ubuntu, 14.04
> > has 20120701, 16.04 has 20140328, and current devel has 20140328. In
> > general there seems to be very little overlap there, except Debian and
> > Ubuntu covers the same versions.
>
> here's the changelog
> https://metacpan.org/source/SHANCOCK/Perl-Tidy-20180220/CHANGES
>
> The wikipedia page claims that the latest stable release is 20160302,
> but that seems to be just because the page is out of date (last update
> is before the following 2017-05 release)
>
> It's hard to form an opinion based on this.  I don't think picking one
> because of its availability in some distribution is useful, since almost
> everyone is going to have to download a custom one anyway, whichever
> distribution we pick -- unless it's mine, of course, hah.
>
> I think we should just pick some recent one and use it for X years; use
> that one for all backbranches.  I propose X=3.  I propose 20170521
> (newer ones seem to cater for stuff that I think we mostly don't use).
>

20140328 seems to cover *most* versions. Another argument for that one
would be it's the one that we have on Borka, which is where we build the
official release tarballs, so we can use that as a stable fallback.

Those are both fairly weak arguments though. As long as we have good
instructions for how to make a local install of it that doesn't affect the
rest of the system, then that should not matter. And we need such
instructions anyway, since it won't be on every distribution.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: constraint exclusion and nulls in IN (..) clause

2018-03-05 Thread Emre Hasegeli
>> Shouldn't we check if we consumed all elements (state->next_elem >=
>> state->num_elems) inside the while loop?
>
> You're right.  Fixed.

I don't think the fix is correct.  arrayconst_next_fn() can still
execute state->next_elem++ without checking if we consumed all
elements.  I couldn't manage to crash it though.

I am also not sure if it is proper to skip some items inside the
"next_fn", but I don't know the code to suggest a better alternative.

> +   /* skip nulls if ok to do so */
> +   if (state->opisstrict && state->next)

Do we need && state->next in here?  It is checked for (state->next ==
NULL) 2 lines above.

> +   {
> +   Node   *node = (Node *) lfirst(state->next);
> +
> +   while (node && IsA(node, Const) && ((Const *) node)->constisnull)

Should we spell the first check like node != NULL as the rest of the code does.

> +   {
> +   state->next = lnext(state->next);
> +   if (state->next)
> +   node = (Node *) lfirst(state->next);
> +   }
> +   }

I think this function is also not correct as it can call
lnext(state->next) without checking.

> So far, I hadn't either.  I figured one out and added it to inherit.sql.
> Basically, I needed to write the query such that an IN () expression
> doesn't get const-simplified to a Const containing array, but rather
> remains an ArrayExpr.  So, arrayexpr_*() functions in predtest.c are now
> exercised.

Nice.  I noticed that this part of the code was not covered at all by
the tests before.



Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Andreas 'ads' Scherbaum
On Mon, Mar 5, 2018 at 1:49 PM, Michael Banck 
wrote:

>
> In any case, new patch attached which does this.
>

The null device is already defined in port.h, as DEVNULL. No need to
redefine it as NULL_DEVICE.
Alternatively paths.h defines _PATH_DEVNULL.


Regards,

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


Re: MCV lists for highly skewed distributions

2018-03-05 Thread Dean Rasheed
On 7 February 2018 at 15:58, Dean Rasheed  wrote:
> On 7 February 2018 at 15:25, Robert Haas  wrote:
>> Do you plan to press forward with this, then, or what's
>> the next step?
>
> I plan to do more testing

TL;DR: I tested this new algorithm using 9 different relative standard
error cutoffs (10%, 15%, ... 50%) against a variety of different data
distributions, and it looks like a definite improvement over the
current algorithm, for a wide range of cutoff values. Overall, it
isn't that sensitive to the exact cutoff chosen, but it looks like a
value of 20% is a good general-purpose choice.

One of the properties of the new algorithm is that the size of the MCV
list responds better to changing the stats target, so I don't think
it's worth having a separate knob to control the cutoff percentage.
Such a knob would have a similar, but weaker effect than the stats
target knob, and thus doesn't seem worthwhile.

Attached is an updated patch. I decided to move the code that
determines the minimum number of occurrences for an MCV list item out
to a separate function. It's not much code, but there's a large
comment to explain its statistical justification, and I didn't want to
duplicate that in 2 different places (possibly to become 3, with the
multivariate MCV stats patch).

I think this is ready for commit now, so barring objections, I will do
so in the next day or so.

---

I tested using the attached python script, which tests a large number
of different data distributions, comparing the results with the patch
vs those from HEAD. It uses EXPLAIN ANALYSE to compare the estimates
against actual row counts, both for values in the MCV list, and
non-MCV values, making it possible to tell whether it would have been
better if the MCV list had been bigger or smaller.

There's a lot of random variation between tests, but by running a lot
of them, the general pattern can be seen. I ran the test 9 times, with
different MCV cutoff values in the patched code of 10%, 15%, ... 50%,
then collected all the results from the test runs into a single CSV
file to analyse using SQL. The columns in the CSV are:

  test_name - A textual description of the data distribution used in the test.

  mcv_cutoff - The relative standard error cutoff percentage used (10,
15, 20, ... 50), or -10, -15, ... -50 for the corresponding tests
against HEAD.

  stats_tgt - The stats target used (100 or 1000).

  num_mcvs - The size of the resulting MCV list.

  avg_mcv_err - The average percentage difference between estimated
and actual row counts for values in the MCV list. (There's a bit of a
fudge factor in calculating these percentages, to avoid them becoming
too large in cases where the row counts are small, but I think it's
still useful for comparison purposes.)

  max_mcv_err - The maximum percentage difference between estimated
and actual row counts for values in the MCV list.

  avg_non_mcv_err - The average percentage difference between
estimated and actual row counts for a bunch of non-MCV values.

  max_non_mcv_err - The maximum percentage difference between
estimated and actual row counts for a bunch of non-MCV values.

  avg_err - The average percentage difference between estimated and
actual row counts for all values tested.

  max_err - The maximum percentage difference between estimated and
actual row counts for all values tested.

>From this, it's possible to look for cases where the MCV list is too
large or too small. For example, looking for too-many-mcv cases
(generally the more uniform data distributions):

SELECT mcv_cutoff, count(*)
  FROM results
 WHERE max_mcv_err - max_non_mcv_err > 50
 GROUP BY 1
 ORDER BY 2 DESC;

 mcv_cutoff | count
+---
-40 |41
-50 |40
-25 |39
-15 |38
-20 |38
-30 |38
-35 |38
-45 |37
-10 |37
 50 |35
 40 |35
 45 |34
 35 |33
 30 |33
 25 |25
 20 |13
 15 | 6
 10 | 3
(18 rows)

SELECT mcv_cutoff, count(*)
  FROM results
 WHERE max_mcv_err - max_non_mcv_err > 100
 GROUP BY 1
 ORDER BY 2 DESC;

 mcv_cutoff | count
+---
-30 |17
-40 |16
-15 |15
-25 |14
-10 |14
-35 |14
-45 |13
-50 |13
-20 |12
 35 |12
 45 |11
 40 |10
 30 |10
 50 |10
 25 | 6
 10 | 2
(16 rows)
( and implicitly:
 15 | 0
 20 | 0 )

So the patched code is generally better at avoiding the too-many-mcvs
problem, but an mcv_cutoff of 30% or more may be too high.

Looking for too-few-mcv cases:

SELECT mcv_cutoff, count(*)
  FROM results
 WHERE (max_non_mcv_err - max_mcv_err > 50 AND num_mcvs < stats_tgt)
 GROUP BY 1
 ORDER BY 2 DESC;

 mcv_cutoff | count
+---
-20 |   120
-5

Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Tom Lane
Edmund Horner  writes:
> On 5 March 2018 at 08:06, Tom Lane  wrote:
>> If people like this approach, I propose to commit this more or less
>> as-is.  The select-tab-completion patch would then need to be rewritten
>> to use this infrastructure, but I think that should be straightforward.

> My patch had 4 versions: PRE_81, PRE_90, PRE_96, and current.  This
> could be reformulated as
> static const VersionedQuery
> Query_for_list_of_selectable_functions_or_attributes[] = {
> {90600, ... },
> {9, ... },
> {80100, ... },
> {70300, ... },
> {0, NULL}
> };

Right.

> Is it overkill to have so many variations?

Well, it's whatever you need for the purpose.  We could discuss what the
support cutoff is, but I doubt we'd make it any later than 8.0, so some
types of catalog queries are going to need a lot of variations.

> I am still just slightly unclear on where we are in relation to the
> SAVEPOINT patch -- is that redundant now?

I'm inclined to think it's a bit pointless, if the direction we're
heading is to make the queries actually work on every server version.
Issuing a savepoint would just mask failures.

What would be actually useful is to be able to tab-complete even in
the midst of a failed transaction block ... but savepoints as such
won't get us there, and I have no good ideas about what would.

regards, tom lane



Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Michael Banck
Hi Andreas,

On Mon, Mar 05, 2018 at 03:15:19PM +0100, Andreas 'ads' Scherbaum wrote:
> The null device is already defined in port.h, as DEVNULL. No need to
> redefine it as NULL_DEVICE.

Thanks for pointing that out, a new patch removing NULL_DEVICE is
attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..0953290069 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -512,7 +512,10 @@ cfopen_write(const char *path, const char *mode, int compression)
 #ifdef HAVE_LIBZ
 		char	   *fname;
 
-		fname = psprintf("%s.gz", path);
+		if (strcmp(path, DEVNULL) != 0)
+			fname = psprintf("%s.gz", path);
+		else
+			fname = pg_strdup(path);
 		fp = cfopen(fname, mode, compression);
 		free_keep_errno(fname);
 #else
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 4aabb40f59..6aee04dad3 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -54,6 +54,7 @@ typedef struct
 
 	cfp		   *blobsTocFH;		/* file handle for blobs.toc */
 	ParallelState *pstate;		/* for parallel backup / restore */
+	int			discard;		/* target is DEVNULL */
 } lclContext;
 
 typedef struct
@@ -158,6 +159,11 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 
 	ctx->directory = AH->fSpec;
 
+	if (strcmp(ctx->directory, DEVNULL) == 0)
+		ctx->discard = 1;
+	else
+		ctx->discard = 0;
+
 	if (AH->mode == archModeWrite)
 	{
 		struct stat st;
@@ -192,9 +198,15 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 			}
 		}
 
-		if (!is_empty && mkdir(ctx->directory, 0700) < 0)
-			exit_horribly(modulename, "could not create directory \"%s\": %s\n",
-		  ctx->directory, strerror(errno));
+		/*
+		 * Create the output directory, unless it exists already and is empty.
+		 * If we are discarding output to the null device, we must not
+		 * create it.
+		 */
+		if (!is_empty && !ctx->discard)
+			if (mkdir(ctx->directory, 0711) < 0)
+exit_horribly(modulename, "could not create directory \"%s\": %s\n",
+			  ctx->directory, strerror(errno));
 	}
 	else
 	{			/* Read Mode */
@@ -602,8 +614,10 @@ _CloseArchive(ArchiveHandle *AH)
 		/*
 		 * In directory mode, there is no need to sync all the entries
 		 * individually. Just recurse once through all the files generated.
+		 * If we are discarding output to the null device, it does not
+		 * need syncing and would emit a warning if we did.
 		 */
-		if (AH->dosync)
+		if (AH->dosync && !ctx->discard)
 			fsync_dir_recurse(ctx->directory, progname);
 	}
 	AH->FH = NULL;
@@ -659,7 +673,14 @@ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	char		fname[MAXPGPATH];
 
-	snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
+	/*
+	 * If we are discarding output to the null device, just use that as
+	 * fname.
+	 */
+	if (ctx->discard)
+		snprintf(fname, MAXPGPATH, DEVNULL);
+	else
+		snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
 
 	ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression);
 
@@ -721,9 +742,18 @@ setFilePath(ArchiveHandle *AH, char *buf, const char *relativeFilename)
 	if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
 		exit_horribly(modulename, "file name too long: \"%s\"\n", dname);
 
-	strcpy(buf, dname);
-	strcat(buf, "/");
-	strcat(buf, relativeFilename);
+	/*
+	 * If we are discarding output to the null device, we cannot set a file
+	 * path and just set the buffer to the null device.
+	 */
+	if (ctx->discard)
+	{
+		strcpy(buf, DEVNULL);
+	} else {
+		strcpy(buf, dname);
+		strcat(buf, "/");
+		strcat(buf, relativeFilename);
+	}
 }
 
 /*


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-03-05 Thread Tomas Vondra
Hi,

I've done a bit of benchmarking on the last patch version (from 22/8),
using a simple workload:

1) 8 clients doing

  SELECT SUM(abalance) FROM pgbench_accounts

with the table truncated to only 10k rows

2) variable number (8, 16, 32, ..., 512) of write clients, doing this

  \set aid random(1, 1 * :scale)
  BEGIN;
  UPDATE pgbench_accounts SET abalance = abalance + 1 WHERE aid = :aid;
  SELECT pg_sleep(0.005);
  COMMIT;

The idea is to measure tps of the read-only clients, with many throttled
write clients (idle in transaction for 5ms after each update). This
should generate snapshots with many XIDs. Scripts attached, of course.

The results look really promising (see the attached chart and .ods):

write clientsmasterpatched
87048   7089
16   6601   6614
32   5862   5944
64   5327   5349
128  4574   4952
256  4132   4956
512  2196   4930

That being said, I think Stephen is right that there's a bug in
CopySnapshot, and I agree with his other comments too. I think the patch
should have been in WoA, so I'll mark it like that.

simplehash is great, but I'm not sure it's a good fit for this use case.
Seems a bit overkill to me in this case, but maybe I'm wrong.

Snapshots are static (we don't really add new XIDs into existing ones,
right?), so why don't we simply sort the XIDs and then use bsearch to
lookup values? That should fix the linear search, without need for any
local hash table.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


xid-select.sql
Description: application/sql


xid-update.sql
Description: application/sql


xid-tests.sh
Description: application/shellscript


xid-hash.ods
Description: application/vnd.oasis.opendocument.spreadsheet


tests.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread David G. Johnston
On Mon, Mar 5, 2018 at 7:41 AM, Tom Lane  wrote:

> What would be actually useful is to be able to tab-complete even in
> the midst of a failed transaction block ... but savepoints as such
> won't get us there, and I have no good ideas about what would.
>

​Why not have psql open two sessions to the backend, one with
application_name 'psql_user' and one with application name "psql_​meta" (or
some such) and have all these queries executed on the psql_meta connection?

David J.


Re: 2018-03 CFM

2018-03-05 Thread David Steele
Hi Aleksander,

On 3/2/18 7:18 AM, Aleksander Alekseev wrote:
> 
>> You do realize we have the actual source database available, I hope? Since
>> it's our own system... There is no need to scrape the data back out -- if
>> we can just define what kind of reports we want, we can trivially run it on
>> the source database. Or if we want it more often, we can easily make a
>> webview for it. It's basically just a "map this URL to a SQL query"...
> 
> I don't think commitfest.cputube.org has the SQL data on whether patch
> pass the tests. It just displays SVG images from travis-ci.org. Also
> unfortunately both commitfest.postgresql.org and commitfest.cputube.org
> currently don't have any kind of public API and don't allow to export
> data, e.g. in CSV or JSON.
> 
> I guess it would be nice if both services supported export, in any
> format, so anyone could build any kind of reports or automation tools
> without parsing HTML with regular expressions or depending on other
> people.

Yes, that would be good.  I just had a chance to look through the data
and the thing I was most hoping to do with it would be a bit complicated.

I would like to get a list of submitter patches totals vs the total
number of patches they are reviewing.  In the past I have done this by
eyeball.

Do you think you could put something like that together?

> If I'm not mistaken, there was a discussion regarding public APIs.
> I wonder what prevents adding it, at least a simple export of everything.
> After all, it is indeed just mapping URL to a SQL query. For instance,
> this one:
> 
> select array_to_json(array_agg(row_to_json(tbl))) from tbl;

I would be happy with a page that is simply a json dump of the
publicly-viewable fields in each table.  Then I can do whatever I want
with the data.

Thanks,
-- 
-David
da...@pgmasters.net



get_actual_variable_range vs idx_scan/idx_tup_fetch, again

2018-03-05 Thread Marko Tiikkaja
(Sorry for not continuing the thread in 54418d75.2000...@joh.to , but I
don't have the original email anymore.)

So I'm in the same pickle again.  According to pg_stat_user_indexes an
index is being used all the time.  However, it's only being used by
mergejoinscansel() to compare these two plans:

=> explain analyze
select *
from orders child
join orders parent on (parent.orderid = child.parentorderid)
where child.orderid = 1161771612;
 QUERY PLAN

 Nested Loop  (cost=0.00..15.56 rows=1 width=2910) (actual
time=0.401..0.402 rows=1 loops=1)
   ->  Index Scan using orders_pkey on orders child  (cost=0.00..7.78
rows=1 width=1455) (actual time=0.367..0.367 rows=1 loops=1)
 Index Cond: (orderid = 1161771612)
   ->  Index Scan using orders_pkey on orders parent  (cost=0.00..7.78
rows=1 width=1455) (actual time=0.027..0.028 rows=1 loops=1)
 Index Cond: (orderid = child.parentorderid)
 Total runtime: 0.852 ms
(6 rows)

=> set enable_nestloop to false; set enable_hashjoin to false;
SET
SET
=> explain
select *
from orders child
join orders parent on (parent.orderid = child.parentorderid)
where child.orderid = 1161771612;
   QUERY PLAN
-
 Merge Join  (cost=1804805.57..97084775.33 rows=1 width=2910)
   Merge Cond: (parent.orderid = child.parentorderid)
   ->  Index Scan using orders_pkey on orders parent
(cost=0.00..96776686.40 rows=123232448 width=1455)
   ->  Sort  (cost=7.79..7.79 rows=1 width=1455)
 Sort Key: child.parentorderid
 ->  Index Scan using orders_pkey on orders child  (cost=0.00..7.78
rows=1 width=1455)
   Index Cond: (orderid = 1161771612)
(7 rows)

The merge join plan is pretty obviously shit and the fact that the planner
got a better estimate for it by peeking through the index had zero effect.

I think it would be really important to have a way to turn off
get_actual_variable_range() for a specific index during runtime.  Would a C
level hook be acceptable for this?


.m


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-03-05 Thread Tom Lane
Tomas Vondra  writes:
> Snapshots are static (we don't really add new XIDs into existing ones,
> right?), so why don't we simply sort the XIDs and then use bsearch to
> lookup values? That should fix the linear search, without need for any
> local hash table.

+1 for looking into that, since it would avoid adding any complication
to snapshot copying.  In principle, with enough XIDs in the snap, an
O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
dubious that we are often in the range where that would matter.
We do need to worry about the cost of snapshot copying, too.

regards, tom lane



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Tom Lane
"David G. Johnston"  writes:
> On Mon, Mar 5, 2018 at 7:41 AM, Tom Lane  wrote:
>> What would be actually useful is to be able to tab-complete even in
>> the midst of a failed transaction block ... but savepoints as such
>> won't get us there, and I have no good ideas about what would.

> ​Why not have psql open two sessions to the backend, one with
> application_name 'psql_user' and one with application name "psql_​meta" (or
> some such) and have all these queries executed on the psql_meta connection?

If we did it like that, tab completion would fail to see the session's
temp tables, or objects created in the current open transaction.

People might bitch about using twice as many connections, too, although
likely you could finesse that by only opening the second connection if
tab completion actually happens (so that only interactive sessions have
one).  Still, the local-objects problem seems like a fatal objection.

regards, tom lane



Re: get_actual_variable_range vs idx_scan/idx_tup_fetch, again

2018-03-05 Thread Tom Lane
Marko Tiikkaja  writes:
> So I'm in the same pickle again.  According to pg_stat_user_indexes an
> index is being used all the time.  However, it's only being used by
> mergejoinscansel() to compare these two plans:

If it's not being used otherwise, could you drop it?

> I think it would be really important to have a way to turn off
> get_actual_variable_range() for a specific index during runtime.  Would a C
> level hook be acceptable for this?

You haven't really made a case for why you (or anyone else) should care.
As long as the planner makes the right choice, having investigated a wrong
choice doesn't seem like a bug to me.

regards, tom lane



Re: [HACKERS] [PATCH] Vacuum: Update FSM more frequently

2018-03-05 Thread Claudio Freire
On Sun, Mar 4, 2018 at 10:25 PM, Masahiko Sawada  wrote:
> On Mon, Mar 5, 2018 at 10:21 AM, Masahiko Sawada  
> wrote:
>> On Fri, Mar 2, 2018 at 10:50 PM, Claudio Freire  
>> wrote:
>>> On Fri, Mar 2, 2018 at 10:47 AM, Claudio Freire  
>>> wrote:
 On Fri, Mar 2, 2018 at 7:38 AM, Masahiko Sawada  
 wrote:
> Thank you for updating the patches!
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES 1048576
>
> Is it better if we vacuum fsm every 8GB regardless of the block size?
> Because an installation that uses >8GB block size is likely to have
> the pages less than what an 8GB block size installation has, the
> current patch might lead to delay fsm vacuum. What do you think? If
> it's better, we can define it as follows.
>
> #define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)

 I honestly don't know. I've never tweaked page size, and know nobody
 that did, so I have no experience with those installations.

 Lets go with your proposal.

>
> --
> @@ -470,7 +484,9 @@ lazy_scan_heap(Relation onerel, int options,
> LVRelStats *vacrelstats,
>  TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
>  TransactionId relminmxid = onerel->rd_rel->relminmxid;
>  BlockNumber empty_pages,
> -vacuumed_pages;
> +vacuumed_pages,
> +fsm_updated_pages,
> +vacuum_fsm_every_pages;
>  doublenum_tuples,
>  tups_vacuumed,
>  nkeep,
>
> Regarding fsm_updated_pages variable name, I think it's better to be
> freespace_updated_pages because we actually counts the page updated
> its freespace, not fsm.

 Good point.

 New version attached.
>>>
>>> Sorry, forgot to actually squash. Now the real new version is attached.
>>
>> Thank you for updating. I think the patches has enough discussion and
>> quality, so can go to the committer's review. I've marked them as
>> Ready for Committer.
>>
>
> Oops, the following comment needs to be updated. Since it would be
> better to defer decision of this behavior to committers I'll keep the
> status of this patch as Ready for Committer behavior.
>
> +/*
> + * When a table has no indexes, vacuum the FSM at most every this
> + * many dirty pages. With a default page size of 8kb, this value
> + * basically means 8GB of dirtied pages.
> + */
> +#define VACUUM_FSM_EVERY_PAGES ((8 * 1024 * 1024) / BLCKSZ)
> +

I just noticed that it actually computes 8MB (not GB) of pages.

Attached a fix for that and the comment update.
From 4f6d694dd2bc8e0c1185682716175ecb08ca6c04 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Fri, 28 Jul 2017 21:31:59 -0300
Subject: [PATCH 1/4] Vacuum: Update FSM more frequently

Make Vacuum update the FSM more frequently, to avoid the case where
autovacuum fails to reach the point where it updates the FSM in
highly contended tables.

Introduce a tree pruning threshold to FreeSpaceMapVacuum that avoids
recursing into branches that already contain enough free space, to
avoid having to traverse the whole FSM and thus induce quadratic
costs. Intermediate FSM vacuums are only supposed to make enough
free space visible to avoid extension until the final (non-partial)
FSM vacuum.

Run partial FSM vacuums after each index pass, which is a point
at which whole ranges of the heap have been thorougly cleaned, and
we can expect no further updates to those ranges of the FSM save
for concurrent activity. When there are no indexes, and thus no
index passes, run partial FSM vacuums every 8GB of dirtied pages
or 1/8th of the relation, whichever is highest. This allows some
partial work to be made visible without incurring quadratic cost.

In any case, FSM are small in relation to the table, so even when
quadratic cost is involved, it should not be problematic. Index
passes already incur quadratic cost, and the addition of the FSM
is unlikely to be measurable.

Run a partial FSM vacuum with a low pruning threshold right at
the beginning of the VACUUM to finish up any work left over from
prior canceled vacuum runs, something that is common in highly
contended tables when running only autovacuum.
---
 src/backend/access/brin/brin.c|  2 +-
 src/backend/access/brin/brin_pageops.c| 10 ++--
 src/backend/commands/vacuumlazy.c | 80 ---
 src/backend/storage/freespace/freespace.c | 27 +--
 src/backend/storage/freespace/indexfsm.c  |  2 +-
 src/include/storage/freespace.h   |  2 +-
 6 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 68b3371..24d6df7 100644
--- a/src/backen

Re: Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-05 Thread David Steele
Hi Tomas,

On 1/8/18 3:28 PM, Tomas Vondra wrote:
> 
> 
> On 01/08/2018 08:39 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> As I already mentioned, Tom's updated patch is better than what I
>>> posted initially, and I agree with his approach to the remaining
>>> issues he pointed out. But I somehow assumed that he's already
>>> looking into that. Tom, do you plan to look into this patch soon,
>>> or should I?
>>
>> No, I thought you were going to run with those ideas. I have a lot
>> of other stuff on my plate ...
>>
> 
> OK, will do.

What the status of this patch?  It's been waiting on author since last
November, though I can see there was some confusion over who was working
on it.

Given that it's a bug fix it would be good to see a patch for this CF,
or very soon after.

Thanks,
-- 
-David
da...@pgmasters.net



Re: get_actual_variable_range vs idx_scan/idx_tup_fetch, again

2018-03-05 Thread Marko Tiikkaja
On Mon, Mar 5, 2018 at 5:08 PM, Tom Lane  wrote:

> Marko Tiikkaja  writes:
> > So I'm in the same pickle again.  According to pg_stat_user_indexes an
> > index is being used all the time.  However, it's only being used by
> > mergejoinscansel() to compare these two plans:
>
> If it's not being used otherwise, could you drop it?
>

Yes.  I want to drop it, as I think it's useless, but it's hard to be 100%
sure.


> > I think it would be really important to have a way to turn off
> > get_actual_variable_range() for a specific index during runtime.  Would
> a C
> > level hook be acceptable for this?
>
> You haven't really made a case for why you (or anyone else) should care.
> As long as the planner makes the right choice, having investigated a wrong
> choice doesn't seem like a bug to me.
>

Because I'm certain the planner would make the right choice even without
the index, and I want it gone.


.m


Re: Re: WIP Patch: Pgbench Serialization and deadlock errors

2018-03-05 Thread David Steele
Hello Marina,

On 1/12/18 12:01 PM, Marina Polyakova wrote:
>>
>> Yep. I'm trying to suggest an incremental path with simple but yet
>> quite useful things first.
> 
> This question ("if there's a failure what savepoint we should rollback
> to and start the execution again? ...") mostly concerns the basic idea
> of how to maintain the savepoints in this feature, rather than the exact
> architecture of the code, so we can discuss this now :)

This patch was marked Waiting on Author on Jan 8 and no new patch was
submitted before this commitfest.

I think we should mark this patch as Returned with Feedback.

Thanks,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2018-03-05 Thread Tomas Vondra


On 03/05/2018 04:12 PM, David Steele wrote:
> Hi Tomas,
> 
> On 1/8/18 3:28 PM, Tomas Vondra wrote:
>>
>>
>> On 01/08/2018 08:39 PM, Tom Lane wrote:
>>> Tomas Vondra  writes:
 As I already mentioned, Tom's updated patch is better than what I
 posted initially, and I agree with his approach to the remaining
 issues he pointed out. But I somehow assumed that he's already
 looking into that. Tom, do you plan to look into this patch soon,
 or should I?
>>>
>>> No, I thought you were going to run with those ideas. I have a lot
>>> of other stuff on my plate ...
>>>
>>
>> OK, will do.
> 
> What the status of this patch?  It's been waiting on author since last
> November, though I can see there was some confusion over who was working
> on it.
> 
> Given that it's a bug fix it would be good to see a patch for this CF,
> or very soon after.
> 

Yeah, it's the next thing on my TODO.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Too much memory allocated for ReorderBufferDiskChange

2018-03-05 Thread Antonin Houska
I suggest the following change to avoid excessive
allocation. sizeof(ReorderBufferDiskChange) should already be contained in
ondisk->size.

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
new file mode 100644
index c72a611..84f6e05
*** a/src/backend/replication/logical/reorderbuffer.c
--- b/src/backend/replication/logical/reorderbuffer.c
*** ReorderBufferRestoreChanges(ReorderBuffe
*** 2366,2373 
  
ondisk = (ReorderBufferDiskChange *) rb->outbuf;
  
!   ReorderBufferSerializeReserve(rb,
! 
sizeof(ReorderBufferDiskChange) + ondisk->size);
ondisk = (ReorderBufferDiskChange *) rb->outbuf;
  
pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_READ);
--- 2366,2372 
  
ondisk = (ReorderBufferDiskChange *) rb->outbuf;
  
!   ReorderBufferSerializeReserve(rb, ondisk->size);
ondisk = (ReorderBufferDiskChange *) rb->outbuf;
  
pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_READ);

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers

2018-03-05 Thread David Steele
Hi Craig,

On 1/21/18 5:45 PM, Craig Ringer wrote:
> On 6 January 2018 at 08:28, Alvaro Herrera  > wrote:
> 
> I think this should use ReadDirExtended with an elevel less than ERROR,
> and do nothing.
> 
> Why have strcmp(.) and strcmp(..)?  These are going to be skipped by the
> comparison to "xid" prefix anyway.  Looks like strcmp processing
> power waste.
> 
> Please don't use bare sprintf() -- upgrade to snprintf.
> 
> With this coding, if I put a root-owned file "xidfoo" in a replslot
> directory, it will PANIC the server.  Is that okay?  Why not read the
> file name with sscanf(), since we know the precise format it has?  Then
> we don't need to bother with random crap left around.  Maybe a good time
> to put the "xid-%u-lsn-%X-%X.snap" literal in a macro?   I guess the
> rationale is that if you let random people put "xidfoo" files in your
> replication slot dirs, you deserve a PANIC anyway, but I'm not sure.
> 
> I'm happy to address those comments.
> 
> The PANIC probably made sense when it was only done on startup, but not
> now it's at runtime.
> 
> The rest is mainly retained from the prior code, but it's a good chance
> to make those changes.
This patch was marked Waiting on Author last December.  Do you know when
you'll have a chance to provide an updated patch?

Given that it's a bug fix it would be good to see a patch for this CF,
or very soon after.

Thanks,
-- 
-David
da...@pgmasters.net



planner failure with ProjectSet + aggregation + parallel query

2018-03-05 Thread Robert Haas
While trying to track down a bug today, I found a different bug.

As of 6946280cded903b6f5269fcce105f8ab1d455d33:

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# set min_parallel_table_scan_size = 0;
SET
rhaas=# set parallel_setup_cost = 0;
SET
rhaas=# set parallel_tuple_cost = 0;
SET
rhaas=# select generate_series(1, a) from foo group by a;
ERROR:  ORDER/GROUP BY expression not found in targetlist

Without the SET commands, or without the GROUP BY, or without the SRF,
it successfully constructs a plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Alvaro Herrera
Please make ctx->discard a bool, not an int.

You can initialize it like this:
ctx->discard = strcmp(ctx->directory, DEVNULL) == 0;

Why do you change the mode to create directory to 0711 from 0700?

You have a cuddled "else" in the last hunk.  Please uncuddle.



-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Vik Fearing
On 03/05/2018 11:33 AM, Edmund Horner wrote:
> On 5 March 2018 at 21:46, Vik Fearing  wrote:
>> Tab completion on functions in SELECT (a subset of this thread's patch)
>> is quite important to me personally.  I will work on this in the coming
>> days.
> 
> It's something I've missed numerous times in the last few months at
> work.  I guess I should really be running a psql with my own
> preliminary patches applied; but I'm only a novice pgsql-hacker, and
> easily default to using the official one.
> 
> If you are going to work on a patch just for functions, you should
> probably make it a SchemaQuery.  I did not find a way to support
> schema-qualified functions in a query that also returned column names.

I meant that I was going to work on your patch, with you, to quickly get
this in v11.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: Re: [HACKERS] PoC: custom signal handler for extensions

2018-03-05 Thread David Steele
Hello Maksim,

On 1/27/18 2:19 PM, Arthur Zakirov wrote:
> On Mon, Jan 22, 2018 at 02:34:58PM +0300, Maksim Milyutin wrote:
> 
> The patch is applied and build.
> 
>> +/*
>> + * UnregisterCustomProcSignal
>> + *  Release slot of specific custom signal.
>> + *
>> + * This function have to be called in _PG_init or _PG_fini functions of
>> + * extensions at the stage of loading shared preloaded libraries. Otherwise 
>> it
>> + * throws fatal error. Also it throws fatal error if argument is not valid
>> + * custom signal.
>> + */
>> +void
>> +UnregisterCustomProcSignal(ProcSignalReason reason)
>> +{
>> +if (!process_shared_preload_libraries_in_progress)
>> +ereport(FATAL, (errcode(ERRCODE_INTERNAL_ERROR),
>> +errmsg("cannot unregister 
>> custom signal after startup")));
> 
> Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
> An extension registers a handler and then unregister it doing
> nothing. It seems useless.
> 
> Also process_shared_preload_libraries_in_progress within _PG_fini() will
> be false I think. _PG_fini() won't be called though, because
> implementation of internal_unload_library() is disabled.
> 
> Actually, is there need in UnregisterCustomProcSignal() at all? It
> unregisters a handler only in current backend, for actual unergistering
> we need to call it everywhere, if I'm not mistaken.

This patch has been in Waiting on Author state for almost three weeks.
Have you had a chance to consider Arthur's suggestions?

Do you know when you'll have an updated patch available?

Thanks,
-- 
-David
da...@pgmasters.net



Re: Too much memory allocated for ReorderBufferDiskChange

2018-03-05 Thread Euler Taveira
2018-03-05 12:26 GMT-03:00 Antonin Houska :
> !   ReorderBufferSerializeReserve(rb,
> ! 
> sizeof(ReorderBufferDiskChange) + ondisk->size);
> ondisk = (ReorderBufferDiskChange *) rb->outbuf;
>
ReorderBufferSerializeReserve() always does a repalloc here
(ReorderBufferDiskChange size is always greater than zero) then the
allocation size is correct.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: [Patch] Checksums for SLRU files

2018-03-05 Thread Andrey Borodin
Hi, Ivan!

I've found that there are few more places with SLRU items per page, where you 
have to update usable page size. Please find the diff attached.
I agree that there is a little chance to get this commitable quickly, but 
still, the feature worth working on, from my point of view.

Best regards, Andrey Borodin.


moar_slru.diff
Description: Binary data


Re: PATCH: psql tab completion for SELECT

2018-03-05 Thread Vik Fearing
On 03/05/2018 11:21 AM, Edmund Horner wrote:
> I am still just slightly unclear on where we are in relation to the
> SAVEPOINT patch -- is that redundant now?

My vote is to reject that patch.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: INOUT parameters in procedures

2018-03-05 Thread Pavel Stehule
Hi

2018-02-28 23:28 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> This patch set adds support for INOUT parameters to procedures.
> Currently, INOUT and OUT parameters are not supported.
>
> A top-level CALL returns the output parameters as a result row.  In
> PL/pgSQL, I have added special support to pass the output back into the
> variables, as one would expect.
>
> These patches apply on top of the "prokind" patch set v2.  (Tom has
> submitted an updated version of that, which overlaps with some of the
> changes I've made here.  I will work on consolidating that soon.)
>
>
> So ... no OUT parameters, though.  I'm struggling to find a way to make
> this compatible with everything else.  For functions, the OUT parameters
> don't appear in the signature.  But that is not how this is specified in
> the SQL standard for procedures (I think).  In PL/pgSQL, you'd expect that
>
> CREATE PROCEDURE foo(a int, OUT b int) ...
>
> could be called like
>
> CALL foo(x, y);
>
> but that would require a different way of parsing function invocation.
>
> At the top-level, it's even more dubious.  In DB2, apparently you write
>
> CALL foo(123, ?);
>
> with a literal ? for the OUT parameters.
>
> In Oracle, I've seen CALL ... INTO syntax.
>
> Anyway, I'm leaving this out for now.  It can be worked around by using
> INOUT parameters.  Future improvements would be mainly syntax/parsing
> adjustments; the guts that I'm implementing here would remain valid.
>

I am looking on attached code, and it looks pretty well. Can be really nice
if this code will be part of release 11, because it is very interesting,
important feature feature.

Regards

p.s. can be nice, if we allow same trick with calling of OUT variables
functions in plpgsql

fx(in a, out x, out y) return int -- but requires some special mark

do $$
declare x int, y int, z int;
begin
  z := fx(10, x, y);
  raise notice '% 

Then migration from Oracle can be really easy and friendly





Pavel

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Alvaro Herrera
Thomas Munro wrote:
> On Sun, Mar 4, 2018 at 10:46 PM, Magnus Hagander  wrote:
> > Um. Have you actually seen the "mail archive app" cut long threads off in
> > other cases? Because it's certainly not supposed to do that...
> 
> Hi Magnus,
> 
> I mean the "flat" thread view:
> 
> https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com
> 
> The final message on that page is not the final message that appears
> in my mail client for the thread.  I guessed that might have been cut
> off due to some hard-coded limit, but perhaps there is some other
> reason (different heuristics for thread following?)

You're thinking of message
https://www.postgresql.org/message-id/cafjfprfa6_n10cn3vxjn9hdtqneh6a1rfnlxy0pncp63t2p...@mail.gmail.com
but that is not the same thread -- it doesn't have the References or
In-Reply-To headers (see "raw"; user/pwd is archives/antispam).  Don't
know why though -- maybe Gmail trimmed References because it no longer
fit in the DKIM signature?  Yours had a long one:
https://www.postgresql.org/message-id/raw/CAEepm%3D0VCrC-WfzZkq3YSvJXf225rDnp1ypjv%2BrjKO5d0%3DXqFg%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Michael Banck
Hi Alvaro,

Am Montag, den 05.03.2018, 12:48 -0300 schrieb Alvaro Herrera:
> Please make ctx->discard a bool, not an int.

Ok, done so now.  I forgot to mention that in the earlier resubmission,
but I had a look at the other pg_backup_*.c files and isSpecialScript
and hasSeek were ints as well, so I opted for that.

> You can initialize it like this:
>   ctx->discard = strcmp(ctx->directory, DEVNULL) == 0;

Thanks, changed it thusly.

> Why do you change the mode to create directory to 0711 from 0700?

Oh wow, that was either a mistype in vim or a very old test hack, thanks
for spotting it.

> You have a cuddled "else" in the last hunk.  Please uncuddle.

Done so now, hopefully.

Thanks again for the review, a new version is attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuerdiff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..0953290069 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -512,7 +512,10 @@ cfopen_write(const char *path, const char *mode, int compression)
 #ifdef HAVE_LIBZ
 		char	   *fname;
 
-		fname = psprintf("%s.gz", path);
+		if (strcmp(path, DEVNULL) != 0)
+			fname = psprintf("%s.gz", path);
+		else
+			fname = pg_strdup(path);
 		fp = cfopen(fname, mode, compression);
 		free_keep_errno(fname);
 #else
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 4aabb40f59..3b77023d77 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -54,6 +54,7 @@ typedef struct
 
 	cfp		   *blobsTocFH;		/* file handle for blobs.toc */
 	ParallelState *pstate;		/* for parallel backup / restore */
+	bool		discard;		/* target is DEVNULL */
 } lclContext;
 
 typedef struct
@@ -158,6 +159,8 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 
 	ctx->directory = AH->fSpec;
 
+	ctx->discard = strcmp(ctx->directory, DEVNULL) == 0;
+
 	if (AH->mode == archModeWrite)
 	{
 		struct stat st;
@@ -192,9 +195,15 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 			}
 		}
 
-		if (!is_empty && mkdir(ctx->directory, 0700) < 0)
-			exit_horribly(modulename, "could not create directory \"%s\": %s\n",
-		  ctx->directory, strerror(errno));
+		/*
+		 * Create the output directory, unless it exists already and is empty.
+		 * If we are discarding output to the null device, we must not
+		 * create it.
+		 */
+		if (!is_empty && !ctx->discard)
+			if (mkdir(ctx->directory, 0700) < 0)
+exit_horribly(modulename, "could not create directory \"%s\": %s\n",
+			  ctx->directory, strerror(errno));
 	}
 	else
 	{			/* Read Mode */
@@ -602,8 +611,10 @@ _CloseArchive(ArchiveHandle *AH)
 		/*
 		 * In directory mode, there is no need to sync all the entries
 		 * individually. Just recurse once through all the files generated.
+		 * If we are discarding output to the null device, it does not
+		 * need syncing and would emit a warning if we did.
 		 */
-		if (AH->dosync)
+		if (AH->dosync && !ctx->discard)
 			fsync_dir_recurse(ctx->directory, progname);
 	}
 	AH->FH = NULL;
@@ -659,7 +670,14 @@ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
 	lclContext *ctx = (lclContext *) AH->formatData;
 	char		fname[MAXPGPATH];
 
-	snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
+	/*
+	 * If we are discarding output to the null device, just use that as
+	 * fname.
+	 */
+	if (ctx->discard)
+		snprintf(fname, MAXPGPATH, DEVNULL);
+	else
+		snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
 
 	ctx->dataFH = cfopen_write(fname, PG_BINARY_W, AH->compression);
 
@@ -721,9 +739,18 @@ setFilePath(ArchiveHandle *AH, char *buf, const char *relativeFilename)
 	if (strlen(dname) + 1 + strlen(relativeFilename) + 1 > MAXPGPATH)
 		exit_horribly(modulename, "file name too long: \"%s\"\n", dname);
 
-	strcpy(buf, dname);
-	strcat(buf, "/");
-	strcat(buf, relativeFilename);
+	/*
+	 * If we are discarding output to the null device, we cannot set a file
+	 * path and just set the buffer to the null device.
+	 */
+	if (ctx->discard)
+		strcpy(buf, DEVNULL);
+	else
+	{
+		strcpy(buf, dname);
+		strcat(buf, "/");
+		strcat(buf, relativeFilename);
+	}
 }
 
 /*


Re: [HACKERS] PoC: custom signal handler for extensions

2018-03-05 Thread Maksim Milyutin

Hello David,


On 05.03.2018 18:50, David Steele wrote:

Hello Maksim,

On 1/27/18 2:19 PM, Arthur Zakirov wrote:


Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
An extension registers a handler and then unregister it doing
nothing. It seems useless.

Also process_shared_preload_libraries_in_progress within _PG_fini() will
be false I think. _PG_fini() won't be called though, because
implementation of internal_unload_library() is disabled.

Actually, is there need in UnregisterCustomProcSignal() at all? It
unregisters a handler only in current backend, for actual unergistering
we need to call it everywhere, if I'm not mistaken.

This patch has been in Waiting on Author state for almost three weeks.
Have you had a chance to consider Arthur's suggestions?


Yes, I want to rework my patch to enable setup of custom signals on 
working backend without preload initialization.



Do you know when you'll have an updated patch available?


I want to actuate the work on this patch for the next 12 release. Sorry, 
for now I can not keep up with the current release.


--
Regards,
Maksim Milyutin




Re: All Taxi Services need Index Clustered Heap Append

2018-03-05 Thread Ants Aasma
On Mon, Mar 5, 2018 at 2:11 PM, Darafei "Komяpa" Praliaskouski
 wrote:
>> This approach mixes well with hash
>> partitioning. It would be neat indeed if PostgreSQL do something
>> equivalent on its own, and pluggable storage work being done could
>> enable index organized tables that would help. But you probably need
>> something right now.
>
>
> Fixing glaring issues (no vacuum and thus no Index-Only Scan on append-only
> tables, vacuum processing all of the eternity of btree) by 11 will get most
> of spike-nails out of the microservice code, and we can probably live with
> them until 11 gets to RDS.
>
> I also don't see why a pluggable storage is a must for the clustered write.
> Postgres does have a mechanism for selecting the next page to write tuple
> to, right now it's just looking at FSM - but what if it just peeked at
> existing index that already has enough the data to route tuple to correct
> page on write?

The mechanism you outlined would likely work for your use case, but it
has many issues that prevent it from being universally useful. From
the top of my head:

* One extra index descent per insertion (I/O for this is necessary
anyway, but CPU work is duplicated).
* We don't currently track the amount of bloat. A mechanism that does
this needs to be added.
* If table hits the bloat limit there will be a sudden change in
behavior. This is pretty nasty from an operations point of view.
* With your (id,ts) clustering and data coming in mostly ordered by
timestamp, after initial warmup, each page will contain rows from a
single id, but different ids are arbitrarily interleaved. This is
better than current state, but people might want to have an
interleaving step bigger than 8kB to better utilize storage hardware.
* It seems that with a common (ts) clustering and age of timestamp
coming from an exponential distribution, this will quickly bloat to
threshold and then insert data in a rather arbitrary order. This is
much worse than the default behavior.

At least in my opinion these problems make it a special case
optimization that is hard to justify in core. A decent alternative
would be a plugin mechanism for locating free space for a tuple where
you can write your extension to find a suitable location for the row.

>> I guess I don't have to tell you that it looks like your needs have
>> outgrown what RDS works well with and you are in for a painful move
>> sooner or later.
>
>
> Painful move where to? If we just run a Postgres instance without RDS we'll
> get the pain of setting up Postgres and replication and backups and
> autofailover, with no visible gain except if we get some private /
> unaccepted patches applied to it. If we can get these things right upstream
> why would we want to switch?

EC2 for example. Mainly because I3 instances and ephemeral provide an
order of magnitude or two of performance improvement while costing
less. Being able to run custom extensions and patches if necessary is
a nice bonus. Yes, setting up replication, autofailover and backups is
extra work that you have to weigh against the benefits. But don't
overestimate the effort - there are some pretty nice tools available
that make a proper cluster relatively simple to set up.

> Per my colleagues, MySQL offers clustered index, also MySQL is available on
> RDS without the need of "painful move", which is doable by writing to two
> locations for a day and then pointing readers to new DB. But if we can
> instead do no move and be sure the issues are gone upstream before we hit
> the limit of spike-nails we're running on currently, wouldn't that be
> better? :)

The move off of RDS is painful because getting data out of RDS
involves either downtime or building an ad-hoc logical replication
solution. You need to solve that regardless of where you move to.

Providing an out-of-the-box solution in core PostgreSQL would of
course be best, but realistically you will be waiting at least 2 years
to get it on RDS. In the meanwhile either the buffer partition
approach I described, or a buffering microservice in front of
PostgreSQL like Aleksander recommended should fix data locality for
you. If you weren't running on RDS I would even propose using Redis as
the buffer with one key per driver and redis_fdw to make the data
accessible from within PostgreSQL.

Regards,
Ants  Aasma
--
+43-670-6056265
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: 2018-03 CFM

2018-03-05 Thread Aleksander Alekseev
Hello David,

> I would like to get a list of submitter patches totals vs the total
> number of patches they are reviewing.  In the past I have done this by
> eyeball.
> 
> Do you think you could put something like that together?

Sure, no problem. It's pretty late in my timezone (UTC+3) right now
though so I'm going to make it tomorrow.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [PoC PATCH] Parallel dump to /dev/null

2018-03-05 Thread Alvaro Herrera
I made a few amendments (here's v5) and was ready to push, when I
noticed that _StartBlobs() does not seem to be doing the right thing.
Did you test this with a few large objects?

The reason I noticed is I wondered about the amount of open() calls
(plus zlib function calls) we would save by keeping an FD open to
/dev/null, rather than opening it over and over for each object -- ie.,
maybe not touch setFilePath() at all, if possible.  That looks perhaps
too invasive, so maybe not.  But do audit other callsites that may open
files.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 5627c18f5a26ad8314f0590ce8ccf3b4093efb74 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 5 Mar 2018 13:42:29 -0300
Subject: [PATCH v5] Allow parallel dump to go to /dev/null

This is useful to quickly determine whether a dump would finish
without errors.

Author: Michael Banck
Discussion: 
https://postgr.es/m/20180201132446.ga13...@nighthawk.caipicrew.dd-dns.de
---
 src/bin/pg_dump/compress_io.c |  8 +--
 src/bin/pg_dump/pg_backup_directory.c | 43 ---
 2 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index a96da15dc1..a83b1e2ef5 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -496,7 +496,8 @@ cfopen_read(const char *path, const char *mode)
  *
  * If 'compression' is non-zero, a gzip compressed stream is opened, and
  * 'compression' indicates the compression level used. The ".gz" suffix
- * is automatically added to 'path' in that case.
+ * is automatically added to 'path' in that case (unless it's the null
+ * device).
  *
  * On failure, return NULL with an error code in errno.
  */
@@ -512,7 +513,10 @@ cfopen_write(const char *path, const char *mode, int 
compression)
 #ifdef HAVE_LIBZ
char   *fname;
 
-   fname = psprintf("%s.gz", path);
+   if (strcmp(path, DEVNULL) != 0)
+   fname = psprintf("%s.gz", path);
+   else
+   fname = pg_strdup(path);
fp = cfopen(fname, mode, compression);
free_keep_errno(fname);
 #else
diff --git a/src/bin/pg_dump/pg_backup_directory.c 
b/src/bin/pg_dump/pg_backup_directory.c
index 4aabb40f59..3b77023d77 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -54,6 +54,7 @@ typedef struct
 
cfp*blobsTocFH; /* file handle for blobs.toc */
ParallelState *pstate;  /* for parallel backup / restore */
+   booldiscard;/* target is DEVNULL */
 } lclContext;
 
 typedef struct
@@ -158,6 +159,8 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
 
ctx->directory = AH->fSpec;
 
+   ctx->discard = strcmp(ctx->directory, DEVNULL) == 0;
+
if (AH->mode == archModeWrite)
{
struct stat st;
@@ -192,9 +195,15 @@ InitArchiveFmt_Directory(ArchiveHandle *AH)
}
}
 
-   if (!is_empty && mkdir(ctx->directory, 0700) < 0)
-   exit_horribly(modulename, "could not create directory 
\"%s\": %s\n",
- ctx->directory, 
strerror(errno));
+   /*
+* Create the output directory, unless it exists already and is 
empty.
+* If we are discarding output to the null device, we must not
+* create it.
+*/
+   if (!is_empty && !ctx->discard)
+   if (mkdir(ctx->directory, 0700) < 0)
+   exit_horribly(modulename, "could not create 
directory \"%s\": %s\n",
+ ctx->directory, 
strerror(errno));
}
else
{   /* Read Mode */
@@ -602,8 +611,10 @@ _CloseArchive(ArchiveHandle *AH)
/*
 * In directory mode, there is no need to sync all the entries
 * individually. Just recurse once through all the files 
generated.
+* If we are discarding output to the null device, it does not
+* need syncing and would emit a warning if we did.
 */
-   if (AH->dosync)
+   if (AH->dosync && !ctx->discard)
fsync_dir_recurse(ctx->directory, progname);
}
AH->FH = NULL;
@@ -659,7 +670,14 @@ _StartBlob(ArchiveHandle *AH, TocEntry *te, Oid oid)
lclContext *ctx = (lclContext *) AH->formatData;
charfname[MAXPGPATH];
 
-   snprintf(fname, MAXPGPATH, "%s/blob_%u.dat", ctx->directory, oid);
+   /*
+* If we are discarding output to the null device, just use that as
+  

Re: [HACKERS] Another oddity in handling of WCO constraints in postgres_fdw

2018-03-05 Thread Arthur Zakirov
On Mon, Mar 05, 2018 at 09:44:37PM +0900, Etsuro Fujita wrote:
> I rebased the patch over HEAD.  Please find attached an updated patch.

Thank you!

IMHO, it is worth to add more explaining comment into
deparseReturningList, why it is necessary to merge WCO attributes to
RETURNING clause. You already noted it in the thread. I think it could
confuse someone who not very familiar how RETURNING is related with WITH
CHECK OPTION.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: All Taxi Services need Index Clustered Heap Append

2018-03-05 Thread Alex Kane
https://aws.amazon.com/dms/

DMS might be helpful if you need to move off of RDS


Alex Kane

On Mon, Mar 5, 2018 at 11:48 AM, Ants Aasma  wrote:

> On Mon, Mar 5, 2018 at 2:11 PM, Darafei "Komяpa" Praliaskouski
>  wrote:
> >> This approach mixes well with hash
> >> partitioning. It would be neat indeed if PostgreSQL do something
> >> equivalent on its own, and pluggable storage work being done could
> >> enable index organized tables that would help. But you probably need
> >> something right now.
> >
> >
> > Fixing glaring issues (no vacuum and thus no Index-Only Scan on
> append-only
> > tables, vacuum processing all of the eternity of btree) by 11 will get
> most
> > of spike-nails out of the microservice code, and we can probably live
> with
> > them until 11 gets to RDS.
> >
> > I also don't see why a pluggable storage is a must for the clustered
> write.
> > Postgres does have a mechanism for selecting the next page to write tuple
> > to, right now it's just looking at FSM - but what if it just peeked at
> > existing index that already has enough the data to route tuple to correct
> > page on write?
>
> The mechanism you outlined would likely work for your use case, but it
> has many issues that prevent it from being universally useful. From
> the top of my head:
>
> * One extra index descent per insertion (I/O for this is necessary
> anyway, but CPU work is duplicated).
> * We don't currently track the amount of bloat. A mechanism that does
> this needs to be added.
> * If table hits the bloat limit there will be a sudden change in
> behavior. This is pretty nasty from an operations point of view.
> * With your (id,ts) clustering and data coming in mostly ordered by
> timestamp, after initial warmup, each page will contain rows from a
> single id, but different ids are arbitrarily interleaved. This is
> better than current state, but people might want to have an
> interleaving step bigger than 8kB to better utilize storage hardware.
> * It seems that with a common (ts) clustering and age of timestamp
> coming from an exponential distribution, this will quickly bloat to
> threshold and then insert data in a rather arbitrary order. This is
> much worse than the default behavior.
>
> At least in my opinion these problems make it a special case
> optimization that is hard to justify in core. A decent alternative
> would be a plugin mechanism for locating free space for a tuple where
> you can write your extension to find a suitable location for the row.
>
> >> I guess I don't have to tell you that it looks like your needs have
> >> outgrown what RDS works well with and you are in for a painful move
> >> sooner or later.
> >
> >
> > Painful move where to? If we just run a Postgres instance without RDS
> we'll
> > get the pain of setting up Postgres and replication and backups and
> > autofailover, with no visible gain except if we get some private /
> > unaccepted patches applied to it. If we can get these things right
> upstream
> > why would we want to switch?
>
> EC2 for example. Mainly because I3 instances and ephemeral provide an
> order of magnitude or two of performance improvement while costing
> less. Being able to run custom extensions and patches if necessary is
> a nice bonus. Yes, setting up replication, autofailover and backups is
> extra work that you have to weigh against the benefits. But don't
> overestimate the effort - there are some pretty nice tools available
> that make a proper cluster relatively simple to set up.
>
> > Per my colleagues, MySQL offers clustered index, also MySQL is available
> on
> > RDS without the need of "painful move", which is doable by writing to two
> > locations for a day and then pointing readers to new DB. But if we can
> > instead do no move and be sure the issues are gone upstream before we hit
> > the limit of spike-nails we're running on currently, wouldn't that be
> > better? :)
>
> The move off of RDS is painful because getting data out of RDS
> involves either downtime or building an ad-hoc logical replication
> solution. You need to solve that regardless of where you move to.
>
> Providing an out-of-the-box solution in core PostgreSQL would of
> course be best, but realistically you will be waiting at least 2 years
> to get it on RDS. In the meanwhile either the buffer partition
> approach I described, or a buffering microservice in front of
> PostgreSQL like Aleksander recommended should fix data locality for
> you. If you weren't running on RDS I would even propose using Redis as
> the buffer with one key per driver and redis_fdw to make the data
> accessible from within PostgreSQL.
>
> Regards,
> Ants  Aasma
> --
> +43-670-6056265
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26, A-2700 Wiener Neustadt
> Web: https://www.cybertec-postgresql.com
>
>


Re: [bug fix] pg_rewind takes long time because it mistakenly copies data files

2018-03-05 Thread Fujii Masao
On Wed, Feb 28, 2018 at 3:58 PM, Tsunakawa, Takayuki
 wrote:
> From: Michael Paquier [mailto:mich...@paquier.xyz]
>> So I would propose to just do that later.  I have looked a second time at
>> your patch, attached is the set of tests I have run:
>
> Thanks so much, that has helped me a lot!
>
>> I have one small comment though.  The comment block at the beginning of
>> isRelDataFile() refers to "pg_tblspc//PG_9.4_201403261/".
>> Could you switch that to "pg_tblspc//"?
>> That's not directly the fault of your patch, but as long as we are on it..
>
> Done, thanks again for marking the CF entry.

Thanks for the patch! Pushed.

Regards,

-- 
Fujii Masao



Re: INOUT parameters in procedures

2018-03-05 Thread Douglas Doole
>
> At the top-level, it's even more dubious.  In DB2, apparently you write
>>
>> CALL foo(123, ?);
>>
>> with a literal ? for the OUT parameters.
>>
>
That's not actually as scary as it seems.

DB2 has two cases where you can use a ? like that:

1) In CLP (DB2's equivalent to psql)

DB2 draws a distinct line between procedures and functions, and you have to
invoke procedures with CALL FOO(...). Since CLP doesn't support variables
(and SQL variables didn't exist in DB2 when the CALL statement was
introduced), they needed a way to say "there's an output parameter here" so
they settled on using ? as the placeholder. (? was chosen because it ties
nicely into the next point.)

2) In dynamic SQL

DB2 has traditionally used ? as a parameter marker (placeholder for a
variable) in dynamic SQL. So the usage would look something like:

DECLARE res INTEGER;
DECLARE text VARCHAR(50);

SET text = 'CALL foo(123, ?)';
PREPARE stmt FROM text;
EXECUTE stmt INTO res; -- This invokes the statement and maps the ? into
the variable "res"

If you didn't need/want to use dynamic SQL, then you could have simply
written:

CALL foo(123, res);

- Doug Doole
Salesforce


Re: [HACKERS] Creating backup history files for backups taken from standbys

2018-03-05 Thread Fujii Masao
On Mon, Mar 5, 2018 at 11:01 PM, David Steele  wrote:
> On 3/5/18 1:06 AM, Michael Paquier wrote:
>> On Fri, Mar 02, 2018 at 03:41:57PM -0500, David Steele wrote:
>>> On 3/2/18 1:03 PM, Fujii Masao wrote:
 On Fri, Mar 2, 2018 at 1:07 PM, Michael Paquier  
 wrote:
> We would talk about two backups running
> simultaneously on a standby, which would overlap with each other to
> generate a file aimed only at being helpful for debugging purposes, and
> we provide no information now for backups taken from standbys.  We could
> of course make that logic a bit smarter by checking if there is an
> extsing file with the same name and create a new file with a different
> name.  But is that worth the complication? That's where I am not
> convinced, and that's the reason why this patch is doing things this
> way.
>>
>> Sorry for my late reply, I was thinking more about the whole thing.
>>
 What about including the backup history file in the base backup instead of
 creating it in the standby's pg_wal and archiving it? As the good side 
 effect
 of this approach, we can use the backup history file for debugging purpose
 even when WAL archiving is not enabled.
>>
>> That's not going to be much helpful for tar'ed base backups as the
>> backup history file would be at the end of the stream :(  For a debug
>> file, that's not really helpful.
>>
>>> I don't think this is a good idea, even if it does solve the race
>>> condition.  First, it would be different from the way primary-style
>>> backups generate this file.  Second, these files would not be excluded
>>> from future restore / backup cycles so they could build up after a while
>>> and cause confusion.  I guess we could teach PG to delete them or
>>> pg_basebackup to ignore them, but that doesn't seem worth the effort.
>>
>> Something I did not consider though is that this causes archive commands
>> to choke on those identical file names, so any archive command which is
>> not able to handle that would cause WAL to bloat on the standby.  For
>> this reason I would rather drop the current approach taken.  Remember
>> that the docs tell to use a plain "cp" for example, so this would cause
>> standbys to go silently down.
>
> That's a good point.  The pgBackRest archive command would definitely
> not like this.
>
>> Something that I would find way more portable though is the addition of
>> the backup history file in the output of pg_stop_backup(), which would
>> map as well with what Fujii-san's idea to add this file to the backup
>> data itself, and do the same for backups taken on a primary.  However
>> this would mean moving the 'c' marker marking the end of the tar stream
>> within do_pg_stop_backup(), and I am in no way a fan of that: the
>> handling of the tar stream should be out of reach of the start and stop
>> phases.
>
> If the file is stored with the backup instead of the archive then I
> don't think it adds much.  Why not just add stop time and stop LSN to
> backup_label and get rid of the history file entirely.
>
> I've been working closely with backup for nearly five years now and I've
> need had need to look at a .backup file.  Other than writing code to
> handle it in archiving, I've barely given it a thought.
>
>> Another idea that I have here as well would be to change a bit the
>> history backup file name so as the backup name is added to it, but that
>> does not fly high as pg_basebackup uses its own name with spaces
>> (Yeah!), or even better the PID of the backend taking the backup.
>
> This could work, but seems complicated for little benefit.  It would
> also require some archive commands to be updated to understand the new
> format.
>
>> The renaming is more appealing, still not perfect.  So my mood of the
>> day would be to just drop the patch entirely.
>
> I think that's the best idea for now.  It doesn't seem worth using
> cycles in the last CF coming up with a new approach.

I have no objection to mark the patch "returned with feedback".

Regards,

-- 
Fujii Masao



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Alvaro Herrera
David G. Johnston wrote:

> This bug has an obvious if annoying work-around and fixing the bug will
> likely cause people's code, that uses said work-around, to fail.  Breaking
> people's working code in stable release branches is generally a no-no.
> 
> However, given that this was discovered 4 months after the feature was
> released suggests to me that we are justified, and community-serving, to
> back-patch this.  Put more bluntly, we can ask for more leeway in the first
> few patch releases of a new feature since more people will benefit from 5
> years of a fully-baked feature than may be harmed by said change.  We
> shouldn't abuse that but an obvious new feature bug/oversight like this
> seems reasonable.

I agree with this argumentation and in my opinion we should back-patch
this as a bugfix.  Certainly if I had remembered about CREATE TABLE LIKE
I would have stalled the ext.stats patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PATCH: Unlogged tables re-initialization tests

2018-03-05 Thread David Steele
Hi,

On 3/1/18 11:59 PM, Michael Paquier wrote:
> On Thu, Mar 01, 2018 at 01:12:19PM -0500, David Steele wrote:
>> But your point is well-taken.  No symlinks are used in this test so it
>> *should* work.
>>
>> Michael, what do you think?
> 
> Perl's symlink() does not work on Windows.  It does not fly higher than
> that, and that's the reason why a good chunk of the tests are skipped
> for pg_basebackup.  If perl was to have a version of symlink() which
> works, say with junction points, or if Windows was to have a sane
> symlink implementation (or with [1]?), or if it was possible to create
> junction points using an in-core implementation in perl, then those
> tests could not be skipped. But it seems that none of those scenarios
> have happened yet.
> 
> From what I read in your patch, it seems to me that this test should
> work.  If they don't for whatever reason, your patch then does not give
> a correct justification explaining why they should be skipped.

Thanks for the input, Michael.

Attached is a new version that does not skip tests on Windows.  I don't
have any way to test it, but if one of you do that would be much
appreciated.

Thanks!
-- 
-David
da...@pgmasters.net
From 04550fb532dc4d3894a92d397f6303fddcaf75b2 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Mon, 5 Mar 2018 12:21:53 -0500
Subject: [PATCH 1/1] Reinit.c regression tests.

Add regression tests for reinit.c.

These were originally written for a patch that refactored reinit.c.  That 
refactor ended up not being needed, but it seems like a good idea to add the 
tests.
---
 src/test/recovery/t/014_unlogged_reinit.pl | 96 ++
 1 file changed, 96 insertions(+)
 create mode 100644 src/test/recovery/t/014_unlogged_reinit.pl

diff --git a/src/test/recovery/t/014_unlogged_reinit.pl 
b/src/test/recovery/t/014_unlogged_reinit.pl
new file mode 100644
index 00..290196291f
--- /dev/null
+++ b/src/test/recovery/t/014_unlogged_reinit.pl
@@ -0,0 +1,96 @@
+# Tests that unlogged tables are properly reinitialized after a crash.
+#
+# The behavior should be the same when restoring from a backup but that is not
+# tested here (yet).
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Initialize node without replication settings
+my $node = get_new_node('main');
+
+$node->init;
+$node->start;
+my $pgdata = $node->data_dir;
+
+# Create an unlogged table to test that forks other than init are not copied
+$node->safe_psql('postgres', 'CREATE UNLOGGED TABLE base_unlogged (id int)');
+
+my $baseUnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('base_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+
+# Create unlogged tables in a tablespace
+my $tablespaceDir = undef;
+my $ts1UnloggedPath = undef;
+
+$tablespaceDir = TestLib::tempdir . "/ts1";
+
+mkdir($tablespaceDir)
+   or die "unable to mkdir \"$tablespaceDir\"";
+
+$node->safe_psql('postgres',
+   "CREATE TABLESPACE ts1 LOCATION '$tablespaceDir'");
+$node->safe_psql('postgres',
+   'CREATE UNLOGGED TABLE ts1_unlogged (id int) TABLESPACE ts1');
+
+   $ts1UnloggedPath = $node->safe_psql('postgres',
+   q{select pg_relation_filepath('ts1_unlogged')});
+
+# Make sure main and init forks exist
+ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ok(-f "$pgdata/$ts1UnloggedPath", 'main fork in tablespace');
+
+# Crash the postmaster
+$node->stop('immediate');
+
+# Write forks to test that they are removed during recovery
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_vm"],
+   'touch vm fork in base');
+$node->command_ok(['touch', "$pgdata/${baseUnloggedPath}_fsm"],
+   'touch fsm fork in base');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${baseUnloggedPath}")
+   or die "unable to remove \"${baseUnloggedPath}\"";
+
+# Write forks to test that they are removed by recovery
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_vm"],
+   'touch vm fork in tablespace');
+$node->command_ok(['touch', "$pgdata/${ts1UnloggedPath}_fsm"],
+   'touch fsm fork in tablespace');
+
+# Remove main fork to test that it is recopied from init
+unlink("$pgdata/${ts1UnloggedPath}")
+   or die "unable to remove \"${ts1UnloggedPath}\"";
+
+# Start the postmaster
+$node->start;
+
+# Check unlogged table in base
+ok(-f "$pgdata/${baseUnloggedPath}_init", 'init fork in base');
+ok(-f "$pgdata/$baseUnloggedPath", 'main fork in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_vm", 'vm fork not in base');
+ok(!-f "$pgdata/${baseUnloggedPath}_fsm", 'fsm fork not in base');
+
+# Drop unlogged table
+$node->safe_psql('postgres', 'DROP TABLE base_unlogged');
+
+# Check unlogged table in tablespace
+ok(-f "$pgdata/${ts1UnloggedPath}_init", 'init fork in tablespace');
+ok(-f "$pgdata/$ts1UnloggedPath", 

Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative

2018-03-05 Thread Andres Freund
On 2018-03-05 10:41:01 +0900, Michael Paquier wrote:
> > If no new TLS library is supported in v11, we still got cleaner SSL support 
> > out
> > of it due to the work performed to further remove our dependency on 
> > OpenSSL, so
> > we still come out on top IMO. Thanks Peter et.al!
> 
> I am definitely interested in plugging in more generic APIs for this
> release, so as people can also fork Postgres and implement their own SSL
> implementation more easily.

I don't think that should be a goal. A positive side-effect, yes, but we
imo shouldn't let that as a goal guide us.

Greetings,

Andres Freund



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Tomas Vondra
Hi,

On 03/03/2018 11:20 AM, David Rowley wrote:
> I've attached a rebased patch which fixes up the conflicts caused by 8b08f7d.
> 

The patch seems fine to me. A couple of minor points

1) I wonder if we should make the list in create_table.sgml to also be
alphabetically sorted.

2) I think the code in parse_utilcmd.c was missing a bunch of comments.
Nothing particularly serious, but the surrounding code has them ...

3) The transformExtendedStatistics call in transformCreateStmt was a bit
too high, interleaving with the transforms of various constraints. I
think we should move it a couple of lines down, to keep those calls
together (transformAlterTableStmt also does the call after handling all
the constraint-related stuff).

4) I see you've added generateClonedExtStatsStmt to parse_utilcmd.h, but
it was only really called in parse_utilcmd.c, so I've made it static. I
don't think we need to expose stuff unnecessarily.

The attached patch does those changes - feel free to pick only those you
like / agree with.


BTW the last point made me thinking, because parse_utilcmd.h also
exposes generateClonedIndexStmt. That is however necessary, because it's
called from DefineRelation when copying indexes from partitioned table
to partitions. I'm wondering - shouldn't we do the same thing for
extended statistics?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 51804d7..3851546 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -82,7 +82,7 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
 
 and like_option is:
 
-{ INCLUDING | EXCLUDING } { DEFAULTS | CONSTRAINTS | IDENTITY | INDEXES | STORAGE | STATISTICS | COMMENTS | ALL }
+{ INCLUDING | EXCLUDING } { COMMENTS | CONSTRAINTS | DEFAULTS | IDENTITY | INDEXES | STATISTICS | STORAGE | ALL }
 
 and partition_bound_spec is:
 
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 21fb6db..9b109d7 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -139,6 +139,8 @@ static void transformPartitionCmd(CreateStmtContext *cxt, PartitionCmd *cmd);
 static void validateInfiniteBounds(ParseState *pstate, List *blist);
 static Const *transformPartitionBoundValue(ParseState *pstate, A_Const *con,
 			 const char *colName, Oid colType, int32 colTypmod);
+static CreateStatsStmt *generateClonedExtStatsStmt(RangeVar *heapRel, Oid heapRelid,
+		   Oid source_statsid);
 
 
 /*
@@ -328,8 +330,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	 */
 	transformIndexConstraints(&cxt);
 
-	transformExtendedStatistics(&cxt);
-
 	/*
 	 * Postprocess foreign-key constraints.
 	 */
@@ -341,6 +341,11 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	transformCheckConstraints(&cxt, !is_foreign_table ? true : false);
 
 	/*
+	 * Postprocess extended statistics.
+	 */
+	transformExtendedStatistics(&cxt);
+
+	/*
 	 * Output results.
 	 */
 	stmt->tableElts = cxt.columns;
@@ -1195,6 +1200,9 @@ transformTableLikeClause(CreateStmtContext *cxt, TableLikeClause *table_like_cla
 		}
 	}
 
+	/*
+	 * Likewise, copy extended statistics if requested
+	 */
 	if (table_like_clause->options & CREATE_TABLE_LIKE_STATISTICS)
 	{
 		List		   *parent_extstats;
@@ -1608,7 +1616,7 @@ generateClonedIndexStmt(RangeVar *heapRel, Oid heapRelid, Relation source_idx,
  * extended statistic "source_statsid", for the rel identified by heapRel and
  * heapRelid.
  */
-CreateStatsStmt *
+static CreateStatsStmt *
 generateClonedExtStatsStmt(RangeVar *heapRel, Oid heapRelid,
 		   Oid source_statsid)
 {
@@ -2246,13 +2254,16 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 	return index;
 }
 
+/*
+ * transformExtendedStatistics
+ *		handle extended statistics
+ *
+ * Right now, there's nothing to do here, so we just copy the list.
+ */
 static void
 transformExtendedStatistics(CreateStmtContext *cxt)
 {
-	ListCell *lc;
-
-	foreach(lc, cxt->extstats)
-		cxt->alist = lappend(cxt->alist, lfirst(lc));
+	cxt->alist = list_copy(cxt->extstats);
 }
 
 /*
@@ -3094,6 +3105,7 @@ transformAlterTableStmt(Oid relid, AlterTableStmt *stmt,
 		newcmds = lappend(newcmds, newcmd);
 	}
 
+	/* Postprocess extended statistics */
 	transformExtendedStatistics(&cxt);
 
 	/* Close rel */
diff --git a/src/include/parser/parse_utilcmd.h b/src/include/parser/parse_utilcmd.h
index cdc98d8..35ac979 100644
--- a/src/include/parser/parse_utilcmd.h
+++ b/src/include/parser/parse_utilcmd.h
@@ -31,8 +31,5 @@ extern IndexStmt *generateClonedIndexStmt(RangeVar *heapRel, Oid heapOid,
 		Relation source_idx,
 		const AttrNumber *attmap, int attmap_length,
 		Oid *constraintOid);
-extern CreateStatsStmt *generateClonedExtStatsStmt(RangeVar *heapRel,
-		

Re: INOUT parameters in procedures

2018-03-05 Thread Peter Eisentraut
On 3/5/18 11:00, Pavel Stehule wrote:
> I am looking on attached code, and it looks pretty well. Can be really
> nice if this code will be part of release 11, because it is very
> interesting, important feature feature.

Here is an updated patch, rebased on top of several recent changes, also
added more documentation and tests in other PLs.

> p.s. can be nice, if we allow same trick with calling of OUT variables
> functions in plpgsql
> 
> fx(in a, out x, out y) return int -- but requires some special mark
> 
> do $$
> declare x int, y int, z int;
> begin
>   z := fx(10, x, y);
>   raise notice '% 
> 
> Then migration from Oracle can be really easy and friendly

This would require some changes to how routines are looked up, because
we currently ignore OUT parameters there.  That code does not exist yet.
 But it's certainly a plausible extension for the future.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 8ce8ae9e59611e1a01f7507a6595f50416b761cc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 5 Mar 2018 12:45:33 -0500
Subject: [PATCH v2] Support INOUT parameters in procedures

In a top-level CALL, the values of INOUT parameters will be returned as
a result row.  In PL/pgSQL, the values are assigned back to the input
parameters.  In other languages, the same convention as for return a
record from a function is used.  That does not require any code changes
in the PL implementations.
---
 doc/src/sgml/plperl.sgml   |  14 +++
 doc/src/sgml/plpgsql.sgml  |  16 +++
 doc/src/sgml/plpython.sgml |  11 ++
 doc/src/sgml/pltcl.sgml|  12 +++
 doc/src/sgml/ref/create_procedure.sgml |   5 +-
 src/backend/catalog/pg_proc.c  |   3 +-
 src/backend/commands/functioncmds.c|  48 +++--
 src/backend/tcop/utility.c |   3 +-
 src/backend/utils/fmgr/funcapi.c   |  11 +-
 src/include/commands/defrem.h  |   3 +-
 src/include/funcapi.h  |   3 +-
 src/pl/plperl/expected/plperl_call.out |  25 +
 src/pl/plperl/sql/plperl_call.sql  |  22 
 src/pl/plpgsql/src/expected/plpgsql_call.out   |  71 +
 .../plpgsql/src/expected/plpgsql_transaction.out   |   2 +-
 src/pl/plpgsql/src/pl_comp.c   |  10 +-
 src/pl/plpgsql/src/pl_exec.c   | 118 +
 src/pl/plpgsql/src/pl_funcs.c  |  25 +
 src/pl/plpgsql/src/pl_gram.y   |  38 +--
 src/pl/plpgsql/src/pl_scanner.c|   1 +
 src/pl/plpgsql/src/plpgsql.h   |  12 +++
 src/pl/plpgsql/src/sql/plpgsql_call.sql|  66 
 src/pl/plpython/expected/plpython_call.out |  23 
 src/pl/plpython/plpy_exec.c|  24 ++---
 src/pl/plpython/sql/plpython_call.sql  |  20 
 src/pl/tcl/expected/pltcl_call.out |  26 +
 src/pl/tcl/sql/pltcl_call.sql  |  23 
 27 files changed, 588 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml
index cff7a847de..9295c03db9 100644
--- a/doc/src/sgml/plperl.sgml
+++ b/doc/src/sgml/plperl.sgml
@@ -278,6 +278,20 @@ PL/Perl Functions and Arguments
hash will be returned as null values.
   
 
+  
+   Similarly, output parameters of procedures can be returned as a hash
+   reference:
+
+
+CREATE PROCEDURE perl_triple(INOUT a integer, INOUT b integer) AS $$
+my ($a, $b) = @_;
+return {a => $a * 3, b => $b * 3};
+$$ LANGUAGE plperl;
+
+CALL perl_triple(5, 10);
+
+  
+
   
 PL/Perl functions can also return sets of either scalar or
 composite types.  Usually you'll want to return rows one at a
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index c1e3c6a19d..6c25116538 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1870,6 +1870,22 @@ Returning From a Procedure
  then NULL must be returned.  Returning any other value
  will result in an error.
 
+
+
+ If a procedure has output parameters, then the output values can be
+ assigned to the parameters as if they were variables.  For example:
+
+CREATE PROCEDURE triple(INOUT x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+x := x * 3;
+END;
+$$;
+
+CALL triple(5);
+
+

 

diff --git a/doc/src/sgml/plpython.sgml b/doc/src/sgml/plpython.sgml
index ba79beb743..3b7974690e 100644
--- a/doc/src/sgml/plpython.sgml
+++ b/doc/src/sgml/plpython.sgml
@@ -649,6 +649,17 @@ Composite Types
 $$ LANGUAGE plpythonu;
 
 SELECT * FROM multiout_simple();
+
+   
+
+   
+Output parameters of procedures are passed back the same way.  For example:
+
+CREATE PROCEDURE python_triple(INOUT a integ

Re: INOUT parameters in procedures

2018-03-05 Thread Pavel Stehule
2018-03-05 19:38 GMT+01:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 3/5/18 11:00, Pavel Stehule wrote:
> > I am looking on attached code, and it looks pretty well. Can be really
> > nice if this code will be part of release 11, because it is very
> > interesting, important feature feature.
>
> Here is an updated patch, rebased on top of several recent changes, also
> added more documentation and tests in other PLs.
>
> > p.s. can be nice, if we allow same trick with calling of OUT variables
> > functions in plpgsql
> >
> > fx(in a, out x, out y) return int -- but requires some special mark
> >
> > do $$
> > declare x int, y int, z int;
> > begin
> >   z := fx(10, x, y);
> >   raise notice '% 
> >
> > Then migration from Oracle can be really easy and friendly
>
> This would require some changes to how routines are looked up, because
> we currently ignore OUT parameters there.  That code does not exist yet.
>  But it's certainly a plausible extension for the future.
>

sure - this is topic for 12 release. But it can fix more than one issue
when PL/SQL code is migrated.

note: in this case we should to return one parameter more. Out parameters +
RETURN expression result.

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Optimize Arm64 crc32c implementation in Postgresql

2018-03-05 Thread Heikki Linnakangas

On 02/03/18 06:42, Andres Freund wrote:

On 2018-03-02 11:37:52 +1300, Thomas Munro wrote:

So... that stuff probably needs either a configure check for the
getauxval function and/or those headers, or an OS check?


It'd probably be better to not rely on os specific headers, and instead
directly access the capabilities.


Anyone got an idea on how to do that? I googled around a bit, but 
couldn't find any examples. All the examples I could find very 
Linux-specific, and used getauxval(), except for this in the FreeBSD 
kernel itself: 
https://github.com/freebsd/freebsd/blob/master/sys/libkern/crc32.c#L775. 
I'm no expert on FreeBSD, but that doesn't seem suitable for use in a 
user program.


In any case, I reworked this patch to follow the example of the existing 
code more closely. Notable changes:


* Use compiler intrinsics instead of inline assembly.

* If the target architecture has them, use the CRC instructions without 
a runtime check. You'll get that if you use "CFLAGS=armv8.1-a", for 
example, as the CRC Extension was made mandatory in ARM v8.1. This 
should work even on FreeBSD or other non-Linux systems, where 
getauxval() is not available.


* I removed the loop to handle two uint64's at a time, using the LDP 
instruction. I couldn't find a compiler intrinsic for that, and it was 
actually slower, at least on the system I have access to, than a 
straightforward loop that processes 8 bytes at a time.


* I tested this on Linux, with gcc and clang, on an ARM64 virtual 
machine that I had available (not an emulator, but a VM on a shared 
ARM64 server).


- Heikki
diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 689bb7f181..d530cf92c0 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -627,3 +627,34 @@ if test x"$Ac_cachevar" = x"yes"; then
 fi
 undefine([Ac_cachevar])dnl
 ])# PGAC_SSE42_CRC32_INTRINSICS
+
+
+# PGAC_ARM64CE_CRC32C_INTRINSICS
+# ---
+# Check if the compiler supports the ARM64CE CRC32C instructions added in XXX
+# using the __crc32cb, __crc32ch, __crc32cw, and __crc32cd intrinsic functions.
+#
+# An optional compiler flag can be passed as argument (e.g. -march=+crc). If the
+# intrinsics are supported, sets pgac_arm64ce_crc32c_intrinsics, and CFLAGS_ARM64CE_CRC32C.
+AC_DEFUN([PGAC_ARM64CE_CRC32C_INTRINSICS],
+[define([Ac_cachevar], [AS_TR_SH([pgac_cv_arm64ce_crc32c_intrinsics_$1])])dnl
+AC_CACHE_CHECK([for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=$1], [Ac_cachevar],
+[pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS $1"
+AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ],
+  [unsigned int crc = 0;
+   crc = __crc32cb(crc, 0);
+   crc = __crc32ch(crc, 0);
+   crc = __crc32cw(crc, 0);
+   crc = __crc32cd(crc, 0);
+   /* return computed value, to prevent the above being optimized away */
+   return crc == 0;])],
+  [Ac_cachevar=yes],
+  [Ac_cachevar=no])
+CFLAGS="$pgac_save_CFLAGS"])
+if test x"$Ac_cachevar" = x"yes"; then
+  CFLAGS_ARM64CE_CRC32C="$1"
+  pgac_arm64ce_crc32c_intrinsics=yes
+fi
+undefine([Ac_cachevar])dnl
+])# PGAC_ARM64CE_CRC32C_INTRINSICS
diff --git a/configure b/configure
index 1242e310b4..9b1389df92 100755
--- a/configure
+++ b/configure
@@ -646,6 +646,7 @@ MSGMERGE
 MSGFMT_FLAGS
 MSGFMT
 PG_CRC32C_OBJS
+CFLAGS_ARM64CE_CRC32C
 CFLAGS_SSE42
 have_win32_dbghelp
 HAVE_IPV6
@@ -15509,28 +15510,175 @@ if ac_fn_c_try_compile "$LINENO"; then :
 fi
 rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 
+# Check for ARM64 CRC Extensions intrinsics to do CRC calculations.
+#
+# First check if __crc32c* intrinsics can be used with the default compiler
+# flags. If not, check if adding -march=v8-a+crc flag helps.
+# CFLAGS_ARM64CE_CRC32C is set if that's required.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=" >&5
+$as_echo_n "checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... " >&6; }
+if ${pgac_cv_arm64ce_crc32c_intrinsics_+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS "
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+unsigned int crc = 0;
+   crc = __crc32cb(crc, 0);
+   crc = __crc32ch(crc, 0);
+   crc = __crc32cw(crc, 0);
+   crc = __crc32cd(crc, 0);
+   /* return computed value, to prevent the above being optimized away */
+   return crc == 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_arm64ce_crc32c_intrinsics_=yes
+else
+  pgac_cv_arm64ce_crc32c_intrinsics_=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_arm64ce_crc32c_intrinsics_" >&5
+$as_echo "$pgac_cv_arm64ce_crc32c_intrinsics_" >&6; }
+if test x"$pgac_cv_arm64ce_crc32c_intrinsics_" = x"yes"; then
+  CFLAGS_ARM64CE_CRC32C=""
+  pgac_arm64ce_crc32c_intrinsics=yes
+fi
+
+if 

Re: IndexJoin memory problem using spgist and boxes

2018-03-05 Thread Alexander Kuzmenkov

On 04.03.2018 20:20, Anton Dignös wrote:

The better alternative may be to have two temporary memory contexts,
one per-tuple for calling the inner consistent method and one
per-index-scan for the traversal memory.


Yes, this seems to be a better way of fixing the problem without 
introducing regressions mentioned by Tom. We'd keep a separate traversal 
context in ScanOpaque and run most of the spgWalk in it, except calling 
storeRes in query context and the inner consistent method in short-lived 
context.


Also, I think it would be worthwhile to test the resulting patch with 
valgrind. The allocations are not very apparent in the code, so it's 
easy to miss something.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: parallel append vs. simple UNION ALL

2018-03-05 Thread Robert Haas
On Tue, Feb 27, 2018 at 6:21 AM, Rajkumar Raghuwanshi
 wrote:
> I have applied 0001 on pg-head, and while playing with regression tests, it
> crashed with below test case.
>
> postgres=# SET min_parallel_table_scan_size=0;
> SET
> postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
> ORDER BY 1, 2, 3;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hmm, nice catch.  I think part of the problem here is that commit
69f4b9c85f168ae006929eec44fc44d569e846b9, wherein Andres introduced
the ProjectSet node, didn't really do the right thing in terms of
testing parallel-safety.  Before that commit, is_parallel_safe(root,
(Node *) scanjoin_target->exprs)) was really testing whether the tlist
produced during the scan/join phase was parallel-safe.  However, after
that commit, scanjoin_target->exprs wasn't really the final target for
the scan/join phase any more; instead, it was the first of possibly
several targets computed by split_pathtarget_at_srfs().  Really, the
right thing to do is to test the *last* element in that list for
parallel-safety, but as the code stands we end up testing the *first*
element.  So, if there's a parallel-restricted item in the target list
(this query ends up putting a CoerceToDomain in the target list, which
we currently consider parallel-restricted), it looks we can
nevertheless end up trying to project it in what is supposed to be a
partial path.

There are a large number of cases where this doesn't end up mattering
because the next upper_rel created will not get marked
consider_parallel because its target list will also contain the same
parallel-restricted construct, and therefore the partial paths
generated for the scan/join rel will never get used -- except to
generate Gather/Gather Merge paths, which has already happened; but
that step didn't know about the rest of the scan/join targets either,
so it won't have used them.  However, both create_distinct_paths() and
the code in grouping_planner that populates final_rel assume that they
don't need to retest the target for parallel-safety because no
projection is done at those levels; they just inherit the
parallel-safety marking of the input rel, so in those cases if the
input rel's marking is wrong the result is populated upward.

There's another way final_rel->consider_parallel can be wrong, too: if
the FROM-list is empty, then we create a join rel and set its
consider_parallel flag without regard to the parallel-safety of the
target list.  There are comments in query_planner() says that this
will be dealt with "later", but this seems not to be true. (Before
Tom's commit da1c91631e3577ea5818f855ebb5bd206d559006, the comments
simply ignored the question of whether a check was needed here, but
Tom seems to have inserted an incorrect justification for the
already-wrong code.)

I'm not sure to what degree, if at all, any of these problems are
visible given that we don't use final_rel->consider_parallel for much
of anything.  Certainly, it gets much easier to trigger a problem with
0001 applied, as the test case shows.  But I'm not entirely convinced
that there's no problem even without that.  It seems like every upper
rel that is setting its consider_parallel flag based on the first
element of some list of targets rather than the last is potentially
vulnerable to ending up with the wrong answer, and I'm afraid that
might have some adverse consequence that I haven't quite pinned down
yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Tom Lane
I wrote:
> The thing that I find curious, now that we've shut off autovacuum
> altogether on those tables, is that we *still* aren't getting stable
> results.  How can that be?

I spent some time trying to figure out what's going on here.  I've still
not been able to replicate the failure on any of my machines, but I have
learned a thing or two.

On the first query that's still failing on rhinoceros (and has also been
seen to fail on other machines, before the last round of changes), which
is at line 1142 in postgres_fdw.sql in current HEAD:

EXPLAIN (verbose, costs off)
UPDATE ft2 SET c3 = 'baz'
  FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1)
  WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1
  RETURNING ft2.*, ft4.*, ft5.*;
-- can't be pushed down

(call this Q1), we are expecting to get a plan of the shape of

->  Nested Loop
  ->  Foreign Scan on public.ft2
Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 
1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE
  ->  Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)

Now, there is no possible way that the planner would pick that plan if its
estimate of the number of rows out of the ft2 scan was more than 1.
Re-executing the foreign join would have high enough overhead to push the
plan to some other shape.  In fact, probably the shape we see in the
actual plan choice, on the failing machines:

->  Nested Loop
  ->  Foreign Scan
Relations: (public.ft4) INNER JOIN (public.ft5)
  ->  Materialize
->  Foreign Scan on public.ft2
  Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid 
FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE

I instrumented costsize.c, and determined that the estimated size of
"S 1"."T 1" WHERE (("C 1" > 2000)), before the clamp_row_est() rounding,
is only 0.1159 rows on my machine.  So there's no way to explain the plan
change as the result of small platform-specific roundoff differences ---
we need something more than a 12X change in the selectivity estimate
before the plan shape would change like this.  There are a bunch of things
going on under the hood, such as that the table's raw rowcount estimate
gets scaled up from the original 1000 rows because it's now got more pages
than before, but none come close to explaining 12X.

However, the planner is working with quite old statistics, dating back to
the manual ANALYZE at postgres_fdw.sql line 93.  If I stick in another
manual ANALYZE just before Q1, I get exactly the same plan change reported
by rhinoceros.  (And underneath, the rowcount estimate for ft2 has gone
from 1 row to 8 rows, which is much closer to the true value of 10 rows,
so the plan change is not surprising.)  What's more, doing this also
reproduces the one other plan change seen later in rhinoceros' output.

It is, therefore, very hard to avoid the conclusion that something is
causing an ANALYZE to happen while the script runs, despite our fooling
about with the table's reloptions.  I'm not sure that that something is
autovacuum.  A platform-specific bug in reloptions handling doesn't seem
out of the question, but poking around in the code didn't spot anything
obvious.

Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
rhinoceros' extra_config options, temporarily?  Correlating that log
output with the log_statement output from the test proper would let
us confirm or deny whether it's autovacuum.

Another thing I'd like to do is temporarily add

select relpages, reltuples from pg_class where relname = 'T 1';

to the test script, both just after the manual ANALYZE and just before Q1.
If we see a change between those reports on any of the affected machines,
we'll know that *something* is changing the stats.  Now the problem with
doing that is that the expected value of relpages is platform-dependent
(I see 11 on 64-bit and 10 on 32-bit on my machines).  We can work around
that, perhaps by capturing the initial value in a temp table and printing
only the delta, but I'm not sure if it's worth the effort as opposed to
just letting it fail on 32-bit critters for a day or two.

regards, tom lane



Re: [HACKERS] PoC: custom signal handler for extensions

2018-03-05 Thread David Steele
Hi Maksim,

On 3/5/18 11:24 AM, Maksim Milyutin wrote:
> Hello David,
> 
> 
> On 05.03.2018 18:50, David Steele wrote:
>> Hello Maksim,
>>
>> On 1/27/18 2:19 PM, Arthur Zakirov wrote:
>>
>>> Is there actual need in UnregisterCustomProcSignal() within _PG_init()?
>>> An extension registers a handler and then unregister it doing
>>> nothing. It seems useless.
>>>
>>> Also process_shared_preload_libraries_in_progress within _PG_fini() will
>>> be false I think. _PG_fini() won't be called though, because
>>> implementation of internal_unload_library() is disabled.
>>>
>>> Actually, is there need in UnregisterCustomProcSignal() at all? It
>>> unregisters a handler only in current backend, for actual unergistering
>>> we need to call it everywhere, if I'm not mistaken.
>> This patch has been in Waiting on Author state for almost three weeks.
>> Have you had a chance to consider Arthur's suggestions?
> 
> Yes, I want to rework my patch to enable setup of custom signals on
> working backend without preload initialization.
> 
>> Do you know when you'll have an updated patch available?
> 
> I want to actuate the work on this patch for the next 12 release. Sorry,
> for now I can not keep up with the current release.
Understood.  I'll mark it Returned with Feedback and you can enter it in
a CF when you have a new patch.

Regards,
-- 
-David
da...@pgmasters.net



Re: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type

2018-03-05 Thread Tom Lane
Tomas Vondra  writes:
> On 03/04/2018 09:46 PM, Tom Lane wrote:
>> Well, I think the existing bytea bug is a counterexample to that.  If
>> someone were to repeat that mistake with, say, UUID, these tests would not
>> catch it, because none of them would exercise UUID-vs-something-else.
>> For that matter, your statement is false on its face, because even if
>> somebody tried to add say uuid-versus-int8, these tests would not catch
>> lack of support for that in convert_to_scalar unless the specific case of
>> uuid-versus-int8 were added to the tests.

> I suspect we're simply having different expectations what the tests
> could/should protect against - my intention was to make sure someone
> does not break convert_to_scalar for the currently handled types.

I concur that we could use better test coverage for the existing
code --- the coverage report is pretty bleak there.  But I think we
could improve that just by testing with the existing operators.  I do
not see much use in checking that unsupported cross-type cases fail
cleanly, because there isn't a practical way to have full coverage
for such a concern.

regards, tom lane



Re: Kerberos test suite

2018-03-05 Thread Peter Eisentraut
On 2/27/18 00:21, Michael Paquier wrote:
> Thanks.  Could you document that on the README please?  krb5-user and
> krb5-kdc is a split from Debian.  For darwin, are you using macports or
> homebrew?  I would assume the later, and it would be nice to precise
> that in the README as well.  On Debian you need to install as well
> krb5-admin-server as it includes kadmin.local which the test needs.
> Once I understood that I have been able to run the tests.

Added to README.

> You have forgotten to update ALWAYS_SUBDIRS in src/test/Makefile.

Fixed, and updated to reflect recent changes with PG_TEST_EXTRA etc.

> +my ($stdout, $krb5_version);
> +IPC::Run::run [ 'krb5-config', '--version' ], '>', \$stdout or die
> "could not execute krb5-config";
> +$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/ or die "could not get
> Kerberos version";
> +$krb5_version = $1;
> Time for a new routine command_log which executes the command, then
> returns stdout and stderr to the caller?

I didn't do anything about this.  Do we have any more places where that
would be needed right now?

> +system_or_bail 'echo secret1 | kinit test1';
> Using IPC::Run stuff would be better here.

done

> @@ -1153,6 +1152,11 @@ sub psql
> $params{on_error_stop} = 1 unless defined $params{on_error_stop};
> $params{on_error_die}  = 0 unless defined $params{on_error_die};
> 
> +   $connstr .= ' host=localhost' if defined $params{tcpip};
> +
> +   my @psql_params =
> + ('psql', '-XAtq', '-d', $connstr, '-f', '-');
> This bit I don't like.  Wouldn't it be enough to abuse of extra_params
> and use a custom connection string?  The last value wins in a psql
> command. 

done

Also included feedback from Thomas Munro.

New patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From f694b67401d93051c3442f60c06f8327391239cf Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 5 Mar 2018 14:42:11 -0500
Subject: [PATCH v2] Tests for Kerberos/GSSAPI authentication

---
 configure   |   4 +
 configure.in|   2 +
 doc/src/sgml/regress.sgml   |  12 ++-
 src/Makefile.global.in  |   2 +
 src/test/Makefile   |   7 +-
 src/test/kerberos/.gitignore|   2 +
 src/test/kerberos/Makefile  |  25 ++
 src/test/kerberos/README|  35 
 src/test/kerberos/t/001_auth.pl | 177 
 9 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 src/test/kerberos/.gitignore
 create mode 100644 src/test/kerberos/Makefile
 create mode 100644 src/test/kerberos/README
 create mode 100644 src/test/kerberos/t/001_auth.pl

diff --git a/configure b/configure
index 1242e310b4..3943711283 100755
--- a/configure
+++ b/configure
@@ -709,7 +709,9 @@ with_systemd
 with_selinux
 with_openssl
 with_ldap
+with_krb_srvnam
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5788,6 +5790,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
@@ -5815,6 +5818,7 @@ fi
 
 
 
+
 cat >>confdefs.h <<_ACEOF
 #define PG_KRB_SRVNAM "$with_krb_srvnam"
 _ACEOF
diff --git a/configure.in b/configure.in
index aee3ab0867..1babdbb755 100644
--- a/configure.in
+++ b/configure.in
@@ -638,6 +638,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
@@ -650,6 +651,7 @@ PGAC_ARG_REQ(with, krb-srvnam,
  [NAME], [default service principal name in Kerberos (GSSAPI) 
[postgres]],
  [],
  [with_krb_srvnam="postgres"])
+AC_SUBST(with_krb_srvnam)
 AC_DEFINE_UNQUOTED([PG_KRB_SRVNAM], ["$with_krb_srvnam"],
[Define to the name of the default PostgreSQL service 
principal in Kerberos (GSSAPI). (--with-krb-srvnam=NAME)])
 
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 3c448dc5bc..673a8c2164 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -220,10 +220,20 @@ Additional Test Suites
PG_TEST_EXTRA to a whitespace-separated list, for
example:
 
-make check-world PG_TEST_EXTRA='ldap ssl'
+make check-world PG_TEST_EXTRA='kerberos ldap ssl'
 
The following values are currently supported:

+
+ kerberos
+ 
+  
+   Runs the test suite under src/test/kerberos.  This
+   requires an MIT Kerberos installation and opens TCP/IP listen sockets.
+  
+ 
+
+
 
  ldap
  
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dcb8dc5d90..15c14951e8 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -186,6 +186,8 @@ with_tcl= @with_tcl@
 with_openssl   = @with_openssl@
 with_selinux   = @with_selinux@
 with_systemd   = @with_systemd@
+with_gssapi= @with_gssapi@
+with_krb_srvnam= @with_krb_srvnam@
 with_ldap

Re: [HACKERS] generated columns

2018-03-05 Thread Peter Eisentraut
On 2/1/18 21:25, Michael Paquier wrote:
> On Thu, Feb 01, 2018 at 09:29:09AM -0500, Peter Eisentraut wrote:
>> That would be nice.  I'm going to study this some more to see what can
>> be done.
> 
> Thanks for the update.  Note: Peter has moved the patch to next CF.

I didn't get to updating this patch, so I'm closing it in this CF.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: chained transactions

2018-03-05 Thread Andres Freund
On 2018-03-05 09:21:33 +, Simon Riggs wrote:
> On 2 March 2018 at 07:18, Andres Freund  wrote:
> > Hi,
> >
> > On 2018-02-28 22:35:52 -0500, Peter Eisentraut wrote:
> >> This feature is meant to help manage transaction isolation in
> >> procedures.
> >
> > This is a major new feature, submitted the evening before the last CF
> > for v11 starts. Therefore I think it should just be moved to the next
> > fest, it doesn't seems realistic to attempt to get this into v11.
> 
> Looks like a useful patch that adds fairly minor syntax that follows
> the SQL Standard.
> 
> It introduces no new internal infrastructure, so I can't call this a
> major feature.

You can avoid calling it new infrastructure, but it certainly modifies
new one. And it adds quite some new user interface, which certainly make
s it important to get it right.

 doc/src/sgml/plpgsql.sgml   |9 
 doc/src/sgml/ref/abort.sgml |   14 +
 doc/src/sgml/ref/commit.sgml|   14 +
 doc/src/sgml/ref/end.sgml   |   14 +
 doc/src/sgml/ref/rollback.sgml  |   14 +
 doc/src/sgml/spi.sgml   |   14 -
 src/backend/access/transam/xact.c   |  210 +++-
 src/backend/commands/cluster.c  |2 
 src/backend/commands/dbcommands.c   |2 
 src/backend/commands/discard.c  |2 
 src/backend/commands/portalcmds.c   |2 
 src/backend/commands/subscriptioncmds.c |4 
 src/backend/commands/typecmds.c |2 
 src/backend/commands/vacuum.c   |4 
 src/backend/commands/variable.c |   57 -
 src/backend/executor/spi.c  |   25 ++
 src/backend/nodes/copyfuncs.c   |2 
 src/backend/nodes/equalfuncs.c  |2 
 src/backend/parser/gram.y   |   34 +--
 src/backend/replication/walsender.c |6 
 src/backend/tcop/utility.c  |   58 ++---
 src/backend/utils/misc/guc.c|   35 +--
 src/backend/utils/time/snapmgr.c|2 
 src/bin/psql/common.c   |2 
 src/include/access/xact.h   |   18 -
 src/include/commands/variable.h |4 
 src/include/executor/spi.h  |4 
 src/include/nodes/parsenodes.h  |4 
 src/pl/plperl/plperl.c  |4 
 src/pl/plpgsql/src/expected/plpgsql_transaction.out |   31 ++
 src/pl/plpgsql/src/pl_exec.c|   10 
 src/pl/plpgsql/src/pl_funcs.c   |   10 
 src/pl/plpgsql/src/pl_gram.y|   18 +
 src/pl/plpgsql/src/pl_scanner.c |2 
 src/pl/plpgsql/src/plpgsql.h|2 
 src/pl/plpgsql/src/sql/plpgsql_transaction.sql  |   23 ++
 src/pl/plpython/plpy_plpymodule.c   |4 
 src/pl/tcl/pltcl.c  |4 
 src/test/regress/expected/transactions.out  |  141 +
 src/test/regress/sql/transactions.sql   |   49 
 40 files changed, 596 insertions(+), 262 deletions(-)


Greetings,

Andres Freund



Re: STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?

2018-03-05 Thread Alvaro Herrera
Tomas Vondra wrote:

> 4) I see you've added generateClonedExtStatsStmt to parse_utilcmd.h, but
> it was only really called in parse_utilcmd.c, so I've made it static. I
> don't think we need to expose stuff unnecessarily.

> BTW the last point made me thinking, because parse_utilcmd.h also
> exposes generateClonedIndexStmt. That is however necessary, because it's
> called from DefineRelation when copying indexes from partitioned table
> to partitions. I'm wondering - shouldn't we do the same thing for
> extended statistics?

Maybe, but that would not be a bugfix anymore.  So if we do want that,
that is definitely a new feature, so it should be its own patch; the
copying of indexes to partitions is a new feature in pg11.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Thomas Munro
On Tue, Mar 6, 2018 at 5:04 AM, Alvaro Herrera  wrote:
> Thomas Munro wrote:
>> On Sun, Mar 4, 2018 at 10:46 PM, Magnus Hagander  wrote:
>> > Um. Have you actually seen the "mail archive app" cut long threads off in
>> > other cases? Because it's certainly not supposed to do that...
>>
>> Hi Magnus,
>>
>> I mean the "flat" thread view:
>>
>> https://www.postgresql.org/message-id/flat/CAFjFpRfQ8GrQvzp3jA2wnLqrHmaXna-urjm_UY9BqXj=ead...@mail.gmail.com
>>
>> The final message on that page is not the final message that appears
>> in my mail client for the thread.  I guessed that might have been cut
>> off due to some hard-coded limit, but perhaps there is some other
>> reason (different heuristics for thread following?)
>
> You're thinking of message
> https://www.postgresql.org/message-id/cafjfprfa6_n10cn3vxjn9hdtqneh6a1rfnlxy0pncp63t2p...@mail.gmail.com
> but that is not the same thread -- it doesn't have the References or
> In-Reply-To headers (see "raw"; user/pwd is archives/antispam).  Don't
> know why though -- maybe Gmail trimmed References because it no longer
> fit in the DKIM signature?  Yours had a long one:
> https://www.postgresql.org/message-id/raw/CAEepm%3D0VCrC-WfzZkq3YSvJXf225rDnp1ypjv%2BrjKO5d0%3DXqFg%40mail.gmail.com

Huh.  Interesting.  It seems that Gmail uses a fuzzier heuristics, not
just "In-Reply-To", explaining why I considered that to be the same
thread but our archive didn't:

http://www.sensefulsolutions.com/2010/08/how-does-email-threading-work-in-gmail.html

I wonder why it dropped the In-Reply-To header when Ashutosh replied...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 2/28/18 2:28 AM, Michael Paquier wrote:
> On Tue, Feb 27, 2018 at 03:52:32PM -0500, David Steele wrote:
>> On 1/30/18 3:01 AM, Michael Paquier wrote:
> 
>>> +command_ok(
>>> +   ['chmod', "-R", 'g+rX', "$pgdata"],
>>> +   'add group perms to PGDATA');
>>>
>>> That would blow up on Windows.  You should replace that by perl's chmod
>>> and look at its return result for sanity checks, likely with ($cnt != 0).
>>> And let's not use icacls please...
>>
>> Fixed with a new function, add_pg_data_group_perm().
> 
> That's basically a recursive chmod, so chmod_recursive is more adapted?
> I could imagine that this is useful as well for removing group
> permissions, so the new mode could be specified as an argument.

The required package (File::chmod::Recursive) for chmod_recursive is not
in use anywhere else and was not installed when I installed build
dependencies.

I'm not sure what the protocol for introducing a new Perl module is?  I
couldn't find packages for the major OSes.  Are we OK with using CPAN?

>>> +if ($params{has_group_access})
>>> +{
>>> +push(@{$params{extra}}, '-g');
>>> +}
>>> No need to introduce a new parameter here, please use directly extra =>
>>> [ '-g' ] in the routines calling PostgresNode::init.
>>
>> Other code needs to know if group access is enabled on the node (see
>> RewindTest.pm) so it's not just a matter of passing the option.
> 
> I don't quite understand here.  I have no objection into extending
> setup_cluster() with a group_access argument so as the tests run both
> the group access and without it, but I'd rather keep the low-level API
> clean of anything fancy if we can use the facility which already
> exists.

I'm not sure what you mean by, "facility that already exists".  I think
I implemented this the same as the allows_streaming and has_archiving
flags.  The only difference is that this option must be set at initdb
time rather than in postgresql.conf.

Could you be more specific?

>> New patches are attached.  The end result is almost identical to v6 but
>> code was moved from 03 to 02 as per Michael's suggestion.
> 
> Thanks for the updated versions.
> 
>> 1) 01-pgresetwal-test
>>
>> Adds a very basic test suite for pg_resetwal.
> 
> Okay.  This one looks fine to me.  It could be passed down to a
> committer.

Agreed.

>> 2) 02-file-perm
>>
>> Adds a new header (file_perm.h) and makes all front-end utilities use
>> the new constants for setting umask.  Converts mkdir() calls in the
>> backend to MakeDirectoryDefaultPerm() if the original
>> call used default permissions.  Adds tests to make sure permissions in
>> PGDATA are set correctly by the front-end tools.
> 
> I had a more simple (complicated?) thing in mind for the split, which
> would actually cause this patch to be split into two more:
> 1) Introduce file_perm.h and replace all the existing values for modes
> and masks to what the header introduces.  This allows to check if the
> new header is not forgotten anywhere, especially for the frontends.
> 2) Introduce MakeDirectoryDefaultPerm, which switches the backend calls
> of mkdir() to the new API.
> 3) Add the grouping facilities, which updates the default masks and
> modes of file_perm.h.
> 
> Grouping 1) and 2) together as you do makes sense as well I guess, as in
> my proposal the mkdir calls in the backend would be touched twice, still
> things get more incremental.

I think I prefer grouping 1 and 2.  It produces less churn, as you say,
and I don't think MakeDirectoryDefaultPerm really needs its own patch.
Based on comments so far, nobody has an objection to it.

In theory, the first two patches could be applied just for refactoring
without adding group permissions at all.  There are a lot of new tests
to make sure permissions are set correctly so it seems like a win right off.

> 
> +/*
> + * Default mode for created files.
> + */
> +#define PG_FILE_MODE_DEFAULT   (S_IRUSR | S_IWUSR | S_IRGRP)
> +
> +/*
> + * Default mode for directories.
> + */
> +#define PG_DIR_MODE_DEFAULT(S_IRWXU | S_IRGRP | S_IXGRP)
> For the first cut, this should not use S_IRGRP in the first declaration,
> and (S_IXGRP | S_IXGRP) in the second declaration.

Done.

> 
> -if (script == NULL && (script = fopen_priv(output_path, "w")) == NULL)
> +if (script == NULL && (script = fopen(output_path, "w")) == NULL)
> Why are those calls in pg_upgrade changed?

Done.

Those should have been removed already.  Originally I had removed
fopen_priv() because it didn't serve a purpose anymore.  In the
meantime, somebody else made it a macro to fopen().  I decided my patch
would be less noisy if I reverted to fopen_priv() but I missed a few places.

> TestLib.pm does not need the group-related extensions, right?

Done.

> check_pg_data_perm() looks useful even without $allow_group even if the
> grouping facility is reverted at the end.  S_ISLNK should be considered
> as well for tablespaces or symlinks of pg_wal?  We have such cases in
> the regressio

Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly

2018-03-05 Thread Joe Conway
On 03/05/2018 11:19 AM, Tom Lane wrote:
> Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to
> rhinoceros' extra_config options, temporarily?  Correlating that log
> output with the log_statement output from the test proper would let
> us confirm or deny whether it's autovacuum.


Done just now. Do you want me to force a run?


Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: select_parallel test failure: gather sometimes losing tuples (maybe during rescans)?

2018-03-05 Thread Robert Haas
On Sun, Mar 4, 2018 at 4:46 PM, Thomas Munro
 wrote:
> Thanks!  Here are a couple of patches.  I'm not sure which I prefer.
> The "pessimistic" one looks simpler and is probably the way to go, but
> the "optimistic" one avoids doing an extra read until it has actually
> run out of data and seen mq_detached == true.
>
> I realised that the pg_write_barrier() added to
> shm_mq_detach_internal() from the earlier demonstration/hack patch was
> not needed... I had a notion that SpinLockAcquire() might not include
> a strong enough barrier (unlike SpinLockRelease()), but after reading
> s_lock.h I think it's not needed (since you get either TAS() or a
> syscall-based slow path, both expected to include a full fence).  I
> haven't personally tested this on a weak memory order system.

The optimistic approach seems a little bit less likely to slow this
down on systems where barriers are expensive, so I committed that one.
Thanks for debugging this; I hope this fixes it, but I guess we'll
see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: JIT compiling with LLVM v11

2018-03-05 Thread Andres Freund
Hi,

On 2018-03-04 21:00:06 -0800, Andres Freund wrote:
> > Looking at llvm_get_function(), the function that raises that error, I
> > see that there are a few different paths here.  I don't have
> > HAVE_DECL_LLVMORCGETSYMBOLADDRESSIN defined, and I don't have LLVM <
> > 5, so I should be getting the symbol address with
> > LLVMOrcGetSymbolAddress(llvm_opt0_orc, &addr, mangled) or
> > LLVMOrcGetSymbolAddress(llvm_opt3_orc, &addr, mangled), but clearly
> > those are returning NULL.
> 
> Yep. I wonder if this is some symbol naming issue or such, because
> emitting and relocating the object worked without an error.

Thanks to Thomas helping get access to an OSX machine I was able to
discover what the issue is.  OSX prepends, for reason unbeknownst to me,
a leading underscore to all function names.  That lead to two issues:
First JITed functions do not have that underscore (making us look up a
non-existing symbol, because llvm_get_function applied
mangling). Secondly, llvm_resolve_symbol failed looking up symbol names,
because for $reason dlsym() etc do *not* have the names prefixed by the
underscore.   Easily enough fixed.

After that I discovered another problem, the bitcode files for core pg /
contrib modules weren't installed. That turned out to be a make version
issue, I'd used
define install_llvm_module =
# body
but older make only like
define install_llvm_module
# body

Writing up a patch that I can actually push.  Thanks both to Thomas and
Peter for pointing me towards this issue!

Greetings,

Andres Freund



RE: [HACKERS] Client Connection redirection support for PostgreSQL

2018-03-05 Thread Satyanarayana Narlapuram
Please see the attached patch with the comments.

Changes in the patch:
A client-side PGREDIRECTLIMIT parameter has been introduced to control 
the maximum number of retries. 
BE_v3.1 sends a ProtocolNegotiation message. FE_v3.1 downgrades to v3.0 
upon receipt of this message.
FE falls back to v3.0 if 3.1 is not supported by the server.


>> I hadn't really thought deeply about whether redirection should 
happen before or after authentication.  For the most part, before seems better, 
because it seems a bit silly to force people to authenticate just so that you 
can tell them to go someplace else.  Also, that would lead to double 
authentication,which might for example result in multiple password 
prompts, which users might either dislike or find confusing.  

Yes, redirection before authentication would avoid multiple password 
prompts.

Thanks,
Satya


redirection_v2.patch
Description: redirection_v2.patch


Re: pg_get_functiondef forgets about most GUC_LIST_INPUT GUCs

2018-03-05 Thread Pavel Stehule
2018-02-21 8:35 GMT+01:00 Michael Paquier :

> On Tue, Feb 20, 2018 at 06:46:57PM +0300, Arthur Zakirov wrote:
> > Just 2 cents from me. It seems that there is a problem with extensions
> > GUCs.
> >
> > [...]
> >
> > =# SELECT pg_get_functiondef('func_with_set_params'::regproc);
> > ERROR:  unrecognized configuration parameter "plpgsql.extra_errors"
>
> You have an excellent point here, and I did not consider it.
> For pg_dump and pg_dumpall, something like what I described in
> https://www.postgresql.org/message-id/20180112012440.ga2...@paquier.xyz
> to extend pg_settings so as GUC types are visible via SQL could make
> sense, now it is really a lot for just being able to quote parameters
> correctly.  On top of that, what I suggested previously would not work
> reliably except if we have pg_get_functiondef load the library
> associated to a given parameter.  Even in this case there could
> perfectly be a custom parameter from another plugin and not a parameter
> associated to the function language itself.
>
> It seems to me that this brings us back to a best-effort for the backend
> and the frontend by maintaining a list of hardcoded parameter names,
> which could be refined a maximum by considering lists for in-core
> extensions and perhaps the most famous extensions around :(
>

I afraid so there is not good solution. Is possible to store options in
original form? When the function will be displayed, then the original value
will be displayed. The current patch (with known extensions) can be used as
bug fix and back patched. In new version we store original value with
quotes.

Regards

Pavel



>
>
> Thoughts?
> --
> Michael
>


Re: PATCH: Configurable file mode mask

2018-03-05 Thread David Steele
On 3/1/18 11:18 PM, Michael Paquier wrote:
> 
> Based on my recent lookup at code level for this feature, the patch for
> pg_resetwal (which could have been discussed on its own thread as well),
> would be fine for commit.  The thing could be extended a bit more but
> there is nothing opposing even a basic test suite to be in.  

There are no core changes, so it doesn't seem like the tests can hurt
anything.

> Then you
> have a set of refactoring patches, which still need some work.

New patches posted today, hopefully those address most of your concerns.

> And
> finally there is a rather invasive patch on top of the whole thing.  

I'm not sure if I would call it invasive since it's an optional feature
that is off by default.  Honestly, I think the refactor in 02 is more
likely to cause problems even if the goal there is *not* to change the
behavior.

> The
> refactoring work shows much more value only after the main feature is
> in, still I think that unifying the default permissions allowed for
> files and directories, as well as mkdir() calls has some value in
> itself to think it as an mergeable, independent, change.  

I agree.

> I think that
> it would be hard to get the whole patch set into the tree by the end of
> the CF though

I hope it does make it, it's a pretty big win for security.

> but cutting the refactoring pieces would be doable.  At
> least it would provide some base for integration in v12.  And the
> refactoring patch has some pieces that would be helpful for TAP tests as
> well.

I've gone pretty big on tests in this patch because I recognize it is a
pretty fundamental behavior change.

Thanks,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


  1   2   >