Re: Parallel threads in query

2018-11-01 Thread Konstantin Knizhnik




On 31.10.2018 22:07, Darafei "Komяpa" Praliaskouski wrote:

Hi,

I've tried porting some of PostGIS algorithms to utilize multiple 
cores via OpenMP to return faster.


Question is, what's the best policy to allocate cores so we can play 
nice with rest of postgres?


What I'd like to see is some function that I can call and get a number 
of threads I'm allowed to run, that will also advise rest of postgres 
to not use them, and a function to return the cores back (or do it 
automatically at the end of query). Is there an infrastructure for that?


I do not completely understand which PostGIS algorithms  you are going 
to make parallel.

So may be you should first clarify it.
There are three options to perform parallel execution of the single 
query in Postgres now:


1. Use existed Postgres parallel capabilities. For example if there is 
some expensive function f() which you are going to execute concurrently,
then  you do not need to do anything: parallel seq scan will do it for 
you. You can configure arbitrary number of parallel workers and so 
control level of concurrency.
The restriction of the current Postgres parallel query processing 
implementation is that

- parallel workers are started for each query
- it is necessary to serialize and pass to parallel workers a lot of 
things from coordinator
- in case of seqscan, workers will compete for pages to scan, so 
effective number of workers should be < 10, while most powerful modern 
servers have hundreds of COU cores.


2. Implement you own parallel plan nodes using existed Postgres parallel 
infrastructure. Such approach has most chances to be committed in 
Postgres core.
But disadvantages are mostly the same as in 1) Exchange of data between 
different process is much more complex and expensive than access to 
common memory in case of threads. Mostly likely you will have to use 
shared message queue and dynamic shared memory, implemented in Postgres 
specially for interaction of parallel workers .


3. Use multithreading to provide concurrent execution of your particular 
algorithm (s[awn threads within backend). You should be very careful 
with this approach, because Postgres code is not thread safe. So you 
should not try to execute in thread any subplan or call any postgres 
functions (unless you are 100% sure that them are thread safe).
This approach may be easy to implement and provide better performance 
than 1). But please notice its limitations. I have used such approach in 
my IMCS extension (In-Memory-Columnar-Store).


You can look at pg_strom extension as an example of providing parallel 
query execution (in this case using parallel capabilities of video cards).


--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: row filtering for logical replication

2018-11-01 Thread Erik Rijkers

On 2018-11-01 01:29, Euler Taveira wrote:

Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
 escreveu:

The attached patches add support for filtering rows in the publisher.



I ran pgbench-over-logical-replication with a WHERE-clause and could not 
get this to do a correct replication.  Below is the output of the 
attached test program.



$ ./logrep_rowfilter.sh
--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance1/data --encoding=UTF8 --pwfile=/tmp/bugs

--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance2/data --encoding=UTF8 --pwfile=/tmp/bugs

--
/home/aardvark/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast/initdb 
--pgdata=/tmp/cascade/instance3/data --encoding=UTF8 --pwfile=/tmp/bugs

sleep 3s
dropping old tables...
creating tables...
generating data...
10 of 10 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
create publication pub_6515_to_6516;
alter publication pub_6515_to_6516 add table pgbench_accounts where (aid 
between 4 and 6-1) ; --> where 1

alter publication pub_6515_to_6516 add table pgbench_branches;
alter publication pub_6515_to_6516 add table pgbench_tellers;
alter publication pub_6515_to_6516 add table pgbench_history;
create publication pub_6516_to_6517;
alter publication pub_6516_to_6517 add table pgbench_accounts ; -- where 
(aid between 4 and 6-1) ; --> where 2

alter publication pub_6516_to_6517 add table pgbench_branches;
alter publication pub_6516_to_6517 add table pgbench_tellers;
alter publication pub_6516_to_6517 add table pgbench_history;

create subscription pub_6516_from_6515 connection 'port=6515 
application_name=rowfilter'

   publication pub_6515_to_6516 with(enabled=false);
alter subscription pub_6516_from_6515 enable;
create subscription pub_6517_from_6516 connection 'port=6516 
application_name=rowfilter'

   publication pub_6516_to_6517 with(enabled=false);
alter subscription pub_6517_from_6516 enable;
-- pgbench -p 6515 -c 16 -j 8 -T 5 -n postgres#  scale 1
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 16
number of threads: 8
duration: 5 s
number of transactions actually processed: 80
latency average = 1178.106 ms
tps = 13.581120 (including connections establishing)
tps = 13.597443 (excluding connections establishing)

   accounts  branches   tellers   history
   - - - -
6515   6546b1f0f 2d328ed28 7406473b0 7c1351523e8c07347b
6516   6546b1f0f 2d328ed28 d41d8cd98 d41d8cd98e7235f541
6517   f7c0791c8 d9c63e471 d41d8cd98 d41d8cd9830892eea1   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5191ae1af3   NOK

6515   6546b1f0f 2d328ed28 7406473b0 7c1351523e8c07347b
6516   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5191ae1af3
6517   6546b1f0f 2d328ed28 7406473b0 5a54cf7c5191ae1af3   NOK

[...]

I let that run for 10 minutes or so but that pgbench_history table 
md5-values (of ports 6516 and 6517) do not change anymore, which shows 
that it is and remains different from the original pgbench_history table 
in 6515.



When there is a where-clause this goes *always* wrong.

Without a where-clause all logical replication tests were OK.  Perhaps 
the error is not in our patch but something in logical replication.


Attached is the test program (will need some tweaking of PATHs, 
PG-variables (PGPASSFILE) etc).  This is the same program I used in 
march when you first posted a version of this patch alhough the error is 
different.



thanks,


Erik Rijkers





#!/bin/bash

# postgres binary compiled with 
#
# 20181101
#   0001-Remove-unused-atttypmod-column-from-initial-table-sy.patch
#   0002-Store-number-of-tuples-in-WalRcvExecResult.patch
#   0003-Refactor-function-create_estate_for_relation.patch
#   0004-Rename-a-WHERE-node.patch
#   0005-Row-filtering-for-logical-replication.patch
#   0006-Print-publication-WHERE-condition-in-psql.patch
#   0007-Publication-where-condition-support-for-pg_dump.patch
#   0008-Debug-for-row-filtering.patch
#

unset PGDATABASE PGPORT PGSERVICE
export PGDATABASE=postgres

scale=1

root_dir=/tmp/cascade

BIN=$HOME/pg_stuff/pg_installations/pgsql.logrep_rowfilter/bin.fast

export PATH=$BIN:$PATH

  initdb=$BIN/initdb
postgres=$BIN/postgres
  pg_ctl=$BIN/pg_ctl
baseport=6515
 appname=rowfilter

if [[ -d $root_dir/instance1 ]]; then rm -rf $root_dir/instance1; fi
if [[ -d $root_dir/instance2 ]]; then rm -rf $root_dir/instance2; fi
if [[ -d $root_dir/instance3 ]]; then rm -rf $root_dir/instance3; fi
if [[ -d $root_dir/instance1 ]]; then exit ; fi
if [[ -d $root_dir/instance2 ]]; then exit ; fi
if [[ -d $root_dir/instance3 ]]; then exit ; fi

devel_file=/tmp/bugs
echo filterbug>$devel

Hooks to Modify Execution Flow and Query Planner

2018-11-01 Thread Vincent Mirian
Hi all,

I would like to create a library with UDFs written in C that implements
different Query Planner tasks (e.g. scan, hash, join, etc...). I am looking
for a document that provides an overview of execution flow within postgres
and the query planner. I am also looking for a description of the software
data structures and interfaces used.

Any references, suggestions or guidance would be appreciated.

Thank you in advance,
-- 
Vincent Mirian


RE: [Proposal] Add accumulated statistics for wait event

2018-11-01 Thread Yotsunaga, Naoki
On Mon, Oct 29, 2018 at 1:52 AM, Phil Florent wrote:

Hi, thank you for comments.

>Yes you will be able to solve bottlenecks with sampling. In interactive mode, 
>a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is 
>normally OK.

With the tool you are using, can you sample at intervals shorter than 1 second?
If you can, you can get enough sampling number and you can also acquire short 
events.

>Since grafana is now able to connect directly to a postgresql source, I use it 
>to display the information collected from pg_stat_activity and psutil ( e.g 
>https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but 
>panels are in english)

It is wonderful to visualize.
Especially for beginners like me.


>Other DB have accumulated statistics but you can notice that sampling is also 
>their most modern method.
>E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . 
>Then you had "statspack". Those tools were based on accumulated statistics and 
>the reports were based on differences between 2 points. It was useful to solve 
>major problems but it was limited and not precise enough in many cases.

>The preferred feature to identify bottlenecks in the Oracle world is now ASH 
>(active session history). It can help with major problems, specific problems 
>AND it can identify short blockages.
>Too bad it is licensed as an option of their Enterprise Edition but similar 
>tools exist and they are also based on sampling of the activity.

>With the "official" ASH, sampling and archiving are done internally and you 
>have a circular memory zone dedicated to the feature. Hence the overhead is 
>lower but that's all.

>The most advanced interactive tool is called "snapper" and it is also based on 
>sampling.

Thanks. I will check it.

The current bottleneck survey method, from sampling, I can know the number 
(ratio) of waiting events.
Then, investigate from those with a high number of times (ratio).
Do you agree with this recognition?


---

Naoki, Yotsunaga.


Re: PostgreSQL Limits and lack of documentation about them.

2018-11-01 Thread Andrew Gierth
> "Nasby," == Nasby, Jim  writes:

 >> I did try a table with 1600 text columns then inserted values of
 >> several kB each. Trying with BIGINT columns the row was too large
 >> for the page. I've never really gotten a chance to explore these
 >> limits before, so I guess this is about the time.

 Nasby> Hmm… 18 bytes doesn’t sound right, at least not for the Datum.
 Nasby> Offhand I’d expect it to be the small (1 byte) varlena header +
 Nasby> an OID (4 bytes). Even then I don’t understand how 1600 text
 Nasby> columns would work; the data area of a tuple should be limited
 Nasby> to ~2000 bytes, and 2000/5 = 400.

1600 text columns won't work unless the values are very short or null.

A toast pointer is indeed 18 bytes: 1 byte varlena header flagging it as
a toast pointer, 1 byte type tag, raw size, saved size, toast value oid,
toast table oid.

A tuple can be almost as large as a block; the block/4 threshold is only
the point at which the toaster is run, not a limit on tuple size.

So (with 8k blocks) the limit on the number of non-null external-toasted
columns is about 450, while you can have the full 1600 columns if they
are integers or smaller, or just over 1015 bigints. But you can have
1600 text columns if they average 4 bytes or less (excluding length
byte).

If you push too close to the limit, it may even be possible to overflow
the tuple size by setting fields to null, since the null bitmap is only
present if at least one field is null. So you can have 1010 non-null
bigints, but if you try and do 1009 non-null bigints and one null, it
won't fit (and nor will 999 non-nulls and 11 nulls, if I calculated
right).

(Note also that dropped columns DO count against the 1600 limit, and
also that they are (for new row versions) set to null and thus force the
null bitmap to be present.)

-- 
Andrew (irc:RhodiumToad)



Re: row filtering for logical replication

2018-11-01 Thread Erik Rijkers

On 2018-11-01 08:56, Erik Rijkers wrote:

On 2018-11-01 01:29, Euler Taveira wrote:

Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
 escreveu:

The attached patches add support for filtering rows in the publisher.



I ran pgbench-over-logical-replication with a WHERE-clause and could
not get this to do a correct replication.  Below is the output of the
attached test program.


$ ./logrep_rowfilter.sh


I have noticed that the failure to replicate correctly can be avoided by 
putting a wait state of (on my machine) at least 3 seconds between the 
setting up of the subscription and the start of pgbench.  See the bash 
program I attached in my previous mail.  The bug can be avoided by a 
'sleep 5' just before the start of the actual pgbench run.


So it seems this bug is due to some timing error in your patch (or 
possibly in logical replication itself).



Erik Rijkers





Re: Hooks to Modify Execution Flow and Query Planner

2018-11-01 Thread Amit Langote
Hi,

On 2018/11/01 16:58, Vincent Mirian wrote:
> Hi all,
> 
> I would like to create a library with UDFs written in C that implements
> different Query Planner tasks (e.g. scan, hash, join, etc...). I am looking
> for a document that provides an overview of execution flow within postgres
> and the query planner. I am also looking for a description of the software
> data structures and interfaces used.

Maybe, the following chapter in Postgres documentation will help as a start:

https://www.postgresql.org/docs/11/static/overview.html

For studying internal data structures and interfaces, you can also read
the comments contained in the source code and README files containing
descriptions of various data structures and interfaces, which is a often
recommended method.

Specifically, if you just want to inject alternative plan nodes for the
individual scan, hash, join operators needed to compute a query, but want
the Postgres query planner to take care of the whole planning itself, you
might consider looking into the Custom Scan Provider facility:

https://www.postgresql.org/docs/current/static/custom-scan.html

With it, you can write C code that gets invoked at certain points during
planning and execution, where you can add your special/custom node to a
plan and do execution related tasks on those nodes, respectively.  With
this approach, Postgres planner and executor take care of most of the
details of planning and execution, whereas your code implements the
specialized logic you developed for, say, scanning a disk file, joining
two or more tables, building a hash table from the data read from a table,
etc.

You can alternatively formulate your code as a foreign data wrapper if all
you want do is model a non-Postgres data source as regular Postgres tables.

https://www.postgresql.org/docs/11/static/fdwhandler.html


If you don't intend to add new plan nodes or define a new type of foreign
table, but want to alter the planning or execution itself (or parts
thereof), you may want to look at various planner and executor hooks.  For
example, if you want to replace the whole planner, which takes a parsed
query (the Query struct) and returns a plan (the PlannedStmt struct whose
internals you'll need to figure out if your alternative planning code can
produce a valid one), you can use the following hook:

/* Hook for plugins to get control in planner() */
typedef PlannedStmt *(*planner_hook_type) (Query *parse,
   int cursorOptions,
   ParamListInfo boundParams);

But that may be too complex a hook to implement on your own, so you can
look at more granular hooks which allow certain points within the
planning, such as:

/* Hook for plugins to get control in set_rel_pathlist() */
typedef void (*set_rel_pathlist_hook_type) (PlannerInfo *root,
RelOptInfo *rel,
Index rti,
RangeTblEntry *rte);

/* Hook for plugins to get control in add_paths_to_joinrel() */
typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root,
 RelOptInfo *joinrel,
 RelOptInfo *outerrel,
 RelOptInfo *innerrel,
 JoinType jointype,
 JoinPathExtraData *extra);

/* Hook for plugins to replace standard_join_search() */
typedef RelOptInfo *(*join_search_hook_type) (PlannerInfo *root,
  int levels_needed,
  List *initial_rels);

/* Hook for plugins to get control in get_relation_info() */
typedef void (*get_relation_info_hook_type) (PlannerInfo *root,
 Oid relationObjectId,
 bool inhparent,
 RelOptInfo *rel);

On the executor side, you got:

/* Hook for plugins to get control in ExecutorStart() */
typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);

/* Hook for plugins to get control in ExecutorRun() */
typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
   ScanDirection direction,
   uint64 count,
   bool execute_once);

/* Hook for plugins to get control in ExecutorFinish() */
typedef void (*ExecutorFinish_hook_type) (QueryDesc *queryDesc);

/* Hook for plugins to get control in ExecutorEnd() */
typedef void (*ExecutorEnd_hook_type) (QueryDesc *queryDesc);

/* Hook for plugins to get control in ExecCheckRTPerms() */
typedef bool (*ExecutorCheckPerms_hook_type) (List *, bool);


If you can be more specific about the what exactly you're trying to do,
someone can give even 

Re: Ordered Partitioned Table Scans

2018-11-01 Thread Antonin Houska
David Rowley  wrote:

> On 1 November 2018 at 04:01, Antonin Houska  wrote:
> > * As for the logic, I found generate_mergeappend_paths() to be the most
> >   interesting part:
> >
> > Imagine table partitioned by "i", so "partition_pathkeys" is {i}.
> >
> > partition 1:
> >
> > i | j
> > --+--
> > 0 | 0
> > 1 | 1
> > 0 | 1
> > 1 | 0
> >
> > partition 2:
> >
> > i | j
> > --+--
> > 3 | 0
> > 2 | 0
> > 2 | 1
> > 3 | 1
> >
> > Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", 
> > the
> > ordering of the subpaths should not change the way tuples are split into
> > partitions.
> >
> > ...
>
> I understand what you're saying. I just don't understand what you
> think is wrong with the patch in this area.

I think these conditions are too restrictive:

/*
 * Determine if these pathkeys match the partition order, or reverse
 * partition order.  It can't match both, so only go to the trouble of
 * checking the reverse order when it's not in ascending partition
 * order.
 */
partition_order = pathkeys_contained_in(pathkeys,
partition_pathkeys);
partition_order_desc = !partition_order &&
pathkeys_contained_in(pathkeys,

partition_pathkeys_desc);


In the example above I wanted to show that your new feature should work even
if "pathkeys" is not contained in "partition_pathkeys".

> > Another problem I see is that build_partition_pathkeys() continues even if 
> > it
> > fails to create a pathkey for some partitioning column. In the example above
> > it would mean that the table can have "partition_pathkeys" equal to {j}
> > instead of {i, j} on some EC-related conditions. However such a key does not
> > correspond to reality - this is easier to imagine if another partition is
> > considered.
> >
> > partition 3:
> >
> >  i | j | k
> > ---+---+---
> >  1 | 0 | 1
> >  1 | 0 | 0
> >
> > So I think no "partition_pathkeys" should be generated in that case. On the
> > other hand, if the function returned the part of the list it could construct
> > so far, it'd be wrong because such incomplete pathkeys could pass the 
> > checks I
> > proposed above for reasons unrelated to the partitioning scheme.
> 
> Oops. That's a mistake. We should return what we have so far if we
> can't make one of the pathkeys. Will fix.

I think no "partition_pathkeys" should be created in this case, but before we
can discuss this in detail there needs to be an agreement on the evaluation of
the relationship between "pathkeys" and "partition_pathkeys", see above.

> > The following comments are mostly on coding:
> >
> > * Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() 
> > have
> >   this sentence in the header comment:
> >
> > Note: If changing this, see build_partition_pathkeys()
> >
> > However I could not find other use besides that in
> > RelationBuildPartitionDesc().
> 
> While the new code does not call those directly, the new code does
> depend on the sort order of the partitions inside the PartitionDesc,
> which those functions are responsible for. Perhaps there's a better
> way to communicate that.

I pointed this out because I suspect that changes of these functions would
affect more features, not only the one you're trying to implement.

> > * If pathkeys is passed, shouldn't create_append_path() include the
> >   cost_sort() of subpaths just like create_merge_append_path() does?  And if
> >   so, then create_append_path() and create_merge_append_path() might
> >   eventually have some common code (at least for the subpath processing) to 
> > be
> >   put into a separate function.
> 
> It does. It's just done via the call to cost_append().

ok, I missed that.

> > * Likewise, create_append_plan() / create_merge_append_plan() are going to 
> > be
> >   more similar then before, so some refactoring could also make sense.
> >
> > Although it's not too much code, I admit the patch is not trivial, so I'm
> > curious about your opinion.
> 
> I think the costing code is sufficiently different to warant not
> sharing more. For example, the startup costing is completely
> different. Append can start on the startup cost of the first subpath,
> but MergeAppend takes the sum of the startup cost of all subpaths.

ok

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Is there way to detect uncommitted 'new table' in pg_class?

2018-11-01 Thread Hubert Zhang
Thanks

On Thu, Nov 1, 2018 at 8:38 AM Michael Paquier  wrote:

> On Wed, Oct 31, 2018 at 01:30:52PM -0400, Robert Haas wrote:
> > In theory, at least, you could write C code to scan the catalog tables
> > with SnapshotDirty to find the catalog entries, but I don't think that
> > helps a whole lot.  You couldn't necessarily rely on those catalog
> > entries to be in a consistent state, and even if they were, they might
> > depend on committed types or functions or similar whose definitions
> > your backend can't see.  Moreover, the creating backend will have an
> > AccessExclusiveLock on the table -- if you write C code to bypass that
> > and read the data anyway, then you will probably destabilize the
> > entire system for complicated reasons that I don't feel like
> > explaining right now.
>
> One take here is that we cannot give any guarantee that a single DDL
> will create only one consistent version of the tuple added in system
> catalogs.  In those cases a new version is made visible by using
> CommandCounterIncrement() so as the follow-up processing can see it.
>
> > You should try very hard to find some way of solving this problem that
> > doesn't require reading data from a table that hasn't been committed
> > yet, because you are almost certainly not going to be able to make
> > that work reliably even if you are willing to write code in C.
>
> +1.
> --
> Michael
>


-- 
Thanks

Hubert Zhang


Re: Speeding up INSERTs and UPDATEs to partitioned tables

2018-11-01 Thread Amit Langote
On 2018/11/01 10:30, David Rowley wrote:
> It's great to know the patch is now so perfect that we've only the
> macro naming left to debate ;-)

I looked over v12 again and noticed a couple minor issues.

+ *  table then we store the index into parenting
+ *  PartitionTupleRouting 'partition_dispatch_info' array.  An

s/PartitionTupleRouting/PartitionTupleRouting's/g

Also, I got a bit concerned about "parenting".  Does it mean something
like "enclosing", because the PartitionDispatch is a member of
PartitionTupleRouting?  I got concerned because using "parent" like this
may be confusing as this is the partitioning code we're talking about,
where "parent" is generally used to mean "parent" table.

+ * the partitioned table that's the target of the command.  If we must
+ * route tuple via some sub-partitioned table, then the PartitionDispatch
+ * for those is only built the first time it's required.

... via some sub-partitioned table"s"

Or perhaps rewrite a bit as:

If we must route the tuple via some sub-partitioned table, then its
PartitionDispatch is built the first time it's required.


The macro naming discussion got me thinking today about the macro itself.
It encapsulates access to the various PartitionTupleRouting arrays
containing the maps, but maybe we've got the interface of tuple routing a
bit (maybe a lot given this thread!) wrong to begin with.  Instead of
ExecFindPartition returning indexes into various PartitionTupleRouting
arrays and its callers then using those indexes to fetch various objects
from those arrays, why doesn't it return those objects itself?  Although
we made sure that the callers don't need to worry about the meaning of
these indexes changing with this patch, it still seems a bit odd for them
to have to go back to those arrays to get various objects.

How about we change ExecFindPartition's interface so that it returns the
ResultRelInfo, the two maps, and the partition slot?  So, the arrays
simply become a cache for ExecFindPartition et al and are no longer
accessed outside execPartition.c.  Although that makes the interface of
ExecFindPartition longer, I think it reduces overall complexity.

I've implemented that in the attached patch
1-revise-ExecFindPartition-interface.patch.

Also, since all members of PartitionTupleRouting are only accessed within
execPartition.c save root_tuple_slot, we can move it to execPartition.c to
make its internals private, after doing something about root_tuple_slot.
Looking at the code related to root_tuple_slot, it seems the field really
belongs in ModifyTableState, because it got nothing to do with routing.
Attached 2-make-PartitionTupleRouting-private.patch does that.

These patches 1 and 2 apply on top of v12-0001.. patch.

Thanks,
Amit

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0b0696e61e..b45972682f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2316,6 +2316,7 @@ CopyFrom(CopyState cstate)
bool   *nulls;
ResultRelInfo *resultRelInfo;
ResultRelInfo *target_resultRelInfo;
+   ResultRelInfo *prev_part_rel = NULL;
EState *estate = CreateExecutorState(); /* for ExecConstraints() */
ModifyTableState *mtstate;
ExprContext *econtext;
@@ -2331,7 +2332,6 @@ CopyFrom(CopyState cstate)
CopyInsertMethod insertMethod;
uint64  processed = 0;
int nBufferedTuples = 0;
-   int prev_leaf_part_index = -1;
boolhas_before_insert_row_trig;
boolhas_instead_insert_row_trig;
boolleafpart_use_multi_insert = false;
@@ -2685,19 +2685,24 @@ CopyFrom(CopyState cstate)
/* Determine the partition to heap_insert the tuple into */
if (proute)
{
-   int leaf_part_index;
-   TupleConversionMap *map;
+   TupleTableSlot *partition_slot = NULL;
+   TupleConversionMap *child_to_parent_map,
+  *parent_to_child_map;
 
/*
 * Attempt to find a partition suitable for this tuple.
 * ExecFindPartition() will raise an error if none can 
be found.
+* This replaces the original target ResultRelInfo with
+* partition's.
 */
-   leaf_part_index = ExecFindPartition(mtstate, 
target_resultRelInfo,
-   
proute, slot, estate);
-   Assert(leaf_part_index >= 0 &&
-  leaf_part_index < proute->num_partitions);
+   resultRelInfo = ExecFindPartition(mtstate, 
target_resultRelInfo,
+

Re: Super PathKeys (Allowing sort order through precision loss functions)

2018-11-01 Thread Tomas Vondra

On 11/01/2018 02:40 AM, David Rowley wrote:

On 1 November 2018 at 12:24, Andres Freund  wrote:

FWIW, I kind of wonder if we built proper infrastructure to allow to
make such inferrences from function calls, whether it could also be made
to support the transformation of LIKEs into indexable <= >= clauses.


Perhaps, but I doubt it would be the same function to do both.  Surely
I need something that accepts details about the function call as
arguments and returns an Expr * of the argument that we can derive the
order of the return value from, or NULL.  I think the transformation
you need might be more along the lines of returning a List * of quals
that can substitute an OpExpr containing a function call. I'm not that
clear on how we'd know the new quals were better than the existing
ones.  For example extract('year', dt)  = 2018 could be transformed to
dt >= '2018-01-01' AND dt < '2019-01-01', but how would we know that
was better. There might be an index on extract('year', dt).



IMHO there is only a handful of "useful" transformations of this kind, 
depending on the operator class of an index. So when building index 
paths and checking which quals may be used as index conditions, we'd do 
try transforming incompatible quals and leave the rest up to the 
existing create_index_path machinery (which already makes the decision 
about which quals to use for index search etc.)


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] generated columns

2018-11-01 Thread John Naylor
On 10/30/18, Peter Eisentraut  wrote:
> 3. Radical alternative: Collapse everything into one new column.  We
> could combine atthasdef and attgenerated and even attidentity into a new
> column.  (Only one of the three can be the case.)  This would give
> client code a clean break, which may or may not be good.  The
> implementation would be uglier than #1 but probably cleaner than #2.  We
> could also get 4 bytes back per pg_attribute row.

Thinking about the distinction between 'stored' and 'virtual', I
immediately thought of atthasmissing. In a way it indicates that there
is a virtual default value. It seems the level of materialization is
an orthogonal concept to the kind of value we have. What if
attgenerated had

d = normal default value
i = identity default
a = identity always
c = generated column

and in addition there was an attmaterialized column with

s = stored
v = virtual

In this scheme,
-Normal attribute: '\0' + 's'
-Default value: 'd' + 's'
-Fast default: 'd' + 'v' until the table is rewritten
-Identity column: 'i'/'a' + 's'
-Generated column: 'c' + 's'/'v'

This way, we'd still end up with 1 fewer column (2 new ones minus
atthasdef, attidentity, and atthasmissing).

A bit crazier, what if "d = dropped" was another allowed value in
attmaterialized -- we could then get rid of attisdropped as well. That
has obvious disadvantages, but the broader idea is that this design
may have use cases we haven't thought of yet.

Thoughts?

-John Naylor



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Amit Kapila
On Thu, Nov 1, 2018 at 6:14 AM Michael Paquier  wrote:
>
> On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote:
> > There's no rule whatsoever that a parallel worker can't write to the
> > disk.  pg_start_backup and pg_stop_backup have to be
> > parallel-restricted because, when used in non-exclusive mode, they
> > establish backend-local state that wouldn't be synchronized with the
> > state in the workers -- namely the information that a non-exclusive
> > backup is in progress.
>
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?

Right, but I think the question here is whether it is safe to execute
this function in parallel workers?  I don't see any meaningful use
cases where anyone wants to run this via parallel workers even if it
is safe to execute via them, but I think that is not how we decide
parallel-safe property of any functions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Tomas Vondra

On 10/31/2018 06:17 PM, Robert Haas wrote:

On Mon, Oct 29, 2018 at 12:48 PM Andrew Gierth
 wrote:

Then FETCH FIRST N WITH TIES becomes "stop when the expression
   rank() over (order by ...) <= N
is no longer true" (where the ... is the existing top level order by)


Wouldn't that be wicked expensive compared to something hard-coded
that does the same thing?



Not sure, but that was one of my concerns too, particularly for the 
simple FETCH FIRST N WITH TIES case. But I think Andrew has a point it 
would make it much easier to implement the PERCENT case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Connection slots reserved for replication

2018-11-01 Thread Alexander Kukushkin
Hi,

Attached rebased version patch to the current HEAD and created commit fest entry
On Fri, 21 Sep 2018 at 13:43, Alexander Kukushkin  wrote:
>
> Hi,
>
> On 20 September 2018 at 08:18, Kyotaro HORIGUCHI
>  wrote:
>
> >
> > Instaed, we can iterally "reserve" connection slots for the
> > specific use by providing ProcGlobal->walsenderFreeProcs. The
> > slots are never stolen for other usage. Allowing additional
> > walsenders comes after all the slots are filled to grab an
> > available "normal" slot, it works as the same as the current
> > behavior when walsender_reserved_connectsions = 0.
> >
> > What do you think about this?
>
> Sounds reasonable, please see the updated patch.
>
> Regards,
> --
> Alexander Kukushkin
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7554cba3f9..d9ddcb22b7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3060,6 +3060,27 @@ include_dir 'conf.d'

   
 
+ 
+  replication_reserved_connections
+  (integer)
+  
+   replication_reserved_connections configuration parameter
+  
+  
+  
+   
+Determines the number of connection slots that
+are reserved for replication connections.
+   
+
+   
+The default value is zero. The value should not exceed max_wal_senders.
+This parameter can only be set at server start.
+   
+  
+ 
+
   
max_replication_slots (integer)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2683385ca6..3b6c636077 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -122,6 +122,8 @@ int			max_wal_senders = 0;	/* the maximum number of concurrent
 int			wal_sender_timeout = 60 * 1000; /* maximum time to send one WAL
 			 * data message */
 bool		log_replication_commands = false;
+int			replication_reserved_connections = 0; /* the number of connection slots
+   * reserved for replication connections */
 
 /*
  * State for WalSndWakeupRequest
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6f9aaa52fa..2d04a8204a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -43,6 +43,7 @@
 #include "postmaster/autovacuum.h"
 #include "replication/slot.h"
 #include "replication/syncrep.h"
+#include "replication/walsender.h"
 #include "storage/condition_variable.h"
 #include "storage/standby.h"
 #include "storage/ipc.h"
@@ -180,6 +181,7 @@ InitProcGlobal(void)
 	ProcGlobal->freeProcs = NULL;
 	ProcGlobal->autovacFreeProcs = NULL;
 	ProcGlobal->bgworkerFreeProcs = NULL;
+	ProcGlobal->walsenderFreeProcs = NULL;
 	ProcGlobal->startupProc = NULL;
 	ProcGlobal->startupProcPid = 0;
 	ProcGlobal->startupBufferPinWaitBufId = -1;
@@ -253,13 +255,20 @@ InitProcGlobal(void)
 			ProcGlobal->autovacFreeProcs = &procs[i];
 			procs[i].procgloballist = &ProcGlobal->autovacFreeProcs;
 		}
-		else if (i < MaxBackends)
+		else if (i < MaxBackends - replication_reserved_connections)
 		{
 			/* PGPROC for bgworker, add to bgworkerFreeProcs list */
 			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->bgworkerFreeProcs;
 			ProcGlobal->bgworkerFreeProcs = &procs[i];
 			procs[i].procgloballist = &ProcGlobal->bgworkerFreeProcs;
 		}
+		else if (i < MaxBackends)
+		{
+			/* PGPROC for walsender, add to walsenderFreeProcs list */
+			procs[i].links.next = (SHM_QUEUE *) ProcGlobal->walsenderFreeProcs;
+			ProcGlobal->walsenderFreeProcs = &procs[i];
+			procs[i].procgloballist = &ProcGlobal->walsenderFreeProcs;
+		}
 
 		/* Initialize myProcLocks[] shared memory queues. */
 		for (j = 0; j < NUM_LOCK_PARTITIONS; j++)
@@ -304,6 +313,8 @@ InitProcess(void)
 		procgloballist = &ProcGlobal->autovacFreeProcs;
 	else if (IsBackgroundWorker)
 		procgloballist = &ProcGlobal->bgworkerFreeProcs;
+	else if (am_walsender)
+		procgloballist = &ProcGlobal->walsenderFreeProcs;
 	else
 		procgloballist = &ProcGlobal->freeProcs;
 
@@ -318,6 +329,14 @@ InitProcess(void)
 
 	set_spins_per_delay(ProcGlobal->spins_per_delay);
 
+	/*
+	* Try to use ProcGlobal->freeProcs as a fallback when
+	* all reserved walsender slots are already busy.
+	*/
+	if (am_walsender && replication_reserved_connections < max_wal_senders
+			&& *procgloballist == NULL)
+		procgloballist = &ProcGlobal->freeProcs;
+
 	MyProc = *procgloballist;
 
 	if (MyProc != NULL)
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4f1d2a0d28..bb90f53b1e 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -527,7 +527,7 @@ InitializeMaxBackends(void)
 
 	/* the extra unit accounts for the autovacuum launcher */
 	MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
-		max_worker_processes;
+		max_worker_processes + replication_reserved_connections;
 
 	/* internal error because the values were all checked previously */
 	if (MaxBackends > MAX_BACKENDS)

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Amit Kapila
On Sun, Oct 21, 2018 at 7:12 PM Michael Paquier  wrote:
>
> Hi all,
>
> This is a follow-up of the following thread:
> https://www.postgresql.org/message-id/20181012010411.re53cwcistcpi...@alap3.anarazel.de
>
> In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and
> the buildfarm has immediately complained about files generated by
> EXEC_BACKEND missing, causing pg_verify_checksums to fail for such
> builds.  An extra issue has been noticed by Andres in the tool: it
> misses to check for temporary files, hence files like
> base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the
> tool to fail.  After a crash, those files would not be removed, and
> stopping the instance would still let them around.
>
> pg_verify_checksums used first a blacklist to decide if files have
> checksums or not.  So with this approach all files should have checksums
> except files like pg_control, pg_filenode.map, PG_VERSION, etc.
>
> d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND
> builds.  After discussion, Andres has pointed out that some extensions
> may want to drop files within global/ or base/ as part of their logic.
> cstore_fdw was mentioned but if you look at their README that's not the
> case.  However, I think that Andres argument is pretty good as with
> pluggable storage we should allow extensions to put custom files for
> different tablespaces.  So this commit has changed the logic of
> pg_verify_checksums to use a whitelist, which assumes that only normal
> relation files have checksums:
> 
> .
> _
> _.
> After more discussion on the thread mentioned above, Stephen has pointed
> out that base backups use the same blacklist logic when verifying
> checksums.  This has the same problem with EXEC_BACKEND-specific files,
> resulting in spurious warnings (that's an example!) which are confusing
> for the user:
> $ pg_basebackup -D popo
> WARNING:  cannot verify checksum in file "./global/config_exec_params",
> block 0: read buffer size 5 and page size 8192 differ
>
> Folks on this thread agreed that both pg_verify_checksums and checksum
> verification for base backups should use the same logic.  It seems to me
> that using a whitelist-based approach for both is easier to maintain as
> the patterns of files supporting checksums is more limited than files
> which do not support checksums.  So this way we allow extensions
> bundling custom files to still work with pg_verify_checksums and
> checksum verification in base backups.
>

This sounds like a good argument for having a whitelist approach, but
is it really a big problem if a user gets warning for files that the
utility is not able to verify checksums for?  I think in some sense
this message can be useful to the user as it can allow him to know
which files are not verified by the utility for some form of
corruption.  I guess one can say that users might not be interested in
this information in which case such a check could be optional as you
seem to be suggesting in the following paragraph.

> Something else which has been discussed on this thread is that we should
> have a dedicated API to decide if a file has checksums or not, and if it
> should be skipped in both cases.  That's work only for HEAD though, so
> we need to do something for HEAD and v11, which is simple.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: FETCH FIRST clause WITH TIES option

2018-11-01 Thread Surafel Temesgen
hi,

The attached patch include all the comment given by Tomas and i check sql
standard about LIMIT and this feature

it did not say anything about it but I think its good idea to include it to
LIMIT too and I will add it if we have consensus on it.

regards

surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..b649b1ca7a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1408,6 +1408,9 @@ FETCH { FIRST | NEXT } [ count ] {
 If count is
 omitted in a FETCH clause, it defaults to 1.
 ROW
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index bb28cf7d1d..dbc060e9a9 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
 
 	/*
 	 * Initialize result slot and type. (XXX not actually used, but upper
@@ -387,6 +438,22

Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2018-11-01 Thread Dilip Kumar
On Mon, Oct 29, 2018 at 2:53 PM Amit Langote
 wrote:
>
> Thank you for creating the patch.
>
> On 2018/10/28 20:35, Dilip Kumar wrote:
> > On Sat, Oct 27, 2018 at 10:07 AM Dilip Kumar  wrote:
> >> On Fri, Oct 26, 2018 at 1:12 PM Amit Langote wrote:
> >>> On 2018/10/25 19:54, Dilip Kumar wrote:
>  Basically, if the relation is RELOPT_OTHER_MEMBER_REL, we can
>  recursively fetch its parent until we reach to the base relation
>  (which is actually named in the query). And, once we have the base
>  relation we can check ACL on that and set vardata->acl_ok accordingly.
>  Additionally, for getting the parent RTI we need to traverse
>  "root->append_rel_list". Another alternative could be that we can add
>  parent_rti member in RelOptInfo structure.
> >>>
> >>> Adding parent_rti would be a better idea [1].  I think that traversing
> >>> append_rel_list every time would be inefficient.
> >>>
> >>> [1] I've named it inh_root_parent in one of the patches I'm working on
> >>> where I needed such a field (https://commitfest.postgresql.org/20/1778/)
> >>>
> >> Ok, Make sense. I have written a patch by adding this variable.
> >> There is still one FIXME in the patch, basically, after getting the
> >> baserel rte I need to convert child varattno to parent varattno
> >> because in case of inheritance that can be different.  Do we already
> >> have any mapping from child attno to parent attno or we have to look
> >> up the cache.
>
> Sorry I forgot to cc you, but I'd posted a patch on the "speeding up
> planning with partitions" thread, that's extracted from the bigger patch,
> which adds inh_root_parent member to RelOptInfo [1].  Find it attached
> with this email.
>
> One of the differences from your patch is that it makes inh_root_parent
> work not just for partitioning, but to inheritance in general.  Also, it
> codes the changes to build_simple_rel to set inh_root_parent's value a bit
> differently than your patch.
>
> > Attached patch handles this issue.
>
> I noticed a typo in your patch:
>
> transalate_varattno -> translate_varattno
>
> +static int
> +transalate_varattno(Oid oldrelid, Oid newrelid, int old_attno)
> +{
> +   Relationoldrelation = heap_open(oldrelid, NoLock);
>
> It doesn't seem nice that it performs a heap_open on the parent relation.
>
> +   att = TupleDescAttr(old_tupdesc, old_attno - 1);
> +   attname = NameStr(att->attname);
> +
> +   newtup = SearchSysCacheAttName(newrelid, attname);
> +   if (!newtup)
> +   {
> +   heap_close(oldrelation, NoLock);
> +   return InvalidAttrNumber;
> +   }
>
> and
>
> +   varattno = transalate_varattno(relid, rte->relid, var->varattno);
> +   if (AttributeNumberIsValid(varattno))
> +   relid = rte->relid;
> +   else
> +   varattno = var->varattno;
>
> It's not possible for varattno to be invalid here, because the query on
> inheritance parent only allows to select parent's columns, so we'd error
> out before getting here if a column not present in the parent were selected.

Make sense to me.
>
>
> Anyway, why don't we just use the child table's AppendRelInfo to get the
> parent's version of varattno instead of creating a new function?  It can
> be done as shown in the attached revised version of the portion of the
> patch changing selfuncs.c.  Please take a look.

+1

>
> [1]
> https://www.postgresql.org/message-id/f06a398a-40a9-efb4-fab9-784400fecf13%40lab.ntt.co.jp



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Antonin Houska
Review of [1] made me think of this optimization, currently used only in
create_merge_append_path():

/*
 * Apply query-wide LIMIT if known and path is for sole base relation.
 * (Handling this at this low level is a bit klugy.)
 */
if (bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

Currently it's not a problem because the output of MergeAppend plan is not
likely to be re-sorted, but I don't think data correctness should depend on
cost evaluation. Instead, -1 should be set here if there's any chance that the
output will be sorted again.

I tried to reproduce the issue by applying the "Incremental sort" [2] patch
and running the following example:

CREATE TABLE t(i int, j int);
CREATE TABLE t1() INHERITS (t);
CREATE INDEX ON t1(i, j);
INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
CREATE TABLE t2() INHERITS (t);
CREATE INDEX ON t2(i, j);
INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);

ANALYZE;

SELECT * FROM t ORDER BY i, j DESC LIMIT 1;

I expected the MergeAppend plan to apply the limit and thus prevent the
incremental sort node from receiving the first tuple that it should emit,
however the query yielded correct result. I think the reason is that the
MergeAppendPath.limit_tuples field is only used for cost estimates, but not
enforced during execution. Is that intended?

I thought this could be better approach to the limit push-down

if (root->limit_tuples > 0 && root->parse->sortClause == NIL &&
bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

however it will stop working as soon as the incremental sort patch (which can
is used below the upper planner) gets applied.

Finally I think we should be able to apply the limit to generic path, not only
to MergeAppendPath. I just don't know how to check when it's correct. Does
anyone have an idea?

[1] https://commitfest.postgresql.org/20/1850/

[2] https://commitfest.postgresql.org/20/1124/

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 04:17, Tom Lane  wrote:

> and on Xcode 10.0 I get

Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 installation,
on 10.12 with XCode 8.3.3 I do however get -isysroot.

> Right now I think the only plausible
> fix is to go back to adding "-isysroot $PG_SYSROOT" to CPPFLAGS.

+1. That does seem like the safe option.

cheers ./daniel


bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
Hi

The processing of named parameters inside CALL statement is not correct.

It is my code :-/. I am sorry

Attached patch fix it.

This still doesn't fix INOUT variables with default value - so is not fully
correct, but in this moment, it can show, where is a core of this issue.

Regards

Pavel
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..781963e32c 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2171,7 +2171,6 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2210,45 +2209,60 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   **args;
+int		i = 0;
+int		nargs = list_length(funcexpr->args);
+
+args = palloc0(sizeof(Node *) * FUNC_MAX_ARGS);
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
+	Node	   *n = lfirst(lc);
+
+	if (IsA(n, NamedArgExpr))
 	{
-		Param	   *param = castNode(Param, n);
+		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
+
+		Assert(nexpr->argnumber < nargs);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
+		args[nexpr->argnumber] = (Node *) nexpr->arg;
 	}
-	else if (IsA(n, NamedArgExpr))
+	else
+		args[i++] = n;
+}
+
+for (i = 0; i < nargs; i++)
+{
+	Node	   *n = args[i];
+
+	Assert(n != NULL);
+
+	if (argmodes[i] == PROARGMODE_INOUT)
 	{
-		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
+		if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
+
+			/* paramid is offset by 1 (see make_datum_param()) */
 
-		if (!IsA(nexpr->arg, Param))
+			row->varnos[nfields++] = param->paramid - 1;
+		}
+		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
 	 errmsg("argument %d is an output argument but is not writable", i + 1)));
-
-		param = castNode(Param, nexpr->arg);
-
-		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
-		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
 	}
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("argument %d is an output argument but is not writable", i + 1)));
 }
-i++;
+
+pfree(args);
 			}
 
 			row->nfields = nfields;


Commitfest 2018-11

2018-11-01 Thread Daniel Gustafsson
We are now in November and the 2018-11 commitfest has started (or rather, will
one someone pushes the button) but AFAICT there hasn’t been anyone volunteering
to do be CFM this time apart from Dmitry Dolgov who showed an interest (in
CA+q6zcWPs5aFUQxZ5qi=wRhbesUSRGD3_r4HSgQKq+G6=ux...@mail.gmail.com) when the
2018-09 CF started.

Dmitry: Are you still interested in running this commitfest?

If not, I volunteer to do it for 2018-11.

If someone has already volunteered and I completely missed that then sorry for
the noise.

cheers ./daniel


Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
čt 1. 11. 2018 v 13:00 odesílatel Pavel Stehule 
napsal:

> Hi
>
> The processing of named parameters inside CALL statement is not correct.
>
> It is my code :-/. I am sorry
>
> Attached patch fix it.
>
> This still doesn't fix INOUT variables with default value - so is not
> fully correct, but in this moment, it can show, where is a core of this
> issue.
>

This version disallow INOUT default variables from plpgsql



> Regards
>
> Pavel
>
>
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..21c8c398e0 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2171,7 +2171,6 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2210,45 +2209,62 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   **args;
+int		i = 0;
+int		nargs = list_length(funcexpr->args);
+
+args = palloc0(sizeof(Node *) * FUNC_MAX_ARGS);
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
-	{
-		Param	   *param = castNode(Param, n);
+	Node	   *n = lfirst(lc);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
-	}
-	else if (IsA(n, NamedArgExpr))
+	if (IsA(n, NamedArgExpr))
 	{
 		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
 
-		if (!IsA(nexpr->arg, Param))
+		args[nexpr->argnumber] = (Node *) nexpr->arg;
+	}
+	else
+		args[i++] = n;
+}
+
+for (i = 0; i < nargs; i++)
+{
+	Node	   *n = args[i];
+
+	if (argmodes[i] == PROARGMODE_INOUT)
+	{
+		if (!n)
+		{
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("argument %d is an output argument but is not writable", i + 1)));
+	 errmsg("argument %d is an output argument but is not passed", i + 1)));
+		}
+		else if (IsA(n, Param))
+		{
+			Param	   *param = castNode(Param, n);
 
-		param = castNode(Param, nexpr->arg);
+			/* paramid is offset by 1 (see make_datum_param()) */
 
-		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
-		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
+			row->varnos[nfields++] = param->paramid - 1;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_SYNTAX_ERROR),
+	 errmsg("argument %d is an output argument but is not writable", i + 1)));
 	}
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("argument %d is an output argument but is not writable", i + 1)));
 }
-i++;
+
+pfree(args);
 			}
 
 			row->nfields = nfields;


Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>
> Dmitry: Are you still interested in running this commitfest?

Yes, I'm still interested. Do I need to have any special permissions in CF app
for that (e.g. now I can't "push the button" to start the current one)?



Re: Commitfest 2018-11

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 13:19, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> 
>> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>> 
>> Dmitry: Are you still interested in running this commitfest?
> 
> Yes, I'm still interested.

Great!

> Do I need to have any special permissions in CF app
> for that (e.g. now I can't "push the button" to start the current one)?

Yes, I believe either Magnus or Stephen can set the correct permissions (or at
least point to whomever can).

cheers ./daniel


Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
čt 1. 11. 2018 v 13:10 odesílatel Pavel Stehule 
napsal:

>
>
> čt 1. 11. 2018 v 13:00 odesílatel Pavel Stehule 
> napsal:
>
>> Hi
>>
>> The processing of named parameters inside CALL statement is not correct.
>>
>> It is my code :-/. I am sorry
>>
>> Attached patch fix it.
>>
>> This still doesn't fix INOUT variables with default value - so is not
>> fully correct, but in this moment, it can show, where is a core of this
>> issue.
>>
>
> This version disallow INOUT default variables from plpgsql
>

Probably correct solution is saving transformed arguments after
expand_function_arguments and iterating over transformed argument list, not
over
original argument list.


>
>
>> Regards
>>
>> Pavel
>>
>>


Re: Commitfest 2018-11

2018-11-01 Thread Stephen Frost
Greetings,

* Daniel Gustafsson (dan...@yesql.se) wrote:
> > On 1 Nov 2018, at 13:19, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > Do I need to have any special permissions in CF app
> > for that (e.g. now I can't "push the button" to start the current one)?
> 
> Yes, I believe either Magnus or Stephen can set the correct permissions (or at
> least point to whomever can).

I've added you to the 'cf admins' group, so please give it a shot now
and let me know if you run into any troubles.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Andreas 'ads' Scherbaum

On 01.11.18 07:26, Michael Paquier wrote:

On Thu, Nov 01, 2018 at 01:32:09AM +0100, Andreas 'ads' Scherbaum wrote:

Picking up on this idea, attached is a first draft for changing the
README.

Why don't you add it to the upcoming commit fest?  It would be good to
get some traction with a formal review.


I plan to do that, once I gathered some feedback here.



It includes links to the website, as well as the short version of the
installation instructions.

+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html
+
+Short version:
+
+./configure
+make
+su
+make install
+adduser postgres
+mkdir /usr/local/pgsql/data
+chown postgres /usr/local/pgsql/data
+su - postgres
+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start

Adding a section about installation and another one about documentation
are good things.  Now for the installation section I disagree about
adding this detailed way of doing things, and just adding a URL looks
enough.

Was thinking about this, but then decided to add it as an example,
and see what people think.




Pointing to the global installation recommendations would be a better
fit also as a lot of things are platform-dependent.  So this URL looks
better:
https://www.postgresql.org/docs/current/static/installation.html

Now there is also a problem, the README would point out to the
development version of the documentation.  As this is made at working
using git, I could personally live with having stable branches also
refer to the development version, but it could also make sense to have
each stable branch point to the URL of the versions they work on.


That is a bit problematic. The README lives on git first, and therefore
should point to the development version. The release process
can replace this with links to the current version.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project




Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-01 Thread Pavel Stehule
Cleaned patch with regress tests
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out
index 547ca22a55..8762e1335c 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -276,3 +276,43 @@ DROP PROCEDURE test_proc1;
 DROP PROCEDURE test_proc3;
 DROP PROCEDURE test_proc4;
 DROP TABLE test1;
+CREATE TABLE test_call_proc(key serial, name text);
+CREATE OR REPLACE PROCEDURE p1(v_cnt int, v_ResultSet inout refcursor = NULL)
+AS $$
+BEGIN
+  INSERT INTO test_call_proc(name) VALUES('name test');
+  OPEN v_ResultSet FOR SELECT * FROM test_call_proc;
+END
+$$ LANGUAGE plpgsql;
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_cnt:=v_cnt, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(10, v_ResultSet := v_ResultSet);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DO $$
+DECLARE
+  v_ResultSet refcursor;
+  v_cnt   integer;
+BEGIN
+  CALL p1(v_ResultSet := v_ResultSet, v_cnt:=v_cnt);
+  RAISE NOTICE '%', v_ResultSet;
+END;
+$$;
+NOTICE:  
+DROP PROCEDURE p1;
+DROP TABLE test_call_proc;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 45526383f2..8589c62ce1 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2168,10 +2168,10 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 		 */
 		if (!stmt->target)
 		{
+			Form_pg_proc funcform;
 			Node	   *node;
 			ListCell   *lc;
 			FuncExpr   *funcexpr;
-			int			i;
 			HeapTuple	tuple;
 			Oid		   *argtypes;
 			char	  **argnames;
@@ -2179,6 +2179,7 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			MemoryContext oldcontext;
 			PLpgSQL_row *row;
 			int			nfields;
+			int			pronargs;
 
 			/*
 			 * Get the original CallStmt
@@ -2196,6 +2197,8 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			if (!HeapTupleIsValid(tuple))
 elog(ERROR, "cache lookup failed for function %u", funcexpr->funcid);
 			get_func_arg_info(tuple, &argtypes, &argnames, &argmodes);
+			funcform = (Form_pg_proc) GETSTRUCT(tuple);
+			pronargs = funcform->pronargs;
 			ReleaseSysCache(tuple);
 
 			/*
@@ -2210,45 +2213,80 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 			row->varnos = palloc(sizeof(int) * FUNC_MAX_ARGS);
 
 			nfields = 0;
-			i = 0;
-			foreach(lc, funcexpr->args)
+
+			/*
+			 * The argmodes can be in different order than funcexpr->args due
+			 * named args. When we should to check INOUT parameters and prepare
+			 * target variable, we should to reorder a arguments  first. This is
+			 * similar code to reorder_function_arguments. In this part a default
+			 * values are not necessary.
+			 */
+			if (argmodes)
 			{
-Node	   *n = lfirst(lc);
+Node	   *argarray[FUNC_MAX_ARGS];
+int		nargsprovided = list_length(funcexpr->args);
+int		i = 0;
+
+MemSet(argarray, 0, pronargs * sizeof(Node *));
 
-if (argmodes && argmodes[i] == PROARGMODE_INOUT)
+foreach(lc, funcexpr->args)
 {
-	if (IsA(n, Param))
-	{
-		Param	   *param = castNode(Param, n);
+	Node	   *n = lfirst(lc);
 
-		/* paramid is offset by 1 (see make_datum_param()) */
-		row->varnos[nfields++] = param->paramid - 1;
-	}
-	else if (IsA(n, NamedArgExpr))
+	if (IsA(n, NamedArgExpr))
 	{
 		NamedArgExpr *nexpr = castNode(NamedArgExpr, n);
-		Param	   *param;
+		argarray[nexpr->argnumber] = (Node *) nexpr->arg;
+	}
+	else
+		argarray[i++] = n;
+}
 
-		if (!IsA(nexpr->arg, Param))
-			ereport(ERROR,
-	(errcode(ERRCODE_SYNTAX_ERROR),
-	 errmsg("argument %d is an output argument but is not writable", i + 1)));
+Assert(nargsprovided <= pronargs);
 
-		param = castNode(Param, nexpr->arg);
+for (i = 0; i < pronargs; i++)
+{
+	Node	   *n = argarray[i];
 
+	if (argmodes[i] == PROARGMODE_INOUT)
+	{
 		/*
-		 * Named arguments must be after positional arguments,
-		 * so we can increase nfields.
+		 * Empty positions are related to default values. The INOUT defaults
+		 * are allowed, only if after are not any other parameter.
 		 */
-		row->varnos[nexpr->argnumber] = param->paramid - 1;
-		nfields++;
+		if (!n)
+		{
+			int		j;
+			bool	found = false;
+
+			for (j = i + 1; j < pronargs; j++)
+if (argarray[j])
+{
+	found = true;
+	break;
+}
+
+			if (found)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("argument %i with default values is output argument but it is not writeable", i + 1)));
+
+			/* there are not any other parameter */
+			break;
+		}
+		else if (IsA(n, Param))
+		{
+			Param	   *param = castNode(P

CF app feature request

2018-11-01 Thread Andrew Dunstan



Yesterday Fabien and I submitted the same item to the Commitfest (1859 
and 1860). Unfortunately there doesn't seem to be any way for one of 
these to be withdrawn. "Rejected" and "Returned with Feedback" seem 
wrong. Ideally, there would be a way for someone who submits an item in 
error to withdraw it, at which point it should be as if it had never 
been submitted.


Meanwhile, what's the advice about how to deal with these?


cheers


andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 14:11, Stephen Frost  wrote:
> I've added you to the 'cf admins' group

Thanks.

> so please give it a shot now and let me know if you run into any troubles.

Hm...I don't see any difference in CF app, what should be changed?



Re: zheap: a new storage format for PostgreSQL

2018-11-01 Thread Tomas Vondra

On 11/01/2018 07:43 AM, Amit Kapila wrote:


You can find the latest code at https://github.com/EnterpriseDB/zheap



Seems valgrind complains about a couple of places in the code - nothing 
major, might be noise, but probably worth a look.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
==7569== Conditional jump or move depends on uninitialised value(s)
==7569==at 0x5914C8: ZHeapDetermineModifiedColumns (zheapam.c:)
==7569==by 0x587453: zheap_update (zheapam.c:2176)
==7569==by 0x7577D3: ExecUpdate (nodeModifyTable.c:1426)
==7569==by 0x759C0B: ExecModifyTable (nodeModifyTable.c:2629)
==7569==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==7569==by 0x71D8E9: ExecProcNode (executor.h:241)
==7569==by 0x720194: ExecutePlan (execMain.c:1711)
==7569==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==7569==by 0x71DD3F: ExecutorRun (execMain.c:310)
==7569==by 0x9113CA: ProcessQuery (pquery.c:161)
==7569==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==7569==by 0x9122C1: PortalRun (pquery.c:799)
==7569==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==7569==by 0x9104D9: PostgresMain (postgres.c:4244)
==7569==by 0x86A911: BackendRun (postmaster.c:4388)
==7569==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==7569==by 0x8665F7: ServerLoop (postmaster.c:1711)
==7569==by 0x865EA3: PostmasterMain (postmaster.c:1384)
==7569==by 0x78E3AB: main (main.c:228)
==7569==  Uninitialised value was created by a stack allocation
==7569==at 0x59043D: znocachegetattr (zheapam.c:6206)
==7569== 
{
   
   Memcheck:Cond
   fun:ZHeapDetermineModifiedColumns
   fun:zheap_update
   fun:ExecUpdate
   fun:ExecModifyTable
   fun:ExecProcNodeFirst
   fun:ExecProcNode
   fun:ExecutePlan
   fun:standard_ExecutorRun
   fun:ExecutorRun
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==8811== Conditional jump or move depends on uninitialised value(s)
==8811==at 0x5914C8: ZHeapDetermineModifiedColumns (zheapam.c:)
==8811==by 0x587453: zheap_update (zheapam.c:2176)
==8811==by 0x7577D3: ExecUpdate (nodeModifyTable.c:1426)
==8811==by 0x759C0B: ExecModifyTable (nodeModifyTable.c:2629)
==8811==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==8811==by 0x71D8E9: ExecProcNode (executor.h:241)
==8811==by 0x720194: ExecutePlan (execMain.c:1711)
==8811==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==8811==by 0x71DD3F: ExecutorRun (execMain.c:310)
==8811==by 0x9113CA: ProcessQuery (pquery.c:161)
==8811==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==8811==by 0x9122C1: PortalRun (pquery.c:799)
==8811==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==8811==by 0x9104D9: PostgresMain (postgres.c:4244)
==8811==by 0x86A911: BackendRun (postmaster.c:4388)
==8811==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==8811==by 0x8665F7: ServerLoop (postmaster.c:1711)
==8811==by 0x865EA3: PostmasterMain (postmaster.c:1384)
==8811==by 0x78E3AB: main (main.c:228)
==8811==  Uninitialised value was created by a stack allocation
==8811==at 0x59043D: znocachegetattr (zheapam.c:6206)
==8811== 
{
   
   Memcheck:Cond
   fun:ZHeapDetermineModifiedColumns
   fun:zheap_update
   fun:ExecUpdate
   fun:ExecModifyTable
   fun:ExecProcNodeFirst
   fun:ExecProcNode
   fun:ExecutePlan
   fun:standard_ExecutorRun
   fun:ExecutorRun
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==8811== Conditional jump or move depends on uninitialised value(s)
==8811==at 0x574A76: InsertUndoRecord (undorecord.c:135)
==8811==by 0x56F405: InsertPreparedUndo (undoinsert.c:840)
==8811==by 0x5848E0: zheap_insert (zheapam.c:896)
==8811==by 0x755F21: ExecInsert (nodeModifyTable.c:692)
==8811==by 0x759BB2: ExecModifyTable (nodeModifyTable.c:2622)
==8811==by 0x729687: ExecProcNodeFirst (execProcnode.c:445)
==8811==by 0x71D8E9: ExecProcNode (executor.h:241)
==8811==by 0x720194: ExecutePlan (execMain.c:1711)
==8811==by 0x71DF11: standard_ExecutorRun (execMain.c:367)
==8811==by 0x71DD3F: ExecutorRun (execMain.c:310)
==8811==by 0x9113CA: ProcessQuery (pquery.c:161)
==8811==by 0x912CE8: PortalRunMulti (pquery.c:1286)
==8811==by 0x9122C1: PortalRun (pquery.c:799)
==8811==by 0x90C1C2: exec_simple_query (postgres.c:1215)
==8811==by 0x9104D9: PostgresMain (postgres.c:4244)
==8811==by 0x86A911: BackendRun (postmaster.c:4388)
==8811==by 0x86A0DF: BackendStartup (postmaster.c:4079)
==8811==by 0x8665F7: ServerLoop (postmaster.c:1711)
==8811==by 0x865EA3: PostmasterMain (postmaster.c

Re: Commitfest 2018-11

2018-11-01 Thread Stephen Frost
Greetings,

* Dmitry Dolgov (9erthali...@gmail.com) wrote:
> > On Thu, 1 Nov 2018 at 14:11, Stephen Frost  wrote:
> > I've added you to the 'cf admins' group
> 
> Thanks.
> 
> > so please give it a shot now and let me know if you run into any troubles.
> 
> Hm...I don't see any difference in CF app, what should be changed?

Hmm...  Can you try it again?  If it still doesn't show up, please let
me know what your community account is that you're logging in with..

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Commitfest 2018-11

2018-11-01 Thread Dmitry Dolgov
> On Thu, 1 Nov 2018 at 15:11, Stephen Frost  wrote:
> Hmm...  Can you try it again?

Yep, now I see the administration menu, thanks.



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Daniel Gustafsson  writes:
> Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 installation,
> on 10.12 with XCode 8.3.3 I do however get -isysroot.

Wow ... could it be that it actually varies depending on the combination
of compiler and OS versions?  That would be weird.

regards, tom lane



Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Odd.  I don’t actually get -isysroot on XCode 10.0 on my 10.13.6 
>> installation,
>> on 10.12 with XCode 8.3.3 I do however get -isysroot.
> 
> Wow ... could it be that it actually varies depending on the combination
> of compiler and OS versions?  That would be weird.

Or the version of XCode and the set of installed SDKs?  I only have a single
SDK installed on both of these systems, if you have multiple ones on the 10.0
installation that might explain something.  Or not.

cheers ./daniel


Re: POC: Cleaning up orphaned files using undo logs

2018-11-01 Thread Robert Haas
While I've been involved in the design discussions for this patch set,
I haven't looked at any of the code personally in a very long time.  I
certainly don't claim to be an independent reviewer, and I encourage
others to review this work also.  That said, here are some review
comments.

I decided to start with 0005, as that has the user-facing
documentation for this feature. There is a spurious whitespace-only
hunk in monitoring.sgml.

+ Process ID of the backend currently attached to this undo log
+  for writing.

or NULL/0/something if none?

+   each undo log that exists.  Undo logs are extents within a contiguous
+   addressing space that have their own head and tail pointers.

This sentence seems to me to have so little detail that it's not going
to help anyone, and it also seems somewhat out-of-place here.  I think
it would be better to link to the longer explanation in the new
storage section instead.

+   Each backend that has written undo data is associated with one or more undo

extra space

+
+Undo logs hold data that is used for rolling back and for implementing
+MVCC in access managers that are undo-aware (currently "zheap").  The storage
+format of undo logs is optimized for reusing existing files.
+

I think the mention of zheap should be removed here since the hope is
that the undo stuff can be committed independently of and prior to
zheap.

I think you mean access methods, not access managers.  I suggest
making that an xref.

Maybe add a little more detail, e.g.

Undo logs provide a place for access methods to store data that can be
used to perform necessary cleanup operations after a transaction
abort.  The data will be retained after a transaction abort until the
access method successfully performs the required cleanup operations.
After a transaction commit, undo data will be retained until the
transaction is all-visible.  This makes it possible for access
managers to use undo data to implement MVCC.  Since it most cases undo
data is discarded very quickly, the undo system has been optimized to
minimize writes to disk and to reuse existing files efficiently.

+
+Undo data exists in a 64 bit address space broken up into numbered undo logs
+that represent 1TB extents, for efficient management.  The space is further
+broken up into 1MB segment files, for physical storage.  The name of each file
+is the address of of the first byte in the file, with a period inserted after
+the part that indicates the undo log number.
+

I cannot read this section and know what an undo filename is going to
look like.  Also, the remarks about efficient management seems like it
might be unclear to someone not already familiar with how this works.
Maybe something like:

Undo data exists in a 64-bit address space divided into 2^34 undo
logs, each with a theoretical capacity of 1TB.  The first time a
backend writes undo, it attaches to an existing undo log whose
capacity is not yet exhausted and which is not currently being used by
any other backend; or if no suitable undo log already exists, it
creates a new one.  To avoid wasting space, each undo log is further
divided into 1MB segment files, so that segments which are no longer
needed can be removed (possibly recycling the underlying file by
renaming it) and segments which are not yet needed do not need to be
physically created on disk.  An undo segment file has a name like
, where  is the undo log number and  is the
segment number.

I think it's good to spell out the part about attaching to undo logs
here, because when people look at pg_undo, the number of files will be
roughly proportional to the number of backends, and we should try to
help them understand - at least in general terms - why that happens.

+
+Just as relations can have one of the three persistence levels permanent,
+unlogged or temporary, the undo data that is generated by modifying them must
+be stored in an undo log of the same persistence level.  This enables the
+undo data to be discarded at appropriate times along with the relations that
+reference it.
+

This is not quite general, because we're not necessarily talking about
modifications to the files.  In fact, in this POC, we're explicitly
talking about the cleanup of the files themselves.  Also, it's not
technically correct to say that the persistence level has to match.
You could put everything in permanent undo logs.  It would just suck.

Moving on to 0003, the developer documentation:

+The undo log subsystem provides a way to store data that is needed for
+a limited time.  Undo data is generated whenever zheap relations are
+modified, but it is only useful until (1) the generating transaction
+is committed or rolled back and (2) there is no snapshot that might
+need it for MVCC purposes.  See src/backend/access/zheap/README for
+more information on zheap.  The undo log subsystem is concerned with

Again, I think this should be rewritten to make it independent of
zheap.  We hope that this facility is not only usable by but 

Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
>> Wow ... could it be that it actually varies depending on the combination
>> of compiler and OS versions?  That would be weird.

> Or the version of XCode and the set of installed SDKs?  I only have a single
> SDK installed on both of these systems, if you have multiple ones on the 10.0
> installation that might explain something.  Or not.

Nope, I just have the one:

$ ls -l 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/
total 0
drwxr-xr-x  7 root  wheel  224 Oct 31 11:22 MacOSX.sdk/
lrwxr-xr-x  1 root  wheel   10 Sep 19 19:28 MacOSX10.14.sdk@ -> MacOSX.sdk

regards, tom lane



Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Robert Haas
On Wed, Mar 7, 2018 at 10:23 AM Nikolay Shaplov  wrote:
> > I see you lost the Oxford comma:
> >
> > -DETAIL:  Valid values are "on", "off", and "auto".
> > +DETAIL:  Valid values are "auto", "on" and "off".
> >
> > Please put these back.
> Actually that's me who have lost it. The code with  oxford comma would be a
> bit more complicated. We should put such coma when we have 3+ items and do not
> put it when we have 2.
>
> Does it worth it?

In my opinion, it is worth it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PG vs macOS Mojave

2018-11-01 Thread Daniel Gustafsson
> On 1 Nov 2018, at 15:53, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 1 Nov 2018, at 15:14, Tom Lane  wrote:
>>> Wow ... could it be that it actually varies depending on the combination
>>> of compiler and OS versions?  That would be weird.
> 
>> Or the version of XCode and the set of installed SDKs?  I only have a single
>> SDK installed on both of these systems, if you have multiple ones on the 10.0
>> installation that might explain something.  Or not.
> 
> Nope, I just have the one:

Then I’m inclined to say that it probably depends on the combination of OS
version, XCode version and potentially the SDK version.  Either way, passing
the -isysroot explicitly as in your suggestion should still work unless I’m
missing something.

cheers ./daniel


Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 7, 2018 at 10:23 AM Nikolay Shaplov  wrote:
>>> I see you lost the Oxford comma:
>>> 
>>> -DETAIL:  Valid values are "on", "off", and "auto".
>>> +DETAIL:  Valid values are "auto", "on" and "off".
>>> 
>>> Please put these back.

>> Actually that's me who have lost it. The code with  oxford comma would be a
>> bit more complicated. We should put such coma when we have 3+ items and do 
>> not
>> put it when we have 2.
>> 
>> Does it worth it?

> In my opinion, it is worth it.

Uh ... I've not been paying attention to this thread, but this exchange
seems to be about somebody constructing a message like that piece-by-piece
in code.  This has got to be a complete failure from the translatability
standpoint.  See

https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

regards, tom lane



Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-01 Thread Nikolay Shaplov
В письме от 1 ноября 2018 11:10:14 пользователь Tom Lane написал:

> >>> I see you lost the Oxford comma:
> >>> 
> >>> -DETAIL:  Valid values are "on", "off", and "auto".
> >>> +DETAIL:  Valid values are "auto", "on" and "off".
> >>> 
> >>> Please put these back.
> >> 
> >> Actually that's me who have lost it. The code with  oxford comma would be
> >> a
> >> bit more complicated. We should put such coma when we have 3+ items and
> >> do not put it when we have 2.
> >> 
> >> Does it worth it?
> > 
> > In my opinion, it is worth it.
> 
> Uh ... I've not been paying attention to this thread, but this exchange
> seems to be about somebody constructing a message like that piece-by-piece
> in code.  This has got to be a complete failure from the translatability
> standpoint.  See
> 
> https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELI
> NES

It's a very good reason...

In this case the only solution I can see is 

DETAIL:  Valid values are: "value1", "value2", "value3".

Where list '"value1", "value2", "value3"' is built in runtime but have no any 
bindnings to any specific language. And the rest of the message is
'DETAIL:  Valid values are: %s' which can be properly translated.


-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.


Re: New vacuum option to do only freezing

2018-11-01 Thread Bossart, Nathan
Hi,

On 10/1/18, 5:23 AM, "Masahiko Sawada"  wrote:
> Attached patch adds a new option FREEZE_ONLY to VACUUM command. This
> option is same as FREEZE option except for it disables reclaiming dead
> tuples. That is, with this option vacuum does pruning HOT chain,
> freezing live tuples and maintaining both visibility map and freespace
> map but does not collect dead tuples and invoke neither heap vacuum
> nor index vacuum. This option will be useful if user wants to prevent
> XID wraparound a table as quick as possible, especially when table is
> quite large and is about to XID wraparound. I think this usecase was
> mentioned in some threads but I couldn't find them.

I've thought about this a bit myself.  One of the reasons VACUUM can
take so long is because of all the index scans needed.  If you're in a
potential XID wraparound situation and just need a quick way out, it
would be nice to have a way to do the minimum amount of work necessary
to reclaim transaction IDs.  At a high level, I think there are some
improvements to this design we should consider.

1. Create a separate FREEZE command instead of adding a new VACUUM
   option

The first line of the VACUUM documentation reads, "VACUUM reclaims
storage occupied by dead tuples," which is something that we would
explicitly not be doing with FREEZE_ONLY.  I think it makes sense to
reuse many of the VACUUM code paths to implement this feature, but
from a user perspective, it should be separate.

2. We should reclaim transaction IDs from dead tuples as well

Unless we also have a way to freeze XMAX like we do XMIN, I doubt this
feature will be useful for the imminent-XID-wraparound use-case.  In
short, we won't be able to advance relfrozenxid and relminmxid beyond
the oldest XMAX value for the relation.  IIUC the idea of freezing
XMAX doesn't really exist yet.  Either the XMAX is aborted/invalid and
can be reset to InvalidTransactionId, or it is committed and the tuple
can be removed if it beyond the freezing threshold.  So, we probably
also want to look into adding a way to freeze XMAX, either by setting
it to FrozenTransactionId or by setting the hint bits to
(HEAP_XMAX_COMMITTED | HEAP_XMIN_INVALID) as is done for XMIN.

Looking closer, I see that the phrase "freezing XMAX" is currently
used to refer to setting it to InvalidTransactionId if it is aborted
or invalid (e.g. lock-only).

> Currently this patch just adds the new option to VACUUM command but it
> might be good to make autovacuum use it when emergency vacuum is
> required.

This also seems like a valid use-case, but it should definitely be
done as a separate effort after this feature has been committed.

> This is a performance-test result for FREEZE option and FREEZE_ONLY
> option. I've tested them on the table which is about 3.8GB table
> without indexes and randomly modified.
>
> * FREEZE
> ...
> Time: 50301.262 ms (00:50.301)
>
> * FREEZE_ONLY
> ...
> Time: 44589.794 ms (00:44.590)

I'd be curious to see the improvements you get when there are several
indexes on the relation.  The ability to skip the index scans is
likely how this feature will really help speed things up.

Nathan



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Robert Haas
On Wed, Oct 31, 2018 at 8:43 PM Michael Paquier  wrote:
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?  FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if
> promotion is triggered more than once, but that does not like a sane
> thing to do if not necessary.

Uh, what in the world does that have to do with the topic at hand?
Parallel-safety has nothing to do with preventing functions from being
called "unnecessarily" or "more than once".

I don't like the fact that Tom keeps arguing for marking things
parallel-unsafe without any real justification.  During the
development of parallel query, he argued that no such concept should
exist, because everything should be work in parallel mode just like it
does otherwise.  I eventually convinced him (or so it seemed) that we
had to have the concept because it was impractical to eliminate all
the corner cases where things were parallel-unsafe in the short/medium
term.  However, in the long term, the fact that we have
parallel-restricted and parallel-unsafe is an implementation
restriction that we should be trying to eliminate.  Making the list of
parallel-unsafe things longer makes that goal harder to achieve,
especially when the marking is applied not out of any real
justification based on what the code does, as was my intent, but based
on an ill-defined feeling of unease.

Eventually, after some future round of infrastructure improvements,
we're going to end up having discussions about changing things that
are marked parallel-unsafe or parallel-restricted to parallel-safe,
and when somebody can't find any reason why the existing markings are
like that in the first place, they're going to use that as
justification for not changing them ("there must've been a reason!").
Parallel-safety shouldn't be some kind of quasi-religious thing where
markings bleed from a function that is properly marked to one with a
similar name, or irrational values are applied in the name of
preventing unspecified or only-vaguely-connected evils.  It should be
based on things that can be determined scientifically, like "hey, does
this code access any global variables other than those which are
automatically synchronized between the master and the workers?" and
"hey, does this code ever attempt
heap_insert/heap_update/heap_delete?" and "hey, does this ever call
other user-defined code that might itself be parallel-unsafe?".

The right way to figure out the appropriate parallel-safety marking
for a function is to read the code and see what it does.  Now you
might miss something, as with any code-reading exercise, but if you
don't, then you should come up with the right answer.  In the case of
pg_promote, it calls out to the following functions:

RecoveryInProgress - checks shared memory state.  equally available to
a worker as to the leader.
AllocateFile/FreeFile - no problem here; these work fine in workers
just like any other backend.
kill, unlink - same thing
ResetLatch, WaitLatch, CHECK_FOR_INTERRUPTS() - all fine in a worker;
in fact, used as part of the worker machinery
ereport - works anywhere, worker machinery propagates errors back to leader

That's all.  There are no obvious references to global variables or
anything here.  So, it looks to be parallel-safe.

If you do the same analysis for pg_start_backup(), you'll immediately
notice that it calls get_backup_status(), and if you look at that
function, you'll see that it returns the value of a global variable.
If you check parallel.c, you'll find that that global variable is not
one of the special ones that gets synchronized between a leader and
the workers.  Therefore, if you ran this function in a worker, it
might do the wrong thing, because it might get the wrong value of that
global variable.  So it's at least (and in fact exactly)
parallel-restricted.  It doesn't need to be parallel-restricted
because it's scary in some ill-defined way or any of these other
reasons advanced on this thread -- it needs to be parallel-restricted
because it *relies on a global variable that is not synchronized to
parallel workers*.  If somebody writes code to move that state out of
a global variables and into the main shared memory segment or to a DSM
shared between the leader and the workers, then it can be marked
parallel-safe (unless, for example, it also depends on OTHER global
variables that are NOT moved into shared storage).  Otherwise not.

I hope I'm making sense here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: New vacuum option to do only freezing

2018-11-01 Thread Robert Haas
On Mon, Oct 1, 2018 at 6:23 AM Masahiko Sawada  wrote:
> Attached patch adds a new option FREEZE_ONLY to VACUUM command. This
> option is same as FREEZE option except for it disables reclaiming dead
> tuples. That is, with this option vacuum does pruning HOT chain,
> freezing live tuples and maintaining both visibility map and freespace
> map but does not collect dead tuples and invoke neither heap vacuum
> nor index vacuum. This option will be useful if user wants to prevent
> XID wraparound a table as quick as possible, especially when table is
> quite large and is about to XID wraparound. I think this usecase was
> mentioned in some threads but I couldn't find them.

The feature seems like a reasonable one, but the name doesn't seem
like a good choice.  It doesn't only freeze - e.g. it HOT-prunes, as
it must.  Maybe consider something like (without_index_cleanup true)
or (index_cleanup false).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel threads in query

2018-11-01 Thread Paul Ramsey
On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:

> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
> writes:
> > Question is, what's the best policy to allocate cores so we can play nice
> > with rest of postgres?
>


> There is not, because we do not use or support multiple threads inside
> a Postgres backend, and have no intention of doing so any time soon.
>

As a practical matter though, if we're multi-threading  a heavy PostGIS
function, presumably simply grabbing *every* core is not a recommended or
friendly practice. My finger-in-the-wind guess would be that the value
of max_parallel_workers_per_gather would be the most reasonable value to
use to limit the number of cores a parallel PostGIS function should use.
Does that make sense?

P


Re: Parallel threads in query

2018-11-01 Thread Andres Freund
On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:
> 
> > =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
> > writes:
> > > Question is, what's the best policy to allocate cores so we can play nice
> > > with rest of postgres?
> >
> 
> 
> > There is not, because we do not use or support multiple threads inside
> > a Postgres backend, and have no intention of doing so any time soon.
> >
> 
> As a practical matter though, if we're multi-threading  a heavy PostGIS
> function, presumably simply grabbing *every* core is not a recommended or
> friendly practice. My finger-in-the-wind guess would be that the value
> of max_parallel_workers_per_gather would be the most reasonable value to
> use to limit the number of cores a parallel PostGIS function should use.
> Does that make sense?

I'm not sure that's a good approximation.  Postgres' infrastructure
prevents every query from using max_parallel_workers_per_gather
processes due to the global max_worker_processes limit.  I think you
probably would want something very very roughly approximating a global
limit - otherwise you'll either need to set the per-process limit way
too low, or overwhelm machines with context switches.

Greetings,

Andres Freund



Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:
> On 01.11.18 07:26, Michael Paquier wrote:
> >>It includes links to the website, as well as the short version of the
> >>installation instructions.
> >+The installation instructions are listed on the website:
> >+
> >+https://www.postgresql.org/docs/current/static/install-short.html

I don't think we should link to the "Short version" directly here, the
above URL should be the "installation.html" one..  With a caveat, see
below.

> >+Short version:
> >+
> >+./configure
> >+make
> >+su
> >+make install
> >+adduser postgres
> >+mkdir /usr/local/pgsql/data
> >+chown postgres /usr/local/pgsql/data
> >+su - postgres
> >+/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
> >+/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start
> >
> >Adding a section about installation and another one about documentation
> >are good things.  Now for the installation section I disagree about
> >adding this detailed way of doing things, and just adding a URL looks
> >enough.
> Was thinking about this, but then decided to add it as an example,
> and see what people think.

I like having the detail, but I'd go farther, really.  Here's my
thinking:

BUILDING ON UNIX


Detailed instructions for many Unix platforms is available here:
https://www.postgresql.org/docs/current/static/installation.html

To build PostgreSQL on most Unix variants, the following are required:

GNU make, version 3.8 or newer
ISO/ANSI C compilar (at least C99-compliant)
Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
Perl 5.8.3 (for building from git)

PostgreSQL has many additional capabilities which can be enabled using
configure --enable switches but many of those also depend on additional
libraries.  See the installation instructions for details.

To build PostgreSQL, run the following commands:

./configure
make

PostgreSQL can then be installed using 'make install', which will
require being a superuser to install into the default directory.
The installation location can be changed by passing '--prefix' to
'configure'.  Run './configure --help' for additional options.


BUILDING ON WINDOWS


Detailed instructions for building on Windows is available here:
https://www.postgresql.org/docs/current/static/install-windows.html

To build PostgreSQL on Windows, either Visual Studio Express 2017
for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
installed.  PostgreSQL can also be build using MinGW or Cygwin using
the Unix instructions.

... some details about how to actually use VSE 2017 would be good

CREATING YOUR FIRST DATABASE


Once the PostgreSQL software is installed, the first step to having a
running database is to initialize a PostgreSQL database using the
'initdb' command, eg:

initdb -D mydatabase

After the database system has been initialized, PostgreSQL can be
started by using the pg_ctl command:

pg_ctl -D mydatabase -l logfile start 

Once PostgreSQL is running, you can connect to it using the psql
command-line client.  A default database called 'postgres' was created
by 'initdb'.

BUILDING THE POSTGRESQL DOCUMENTATION
=

Full documentation for PostgreSQL is available online here:
https://www.postgresql.org/docs/current/static/index.html

To build the PostgreSQL documentation on most Unix variants, the
following tools are required:

 whatever these are

To build PostgreSQL's documentation on Unix, run:

./configure
make docs

The documentation, once built by 'make docs', will be available in
various formats in the 'doc/src/sgml' directory.

> >Now there is also a problem, the README would point out to the
> >development version of the documentation.  As this is made at working
> >using git, I could personally live with having stable branches also
> >refer to the development version, but it could also make sense to have
> >each stable branch point to the URL of the versions they work on.
> 
> That is a bit problematic. The README lives on git first, and therefore
> should point to the development version. The release process
> can replace this with links to the current version.

I'm not sure that I'm really following this, because we aren't pointing
to the development documentation, just the 'current' documentation, and
that seems fine, and there's links on that page to the other versions of
the page for each major version of PostgreSQL, in case someone pulled an
older branch or such.

In short, I think we're fine to just use the 'current' URLs in this
README.

I'd also update the 'Makefile' bit we have at the root to refer to the
README instead of the INSTALL file.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: replication_slots usability issue

2018-11-01 Thread Andres Freund
On 2018-11-01 09:34:05 +0900, Michael Paquier wrote:
> HI Andres,
> 
> On Wed, Oct 31, 2018 at 03:48:02PM -0700, Andres Freund wrote:
> > And done.  Thanks for the report JD.
> 
> Shouldn't we also switch the PANIC to a FATAL in
> RestoreSlotFromDisk()?

That has absolutely nothing to do with the issue at hand though, so I
don't think it'd have made much sense to do it at the same time. Nor do
I think it's particularly important.


> I don't mind doing so myself if you agree with the change, only on
> HEAD as you seemed to disagree about changing that on back-branches.

Cool. And yes, I don't think a cosmetic log level adjustment that could
affect people's scripts should be backpatched without need. Even if not
particularly likely to break something.


> Also, from 691d79a which you just committed:
> +   ereport(FATAL,
> +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("logical replication slots \"%s\" exists, but 
> wal_level < logical",
> +   NameStr(cp.slotdata.name)),
> I can see one grammar mistake here, as you refer to only one slot here.
> The error messages should read:
> "logical replication slot \"%s\" exists, but wal_level < logical"
> and:
> "physical replication slot \"%s\" exists, but wal_level < replica"

Darnit. Fixed. Thanks.

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Andres Freund
On 2018-10-27 08:18:12 +0200, Fabien COELHO wrote:
> 
> Hello Jeff,
> 
> > > I suspect the easiest thing to narrow it down would be to bisect the
> > > problem in gcc :(
> > 
> > Their commit r265241 is what broke the PostgreSQL build.  It also broke the
> > compiler itself--at that commit it was no longer possible to build itself.
> > I had to --disable-bootstrap in order to get a r265241 compiler to test
> > PostgreSQL on.
> 
> It seems they have done a API change around some kind of "range" analysis,
> which must have been incomplete.
> 
> > Their commit r265375 fixed the ability to compile itself, but built
> > PostgreSQL binaries remain broken there and thereafter.
> > 
> > |...]
> 
> Thanks a lot for this investigation! I can fill in a gcc bug report. There
> would be a enormous work to narrow it down to a small test case, it is
> unclear how they can act about it, but at least they would know.

Have you done so? If so, what's the bug number?

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra



On 11/01/2018 06:15 PM, Andres Freund wrote:
> On 2018-11-01 10:10:33 -0700, Paul Ramsey wrote:
>> On Wed, Oct 31, 2018 at 2:11 PM Tom Lane  wrote:
>>
>>> =?UTF-8?Q?Darafei_=22Kom=D1=8Fpa=22_Praliaskouski?= 
>>> writes:
 Question is, what's the best policy to allocate cores so we can play nice
 with rest of postgres?
>>>
>>
>>
>>> There is not, because we do not use or support multiple threads inside
>>> a Postgres backend, and have no intention of doing so any time soon.
>>>
>>
>> As a practical matter though, if we're multi-threading  a heavy PostGIS
>> function, presumably simply grabbing *every* core is not a recommended or
>> friendly practice. My finger-in-the-wind guess would be that the value
>> of max_parallel_workers_per_gather would be the most reasonable value to
>> use to limit the number of cores a parallel PostGIS function should use.
>> Does that make sense?
> 
> I'm not sure that's a good approximation.  Postgres' infrastructure
> prevents every query from using max_parallel_workers_per_gather
> processes due to the global max_worker_processes limit.  I think you
> probably would want something very very roughly approximating a global
> limit - otherwise you'll either need to set the per-process limit way
> too low, or overwhelm machines with context switches.
> 

Yeah. Without a global limit it would be fairly trivial to create way
too many threads - say when a query gets parallelized, and each worker
creates a bunch of private threads. And then a bunch of such queries
executed concurrently, and it's getting bad pretty fast.

In theory, simulating such global limit should be possible using a bit
of shared memory for the current total, per-process counter and probably
some simple abort handling (say, just like contrib/openssl does using
ResourceOwner).

A better solution might be to start a bgworker managing a connection
pool and forward the requests to it using IPC (and enforce the thread
count limit there).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).

Right.  I don't think you even need something resowner like, given that
anything using threads better make it absolutely absolutely impossible
that an error can escape.


> A better solution might be to start a bgworker managing a connection
> pool and forward the requests to it using IPC (and enforce the thread
> count limit there).

That doesn't really seem feasible for cases like this - after all, you'd
only use threads to work on individual rows if you wanted to parallelize
relatively granular per-row work or such. Adding cross-process IPC seems
like it'd make that perform badly.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Komяpa
>
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).
>

I would expect that this limit is already available and it's parallel
worker limit. Basically, when start a new thread I would like to somehow
consume a part of parallel worker limit - a thread is a kind of parallel
worker, from user's perspective. If I have 4 cores and Postgres already
started 4 parallel workers, I don't really want to start 4 threads for each
of them, or 4 for one of them and 1 for each of the rest, if I manage that
separately from parallel worker limit.

IPC and co - that's another question and out of scope for this one. Since
OpenMP allows to write multithreaded code by just adding more #pragma
around loops, I don't want to reinvent that part of infrastructure.
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:40 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
>> In theory, simulating such global limit should be possible using a bit
>> of shared memory for the current total, per-process counter and probably
>> some simple abort handling (say, just like contrib/openssl does using
>> ResourceOwner).
> 
> Right.  I don't think you even need something resowner like, given that
> anything using threads better make it absolutely absolutely impossible
> that an error can escape.
> 

True. Still, I wonder if the process can die in a way that would fail to
update the counter.

> 
>> A better solution might be to start a bgworker managing a connection
>> pool and forward the requests to it using IPC (and enforce the thread
>> count limit there).
> 
> That doesn't really seem feasible for cases like this - after all, you'd
> only use threads to work on individual rows if you wanted to parallelize
> relatively granular per-row work or such. Adding cross-process IPC seems
> like it'd make that perform badly.
> 

I think that very much depends on how expensive the tasks handled by the
threads are. It may still be cheaper than a reasonable IPC, and if you
don't create/destroy threads, that also saves quite a bit of time.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:43 PM, Darafei "Komяpa" Praliaskouski wrote:
> In theory, simulating such global limit should be possible using a bit
> of shared memory for the current total, per-process counter and probably
> some simple abort handling (say, just like contrib/openssl does using
> ResourceOwner).
> 
> 
> I would expect that this limit is already available and it's parallel
> worker limit. Basically, when start a new thread I would like to somehow
> consume a part of parallel worker limit - a thread is a kind of parallel
> worker, from user's perspective. If I have 4 cores and Postgres already
> started 4 parallel workers, I don't really want to start 4 threads for
> each of them, or 4 for one of them and 1 for each of the rest, if I
> manage that separately from parallel worker limit.
> 

Well, PostgreSQL does that, but only for the process-based parallelism.
It has no idea about threads, so it can't work out of the box. Also, the
max_worker_processes limit determines various shared memory we need to
manage those processes, so it's really not about threads.

If you need something like that for threads, feel free to do that, but
I'd strongly suggest using a separate counter (perhaps using a m_w_p as
an initial value).

> IPC and co - that's another question and out of scope for this one.
> Since OpenMP allows to write multithreaded code by just adding more
> #pragma around loops, I don't want to reinvent that part of infrastructure.

Maybe. I don't know OpenMP that well, so I can't really safe if that's a
good idea or not.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
> On 11/01/2018 07:40 PM, Andres Freund wrote:
> > On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
> >> In theory, simulating such global limit should be possible using a bit
> >> of shared memory for the current total, per-process counter and probably
> >> some simple abort handling (say, just like contrib/openssl does using
> >> ResourceOwner).
> > 
> > Right.  I don't think you even need something resowner like, given that
> > anything using threads better make it absolutely absolutely impossible
> > that an error can escape.
> > 
> 
> True. Still, I wonder if the process can die in a way that would fail to
> update the counter.

You'd better make that case a panic restart.


> >> A better solution might be to start a bgworker managing a connection
> >> pool and forward the requests to it using IPC (and enforce the thread
> >> count limit there).
> > 
> > That doesn't really seem feasible for cases like this - after all, you'd
> > only use threads to work on individual rows if you wanted to parallelize
> > relatively granular per-row work or such. Adding cross-process IPC seems
> > like it'd make that perform badly.
> > 
> 
> I think that very much depends on how expensive the tasks handled by the
> threads are. It may still be cheaper than a reasonable IPC, and if you
> don't create/destroy threads, that also saves quite a bit of time.

I'm not following. How can you have a pool *and* threads? Those seem to
be contradictory in PG's architecture? You need full blown IPC with your
proposal afaict?

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 07:50 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-11-01 19:44:54 +0100, Tomas Vondra wrote:
>> On 11/01/2018 07:40 PM, Andres Freund wrote:
>>> On 2018-11-01 19:33:39 +0100, Tomas Vondra wrote:
 In theory, simulating such global limit should be possible using a bit
 of shared memory for the current total, per-process counter and probably
 some simple abort handling (say, just like contrib/openssl does using
 ResourceOwner).
>>>
>>> Right.  I don't think you even need something resowner like, given that
>>> anything using threads better make it absolutely absolutely impossible
>>> that an error can escape.
>>>
>>
>> True. Still, I wonder if the process can die in a way that would fail to
>> update the counter.
> 
> You'd better make that case a panic restart.
> 
> 
 A better solution might be to start a bgworker managing a connection
 pool and forward the requests to it using IPC (and enforce the thread
 count limit there).
>>>
>>> That doesn't really seem feasible for cases like this - after all, you'd
>>> only use threads to work on individual rows if you wanted to parallelize
>>> relatively granular per-row work or such. Adding cross-process IPC seems
>>> like it'd make that perform badly.
>>>
>>
>> I think that very much depends on how expensive the tasks handled by the
>> threads are. It may still be cheaper than a reasonable IPC, and if you
>> don't create/destroy threads, that also saves quite a bit of time.
> 
> I'm not following. How can you have a pool *and* threads? Those seem to
> be contradictory in PG's architecture? You need full blown IPC with your
> proposal afaict?
> 

My suggestion was to create a bgworker, which would then internally
allocate and manage a pool of threads. It could then open some sort of
IPC (say, as dumb as unix socket). The backends could could then send
requests to it, and it would respond to them. Not sure why/how would
this contradict PG's architecture?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Andres Freund
On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
> >> I think that very much depends on how expensive the tasks handled by the
> >> threads are. It may still be cheaper than a reasonable IPC, and if you
> >> don't create/destroy threads, that also saves quite a bit of time.
> > 
> > I'm not following. How can you have a pool *and* threads? Those seem to
> > be contradictory in PG's architecture? You need full blown IPC with your
> > proposal afaict?
> > 
> 
> My suggestion was to create a bgworker, which would then internally
> allocate and manage a pool of threads. It could then open some sort of
> IPC (say, as dumb as unix socket). The backends could could then send
> requests to it, and it would respond to them. Not sure why/how would
> this contradict PG's architecture?

Because you said "faster than reasonable IPC" - which to me implies that
you don't do full blown IPC. Which using threads in a bgworker is very
strongly implying. What you're proposing strongly implies multiple
context switches just to process a few results. Even before, but
especially after, spectre that's an expensive proposition.

Greetings,

Andres Freund



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra
On 11/01/2018 12:48 PM, Antonin Houska wrote:
> Review of [1] made me think of this optimization, currently used only in
> create_merge_append_path():
> 
>   /*
>* Apply query-wide LIMIT if known and path is for sole base relation.
>* (Handling this at this low level is a bit klugy.)
>*/
>   if (bms_equal(rel->relids, root->all_baserels))
>   pathnode->limit_tuples = root->limit_tuples;
>   else
>   pathnode->limit_tuples = -1.0;
> 
> Currently it's not a problem because the output of MergeAppend plan is not
> likely to be re-sorted, but I don't think data correctness should depend on
> cost evaluation. Instead, -1 should be set here if there's any chance that the
> output will be sorted again.
> 

So you're saying we might end up with a plan like this:

Limit
-> Sort
-> MergeAppend
   -> SeqScan on t

In which case we'd pass the wrong limit_tuples to the MergeAppend?

Hmmm, that would depend on us building MergeAppend node that does not
match the expected pathkeys, and pick it instead of plain Append node.
I'm not sure that's actually possible, but maybe it is ...

> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> and running the following example:
> 
> CREATE TABLE t(i int, j int);
> CREATE TABLE t1() INHERITS (t);
> CREATE INDEX ON t1(i, j);
> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> CREATE TABLE t2() INHERITS (t);
> CREATE INDEX ON t2(i, j);
> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> 
> ANALYZE;
> 
> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> 

So, what query plan did this use?


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Parallel threads in query

2018-11-01 Thread Komяpa
>
>
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
>
>
To have some idea of what it could be:

a)
PostGIS has ST_ClusterKMeans window function. It collects all geometries
passed to it to memory, re-packs to more compact buffer and starts a loop
that goes over it several (let's say 10..100) times. Then it spits out all
the assigned cluster numbers for each of the input rows.

It's all great when you need to calculate KMeans of 200-5 rows, but for
a million input rows even a hundred passes on a single core are painful.

b)
PostGIS has ST_Subdivide function. It takes a single row of geometry
(usually super-large, like a continent or the wholeness of Russia) and
splits it into many rows that have more simple shape, by performing a
horizontal or vertical split recursively. Since it's a tree traversal, it
can be paralleled efficiently, with one task becoming to follow the right
subpart of geometry and other - to follow left part of it.

Both seem to be a standard thing for OpenMP, which has compiler support in
GCC and clang and MSVC. For an overview how it work, have a look here:
https://web.archive.org/web/20180828151435/https://bisqwit.iki.fi/story/howto/openmp/


So, do I understand correctly that I need to start a parallel worker that
does nothing for each thread I launch to consume the parallel worker limit?
-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: Parallel threads in query

2018-11-01 Thread Andres Freund
Hi,

On 2018-11-01 09:17:56 -1000, Darafei "Komяpa" Praliaskouski wrote:
> So, do I understand correctly that I need to start a parallel worker that
> does nothing for each thread I launch to consume the parallel worker limit?

No, I don't think that'd be reasonable. I think what we're saying is
that there's no way to reasonably use the parallel worker limit as the
limitation for what you're trying to do. You need custom infrastructure.

Greetings,

Andres Freund



Re: Parallel threads in query

2018-11-01 Thread Tomas Vondra
On 11/01/2018 08:03 PM, Andres Freund wrote:
> On 2018-11-01 19:57:17 +0100, Tomas Vondra wrote:
 I think that very much depends on how expensive the tasks handled by the
 threads are. It may still be cheaper than a reasonable IPC, and if you
 don't create/destroy threads, that also saves quite a bit of time.
>>>
>>> I'm not following. How can you have a pool *and* threads? Those seem to
>>> be contradictory in PG's architecture? You need full blown IPC with your
>>> proposal afaict?
>>>
>>
>> My suggestion was to create a bgworker, which would then internally
>> allocate and manage a pool of threads. It could then open some sort of
>> IPC (say, as dumb as unix socket). The backends could could then send
>> requests to it, and it would respond to them. Not sure why/how would
>> this contradict PG's architecture?
> 
> Because you said "faster than reasonable IPC" - which to me implies that
> you don't do full blown IPC. Which using threads in a bgworker is very
> strongly implying. What you're proposing strongly implies multiple
> context switches just to process a few results. Even before, but
> especially after, spectre that's an expensive proposition.
> 

Gah! I meant to wrote "faster with reasonable IPC" - i.e. faster/cheaper
than a solution that would create threads ad-hoc.

My assumption is that the tasks are fairly large, and may take quite a
bit of time to process (say, a couple of seconds?). In which cese the
the extra context switches are not a major issue. But maybe I'm wrong.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra



On 11/01/2018 08:11 PM, Tomas Vondra wrote:
> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>> Review of [1] made me think of this optimization, currently used only in
>> create_merge_append_path():
>>
>>  /*
>>   * Apply query-wide LIMIT if known and path is for sole base relation.
>>   * (Handling this at this low level is a bit klugy.)
>>   */
>>  if (bms_equal(rel->relids, root->all_baserels))
>>  pathnode->limit_tuples = root->limit_tuples;
>>  else
>>  pathnode->limit_tuples = -1.0;
>>
>> Currently it's not a problem because the output of MergeAppend plan is not
>> likely to be re-sorted, but I don't think data correctness should depend on
>> cost evaluation. Instead, -1 should be set here if there's any chance that 
>> the
>> output will be sorted again.
>>
> 
> So you're saying we might end up with a plan like this:
> 
> Limit
> -> Sort
> -> MergeAppend
>-> SeqScan on t
> 
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
> 
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...
> 

OK, so the reason is that when building child paths, we don't keep the
pathkeys unless it matches the "interesting" pathkeys.

So for example we may have an IndexPath, but with pathkeys=NIL if the
index does not match the ORDER BY we need. So we don't even build the
MergeAppend paths, as generate_mergeappend_paths iterates over child
pathkeys (and we don't have any).

We might have two child paths, one with pathkeys (matching the ORDER BY)
and one without pathkeys. But at that point the optimization does not
apply, because it checks for "single base relation".

At least that's my understanding of add_paths_to_append_rel().

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Antonin Houska
Tomas Vondra  wrote:

> On 11/01/2018 12:48 PM, Antonin Houska wrote:
> > Review of [1] made me think of this optimization, currently used only in
> > create_merge_append_path():
> > 
> > /*
> >  * Apply query-wide LIMIT if known and path is for sole base relation.
> >  * (Handling this at this low level is a bit klugy.)
> >  */
> > if (bms_equal(rel->relids, root->all_baserels))
> > pathnode->limit_tuples = root->limit_tuples;
> > else
> > pathnode->limit_tuples = -1.0;
> > 
> > Currently it's not a problem because the output of MergeAppend plan is not
> > likely to be re-sorted, but I don't think data correctness should depend on
> > cost evaluation. Instead, -1 should be set here if there's any chance that 
> > the
> > output will be sorted again.
> > 
> 
> So you're saying we might end up with a plan like this:
> 
> Limit
> -> Sort
> -> MergeAppend
>-> SeqScan on t
> 
> In which case we'd pass the wrong limit_tuples to the MergeAppend?
> 
> Hmmm, that would depend on us building MergeAppend node that does not
> match the expected pathkeys, and pick it instead of plain Append node.
> I'm not sure that's actually possible, but maybe it is ...

Yes, if there should be Sort node above MergeAppend, then it'll be probably
cheaper to use Append. So the problem should not happen in practice, but in
theory it seems to be possible.

> 
> > I tried to reproduce the issue by applying the "Incremental sort" [2] patch
> > and running the following example:
> > 
> > CREATE TABLE t(i int, j int);
> > CREATE TABLE t1() INHERITS (t);
> > CREATE INDEX ON t1(i, j);
> > INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
> > CREATE TABLE t2() INHERITS (t);
> > CREATE INDEX ON t2(i, j);
> > INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
> > 
> > ANALYZE;
> > 
> > SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
> > 
> 
> So, what query plan did this use?

 Limit
   ->  Incremental Sort
 Sort Key: t.i, t.j DESC
 Presorted Key: t.i
 ->  Merge Append
   Sort Key: t.i
   ->  Sort
 Sort Key: t.i
 ->  Seq Scan on t
   ->  Index Only Scan using t1_i_j_idx on t1
   ->  Index Only Scan using t2_i_j_idx on t2

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: Remove obsolete pg_attrdef.adsrc column

2018-11-01 Thread Peter Eisentraut
On 27/10/2018 23:19, Daniel Gustafsson wrote:
>> On 27 Oct 2018, at 12:57, Peter Eisentraut 
>>  wrote:
>>
>> On 23/10/2018 19:48, Daniel Gustafsson wrote:
 On 23 Oct 2018, at 15:17, Peter Eisentraut 
  wrote:

 I propose the attached patch to remove the long-unused catalog column
 pg_attrdef.adsrc.
>>>
>>> +1, I ran into a bug in an app as recently as today where adsrc was used
>>> instead of pg_get_expr().
>>>
>>> Patch looks good.  I probably would’ve opted for mentioning how to get a 
>>> human
>>> readable version on the page, along the lines of the attached version,
>>
>> Agreed.  I have integrated your suggestion.
>>
>> Also, let's go nuts and remove pg_constraint.consrc as well.
> 
> No objections from me.
> 
>> Updated patches attached.
> 
> +1, applies and works as intended.

Committed, thanks.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-01 Thread Tomas Vondra



On 11/01/2018 08:51 PM, Antonin Houska wrote:
> Tomas Vondra  wrote:
> 
>> On 11/01/2018 12:48 PM, Antonin Houska wrote:
>>> Review of [1] made me think of this optimization, currently used only in
>>> create_merge_append_path():
>>>
>>> /*
>>>  * Apply query-wide LIMIT if known and path is for sole base relation.
>>>  * (Handling this at this low level is a bit klugy.)
>>>  */
>>> if (bms_equal(rel->relids, root->all_baserels))
>>> pathnode->limit_tuples = root->limit_tuples;
>>> else
>>> pathnode->limit_tuples = -1.0;
>>>
>>> Currently it's not a problem because the output of MergeAppend plan is not
>>> likely to be re-sorted, but I don't think data correctness should depend on
>>> cost evaluation. Instead, -1 should be set here if there's any chance that 
>>> the
>>> output will be sorted again.
>>>
>>
>> So you're saying we might end up with a plan like this:
>>
>> Limit
>> -> Sort
>> -> MergeAppend
>>-> SeqScan on t
>>
>> In which case we'd pass the wrong limit_tuples to the MergeAppend?
>>
>> Hmmm, that would depend on us building MergeAppend node that does not
>> match the expected pathkeys, and pick it instead of plain Append node.
>> I'm not sure that's actually possible, but maybe it is ...
> 
> Yes, if there should be Sort node above MergeAppend, then it'll be probably
> cheaper to use Append. So the problem should not happen in practice, but in
> theory it seems to be possible.
> 

Actually no - see my other response. We don't build child paths with
pathkeys that are not useful (do not match the expected ordering), and
we only build MergeAppend for pathkeys from child paths.

So I believe we currently won't build a MergeAppend that would then
require additional Sort node, irrespectedly of the costing.

>>
>>> I tried to reproduce the issue by applying the "Incremental sort" [2] patch
>>> and running the following example:
>>>
>>> CREATE TABLE t(i int, j int);
>>> CREATE TABLE t1() INHERITS (t);
>>> CREATE INDEX ON t1(i, j);
>>> INSERT INTO t1(i, j) VALUES (1, 0), (1, 1);
>>> CREATE TABLE t2() INHERITS (t);
>>> CREATE INDEX ON t2(i, j);
>>> INSERT INTO t2(i, j) VALUES (1, 0), (1, 1);
>>>
>>> ANALYZE;
>>>
>>> SELECT * FROM t ORDER BY i, j DESC LIMIT 1;
>>>
>>
>> So, what query plan did this use?
> 
>  Limit
>->  Incremental Sort
>  Sort Key: t.i, t.j DESC
>  Presorted Key: t.i
>  ->  Merge Append
>Sort Key: t.i
>->  Sort
>  Sort Key: t.i
>  ->  Seq Scan on t
>->  Index Only Scan using t1_i_j_idx on t1
>->  Index Only Scan using t2_i_j_idx on t2
> 

Interesting. That might be broken, not sure if the LIMIT optimization
triggered or not.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Compressed TOAST Slicing

2018-11-01 Thread Paul Ramsey
Currently, PG_DETOAST_DATUM_SLICE when run on a compressed TOAST entry will
first decompress the whole object, then extract the relevant slice.

When the desired slice is at or near the front of the object, this is
obviously non-optimal.

The attached patch adds in a code path to do a partial decompression of the
TOAST entry, when the requested slice is at the start of the object.

For an example of the improvement possible, this trivial example:

create table slicingtest (
  id serial primary key,
  a text
);

insert into slicingtest (a) select repeat('xyz123', 1) as a from
generate_series(1,1);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;

On master, in the current state on my wee laptop, I get

Time: 1426.737 ms (00:01.427)

With the patch, on my wee laptop, I get

Time: 46.886 ms

As usual, doing less work is faster.

Interesting note to motivate a follow-on patch: the substr() function does
attempt to slice, but the left() function does not. So, if this patch is
accepted, next patch will be to left() to add slicing behaviour.

If nobody lights me on fire, I'll submit to commitfest shortly.

P.


compressed-datum-slicing-20190101a.patch
Description: Binary data


Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Thomas Munro
On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty  wrote:
> I'm very interested in this patch, and particularly in possible
> extensions to push the Bloom filter down on the probe side of the join.  I
> made a few small edits to the patch to enable it to compile on PG11, and can
> send it to you if you're interested.

Hi Jim,

Would you compute the hash for the outer tuples in the scan, and then
again in the Hash Join when probing, or would you want to (somehow)
attach the hash to emitted tuples for later reuse by the higher node?
Someone pointed out to me off-list that a popular RDBMS emanating from
the bicycle capital of the North-West pushes down Bloom filters to
scans, but only when the key is a non-nullable integer; I wonder if
that is because they hash in both places, but consider that OK only
when it's really cheap to do so.  (Along the same lines, if we could
attach extra data to tuples, I wonder if it would make sense to
transmit sort support information to higher nodes, so that (for
example) GatherMerge could use it to avoid full key comparison when
dealing with subplans that already did a sort and computed integers
for fast inequality checks.)

> It is currently in the list of patches for the current commitfest, but
> based on your previous post I'm not sure if you're planning to get back to
> this patch just now.  If you plan to resume work on it, I'll sign up as a
> reviewer.

I'm also signed up to review.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Hash Joins vs. Bloom Filters / take 2

2018-11-01 Thread Tomas Vondra
On 11/01/2018 10:06 PM, Thomas Munro wrote:
> On Fri, Nov 2, 2018 at 9:23 AM Jim Finnerty  wrote:
>> I'm very interested in this patch, and particularly in possible
>> extensions to push the Bloom filter down on the probe side of the join.  I
>> made a few small edits to the patch to enable it to compile on PG11, and can
>> send it to you if you're interested.
> 
> Hi Jim,
> 
> Would you compute the hash for the outer tuples in the scan, and then
> again in the Hash Join when probing, or would you want to (somehow)
> attach the hash to emitted tuples for later reuse by the higher node?
> Someone pointed out to me off-list that a popular RDBMS emanating from
> the bicycle capital of the North-West pushes down Bloom filters to
> scans, but only when the key is a non-nullable integer; I wonder if
> that is because they hash in both places, but consider that OK only
> when it's really cheap to do so.  (Along the same lines, if we could
> attach extra data to tuples, I wonder if it would make sense to
> transmit sort support information to higher nodes, so that (for
> example) GatherMerge could use it to avoid full key comparison when
> dealing with subplans that already did a sort and computed integers
> for fast inequality checks.)
> 
>> It is currently in the list of patches for the current commitfest, but
>> based on your previous post I'm not sure if you're planning to get back to
>> this patch just now.  If you plan to resume work on it, I'll sign up as a
>> reviewer.
> 
> I'm also signed up to review.
> 

I haven't really planned to work on this anytime soon, unfortunately,
which is why I proposed to mark it as RwF at the end of the last CF. I
already have a couple other patches there, and (more importantly) I
don't have a very clear idea how to implement the filter pushdown.

That being said I still think it'd be an interesting improvement, and if
someone wants to take over I'm available as a reviewer / tester ...


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
So it seems like there are two ways we could go about this.  One is
to go back to the scheme of adding an -isysroot switch to CPPFLAGS,
where it'd have global effects.  We could make this slightly less
painful for scenarios like Jakob's if we set things up in Makefile.global
this way:

CPPFLAGS = -isysroot $(PG_SYSROOT)  
PG_SYSROOT = 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk

and then, if you need to build on a different SDK version without
reconfiguring, you can do something like "make PG_SYSROOT=/proper/path".
I coded this up, as attached, and it seems to work but it's still not all
that friendly for such cases.

The other idea that's occurred to me is to go back to the scheme of
commit 68fc227dd, where we inject the sysroot path into just the -I
switches used for PL/Perl and PL/Tcl.  We could improve on that
commit by injecting it symbolically similar to what I did here, ie
what ends up in the configure output is

PG_SYSROOT = 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk

perl_includespec = -I 
$(PG_SYSROOT)/System/Library/Perl/5.18/darwin-thread-multi-2level/CORE

Then somebody who wants to build on a different SDK version still needs
to do "make PG_SYSROOT=/proper/path", but only if they're trying to
build PL/Perl or related extensions.  So this second way seems uglier
in some sense but less likely to cause problems for most people.

Either way I guess we'd need to document it rather than just hoping
it's invisible.

Thoughts?

regards, tom lane

diff --git a/configure b/configure
index 43ae8c8..0686941 100755
*** a/configure
--- b/configure
*** ac_includes_default="\
*** 627,632 
--- 627,633 
  
  ac_subst_vars='LTLIBOBJS
  vpath_build
+ PG_SYSROOT
  PG_VERSION_NUM
  PROVE
  FOP
*** _ACEOF
*** 18815,18820 
--- 18816,18830 
  
  
  
+ # If we are inserting PG_SYSROOT into CPPFLAGS, do so symbolically not
+ # literally, so that it's possible to override it at build time using
+ # a command like "make ... PG_SYSROOT=path".  This has to be done after
+ # we've finished all configure checks that depend on CPPFLAGS.
+ if test x"$PG_SYSROOT" != x; then
+   CPPFLAGS=`echo "$CPPFLAGS" | sed -e "s| $PG_SYSROOT | \\\$(PG_SYSROOT) |"`
+ fi
+ 
+ 
  
  # Begin output steps
  
diff --git a/configure.in b/configure.in
index 519ecd5..7586deb 100644
*** a/configure.in
--- b/configure.in
*** $AWK '{printf "%d%04d", $1, $2}'`"]
*** 2357,2362 
--- 2357,2371 
  AC_DEFINE_UNQUOTED(PG_VERSION_NUM, $PG_VERSION_NUM, [PostgreSQL version as a number])
  AC_SUBST(PG_VERSION_NUM)
  
+ # If we are inserting PG_SYSROOT into CPPFLAGS, do so symbolically not
+ # literally, so that it's possible to override it at build time using
+ # a command like "make ... PG_SYSROOT=path".  This has to be done after
+ # we've finished all configure checks that depend on CPPFLAGS.
+ if test x"$PG_SYSROOT" != x; then
+   CPPFLAGS=`echo "$CPPFLAGS" | sed -e "s| $PG_SYSROOT | \\\$(PG_SYSROOT) |"`
+ fi
+ AC_SUBST(PG_SYSROOT)
+ 
  
  # Begin output steps
  
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index bdf394b..218c65a 100644
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
*** BITCODE_CXXFLAGS = @BITCODE_CXXFLAGS@
*** 241,246 
--- 241,247 
  
  CPP = @CPP@
  CPPFLAGS = @CPPFLAGS@
+ PG_SYSROOT = @PG_SYSROOT@
  
  override CPPFLAGS := $(ICU_CFLAGS) $(CPPFLAGS)
  
diff --git a/src/template/darwin b/src/template/darwin
index 159d8bb..c05adca 100644
*** a/src/template/darwin
--- b/src/template/darwin
***
*** 3,16 
  # Note: Darwin is the original code name for macOS, also known as OS X.
  # We still use "darwin" as the port name, partly because config.guess does.
  
! # Some configure tests require explicit knowledge of where the Xcode "sysroot"
! # is.  We try to avoid having this leak into configure's results, though.
  if test x"$PG_SYSROOT" = x"" ; then
PG_SYSROOT=`xcodebuild -version -sdk macosx Path 2>/dev/null`
  fi
  # Old xcodebuild versions may produce garbage, so validate the result.
  if test x"$PG_SYSROOT" != x"" ; then
!   if test \! -d "$PG_SYSROOT" ; then
  PG_SYSROOT=""
fi
  fi
--- 3,17 
  # Note: Darwin is the original code name for macOS, also known as OS X.
  # We still use "darwin" as the port name, partly because config.guess does.
  
! # Select where system include files should be sought.
  if test x"$PG_SYSROOT" = x"" ; then
PG_SYSROOT=`xcodebuild -version -sdk macosx Path 2>/dev/null`
  fi
  # Old xcodebuild versions may produce garbage, so validate the result.
  if test x"$PG_SYSROOT" != x"" ; then
!   if test -d "$PG_SYSROOT" ; then
! CPPFLAGS="-isysroot $PG_SYSROOT $CPPFLAGS"
!   else
  PG_SYSROOT=""
fi
  fi


Re: Compressed TOAST Slicing

2018-11-01 Thread Stephen Frost
Greetings,

* Paul Ramsey (pram...@cleverelephant.ca) wrote:
> The attached patch adds in a code path to do a partial decompression of the
> TOAST entry, when the requested slice is at the start of the object.

Neat!

> As usual, doing less work is faster.

Definitely.

> Interesting note to motivate a follow-on patch: the substr() function does
> attempt to slice, but the left() function does not. So, if this patch is
> accepted, next patch will be to left() to add slicing behaviour.

Makes sense to me.

There two things that I wonder about in the patch- if it would be of any
use to try and allocate on a need basis instead of just allocating the
whole chunk up to the toast size, and secondly, why we wouldn't consider
handling a non-zero offset.  A non-zero offset would, of course, still
require decompressing from the start and then just throwing away what we
skip over, but we're going to be doing that anyway, aren't we?  Why not
stop when we get to the end, at least, and save ourselves the trouble of
decompressing the rest and then throwing it away.

> If nobody lights me on fire, I'll submit to commitfest shortly.

Sounds like a good idea to me.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: CF app feature request

2018-11-01 Thread Magnus Hagander
On Thu, Nov 1, 2018 at 2:44 PM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> Yesterday Fabien and I submitted the same item to the Commitfest (1859
> and 1860). Unfortunately there doesn't seem to be any way for one of
> these to be withdrawn. "Rejected" and "Returned with Feedback" seem
> wrong. Ideally, there would be a way for someone who submits an item in
> error to withdraw it, at which point it should be as if it had never
> been submitted.
>
> Meanwhile, what's the advice about how to deal with these?
>
>
Are you thinking basically another status that's "Withdrawn", but keeping
it, or actually removing the records completely?

//Magnus


Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Fabien COELHO




Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

|...]


Thanks a lot for this investigation! I can fill in a gcc bug report. There
would be a enormous work to narrow it down to a small test case, it is
unclear how they can act about it, but at least they would know.


Have you done so? If so, what's the bug number?


Not yet. There was no answer to my suggestion, so I did not feel like 
that urgent. I'll try to do it over next week.


--
Fabien.



Re: CF app feature request

2018-11-01 Thread Andrew Dunstan




On 11/01/2018 05:50 PM, Magnus Hagander wrote:



On Thu, Nov 1, 2018 at 2:44 PM Andrew Dunstan 
> wrote:



Yesterday Fabien and I submitted the same item to the Commitfest
(1859
and 1860). Unfortunately there doesn't seem to be any way for one of
these to be withdrawn. "Rejected" and "Returned with Feedback" seem
wrong. Ideally, there would be a way for someone who submits an
item in
error to withdraw it, at which point it should be as if it had never
been submitted.

Meanwhile, what's the advice about how to deal with these?


Are you thinking basically another status that's "Withdrawn", but 
keeping it, or actually removing the records completely?






I don't know enough about the app internals to comment. But it probably 
shouldn't appear in the stats, or else should have its own category in 
the stats.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: PG vs macOS Mojave

2018-11-01 Thread Peter Eisentraut
On 01/11/2018 22:17, Tom Lane wrote:
> The other idea that's occurred to me is to go back to the scheme of
> commit 68fc227dd, where we inject the sysroot path into just the -I
> switches used for PL/Perl and PL/Tcl.  We could improve on that
> commit by injecting it symbolically similar to what I did here, ie
> what ends up in the configure output is

How does that work when building against a non-system Perl or Tcl?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
Peter Eisentraut  writes:
> On 01/11/2018 22:17, Tom Lane wrote:
>> The other idea that's occurred to me is to go back to the scheme of
>> commit 68fc227dd, where we inject the sysroot path into just the -I
>> switches used for PL/Perl and PL/Tcl.  We could improve on that
>> commit by injecting it symbolically similar to what I did here, ie
>> what ends up in the configure output is

> How does that work when building against a non-system Perl or Tcl?

It does nothing, because configure will not find that it needs to
inject any sysroot reference in order to find such a Perl or Tcl's
headers.

regards, tom lane



Re: CF app feature request

2018-11-01 Thread Tom Lane
Andrew Dunstan  writes:
> On 11/01/2018 05:50 PM, Magnus Hagander wrote:
>> Are you thinking basically another status that's "Withdrawn", but 
>> keeping it, or actually removing the records completely?

> I don't know enough about the app internals to comment. But it probably 
> shouldn't appear in the stats, or else should have its own category in 
> the stats.

A separate "Withdrawn" status seems like it'd cover more cases than
a "make like it never happened" action.

regards, tom lane



Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> On 01/11/2018 22:17, Tom Lane wrote:
>>> The other idea that's occurred to me is to go back to the scheme of
>>> commit 68fc227dd, where we inject the sysroot path into just the -I
>>> switches used for PL/Perl and PL/Tcl.  We could improve on that
>>> commit by injecting it symbolically similar to what I did here, ie
>>> what ends up in the configure output is

>> How does that work when building against a non-system Perl or Tcl?

> It does nothing, because configure will not find that it needs to
> inject any sysroot reference in order to find such a Perl or Tcl's
> headers.

Here's a lightly-tested patch for that approach.

regards, tom lane

diff --git a/configure b/configure
index 43ae8c8..db50751 100755
*** a/configure
--- b/configure
*** CPP
*** 731,736 
--- 731,737 
  BITCODE_CXXFLAGS
  BITCODE_CFLAGS
  CFLAGS_VECTOR
+ PG_SYSROOT
  LLVM_BINPATH
  LLVM_CXXFLAGS
  LLVM_CFLAGS
*** unset CXXFLAGS
*** 5261,5266 
--- 5262,5270 
  #
  . "$srcdir/src/template/$template" || exit
  
+ # Record PG_SYSROOT in Makefile.global, if set by user or template.
+ 
+ 
  # C[XX]FLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
*** PL/Perl." "$LINENO" 5
*** 9776,9786 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must use -iwithsysroot to look
!   # under $PG_SYSROOT
if test \! -f "$perl_archlibexp/CORE/perl.h" ; then
  if test -f "$PG_SYSROOT$perl_archlibexp/CORE/perl.h" ; then
!   perl_includespec="-iwithsysroot $perl_archlibexp/CORE"
  fi
fi
  
--- 9780,9789 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must look under $PG_SYSROOT instead
if test \! -f "$perl_archlibexp/CORE/perl.h" ; then
  if test -f "$PG_SYSROOT$perl_archlibexp/CORE/perl.h" ; then
!   perl_includespec="-I$PG_SYSROOT$perl_archlibexp/CORE"
  fi
fi
  
*** eval TCL_SHARED_BUILD=\"$TCL_SHARED_BUIL
*** 18115,18120 
--- 18118,18128 
as_fn_error $? "cannot build PL/Tcl because Tcl is not a shared library
  Use --without-tcl to disable building PL/Tcl." "$LINENO" 5
  fi
+ # Some macOS versions report an include spec that uses -iwithsysroot.
+ # We don't really want to use -isysroot, so translate that if we can.
+ if test x"$PG_SYSROOT" != x; then
+ TCL_INCLUDE_SPEC="`echo "$TCL_INCLUDE_SPEC" | sed "s|-iwithsysroot */|-I$PG_SYSROOT/|"`"
+ fi
  # now that we have TCL_INCLUDE_SPEC, we can check for 
  ac_save_CPPFLAGS=$CPPFLAGS
  CPPFLAGS="$TCL_INCLUDE_SPEC $CPPFLAGS"
*** fi
*** 18127,18132 
--- 18135,18147 
  
  
  CPPFLAGS=$ac_save_CPPFLAGS
+ # If we are inserting PG_SYSROOT into TCL_INCLUDE_SPEC, do so symbolically
+ # not literally, so that it's possible to override it at build time using a
+ # command like "make ... PG_SYSROOT=path".  This has to be done after we've
+ # finished all configure checks that depend on TCL_INCLUDE_SPEC.
+ if test x"$PG_SYSROOT" != x; then
+   TCL_INCLUDE_SPEC=`echo "$TCL_INCLUDE_SPEC" | sed -e "s|-I$PG_SYSROOT/|-I\\\$(PG_SYSROOT)/|"`
+ fi
  fi
  
  # check for 
*** rm -f core conftest.err conftest.$ac_obj
*** 18176,18181 
--- 18191,18203 
  conftest$ac_exeext conftest.$ac_ext
LIBS=$pgac_save_LIBS
CPPFLAGS=$ac_save_CPPFLAGS
+   # If we are inserting PG_SYSROOT into perl_includespec, do so symbolically
+   # not literally, so that it's possible to override it at build time using a
+   # command like "make ... PG_SYSROOT=path".  This has to be done after we've
+   # finished all configure checks that depend on perl_includespec.
+   if test x"$PG_SYSROOT" != x; then
+ perl_includespec=`echo "$perl_includespec" | sed -e "s|-I$PG_SYSROOT/|-I\\\$(PG_SYSROOT)/|"`
+   fi
  fi
  
  # check for 
diff --git a/configure.in b/configure.in
index 519ecd5..b895f7d 100644
*** a/configure.in
--- b/configure.in
*** unset CXXFLAGS
*** 404,409 
--- 404,412 
  #
  . "$srcdir/src/template/$template" || exit
  
+ # Record PG_SYSROOT in Makefile.global, if set by user or template.
+ AC_SUBST(PG_SYSROOT)
+ 
  # C[XX]FLAGS are selected so:
  # If the user specifies something in the environment, that is used.
  # else:  If the template file set something, that is used.
*** PL/Perl.])
*** 1046,1056 
fi
# On most platforms, archlibexp is also where the Perl include files live ...
perl_includespec="-I$perl_archlibexp/CORE"
!   # ... but on newer macOS versions, we must use -iwithsysroot to look
!   # under $PG_SYSROOT
if test \! -f

Re: Compressed TOAST Slicing

2018-11-01 Thread Paul Ramsey
On Thu, Nov 1, 2018 at 2:29 PM Stephen Frost  wrote:

> Greetings,
>
> * Paul Ramsey (pram...@cleverelephant.ca) wrote:
> > The attached patch adds in a code path to do a partial decompression of
> the
> > TOAST entry, when the requested slice is at the start of the object.
>
> There two things that I wonder about in the patch- if it would be of any
> use to try and allocate on a need basis instead of just allocating the
> whole chunk up to the toast size,


I'm not sure what I was thinking when I rejected allocating the slice size
in favour of the whole uncompressed size... I cannot see why that wouldn't
work.


> and secondly, why we wouldn't consider
> handling a non-zero offset.  A non-zero offset would, of course, still
> require decompressing from the start and then just throwing away what we
> skip over, but we're going to be doing that anyway, aren't we?  Why not
> stop when we get to the end, at least, and save ourselves the trouble of
> decompressing the rest and then throwing it away.
>

I was worried about changing the pg_lz code too much because it scared me,
but debugging some stuff made me read it more closely so I fear it less
now, and doing interior slices seems not unreasonable, so I will give it a
try.

P


Re: PG vs macOS Mojave

2018-11-01 Thread Tom Lane
I wrote:
> Then somebody who wants to build on a different SDK version still needs
> to do "make PG_SYSROOT=/proper/path", but only if they're trying to
> build PL/Perl or related extensions.  So this second way seems uglier
> in some sense but less likely to cause problems for most people.

> Either way I guess we'd need to document it rather than just hoping
> it's invisible.

Here's a proposed doc patch for this second approach.  It'd still mostly
work for the first approach, if we drop the sentence about it only
mattering for PL/Perl and PL/Tcl.

I failed to resist the temptation to mutter something about SIP while
at it.  AFAIK that problem isn't mentioned anywhere else.

regards, tom lane

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 4487d0c..1c87eef 100644
*** a/doc/src/sgml/installation.sgml
--- b/doc/src/sgml/installation.sgml
*** PHSS_30849  s700_800 u2comp/be/plugin li
*** 2513,2518 
--- 2513,2559 
 

  
+   
+macOS
+ 
+
+ macOS
+ installation on
+
+ 
+
+ On recent macOS releases, it's necessary to
+ embed the sysroot path in the include switches used to
+ find some system header files.  This results in the outputs of
+ the configure script varying depending on
+ which SDK version was used during configure.
+ That shouldn't pose any problem in simple scenarios, but if you are
+ trying to do something like building an extension on a different machine
+ than the server code was built on, you may need to force use of a
+ different sysroot path.  To do that, set PG_SYSROOT,
+ for example
+ 
+ make PG_SYSROOT=/desired/path
+ 
+ To find out the appropriate path on your machine, run
+ 
+ xcodebuild -version -sdk macosx Path
+ 
+ Currently, this only affects Perl and Tcl header files, so it should
+ only matter if you're building PL/Perl, PL/Tcl, or a related extension.
+
+ 
+
+ macOS's System Integrity
+ Protection (SIP) feature breaks make check,
+ because it prevents passing the needed setting
+ of DYLD_LIBRARY_PATH down to the executables being
+ tested.  You can work around that by doing make
+ install before make check.
+ Most Postgres developers just turn off SIP, though.
+
+   
+ 

 MinGW/Native Windows
  

Re: replication_slots usability issue

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 10:54:23AM -0700, Andres Freund wrote:
> On 2018-11-01 09:34:05 +0900, Michael Paquier wrote:
> That has absolutely nothing to do with the issue at hand though, so I
> don't think it'd have made much sense to do it at the same time. Nor do
> I think it's particularly important.

Thanks for the confirmation.

>> I don't mind doing so myself if you agree with the change, only on
>> HEAD as you seemed to disagree about changing that on back-branches.
> 
> Cool. And yes, I don't think a cosmetic log level adjustment that could
> affect people's scripts should be backpatched without need. Even if not
> particularly likely to break something.

No issues with your arguments.  I did the change this way.

>> Also, from 691d79a which you just committed:
>> +   ereport(FATAL,
>> +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +errmsg("logical replication slots \"%s\" exists, but 
>> wal_level < logical",
>> +   NameStr(cp.slotdata.name)),
>> I can see one grammar mistake here, as you refer to only one slot here.
>> The error messages should read:
>> "logical replication slot \"%s\" exists, but wal_level < logical"
>> and:
>> "physical replication slot \"%s\" exists, but wal_level < replica"
> 
> Darnit. Fixed. Thanks.

And thanks for fixing that.
--
Michael


signature.asc
Description: PGP signature


Re: Compressed TOAST Slicing

2018-11-01 Thread Tom Lane
Paul Ramsey  writes:
> On Thu, Nov 1, 2018 at 2:29 PM Stephen Frost  wrote:
>> and secondly, why we wouldn't consider
>> handling a non-zero offset.  A non-zero offset would, of course, still
>> require decompressing from the start and then just throwing away what we
>> skip over, but we're going to be doing that anyway, aren't we?  Why not
>> stop when we get to the end, at least, and save ourselves the trouble of
>> decompressing the rest and then throwing it away.

> I was worried about changing the pg_lz code too much because it scared me,
> but debugging some stuff made me read it more closely so I fear it less
> now, and doing interior slices seems not unreasonable, so I will give it a
> try.

I think Stephen was just envisioning decompressing from offset 0 up to
the end of what's needed, and then discarding any data before the start
of what's needed; at least, that's what'd occurred to me.  It sounds like
you were thinking about hacking pg_lz to not write the leading data
anywhere.  While that'd likely be a win for cases where there was leading
data to discard, I'm worried about adding any cycles to the inner loops
of the decompressor.  We don't want to pessimize every other use of pg_lz
to buy a little bit for these cases.

regards, tom lane



Re: INSTALL file

2018-11-01 Thread Andreas 'ads' Scherbaum

On 01.11.18 18:41, Stephen Frost wrote:

Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:

On 01.11.18 07:26, Michael Paquier wrote:

It includes links to the website, as well as the short version of the
installation instructions.

+The installation instructions are listed on the website:
+
+https://www.postgresql.org/docs/current/static/install-short.html

I don't think we should link to the "Short version" directly here, the
above URL should be the "installation.html" one..  With a caveat, see
below.


How about the attached one? Picked up your draft, and cleaned it up a bit.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

diff --git a/README b/README
index 12de3f1d73..1f33303d80 100644
--- a/README
+++ b/README
@@ -13,14 +13,94 @@ PostgreSQL has many language interfaces, many of which are listed here:
 
 	https://www.postgresql.org/download
 
-See the file INSTALL for instructions on how to build and install
-PostgreSQL.  That file also lists supported operating systems and
-hardware platforms and contains information regarding any other
-software packages that are required to build or run the PostgreSQL
-system.  Copyright and license information can be found in the
-file COPYRIGHT.  A comprehensive documentation set is included in this
-distribution; it can be read as described in the installation
-instructions.
+
+Building on UNIX
+
+
+Detailed instructions for many Unix platforms are available here:
+https://www.postgresql.org/docs/current/static/installation.html
+
+To build PostgreSQL on most Unix variants, the following are required:
+
+GNU make, version 3.8 or newer
+ISO/ANSI C compilar (at least C99-compliant)
+Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
+Perl 5.8.3 (for building from git)
+
+PostgreSQL has many additional capabilities which can be enabled using
+configure --enable switches but many of those also depend on additional
+libraries.  See the installation instructions for details.
+
+To build PostgreSQL, run the following commands:
+
+./configure
+make
+
+PostgreSQL can then be installed using 'make install', which will
+require being a superuser to install into the default directory.
+The installation location can be changed by passing '--prefix' to
+'configure'.  Run './configure --help' for additional options.
+
+
+Building on Windows
+===
+
+Detailed instructions for building on Windows is available here:
+https://www.postgresql.org/docs/current/static/install-windows.html
+
+To build PostgreSQL on Windows, either Visual Studio Express 2017
+for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
+installed.  PostgreSQL can also be build using MinGW or Cygwin using
+the Unix instructions.
+
+There are different requirements for building on a 32-bit or 64-bit
+environment, check the documentation for details.
+
+
+Initializing your Database
+==
+
+Once the PostgreSQL software is installed, the first step to having a
+running database is to initialize a PostgreSQL database using the
+'initdb' command, eg:
+
+initdb -D /path/to/mydatabase
+
+Where '/path/to/mydatabase' is the directory where the database is
+going to be installed. This directory can exist, but must be empty.
+If it does not exist, 'initdb' will create it.
+
+After the database system has been initialized, PostgreSQL can be
+started by using the pg_ctl command:
+
+pg_ctl -D /path/to/mydatabase -l logfile start
+
+Once PostgreSQL is running, you can connect to it using the psql
+command-line client.  A default database called 'postgres' was created
+by 'initdb'.
+
+
+Building the PostgreSQL Documentation
+=
+
+Full documentation for PostgreSQL is available online here:
+https://www.postgresql.org/docs/current/static/index.html
+
+PostgreSQL uses DocBook to build the documentation. Therefore the
+DocBook tools must be installed. In addition, a working Java
+installation is required.
+
+To build PostgreSQL's documentation on Unix, run:
+
+./configure
+make docs
+
+The documentation, once built by 'make docs', will be available in
+various formats in the 'doc/src/sgml' directory.
+
+
+Download
+
 
 The latest version of this software may be obtained at
 https://www.postgresql.org/download/.  For more information look at our
PostgreSQL Database Management System
=

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.

PostgreSQL has many language interfaces, many of which a

Re: INSTALL file

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:41:33PM -0400, Stephen Frost wrote:
> I'm not sure that I'm really following this, because we aren't pointing
> to the development documentation, just the 'current' documentation, and
> that seems fine, and there's links on that page to the other versions of
> the page for each major version of PostgreSQL, in case someone pulled an
> older branch or such.

Imagine for example that Postgres switches from ./configure to cmake,
this makes the current version of the documentation invalid on
back-branches, still they refer to it.  If we target that stuff for
beginners, they may just look into those docs, without thinking that
they actually need the version of the docs with the branch checked out.

My 2c.
--
Michael


signature.asc
Description: PGP signature


Re: Pluggable Storage - Andres's take

2018-11-01 Thread Haribabu Kommi
On Wed, Oct 31, 2018 at 9:34 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Mon, 29 Oct 2018 at 05:56, Haribabu Kommi 
> wrote:
> >
> >> This problem couldn't be reproduced on the master branch, so I've tried
> to
> >> investigate it. It comes from nodeModifyTable.c:1267, when we've got
> >> HeapTupleInvisible as a result, and this value in turn comes from
> >> table_lock_tuple. Everything points to the new way of handling
> HeapTupleUpdated
> >> result from heap_update, when table_lock_tuple call was introduced.
> Since I
> >> don't see anything similar in the master branch, can anyone clarify why
> is this
> >> lock necessary here?
> >
> >
> > In the master branch code also, there is a tuple lock that is happening
> in
> > EvalPlanQual() function, but pluggable-storage code, the lock is kept
> outside
> > and also function call rearrangements, to make it easier for the table
> access
> > methods to provide their own MVCC implementation.
>
> Yes, now I see it, thanks. Also I can confirm that the attached patch
> solves
> this issue.
>

Thanks for the testing and confirmation.


> FYI, alongside with reviewing the code changes I've ran few performance
> tests
> (that's why I hit this issue with pgbench in the first place). In case of
> high
> concurrecy so far I see small performance degradation in comparison with
> the
> master branch (about 2-5% of average latency, depending on the level of
> concurrency), but can't really say why exactly (perf just shows barely
> noticeable overhead there and there, maybe what I see is actually a
> cumulative
> impact).
>

Thanks for sharing your observation, I will also analyze and try to find
out performance
bottlenecks that are causing the overhead.

Here I attached the cumulative fixes of the patches, new API additions for
zheap and
basic outline of the documentation.

Regards,
Haribabu Kommi
Fujitsu Australia


0003-First-draft-of-pluggable-storage-documentation.patch
Description: Binary data


0002-New-API-s-are-added.patch
Description: Binary data


0001-Further-fixes-and-cleanup.patch
Description: Binary data


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 12:59:47PM -0400, Robert Haas wrote:
> If you do the same analysis for pg_start_backup(), you'll immediately
> notice that it calls get_backup_status(), and if you look at that
> function, you'll see that it returns the value of a global variable.
> If you check parallel.c, you'll find that that global variable is not
> one of the special ones that gets synchronized between a leader and
> the workers.  Therefore, if you ran this function in a worker, it
> might do the wrong thing, because it might get the wrong value of that
> global variable.  So it's at least (and in fact exactly)
> parallel-restricted.  It doesn't need to be parallel-restricted
> because it's scary in some ill-defined way or any of these other
> reasons advanced on this thread -- it needs to be parallel-restricted
> because it *relies on a global variable that is not synchronized to
> parallel workers*.  If somebody writes code to move that state out of
> a global variables and into the main shared memory segment or to a DSM
> shared between the leader and the workers, then it can be marked
> parallel-safe (unless, for example, it also depends on OTHER global
> variables that are NOT moved into shared storage).  Otherwise not.

My name is all over the commit logs for this area, so I kind of heard
about what those things rely on..  Both exclusive and non-exclusive
backup interfaces are definitely unsafe to run across multiple processes
in the same query.  Well, unsafe is incorrect for the pg_proc
definition, that's restricted :)

> I hope I'm making sense here.

You actually do a lot, moving just one person with MP as initials to
consider moving the function as being parallel-safe.  Thanks for the
points you raised, what needs to be done looks clear now.
--
Michael


signature.asc
Description: PGP signature


Re: Commitfest 2018-11

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:19:20PM +0100, Dmitry Dolgov wrote:
> On Thu, 1 Nov 2018 at 13:05, Daniel Gustafsson  wrote:
>>
>> Dmitry: Are you still interested in running this commitfest?
> 
> Yes, I'm still interested. Do I need to have any special permissions
> in CF app for that (e.g. now I can't "push the button" to start the
> current one)?

Thanks Daniel for bringing up the topic, and to the person changed the
CF status in the app.  Happy hacking to all!
--
Michael


signature.asc
Description: PGP signature


partitioned indexes and tablespaces

2018-11-01 Thread Alvaro Herrera
Hello

A customer reported to us that partitioned indexes are not working
consistently with tablespaces:

1. When a CREATE INDEX specifies a tablespace, existing partitions get
the index in the correct tablespace; however, the parent index itself
does not record the tablespace.  So when new partitions are created
later, they get the index in the default tablespace instead of the
specified tablespace.  Fix by saving the tablespace in the pg_class row
for the parent index.

2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
raise an error indicating that it's not the correct relation kind.  In
order for this to actually work, we need bespoke code for ATExecCmd();
the code for all other relation kinds wants to move storage (and runs in
Phase 3, later), but these indexes do not have that.  Therefore, write a
cut-down version which is invoked directly in ATExecCmd instead.

3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
the above change.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 819dd64baf04d39f02785cde72b6a4c33fd10f3b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 1 Nov 2018 21:29:56 -0300
Subject: [PATCH] Fix tablespace handling for partitioned indexes

When creating partitioned indexes, the tablespace was not being saved
for the parent index. This meant that subsequently created partitions
would not use the right tablespace for their indexes.

ALTER INDEX SET TABLESPACE and ALTER INDEX ALL IN TABLESPACE also raised
errors when tried; fix them too.
---
 src/backend/catalog/heap.c| 10 -
 src/backend/commands/tablecmds.c  | 61 +--
 src/test/regress/input/tablespace.source  | 10 +
 src/test/regress/output/tablespace.source | 19 +-
 4 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 8c52a1543d..61ce44830b 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -297,7 +297,6 @@ heap_create(const char *relname,
 		case RELKIND_COMPOSITE_TYPE:
 		case RELKIND_FOREIGN_TABLE:
 		case RELKIND_PARTITIONED_TABLE:
-		case RELKIND_PARTITIONED_INDEX:
 			create_storage = false;
 
 			/*
@@ -306,6 +305,15 @@ heap_create(const char *relname,
 			 */
 			reltablespace = InvalidOid;
 			break;
+
+		case RELKIND_PARTITIONED_INDEX:
+			/*
+			 * Preserve tablespace so that it's used as tablespace for indexes
+			 * on future partitions.
+			 */
+			create_storage = false;
+			break;
+
 		case RELKIND_SEQUENCE:
 			create_storage = true;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 357c73073d..514c1c9d49 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -446,6 +446,8 @@ static bool ATPrepChangePersistence(Relation rel, bool toLogged);
 static void ATPrepSetTableSpace(AlteredTableInfo *tab, Relation rel,
 	const char *tablespacename, LOCKMODE lockmode);
 static void ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode);
+static void ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace);
+
 static void ATExecSetRelOptions(Relation rel, List *defList,
 	AlterTableType operation,
 	LOCKMODE lockmode);
@@ -3845,7 +3847,8 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			pass = AT_PASS_DROP;
 			break;
 		case AT_SetTableSpace:	/* SET TABLESPACE */
-			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX);
+			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX |
+ATT_PARTITIONED_INDEX);
 			/* This command never recurses */
 			ATPrepSetTableSpace(tab, rel, cmd->name, lockmode);
 			pass = AT_PASS_MISC;	/* doesn't actually matter */
@@ -4181,10 +4184,13 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			 */
 			break;
 		case AT_SetTableSpace:	/* SET TABLESPACE */
-
 			/*
-			 * Nothing to do here; Phase 3 does the work
+			 * Only do this for partitioned indexes, for which this is just
+			 * a catalog change.  Other relation types are handled by Phase 3.
 			 */
+			if (rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX)
+ATExecPartedIdxSetTableSpace(rel, tab->newTableSpace);
+
 			break;
 		case AT_SetRelOptions:	/* SET (...) */
 		case AT_ResetRelOptions:	/* RESET (...) */
@@ -11050,6 +11056,55 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 }
 
 /*
+ * Special handling of ALTER TABLE SET TABLESPACE for partitioned indexes,
+ * which have no storage (so not handled in Phase 3 like other relation types)
+ */
+static void
+ATExecPartedIdxSetTableSpace(Relation rel, Oid newTableSpace)
+{
+	HeapTuple	tuple;
+	Oid			oldTableSpace;
+	Relation	pg_class;
+	Form_pg_class rd_rel;
+	Oid			indexOid = RelationGetRelid(rel);
+
+	Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_INDEX);
+
+	/*
+	 * No work if no change in tablesp

Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 04:44:40PM +0530, Amit Kapila wrote:
> This sounds like a good argument for having a whitelist approach, but
> is it really a big problem if a user gets warning for files that the
> utility is not able to verify checksums for?  I think in some sense
> this message can be useful to the user as it can allow him to know
> which files are not verified by the utility for some form of
> corruption.  I guess one can say that users might not be interested in
> this information in which case such a check could be optional as you
> seem to be suggesting in the following paragraph.

The replication protocol supports NOVERIFY_CHECKSUMS to avoid the
warnings so they enabled by default, and can be disabled at will.
pg_basebackup supports the same interface.
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:
> How about the attached one? Picked up your draft, and cleaned it up a bit.

(unsurprisingly) this is looking pretty good to me.

A few additional notes:

> PostgreSQL Database Management System
> =
> 
> This directory contains the source code distribution of the PostgreSQL
> database management system.
> 
> PostgreSQL is an advanced object-relational database management system
> that supports an extended subset of the SQL standard, including
> transactions, foreign keys, subqueries, triggers, user-defined types
> and functions.  This distribution also contains C language bindings.
> 
> PostgreSQL has many language interfaces, many of which are listed here:
> 
>   https://www.postgresql.org/download

Honestly, I don't think the above link makes sense when we're saying
"here's where to go to download language interfaces."  We could maybe
link to https://www.postgresql.org/download/products/2-drivers-and-interfaces/
instead, but that link is really pretty messy, imv.

Instead, I'd just move the 'Download' section that's at the end up to
here.

> Building on Windows
> ===
> 
> Detailed instructions for building on Windows is available here:
> https://www.postgresql.org/docs/current/static/install-windows.html
> 
> To build PostgreSQL on Windows, either Visual Studio Express 2017
> for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
> installed.  PostgreSQL can also be build using MinGW or Cygwin using
> the Unix instructions.

The above should say "can also be built", not "build".

> Initializing your Database
> ==
> 
> Once the PostgreSQL software is installed, the first step to having a
> running database is to initialize a PostgreSQL database using the
> 'initdb' command, eg:
> 
> initdb -D /path/to/mydatabase

We probably shouldn't say 'eg' but should use language similar to what
we say above, which would be "run this command" or so.

> Where '/path/to/mydatabase' is the directory where the database is
> going to be installed. This directory can exist, but must be empty.
> If it does not exist, 'initdb' will create it.
> 
> After the database system has been initialized, PostgreSQL can be
> started by using the pg_ctl command:
> 
> pg_ctl -D /path/to/mydatabase -l logfile start
> 
> Once PostgreSQL is running, you can connect to it using the psql
> command-line client.  A default database called 'postgres' was created
> by 'initdb'.

I didn't include a full path intentionally, simply because paths look
different between Unix systems and Windows.  That said, I don't think
it's a big deal either way.

> Building the PostgreSQL Documentation
> =

I've been going back and forth on this, but it seems like building the
documentation should maybe be above the Initializing a database..?  I
could go either way on it.

> Download
> 
> 
> The latest version of this software may be obtained at
> https://www.postgresql.org/download/.  For more information look at our
> web site located at https://www.postgresql.org/.

I'd suggest we change this to be something like:

The latest version of this software, both in source code form and as
binary packages for many platforms, may be obtained at
https://www.postgresql.org/download/.  For more information please visit
https://www.postgresql.org/.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Thu, Nov 01, 2018 at 01:41:33PM -0400, Stephen Frost wrote:
> > I'm not sure that I'm really following this, because we aren't pointing
> > to the development documentation, just the 'current' documentation, and
> > that seems fine, and there's links on that page to the other versions of
> > the page for each major version of PostgreSQL, in case someone pulled an
> > older branch or such.
> 
> Imagine for example that Postgres switches from ./configure to cmake,
> this makes the current version of the documentation invalid on
> back-branches, still they refer to it.  If we target that stuff for
> beginners, they may just look into those docs, without thinking that
> they actually need the version of the docs with the branch checked out.

I'd say we're targeting beginner developers, not beginner computer
users.

That said, I suppose it wouldn't be all that hard to have a different
version of the README for each branch and to keep it updated as we make
releases.

One point I'll make though is that we do need to have the README checked
into git, since that's where most people are getting the source.

If we go down this route, the master branch should probably link to the
regularly built devel documentation, so that if/when we do make such a
change, we'll point people at the updated documentation too.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: CF app feature request

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 05:56:24PM -0400, Andrew Dunstan wrote:
> On 11/01/2018 05:50 PM, Magnus Hagander wrote:
>> Are you thinking basically another status that's "Withdrawn", but
>> keeping it, or actually removing the records completely?
> 
> I don't know enough about the app internals to comment. But it probably
> shouldn't appear in the stats, or else should have its own category in the
> stats.

Or that's closer to "Rejected by the author himself"?  "Withdrawn"
sounds like a good term for that, we surely don't want to simply remove
the thing entirely either.  What's actually the issue with not tracking
such things in the stats?
--
Michael


signature.asc
Description: PGP signature


Re: INSTALL file

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 08:42:34PM -0400, Stephen Frost wrote:
> If we go down this route, the master branch should probably link to the
> regularly built devel documentation, so that if/when we do make such a
> change, we'll point people at the updated documentation too.

I don't know how others feel about such things, so I may be
overengineering a simple problem as well :)

Also, I have not looked in details at the perl tools used to change the
version number in the tree, but we'd likely want a solution which
updates automatically the README also in this case depending on the
version number, perhaps also making sure that for a git repository with
the master branch in use we don't want to point to the development
version of the docs.
--
Michael


signature.asc
Description: PGP signature


Re: partitioned indexes and tablespaces

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
> A customer reported to us that partitioned indexes are not working
> consistently with tablespaces:

Let's see...

> 1. When a CREATE INDEX specifies a tablespace, existing partitions get
> the index in the correct tablespace; however, the parent index itself
> does not record the tablespace.  So when new partitions are created
> later, they get the index in the default tablespace instead of the
> specified tablespace.  Fix by saving the tablespace in the pg_class row
> for the parent index.

I may be missing something of course...  But partitioned tables don't
register the tablespace they are on either so as it cannot be used by
any partitions created on it:
=# create tablespace popo location '/home/ioltas/data/tbspace';
CREATE TABLESPACE
=# create table aa (a int) partition by list (a) tablespace popo;
CREATE TABLE
=# create table aa_1 partition of aa for values in (1) tablespace popo;
CREATE TABLE
=# create table aa_2 partition of aa for values in (2);
CREATE TABLE
=# select t.spcname, c.relname from pg_class c, pg_tablespace t
where c.oid > 16000 and c.reltablespace = t.oid;
 spcname | relname
-+-
 popo| aa_1
(1 row)

It seems to me that the current behavior is wanted in this case, because
partitioned tables and partitioned indexes have no physical storage.

> 2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
> raise an error indicating that it's not the correct relation kind.  In
> order for this to actually work, we need bespoke code for ATExecCmd();
> the code for all other relation kinds wants to move storage (and runs in
> Phase 3, later), but these indexes do not have that.  Therefore, write a
> cut-down version which is invoked directly in ATExecCmd instead.

Using the previous example, this does not raise an error:
alter table aa set tablespace popo;
However the reference to reltablespace in pg_class is not changed.  So I
would agree with your point to not raise an error and back-patch that,
but I don't agree with the point of changing reltablespace for a
partitioned index if that's what you mean.

> 3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
> the above change.

Reproducible with just the following stuff on top of the previous
example:
create index aai on aa(a);
alter index all in tablespace pg_default set tablespace popo;

In this case also raising an error is a bug, it seems to me that
partitioned indexes should just be ignored.

Could you add an entry in the next CF to not lose track of what is
discussed here?
--
Michael


signature.asc
Description: PGP signature


Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-01 Thread Michael Paquier
On Thu, Nov 01, 2018 at 01:04:43PM +0900, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 12:39:16PM +0900, Amit Langote wrote:
>> Rajkumar pointed out off-list that the patch still remains to be applied.
>> Considering that there is a planned point release on Nov 8, maybe we
>> should do something about this?
> 
> Yes doing something about that very soon would be a good idea.  Tom,
> are you planning to look at it or should I jump in?

And so I am looking at v3 now...

Adding a test case in temp.sql would be nice.

Would it make sense to support TRUNCATE on a materialized as well in the
future?  It seems to me that it is dangerous to assume that only
relations make use of heap_truncate_one_rel() anyway as modules or
external code could perfectly call it.  And the thing is documented
to work on a relation, including materialized views, not just an
ordinary table which is what RELKIND_RELATION only mentions.  On the
contrary we know that heap_truncate() works only on temporary
relations.  It is documented to do so and does so.

So it seems to me that Tom correctly mentioned to add the check in
heap_truncate, not heap_truncate_one_rel(), so v3 looks incorrect to
me.
--
Michael


signature.asc
Description: PGP signature


Re: CF app feature request

2018-11-01 Thread Andrew Dunstan




On 11/01/2018 08:40 PM, Michael Paquier wrote:

On Thu, Nov 01, 2018 at 05:56:24PM -0400, Andrew Dunstan wrote:

On 11/01/2018 05:50 PM, Magnus Hagander wrote:

Are you thinking basically another status that's "Withdrawn", but
keeping it, or actually removing the records completely?

I don't know enough about the app internals to comment. But it probably
shouldn't appear in the stats, or else should have its own category in the
stats.

Or that's closer to "Rejected by the author himself"?  "Withdrawn"
sounds like a good term for that, we surely don't want to simply remove
the thing entirely either.  What's actually the issue with not tracking
such things in the stats?



I don't have a strong opinion. It seemed to me that where something had 
been created in error it would be best simply to be able to undo that. 
But I can see arguments against.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: row filtering for logical replication

2018-11-01 Thread Euler Taveira
Em qui, 1 de nov de 2018 às 05:30, Erik Rijkers  escreveu:
> > I ran pgbench-over-logical-replication with a WHERE-clause and could
> > not get this to do a correct replication.  Below is the output of the
> > attached test program.
> >
> >
> > $ ./logrep_rowfilter.sh
>
Erik, thanks for testing.

> So it seems this bug is due to some timing error in your patch (or
> possibly in logical replication itself).
>
It is a bug in the new synchronization code. I'm doing some code
cleanup/review and will post a new patchset after I finish it. If you
want to give it a try again, apply the following patch.

diff --git a/src/backend/replication/logical/tablesync.c
b/src/backend/replication/logical/tablesync.c
index e0eb73c..4797e0b 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -757,7 +757,7 @@ fetch_remote_table_info(char *nspname, char *relname,

/* Fetch row filtering info */
resetStringInfo(&cmd);
-   appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
MyLogicalRepWorker->relid);
+   appendStringInfo(&cmd, "SELECT pg_get_expr(prrowfilter,
prrelid) FROM pg_publication p INNER JOIN pg_publication_rel pr ON
(p.oid = pr.prpubid) WHERE pr.prrelid = %u AND p.pubname IN (",
lrel->remoteid);


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: partitioned indexes and tablespaces

2018-11-01 Thread Amit Langote
Hi,

On 2018/11/02 10:27, Michael Paquier wrote:
> On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:
>> A customer reported to us that partitioned indexes are not working
>> consistently with tablespaces:
> 
> Let's see...
> 
>> 1. When a CREATE INDEX specifies a tablespace, existing partitions get
>> the index in the correct tablespace; however, the parent index itself
>> does not record the tablespace.  So when new partitions are created
>> later, they get the index in the default tablespace instead of the
>> specified tablespace.  Fix by saving the tablespace in the pg_class row
>> for the parent index.
> 
> I may be missing something of course...  But partitioned tables don't
> register the tablespace they are on either so as it cannot be used by
> any partitions created on it:
> =# create tablespace popo location '/home/ioltas/data/tbspace';
> CREATE TABLESPACE
> =# create table aa (a int) partition by list (a) tablespace popo;
> CREATE TABLE
> =# create table aa_1 partition of aa for values in (1) tablespace popo;
> CREATE TABLE
> =# create table aa_2 partition of aa for values in (2);
> CREATE TABLE
> =# select t.spcname, c.relname from pg_class c, pg_tablespace t
> where c.oid > 16000 and c.reltablespace = t.oid;
>  spcname | relname
> -+-
>  popo| aa_1
> (1 row)
> 
> It seems to me that the current behavior is wanted in this case, because
> partitioned tables and partitioned indexes have no physical storage.

Keith Fiske complained about this behavior for partitioned *tables* a few
months ago, which led to the following discussion:

https://www.postgresql.org/message-id/flat/CAKJS1f9PXYcT%2Bj%3DoyL-Lquz%3DScNwpRtmD7u9svLASUygBdbN8w%40mail.gmail.com

It's Michael's message that was the last one on that thread. :)

https://www.postgresql.org/message-id/20180413224007.GB27295%40paquier.xyz

By the way, if we decide to do something about this, I think we do the
same for partitioned tables.  There are more than one interesting
behaviors possible that are mentioned in the above thread for when
parent's reltablespace is set/changed.  For example, when it's set, the
existing partitions are not moved, but the new value only applies to
subsequently created partitions.  Considering various such behaviors, this
would seem like new feature work, which I'd suppose would only be
considered for 12.

>> 2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
>> raise an error indicating that it's not the correct relation kind.  In
>> order for this to actually work, we need bespoke code for ATExecCmd();
>> the code for all other relation kinds wants to move storage (and runs in
>> Phase 3, later), but these indexes do not have that.  Therefore, write a
>> cut-down version which is invoked directly in ATExecCmd instead.
>
> Using the previous example, this does not raise an error:
> alter table aa set tablespace popo;
> However the reference to reltablespace in pg_class is not changed.  So I
> would agree with your point to not raise an error and back-patch that,
> but I don't agree with the point of changing reltablespace for a
> partitioned index if that's what you mean.
>
>> 3. ALTER INDEX ALL IN TABLESPACE, identical problem, is also fixed by
>> the above change.
> 
> Reproducible with just the following stuff on top of the previous
> example:
> create index aai on aa(a);
> alter index all in tablespace pg_default set tablespace popo;
> 
> In this case also raising an error is a bug, it seems to me that
> partitioned indexes should just be ignored.

Same argument here too.

IOW, I agree with Michael that if something will be back-patched to 11, it
should be a small patch to make the unsupported relkind error go away.

Thanks,
Amit




Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-01 Thread Christian Ohler
On Wed, Oct 31, 2018 at 7:22 AM Tom Lane  wrote:

> If we're going to expose the
> internal format, let's just change the definition of the type's binary
> I/O format, thereby getting a win for purposes like COPY BINARY as well.
> We'd need to ensure that jsonb_recv could tell whether it was seeing the
> old or new format, but at worst that'd require prepending a header of
> some sort.  (In practice, I suspect we'd end up with a wire-format
> definition that isn't exactly the bits-on-disk, but something easily
> convertible to/from that and more easily verifiable by jsonb_recv.
> Numeric subfields, for instance, ought to match the numeric wire
> format, which IIRC isn't exactly the bits-on-disk either.)
>

How would this work from the driver's and application's perspective?  What
does the driver do when receiving JSONB data?

Applications currently receive JSON strings when reading JSONB data, and
the driver presumably has to stay compatible with that.  Does that mean the
driver transparently converts JSONB to JSON before handing it to the
application?  That scales better than doing it in Postgres itself, but is
still the kind of inefficiency we're trying to avoid.

We want to convert JSONB directly to language-native objects.  This
shouldn't be the responsibility of the Postgres driver, since the
conversion is complex and can be done in different ways, such as
instantiating objects from a class hierarchy vs instantiating generic
containers, or eager vs lazy conversion.  Applications that are sensitive
to JSONB performance likely want control over these aspects.  Postgres
drivers aren't coupled to specific JSON parsers; they shouldn't be coupled
to specific JSONB parsers either.

So, AFAICS, when the application requests JSONB data, the driver has to
hand it the raw JSONB bytes.  But that's incompatible with what currently
happens.  To preserve compatibility, does the application have to opt in by
setting some out-of-band per-query per-result-column flags to tell the
driver how it wants the JSONB data returned?  That's workable in principle
but bloats every driver's API with some rarely-used performance feature.
Seems much simpler to put this into the query.

The idea behind the proposal is to improve efficiency by avoiding
conversions, and the most straightforward way to do that is for every layer
to pass through the raw bytes.  With an explicit conversion to BYTEA in the
query, this is automatic without any changes to drivers, since every layer
already knows to leave BYTEAs untouched.

I don't have an argument against _also_ adding a binary format version 2
for JSONB once we define a portable JSONB format; but I am not sure it
alone solves the problem we have.


Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-11-01 Thread Steve Singer

On Thu, 1 Nov 2018, David Rowley wrote:



I ended up going with:

   However, this consideration only applies when
is minimal for
   non-partitioned tables as all commands must write WAL otherwise.





I've changed this to:

 Rows will be frozen only if the table being loaded has been created
 or truncated in the current subtransaction, there are no cursors
 open and there are no older snapshots held by this transaction.  It is
 currently not possible to perform a COPY FREEZE on
 a partitioned table.

Which is just the original text with the new sentence tagged onto the end.


Other than that the patch looks fine and works as expected.

The new status of this patch is: Waiting on Author


Thanks for looking at this. I've attached an updated patch.


I am happy with the changes.
I think the patch is ready for a committer.




--
David Rowley   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Steve




  1   2   >