date:20170621

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-06-21 Thread Heikki Linnakangas


On 06/16/2017 01:24 PM, Shubham Barai wrote:

@@ -497,6 +499,13 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
for (ptr = dist->next; ptr; ptr = ptr->next)
UnlockReleaseBuffer(ptr->buffer);
}
+
+   for (ptr = dist; ptr; ptr = ptr->next)
+   PredicateLockPageSplit(rel,
+   BufferGetBlockNumber(buffer),
+   
BufferGetBlockNumber(ptr->buffer));
+
+


I think this new code needs to go before the UnlockReleaseBuffer() calls 
above. Calling BufferGetBlockNumber() on an already-released buffer is 
not cool.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-06-21 Thread Heikki Linnakangas


On 06/21/2017 10:41 AM, Heikki Linnakangas wrote:

On 06/16/2017 01:24 PM, Shubham Barai wrote:

@@ -497,6 +499,13 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
for (ptr = dist->next; ptr; ptr = ptr->next)
UnlockReleaseBuffer(ptr->buffer);
}
+
+   for (ptr = dist; ptr; ptr = ptr->next)
+   PredicateLockPageSplit(rel,
+   BufferGetBlockNumber(buffer),
+   
BufferGetBlockNumber(ptr->buffer));
+
+


I think this new code needs to go before the UnlockReleaseBuffer() calls
above. Calling BufferGetBlockNumber() on an already-released buffer is
not cool.


.. and that's exactly what you fixed in your updated patch. Sorry for 
the noise :-)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-21 Thread Mahendranath Gurram

Hi Thomas,

I like the whole idea. 

In fact, i understood this in your very first response itself. 

Only thing is every time i have to check for dsa_attached or not. I mean 
get_my_shared_state() is NULL or not.

To avoid that check, i tried creating it in _PG_Init(postmaster process itself) 
all that stuff :(

Anyways the solution you suggested will work for me. I'll live with that if 
check. 

Thanks a lot for all your help.

I'll implement this and update the status.

Cheers :)

Thanks & Best Regards,

-Mahi
Teamwork divides the task and multiplies the success.

 On Wed, 21 Jun 2017 12:02:30 +0530 Thomas Munro 
 wrote 

On Wed, Jun 21, 2017 at 5:27 PM, Mahendranath Gurram 

 wrote: 

> Initially i tried to design the same way. 

> I mean, i have created a background worker and created dsa in it. 

> I tried to attach/detach to the same dsa/dsm by all the backends(postgres 

> clients/connections) during backend(client/connection) init/destroy. 

> I didn't find any triggers or callbacks during backend init/close to 

> attach/detach the dsa/dsm. Hence, i took this approach. 

> If postgres have any such triggers/callbacks available, please let me 
know, 

> that is of great great help for me. 

> 

> Anyways now i understood, i have taken a wrong approach to use dsa. I'll 
try 

> to figure out any other way to build my in-memory index over postgres. 

You definitely can use DSA or DSM for this. As a matter of fact 

shared in-memory indexing was one of our target use cases. You just 

have to jump through a few hoops... Here's one approach: 

1. Use _PG_init and the shmem hook to reserve a little bit of 

traditional shared memory and initialise it to zero. This will be 

used just to share the DSA handle, but you can't actually create the 

DSA area in postmaster. In other words, this little bit of shared 

memory is for "discovery", since it can be looked up by name from any 

backend. 

2. In each backend that wants to use your new in-memory index system, 

you need to be able to attach or create the DSA area on-demand. 

Perhaps you could have a get_my_shared_state() function (insert better 

name) that uses a static local variable to hold a pointer to some 

state. If it's NULL, you know you need to create the state. That 

should happen only once in each backend, the first time through the 

function. In that case you need to create or attach to the DSA area 

as appropriate, which you should wrap in 

LWLockAcquire(AddinShmemInitLock, 

LW_EXCLUSIVE)/LWLockRelease(AddinShmemInitLock) to serialise the code 

block. First, look up the bit of traditional shared memory to see if 

there is a DSA handle published in it already. If there is you can 

attach. If there isn't, you are the first so you need to create, and 

publish the handle for others to attach to. Remember whatever state 

you need to remember, such as the dsa_area, in static local variables 

so that all future calls to get_my_shared_state() in that backend will 

be fast. 

If you do it that way, then it doesn't matter whether it's background 

workers or foreground processes: whoever is first to call 

get_my_shared_state() will create the DSA area (and whatever else you 

want to do to set things up). All later callers will attach to the 

existing one. But each backend only ever has to enter the locked code 

path once, and from then on it's fast and there's no lock. 

Note that you could skip step 1 and not require a preload shared 

library. Then you'd be creating the 'discovery' shmem region on 

demand too! Just make sure it's in the locked region and pay 

attention to the 'found' output variable to decide whether you're 

first and need to initialise it. Skipping step 1 is very slightly 

against the rules though, because you'd be using a little piece of 

memory that you didn't tell the postmaster to reserve space for with 

RequestAddinShmemSpace. It's very small though, so you might decide 

that's OK... 

If you don't like the idea of creating that shmem stuff on demand, you 

could look into using session_preload_libraries as a way to get a 

'backend init' hook. 

-- 

Thomas Munro 

http://www.enterprisedb.com 

-- 

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) 

To make changes to your subscription: 

http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-06-21 Thread Shubham Barai

Hi,

On 21 June 2017 at 13:11, Heikki Linnakangas  wrote:

> On 06/16/2017 01:24 PM, Shubham Barai wrote:
>
>> @@ -497,6 +499,13 @@ gistplacetopage(Relation rel, Size freespace,
>> GISTSTATE *giststate,
>> for (ptr = dist->next; ptr; ptr = ptr->next)
>> UnlockReleaseBuffer(ptr->buffer);
>> }
>> +
>> +   for (ptr = dist; ptr; ptr = ptr->next)
>> +   PredicateLockPageSplit(rel,
>> +
>>  BufferGetBlockNumber(buffer),
>> +
>>  BufferGetBlockNumber(ptr->buffer));
>> +
>> +
>>
>
> I think this new code needs to go before the UnlockReleaseBuffer() calls
> above. Calling BufferGetBlockNumber() on an already-released buffer is not
> cool.
>
> - Heikki
>
> I know that. This is the old version of the patch. I had sent updated
patch later. Please have a look at updated patch.

Regards,
Shubham



   Sent with Mailtrack



Predicate-Locking-in-Gist-index_2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ASOF join

2017-06-21 Thread Thomas Munro

On Mon, Jun 19, 2017 at 11:57 PM, Konstantin Knizhnik
 wrote:
> I attached simple patch adding ASOF join to Postgres. Right now it support
> only outer join and requires USING clause (consequently it is not possible
> to join two tables which joi keys has different names. May be it is also
> possible to support ON clause with condition written like o.k1 = i.k2 AND
> o.k2 = i.k2 AND ... AND o.kN >= i.kN
> But such notation can be confusing, because join result includes only one
> matching inner record with kN smaller or equal than kN of outer record and
> not all such records.
> As alternative we can add specia

Hmm.  Yeah, I see the notational problem.  It's hard to come up with a
new syntax that has SQL nature.  What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy?  Queries like this:

WITH ticks(time, price) AS
   (VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
   ('2017-07-21 11:00:00'::timestamptz, 150.00)),
 times(time) AS
   (VALUES ('2017-07-19 12:00:00'::timestamptz),
   ('2017-07-20 12:00:00'::timestamptz),
   ('2017-07-21 12:00:00'::timestamptz),
   ('2017-07-22 12:00:00'::timestamptz))

SELECT times.time, previous_tick.price
  FROM times
  LEFT JOIN LATERAL (SELECT * FROM ticks
  WHERE ticks.time <= times.time
  ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
 ORDER BY times.time;

  time  | price
+
 2017-07-19 12:00:00+12 |
 2017-07-20 12:00:00+12 | 100.00
 2017-07-21 12:00:00+12 | 150.00
 2017-07-22 12:00:00+12 | 150.00
(4 rows)

I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing.  "Get me the latest ... as of time
...".

It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Etsuro Fujita

Commit d3cc37f1d801a6b5cad9bf179274a8d767f1ee50 added this to 
ExecInitModifyTable:


+   /* The root table RT index is at the head of the partitioned_rels 
list */

+   if (node->partitioned_rels)
+   {
+   Index   root_rti;
+   Oid root_oid;
+
+   root_rti = linitial_int(node->partitioned_rels);
+   root_oid = getrelid(root_rti, estate->es_range_table);
+   rel = heap_open(root_oid, NoLock);  /* locked by InitPlan */
+   }

but I noticed that that function doesn't use the relation descriptor at 
all.  Since partitioned_rels is given in case of an UPDATE/DELETE on a 
partitioned table, the relation is opened in that case, but the relation 
descriptor isn't referenced at all in initializing WITH CHECK OPTION 
constraints and/or RETURNING projections.  (The mtstate->resultRelinfo 
array is referenced in those initialization, instead.)  So, I'd like to 
propose to remove this from that function.  Attached is a patch for that.


Best regards,
Etsuro Fujita
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
***
*** 45,51 
  #include "foreign/fdwapi.h"
  #include "miscadmin.h"
  #include "nodes/nodeFuncs.h"
- #include "parser/parsetree.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "utils/builtins.h"
--- 45,50 
***
*** 1767,1786  ExecInitModifyTable(ModifyTable *node, EState *estate, int 
eflags)
  
estate->es_result_relation_info = saved_resultRelInfo;
  
-   /* The root table RT index is at the head of the partitioned_rels list 
*/
-   if (node->partitioned_rels)
-   {
-   Index   root_rti;
-   Oid root_oid;
- 
-   root_rti = linitial_int(node->partitioned_rels);
-   root_oid = getrelid(root_rti, estate->es_range_table);
-   rel = heap_open(root_oid, NoLock);  /* locked by 
InitPlan */
-   }
-   else
-   rel = mtstate->resultRelInfo->ri_RelationDesc;
- 
/* Build state for INSERT tuple routing */
if (operation == CMD_INSERT &&
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
--- 1766,1773 
  
estate->es_result_relation_info = saved_resultRelInfo;
  
/* Build state for INSERT tuple routing */
+   rel = mtstate->resultRelInfo->ri_RelationDesc;
if (operation == CMD_INSERT &&
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
***
*** 1959,1968  ExecInitModifyTable(ModifyTable *node, EState *estate, int 
eflags)
mtstate->ps.ps_ExprContext = NULL;
}
  
-   /* Close the root partitioned rel if we opened it above. */
-   if (rel != mtstate->resultRelInfo->ri_RelationDesc)
-   heap_close(rel, NoLock);
- 
/*
 * If needed, Initialize target list, projection and qual for ON 
CONFLICT
 * DO UPDATE.
--- 1946,1951 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSOC][weekly report 3] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-21 Thread Mengxing Liu


> From: "Heikki Linnakangas" 
>
> Hmm. The hash table ought to speed up the RWConflictExists() function 
> right? Where in the flame graph is RWConflictExists()? If it only 
> accounts for a small amount of the overall runtime, even drastic speedup 
> there won't make much difference to the total.
>

Yes. It is much smaller than we have expected. I did a small test  for HTAB and 
linked list: 
when the set size is smaller than 10, linked list is as quick as HTAB. Here is 
the result.
(x microseconds running 10 million searching) 

setSize: 5, LIST: 423569, HTAB: 523681
setSize: 10, LIST: 540579, HTAB: 430111
setSize: 15, LIST: 752879, HTAB: 429519
setSize: 20, LIST: 940792, HTAB: 431388
setSize: 25, LIST: 1163760, HTAB: 446043
setSize: 30, LIST: 1352990, HTAB: 429057
setSize: 35, LIST: 1524097, HTAB: 431547
setSize: 40, LIST: 1714976, HTAB: 429023

As we can see, the hash table can only benefit in a very extreme situation. 
However, even if there are 100 concurrent connections, the average length of 
conflict 
list is about 10. linked list is not the bottleneck. 

> 
> Nope, there is no such function, I'm afraid.
> 

Oh, that's really a bad news.

> > In a previous email, I reported that many backends wait for the lock 
> > “SerializableFinishedListLock”;
> > If we don't implement functions like “ReleaseOneSerializableXact” quickly, 
> > they will be the bottleneck.
> 
> Yeah, that's probably what's causing the decrease in throughput you're 
> seeing.
> 

An new evidence: I use "SELECT wait_event_type, wait_event FROM 
pg_stat_activity;" and sum by event_type to analyze the wait event.

The result of original version:
SerializableXactHashLock 27
predicate_lock_manager 512
WALWriteLock 3
SerializableFinishedListLock 402

The result of new version:
SerializableXactHashLock 38
predicate_lock_manager 473
WALWriteLock 3
SerializableFinishedListLock 1068

Obviously, more backends are blocked by SerializableFinishedListLock. 

>
> You might need to also keep the linked lists, and use the hash table to 
> only look up particular items in the linked list faster.
> 
> I was surprised to see that you're creating a lot of smallish hash 
> tables, three for each PredXact entry. I would've expected all the 
> PredXact entries to share the same hash tables, i.e. have only three 
> hash tables in total. That would be more flexible in allocating 
> resources among entries. (It won't help with the quick-release, though)
>

Yes, it would looks more elegant and require less memory resources. 
( because hash table objects also consume memory )
But just for performance, it would be less efficient than my patch. 
Because it has to maintain linked lists, besides hash tables.  In other words,
it does more works than my patch.

Another point is that removing linked list may have more opportunities to reduce
lock contentions. Otherwise, we need a global lock to protect the linked list. 

--
Sincerely


Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Comment typo in execMain.c

2017-06-21 Thread Etsuro Fujita

Here is a patch to fix a typo in a comment in ExecWithCheckOptions(): 
s/as as/as/.


Best regards,
Etsuro Fujita
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7f460bd..9dbe175 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2093,8 +2093,8 @@ ExecWithCheckOptions(WCOKind kind, ResultRelInfo 
*resultRelInfo,
 * the permissions on the relation 
(that is, if the user
 * could view it directly anyway).  For 
RLS violations, we
 * don't include the data since we 
don't know if the user
-* should be able to view the tuple as 
as that depends on
-* the USING policy.
+* should be able to view the tuple as 
that depends on the
+* USING policy.
 */
case WCO_VIEW_CHECK:
insertedCols = 
GetInsertedColumns(resultRelInfo, estate);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-21 Thread Heikki Linnakangas


(I'm cleaning up my inbox, hence the delayed reply)

On 08/02/2016 10:51 PM, Robert Haas wrote:

On Tue, Aug 2, 2016 at 2:33 PM, Bruce Momjian  wrote:

On Tue, Jul 26, 2016 at 05:42:43PM -0400, Chapman Flack wrote:

Even so, I'd be curious whether it would break anything to have
xlp_pageaddr simply set to InvalidXLogRecPtr in the dummy zero
pages written to fill out a segment. At least until it's felt
that archive_timeout has been so decidedly obsoleted by streaming
replication that it is removed, and the log-tail zeroing code
with it.

That at least would eliminate the risk of anyone else repeating
my astonishment. :)  I had read that 9.4 added built-in log-zeroing
code, and my first reaction was "cool! that may make the compression
technique we're using unnecessary, but certainly can't make it worse"
only to discover that it did, by ~ 300x, becoming now 3x *worse* than
plain gzip, which itself is ~ 100x worse than what we had.


My guess is that the bytes are there to detect problems where a 512-byte
disk sector is zeroed by a disk failure.  I don't see use changing that
for the use-case you have described.


Is there actually any code that makes such a check?

I'm inclined to doubt that was the motivation, though admittedly we're
both speculating about the contents of Heikki's brain, a tricky
proposition on a good day.


Given that we used to just leave them as garbage, it seems pretty safe 
to zero them out now.


It's kind of nice that all the XLOG pages have valid page headers. One 
way to think of the WAL switch record is that it's a very large WAL 
record that just happens to consume the rest of the WAL segment. Except 
that it's not actually represented like that; the xl_tot_len field of an 
XLOG switch record does not include the zeroed out portion. Instead, 
there's special handling in the reader code, that skips to the end of 
the segment when it sees a switch record. So that point is moot.


When I wrote that code, I don't remember if I realized that we're 
initializing the page headers, or if I thought that it's good enough 
even if we do. I guess I didn't realize it, because a comment would've 
been in order if it was intentional.


So +1 on fixing that, a patch would be welcome. In the meanwhile, have 
you tried using a different compression program? Something else than 
gzip might do a better job at the almost zero pages.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comment typo in execMain.c

2017-06-21 Thread Heikki Linnakangas


On 06/21/2017 11:35 AM, Etsuro Fujita wrote:

Here is a patch to fix a typo in a comment in ExecWithCheckOptions():
s/as as/as/.


Fixed, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Amit Langote

Fujita-san,

On 2017/06/21 16:59, Etsuro Fujita wrote:
> Commit d3cc37f1d801a6b5cad9bf179274a8d767f1ee50 added this to
> ExecInitModifyTable:
> 
> +   /* The root table RT index is at the head of the partitioned_rels list */
> +   if (node->partitioned_rels)
> +   {
> +   Index   root_rti;
> +   Oid root_oid;
> +
> +   root_rti = linitial_int(node->partitioned_rels);
> +   root_oid = getrelid(root_rti, estate->es_range_table);
> +   rel = heap_open(root_oid, NoLock);  /* locked by InitPlan */
> +   }
> 
> but I noticed that that function doesn't use the relation descriptor at
> all.  Since partitioned_rels is given in case of an UPDATE/DELETE on a
> partitioned table, the relation is opened in that case, but the relation
> descriptor isn't referenced at all in initializing WITH CHECK OPTION
> constraints and/or RETURNING projections.  (The mtstate->resultRelinfo
> array is referenced in those initialization, instead.)  So, I'd like to
> propose to remove this from that function.  Attached is a patch for that.

Thanks for cleaning that up.  I cannot see any problem in applying the patch.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Amit Langote

On 2017/06/21 3:53, Robert Haas wrote:
> On Tue, Jun 20, 2017 at 2:54 AM, Amit Khandekar  
> wrote:
>>> I guess I don't see why it should work like this.  In the INSERT case,
>>> we must build withCheckOption objects for each partition because those
>>> partitions don't appear in the plan otherwise -- but in the UPDATE
>>> case, they're already there, so why do we need to build anything at
>>> all?  Similarly for RETURNING projections.  How are the things we need
>>> for those cases not already getting built, associated with the
>>> relevant resultRelInfos?  Maybe there's a concern if some children got
>>> pruned - they could turn out later to be the children into which
>>> tuples need to be routed. But the patch makes no distinction
>>> between possibly-pruned children and any others.
>>
>> Yes, only a subset of the partitions appear in the UPDATE subplans. I
>> think typically for updates, a very small subset of the total leaf
>> partitions will be there in the plans, others would get pruned. IMHO,
>> it would not be worth having an optimization where it opens only those
>> leaf partitions which are not already there in the subplans. Without
>> the optimization, we are able to re-use the INSERT infrastructure
>> without additional changes.
> 
> Well, that is possible, but certainly not guaranteed.  I mean,
> somebody could do a whole-table UPDATE, or an UPDATE that hits a
> smattering of rows in every partition; e.g. the table is partitioned
> on order number, and you do UPDATE lineitem SET product_code = 'K372B'
> WHERE product_code = 'K372'.
> 
> Leaving that aside, the point here is that you're rebuilding
> withCheckOptions and returningLists that have already been built in
> the planner.  That's bad for two reasons.  First, it's inefficient,
> especially if there are many partitions.  Second, it will amount to a
> functional bug if you get a different answer than the planner did.
> Note this comment in the existing code:
> 
> /*
>  * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
>  * that we didn't build the withCheckOptionList for each partition within
>  * the planner, but simple translation of the varattnos for each partition
>  * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
>  * cases are handled above.
>  */
> 
> The comment "UPDATE/DELETE cases are handled above" is referring to
> the code that initializes the WCOs generated by the planner.  You've
> modified the comment in your patch, but the associated code: your
> updated comment says that only "DELETEs and local UPDATES are handled
> above", but in reality, *all* updates are still handled above.  And
> then they are handled again here.  Similarly for returning lists.
> It's certainly not OK for the comment to be inaccurate, but I think
> it's also bad to redo the work which the planner has already done,
> even if it makes the patch smaller.

I guess this has to do with the UPDATE turning into DELETE+INSERT.  So, it
seems like WCOs are being initialized for the leaf partitions
(ResultRelInfos in the mt_partitions array) that are in turn are
initialized for the aforementioned INSERT.  That's why the term "...local
UPDATEs" in the new comment text.

If that's true, I wonder if it makes sense to apply what would be
WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
by calling ExecInsert()?

> Also, I feel like it's probably not correct to use the first result
> relation as the nominal relation for building WCOs and returning lists
> anyway.  I mean, if the first result relation has a different column
> order than the parent relation, isn't this just broken?  If it works
> for some reason, the comments don't explain what that reason is.

Yep, it's more appropriate to use
ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
is, if answer to the question I raised above is positive.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-21 Thread sanyam jain

Hi,
After changing
sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
to
sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;

I was facing another issue.
On promotion of a cascaded server ThisTimeLineID in the standby server having 
logical slot becomes 0.
Then i added a function call to GetStandbyFlushRecPtr in 
StartLogicalReplication which updates ThisTimeLineID.

After the above two changes timeline following is working.But i'm not sure 
whether this is correct or not.In any case please someone clarify.

Thanks,
Sanyam Jain

Re: [HACKERS] ASOF join

2017-06-21 Thread Konstantin Knizhnik




On 21.06.2017 11:00, Thomas Munro wrote:

Hmm.  Yeah, I see the notational problem.  It's hard to come up with a
new syntax that has SQL nature.  What if... we didn't use a new syntax
at all, but recognised existing queries that are executable with this
strategy?  Queries like this:

WITH ticks(time, price) AS
(VALUES ('2017-07-20 12:00:00'::timestamptz, 100.00),
('2017-07-21 11:00:00'::timestamptz, 150.00)),
  times(time) AS
(VALUES ('2017-07-19 12:00:00'::timestamptz),
('2017-07-20 12:00:00'::timestamptz),
('2017-07-21 12:00:00'::timestamptz),
('2017-07-22 12:00:00'::timestamptz))

SELECT times.time, previous_tick.price
   FROM times
   LEFT JOIN LATERAL (SELECT * FROM ticks
   WHERE ticks.time <= times.time
   ORDER BY ticks.time DESC LIMIT 1) previous_tick ON true
  ORDER BY times.time;

   time  | price
+
  2017-07-19 12:00:00+12 |
  2017-07-20 12:00:00+12 | 100.00
  2017-07-21 12:00:00+12 | 150.00
  2017-07-22 12:00:00+12 | 150.00
(4 rows)

I haven't used LATERAL much myself but I've noticed that it's often
used to express this type of thing.  "Get me the latest ... as of time
...".

It'd a bit like the way we recognise EXISTS (...) as a semi-join and
execute it with a join operator instead of having a SEMI JOIN syntax.
On the other hand it's a bit more long winded, extreme and probably
quite niche.
Thank you for this idea. I agree that it is the best way of implementing 
ASOF join - just as optimization of standard SQL query.
But do you think that still it will be good idea to extend SQL syntax 
with ASOF JOIN ... USING ... clause? It will significantly simplify 
writing queries like above
and IMHO doesn't introduce some confusions with standard SQL syntax. My 
primary idea of suggesting ASOF join for Postgres was not  just building 
more efficient plan (using merge join instead of nested loop) but also 
simplifying writing of such queries. Or do you think that nobody will be 
interested in non-standard SQL extensions?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rules on table partitions

2017-06-21 Thread Dean Rasheed

On 20 June 2017 at 03:01, Amit Langote  wrote:
> Hmm, yes.  The following exercise convinced me.
>
> create table r (a int) partition by range (a);
> create table r1 partition of r for values from (1) to (10);
> create rule "_RETURN" as on select to r1 do instead select * from r;
>
> insert into r values (1);
> ERROR:  cannot insert into view "r1"
> HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT
> trigger or an unconditional ON INSERT DO INSTEAD rule.
>
> The error is emitted by CheckValidResultRel() that is called on individual
> leaf partitions when setting up tuple-routing in ExecInitModifyTable.
>
> I agree that we should forbid this case, so please find attached a patch.
>

Committed.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for CSN based snapshots

2017-06-21 Thread Alexander Kuzmenkov


On 21.06.2017 04:48, Michael Paquier wrote:

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.


Hi Michael,

Glad to see you working on this! I've been studying this topic too. 
Attached you can find a recently rebased version of Heikki's v4 patch.
I also fixed a bug that appeared on report-receipts isolation test: 
XidIsConcurrent should report that a transaction is concurrent to ours 
when its csn equals our snapshotcsn.


There is another bug where multiple tuples with the same primary key 
appear after a pgbench read-write test. Querying pgbench_branches with 
disabled index scans, I see several tuples with the same 'bid' value. 
Usually they do not touch the index so there are no errors. But 
sometimes HOT update is not possible and it errors out with violation of 
unique constraint, which is how I noticed it. I can't reproduce it on my 
machine and have to use a 72-core one.


For now I can conclude that the oldestActiveXid is not always updated 
correctly. In TransactionIdAsyncCommitTree, just before the transaction 
sets clog status, its xid can be less than oldestActiveXid. Other 
transactions are seeing it as aborted for some time before it writes to 
clog (see TransactionIdGetCommitSeqNo). They can update a tuple it 
deleted, and that leads to duplication. Unfortunately, I didn't have 
time yet to investigate this further.

--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



csn-5.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Yugo Nagata

Hi,

I have found that we can cancel/terminate autovacuum launchers and
background worker processes by pg_cancel/terminate_backend function.
I'm wondering this behavior is not expected and if not I want to fix it.


The current pg_stat_activity shows background workers and autovacuum
lancher as below. It made me come up with the question.

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  | wait_event  |backend_type 
---+-+-
 30902 | LogicalLauncherMain | background worker
 30900 | AutoVacuumMain  | autovacuum launcher
 30923 | | client backend
 30898 | BgWriterMain| background writer
 30897 | CheckpointerMain| checkpointer
 30899 | WalWriterMain   | walwriter
(6 rows)

We cannot use pg_terminate/cancel_backend for most processes
except client backends. For example, when I tried to terminate
the background writer, I got a warning and failed.

postgres=# select pg_terminate_backend(30899);
WARNING:  PID 30899 is not a PostgreSQL server process
 pg_terminate_backend 
--
 f
(1 row)

However, we can terminate background workers by pg_terminate_backend.
In the following example, I terminated the logical replication launcher,
and this process did not appear again[1]. 

postgres=# select pg_terminate_backend(30902);
 pg_terminate_backend 
--
 t
(1 row)

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  |wait_event |backend_type 
---+---+-
 30900 | AutoVacuumMain| autovacuum launcher
 30923 |   | client backend
 30898 | BgWriterHibernate | background writer
 30897 | CheckpointerMain  | checkpointer
 30899 | WalWriterMain | walwriter
(5 rows)

Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
but a new process is restarted by postmaster in this case.[2]

postgres=# select pg_terminate_backend(30900);
 pg_terminate_backend 
--
 t
(1 row)

postgres=# select pid, wait_event, backend_type from pg_stat_activity ;
  pid  |wait_event |backend_type 
---+---+-
 32483 | AutoVacuumMain| autovacuum launcher
 30923 |   | client backend
 30898 | BgWriterHibernate | background writer
 30897 | CheckpointerMain  | checkpointer
 30899 | WalWriterMain | walwriter
(5 rows)


My question is whether the behavior of pg_terminate/cancel_backend is
expected. If these functions should succeed only for client backends,
we need to fix the behavior. Attached is a patch to fix it in that case.

In my patch, process type is checked in pg_signal_backend(), and if it is
background worker or autovacuum launcher then throw a warning and fail. 
BackendPidGetProc() returns valid PGPROC for proccesses that are initialized
by PostgresInit(), and, in my understand, all such proccess are client
backends, background workers, and autovacuum launcher. So, if this is
neither background woker nor autovacuum launcher, this should be
a normal client backend. For this check, I added a new field,
isAutoVacuumLauncher, to PGPROC.

Any comments would be appreciated.

-
[1]
AFAIK, we have to restart the server to enable logical replication after this.
I'm not sure this is expected, but I found the following comment in
ProcessInterrupts(). Does "can be stopped at any time" mean that we can
drop this process completely?

2852 else if (IsLogicalLauncher())
2853 {
2854 ereport(DEBUG1,
2855 (errmsg("logical replication launcher shutting 
down")));
2856 
2857 /* The logical replication launcher can be stopped at any 
time. */
2858 proc_exit(0);
2859 }

When the logical replication launcher receive SIGTERM, this exits with 
exitstatus 0,
so this is not restarted by the postmaster.

[2]
On the other hand, when we use pg_cancel_backend for autovacuum launcher,
it causes the following error. I'll report the detail in another thread.

 ERROR:  can't attach the same segment more than once

-

Regards,

-- 
Yugo Nagata 


pg_terminate_backend.pach
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Yugo Nagata

Hi,

As I report in another thread[1], I found the autovacuum launcher occurs
the following error in PG 10 when this received SIGINT. I can repuroduce
this by pg_cancel_backend or `kill -2 `.

2017-06-21 13:56:07.010 JST [32483] ERROR:  canceling statement due to user 
request
2017-06-21 13:56:08.022 JST [32483] ERROR:  can't attach the same segment more 
than once
2017-06-21 13:56:09.034 JST [32483] ERROR:  can't attach the same segment more 
than once
2017-06-21 13:56:10.045 JST [32483] ERROR:  can't attach the same segment more 
than once
...

This errors continue until this process is terminated or the server is 
restarted.

When SIGINT is issued, the process exits from the main loop and returns
to sigsetjmp, and calls dsa_attach() before entering into the loop again,
this causes the error. 

We can fix it by calling dsa_attach() before sigsetjmp. Attached is the patch. 

Regards,

[1] 
https://www.postgresql.org/message-id/20170621205657.61d90605.nagata%40sraoss.co.jp

-- 
Yugo Nagata 
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 89dd3b3..8a41a98 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -502,6 +502,28 @@ AutoVacLauncherMain(int argc, char *argv[])
 	MemoryContextSwitchTo(AutovacMemCxt);
 
 	/*
+	 * Set up our DSA so that backends can install work-item requests.  It may
+	 * already exist as created by a previous launcher.
+	 */
+	if (!AutoVacuumShmem->av_dsa_handle)
+	{
+		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
+		AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
+		/* make sure it doesn't go away even if we do */
+		dsa_pin(AutoVacuumDSA);
+		dsa_pin_mapping(AutoVacuumDSA);
+		AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
+		/* delay array allocation until first request */
+		AutoVacuumShmem->av_workitems = InvalidDsaPointer;
+		LWLockRelease(AutovacuumLock);
+	}
+	else
+	{
+		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
+		dsa_pin_mapping(AutoVacuumDSA);
+	}
+
+	/*
 	 * If an exception is encountered, processing resumes here.
 	 *
 	 * This code is a stripped down version of PostgresMain error recovery.
@@ -610,28 +632,6 @@ AutoVacLauncherMain(int argc, char *argv[])
 	 */
 	rebuild_database_list(InvalidOid);
 
-	/*
-	 * Set up our DSA so that backends can install work-item requests.  It may
-	 * already exist as created by a previous launcher.
-	 */
-	if (!AutoVacuumShmem->av_dsa_handle)
-	{
-		LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
-		AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
-		/* make sure it doesn't go away even if we do */
-		dsa_pin(AutoVacuumDSA);
-		dsa_pin_mapping(AutoVacuumDSA);
-		AutoVacuumShmem->av_dsa_handle = dsa_get_handle(AutoVacuumDSA);
-		/* delay array allocation until first request */
-		AutoVacuumShmem->av_workitems = InvalidDsaPointer;
-		LWLockRelease(AutovacuumLock);
-	}
-	else
-	{
-		AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
-		dsa_pin_mapping(AutoVacuumDSA);
-	}
-
 	/* loop until shutdown request */
 	while (!got_SIGTERM)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan



On 06/20/2017 08:30 PM, Tom Lane wrote:
> Michael Paquier  writes:
>> On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
>>> Yeah, I thought it would work fine with Makefile-using Windows toolchains.
>>> But people who use MSVC need something else, no?
>> Are there that many anyway who care?
> Well, *I* don't care, but I thought we wanted to support Windows-using
> developers as reasonably first-class citizens.
>
>   



I imagine we can rig up something fairly simple based on Craig Ringer's
recent work in building extensions on Windows using cmake.

I'll take a look when I get a chance, but I don't think it's a high
priority.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-21 Thread Jeevan Ladhe

Hi,

Sorry for being away from here.
I had some issues with my laptop, and I have resumed working on this.

On Thu, Jun 15, 2017 at 1:21 AM, Robert Haas  wrote:

> On Wed, Jun 14, 2017 at 8:02 AM, Jeevan Ladhe
>  wrote:
> > Here are the details of the patches in attached zip.
> > 0001. refactoring existing ATExecAttachPartition  code so that it can be
> > used for
> > default partitioning as well
> > 0002. support for default partition with the restriction of preventing
> > addition
> > of any new partition after default partition.
> > 0003. extend default partitioning support to allow addition of new
> > partitions.
> > 0004. extend default partitioning validation code to reuse the refactored
> > code
> > in patch 0001.
>
> I think the core ideas of this patch are pretty solid now.  It's come
> a long way in the last month.  High-level comments:
>

Thanks Robert for looking into this.


> - Needs to be rebased over b08df9cab777427fdafe633ca7b8abf29817aa55.
>

Will rebase.


> - Still no documentation.
> - Should probably be merged with the patch to add default partitioning
> for ranges.
>
Will try to get this soon.

Regards,
Jeevan Ladhe

[HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Yugo Nagata

Hi,

In the documentation[1], there is the following description:

 "pg_stat_activity does not show an entry for the Startup process"

However, the current pg_stat_activity show startup process's entry.

postgres=# select pid, backend_type from pg_stat_activity ;
  pid  |   backend_type
---+---
 27314 | client backend
 27103 | startup
 27113 | background writer
 27112 | checkpointer
 27115 | walreceiver
(5 rows)

Attached is a patch for the documentation fix.

Regards,

[1]
26.5.3. Administrator's Overview
https://www.postgresql.org/docs/10/static/hot-standby.html#hot-standby-admin


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-21 Thread Jeevan Ladhe

Hi Amit,

On Thu, Jun 15, 2017 at 12:31 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> Oops, I meant to send one more comment.
>
> On 2017/06/15 15:48, Amit Langote wrote:
> > BTW, I noticed the following in 0002
> +errmsg("there exists a default
> partition for table \"%s\", cannot
> add a new partition",
>
> This error message style seems novel to me.  I'm not sure about the best
> message text here, but maybe: "cannot add new partition to table \"%s\"
> with default partition"
>

This sounds confusing to me, what about something like:
"\"%s\" has a default partition, cannot add a new partition."

Note that this comment belongs to patch 0002, and it will go away
in case we are going to have extended functionality i.e. patch 0003,
as in that patch we allow user to create a new partition even in the
cases when there exists a default partition.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-21 Thread Jeevan Ladhe

Thanks Ashutosh and Kyotaro for reviewing further.
I shall address your comments in next version of my patch.

Regards,
Jeevan Ladhe

On Fri, Jun 16, 2017 at 1:46 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello, I'd like to review this but it doesn't fit the master, as
> Robert said. Especially the interface of predicate_implied_by is
> changed by the suggested commit.
>
> Anyway I have some comment on this patch with fresh eyes.  I
> believe the basic design so my comment below are from a rather
> micro viewpoint.
>
> At Thu, 15 Jun 2017 16:01:53 +0900, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote in  34ff801bf...@lab.ntt.co.jp>
> > Oops, I meant to send one more comment.
> >
> > On 2017/06/15 15:48, Amit Langote wrote:
> > > BTW, I noticed the following in 0002
> > +  errmsg("there exists a default
> partition for table \"%s\", cannot
> > add a new partition",
> >
> > This error message style seems novel to me.  I'm not sure about the best
> > message text here, but maybe: "cannot add new partition to table \"%s\"
> > with default partition"
> >
> > Note that the comment applies to both DefineRelation and
> > ATExecAttachPartition.
>
> - Considering on how canSkipPartConstraintValidation is called, I
>   *think* that RelationProvenValid() would be better.  (But this
>   would be disappear by rebasing..)
>
> - 0002 changes the interface of get_qual_for_list, but left
>   get_qual_for_range alone.  Anyway get_qual_for_range will have
>   to do the similar thing soon.
>
> - In check_new_partition_bound, "overlap" and "with" is
>   completely correlated with each other. "with > -1" means
>   "overlap = true". So overlap is not useless. ("with" would be
>   better to be "overlap_with" or somehting if we remove
>   "overlap")
>
> - The error message of check_default_allows_bound is below.
>
>   "updated partition constraint for default partition \"%s\"
>would be violated by some row"
>
>   This looks an analog of validateCheckConstraint but as my
>   understanding this function is called only when new partition
>   is added. This would be difficult to recognize in the
>   situation.
>
>   "the default partition contains rows that should be in
>the new partition: \"%s\""
>
>   or something?
>
> - In check_default_allows_bound, the iteration over partitions is
>   quite similar to what validateCheckConstraint does. Can we
>   somehow share validateCheckConstraint with this function?
>
> - In the same function, skipping RELKIND_PARTITIONED_TABLE is
>   okay, but silently ignoring RELKIND_FOREIGN_TABLE doesn't seem
>   good. I think at least some warning should be emitted.
>
>   "Skipping foreign tables in the defalut partition. It might
>contain rows that should be in the new partition."  (Needs
>preventing multple warnings in single call, maybe)
>
> - In the same function, the following condition seems somewhat
>   strange in comparison to validateCheckConstraint.
>
> > if (partqualstate && ExecCheck(partqualstate, econtext))
>
>   partqualstate won't be null as long as partition_constraint is
>   valid. Anyway (I'm believing that) an invalid constraint
>   results in error by ExecPrepareExpr. Therefore 'if
>   (partqualstate' is useless.
>
> - In gram.y, the nonterminal for list spec clause is still
>   "ForValues". It seems somewhat strange. partition_spec or
>   something would be better.
>
> - This is not a part of this patch, but in ruleutils.c, the error
>   for unknown paritioning strategy is emitted as following.
>
> >   elog(ERROR, "unrecognized partition strategy: %d",
> >(int) strategy);
>
>   The cast is added because the strategy is a char. I suppose
>   this is because strategy can be an unprintable. I'd like to see
>   a comment if it is correct.
>
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

Re: [HACKERS] An attempt to reduce WALWriteLock contention

2017-06-21 Thread jasrajd

We are also seeing contention on the walwritelock and repeated writes to the
same offset if we move the flush outside the lock in the Azure environment.
pgbench doesn't scale beyond ~8 cores without saturating the IOPs or
bandwidth. Is there more work being done in this area?



--
View this message in context: 
http://www.postgresql-archive.org/An-attempt-to-reduce-WALWriteLock-contention-tp5935907p5967786.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-21 Thread Dilip Kumar

On Tue, Jun 20, 2017 at 6:57 PM, Dilip Kumar  wrote:
> This is basically crashing in RelationBuildPartitionDesc, so I think
> we don't have any test case for testing DEFAULT range partition where
> partition key has more than one attribute.  So I suggest we can add
> such test case.

Some more comments.


- bound->datums = (Datum *) palloc0(key->partnatts * sizeof(Datum));
- bound->content = (RangeDatumContent *) palloc0(key->partnatts *
-   sizeof(RangeDatumContent));
+ bound->datums = datums ? (Datum *) palloc0(key->partnatts * sizeof(Datum))
+ : NULL;
+ bound->content = (RangeDatumContent *) palloc0(
+ (datums ? key->partnatts : 1) * sizeof(RangeDatumContent));
  bound->lower = lower;

  i = 0;
+
+ /* If default, datums are NULL */
+ if (datums == NULL)
+ bound->content[i] = RANGE_DATUM_DEFAULT;


For the default partition we are only setting bound->content[0] to
default, but content for others key
attributes are not initialized.  But later in the code, if the content
of the first key is RANGE_DATUM_DEFAULT then it should not access the
next content,  but I see there are some exceptions.  Which can access
uninitialized value?

for example see below code


for (i = 0; i < key->partnatts; i++)
  {
+ if (rb_content[i] == RANGE_DATUM_DEFAULT)   --> why it's going to
content for next parttion attribute, we never initialized that?
+ continue;
+
  if (rb_content[i] != RANGE_DATUM_FINITE)
  return rb_content[i] == RANGE_DATUM_NEG_INF ? -1 : 1;


Also
In RelatiobBuildPartitionDesc


for (j = 0; j < key->partnatts; j++)
{
-- Suppose first is RANGE_DATUM_DEFAULT then we should not check next
   because that is never initialized.  I think this is the cause of
the crash also what I have reported above.

if (rbounds[i]->content[j] == RANGE_DATUM_FINITE)
boundinfo->datums[i][j] =
datumCopy(rbounds[i]->datums[j],
 key->parttypbyval[j],
 key->parttyplen[j]);
/* Remember, we are storing the tri-state value. */
boundinfo->content[i][j] = rbounds[i]->content[j];


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-21 Thread Albe Laurenz

While playing with HEAD as of d14c85ed,
I ran into the following:

CREATE DATABASE source;
CREATE DATABASE recipient;

\c source
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
INSERT INTO repli VALUES (1, 'one');
CREATE PUBLICATION repsend FOR TABLE repli;
SELECT pg_create_logical_replication_slot('reprec', 'pgoutput');

\c recipient
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
CREATE SUBSCRIPTION reprec CONNECTION 'port=5432 dbname=source user=postgres'
   PUBLICATION repsend WITH (create_slot='false', slot_name='reprec');
SELECT * FROM repli;
┌┬─┐
│ id │ val │
├┼─┤
│  1 │ one │
└┴─┘
(1 row)

Looks good.
Now I try to produce a replication conflict:

INSERT INTO repli VALUES (2, 'block');

\c source
INSERT INTO repli VALUES (2, 'two');

After a while the server crashes, and the log looks like this:

2017-06-21 14:55:07.002 CEST [23096] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:07.002 CEST [23096] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:07.003 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23096) exited with exit code 1
2017-06-21 14:55:07.006 CEST [23117] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:07.008 CEST [23118] LOG:  starting logical decoding for slot 
"reprec"
2017-06-21 14:55:07.008 CEST [23118] DETAIL:  streaming transactions committing 
after 0/51A67B8, reading WAL from 0/51A66C8
LOG:  starting logical decoding for slot "reprec"
DETAIL:  streaming transactions committing after 0/51A67B8, reading WAL from 
0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] LOG:  logical decoding found consistent 
point at 0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] DETAIL:  There are no running transactions.
DEBUG:  sending replication keepalive
LOG:  logical decoding found consistent point at 0/51A66C8
DETAIL:  There are no running transactions.
2017-06-21 14:55:07.009 CEST [23117] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:07.009 CEST [23117] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:07.010 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23117) exited with exit code 1
2017-06-21 14:55:12.019 CEST [23122] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
2017-06-21 14:55:12.021 CEST [23124] LOG:  starting logical decoding for slot 
"reprec"
2017-06-21 14:55:12.021 CEST [23124] DETAIL:  streaming transactions committing 
after 0/51A67B8, reading WAL from 0/51A66C8
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:12.022 CEST [23124] LOG:  logical decoding found consistent 
point at 0/51A66C8
2017-06-21 14:55:12.022 CEST [23124] DETAIL:  There are no running transactions.
LOG:  starting logical decoding for slot "reprec"
DETAIL:  streaming transactions committing after 0/51A67B8, reading WAL from 
0/51A66C8
DEBUG:  sending replication keepalive
LOG:  logical decoding found consistent point at 0/51A66C8
DETAIL:  There are no running transactions.
2017-06-21 14:55:12.023 CEST [23122] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:12.023 CEST [23122] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:12.024 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23122) exited with exit code 1
2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
Broken pipe
2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
ReplicationSlotRelease, slot.c:394
2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
terminated by signal 6: Aborted
2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active server 
processes
2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
Broken pipe
2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
reinitializing

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh

On Wed, Jun 21, 2017 at 5:45 PM, Yugo Nagata  wrote:
> Hi,
>
> As I report in another thread[1], I found the autovacuum launcher occurs
> the following error in PG 10 when this received SIGINT. I can repuroduce
> this by pg_cancel_backend or `kill -2 `.
>
> 2017-06-21 13:56:07.010 JST [32483] ERROR:  canceling statement due to user 
> request
> 2017-06-21 13:56:08.022 JST [32483] ERROR:  can't attach the same segment 
> more than once
> 2017-06-21 13:56:09.034 JST [32483] ERROR:  can't attach the same segment 
> more than once
> 2017-06-21 13:56:10.045 JST [32483] ERROR:  can't attach the same segment 
> more than once
> ...
>
> This errors continue until this process is terminated or the server is 
> restarted.
>
> When SIGINT is issued, the process exits from the main loop and returns
> to sigsetjmp, and calls dsa_attach() before entering into the loop again,
> this causes the error.
>
> We can fix it by calling dsa_attach() before sigsetjmp. Attached is the patch.
>
I think we can just check dsm_find_mapping() to check whether the dsm
handle is already attached. Something like,

}
-   else
+   else if(!dsm_find_mapping(AutoVacuumShmem->av_dsa_handle))
{
AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
dsa_pin_mapping(AutoVacuumDSA);

Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Dilip Kumar

On Wed, Jun 21, 2017 at 6:50 PM, Kuntal Ghosh
 wrote:
> I think we can just check dsm_find_mapping() to check whether the dsm
> handle is already attached. Something like,
>
> }
> -   else
> +   else if(!dsm_find_mapping(AutoVacuumShmem->av_dsa_handle))
> {
> AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
> dsa_pin_mapping(AutoVacuumDSA);
>
> Thoughts?

IMHO, It's not a good idea to use DSM call to verify the DSA handle.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Kuntal Ghosh

On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
>
> Attached is a patch for the documentation fix.
>
Please attach the patch as well. :-)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-21 Thread Yugo Nagata

Hi,

Here are some comments for the patch.

+Datum
+pg_cancel_backend(PG_FUNCTION_ARGS)
+{
+   PG_RETURN_BOOL(pg_cancel_backend_internal(PG_GETARG_INT32(0), NULL));
+}
+Datum
+pg_cancel_backend_msg(PG_FUNCTION_ARGS)
+{
+   pid_t   pid = PG_GETARG_INT32(0);
+   char   *msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+   PG_RETURN_BOOL(pg_cancel_backend_internal(pid, msg));
+}

It would be better to insert a blank line between these functions.

+/*
+ * Sets a cancellation message for the backend with the specified pid and
+ * returns the length of message actually created. If the returned length
+ * is equal to the length of the message parameter, truncation may have
+ * occurred. If the backend wasn't found and no message was set, -1 is
+ * returned.
+ */

It seems to me that this comment is incorrect.

 "If the returned length is not equal to the length of the message parameter,"

is right, isn't it?

In addition, the last statement would be 

 "If the backend wasn't found, -1 is returned. Otherwize, if no message was set,
  0 is returned."


+   strlcpy(slot->message, message, sizeof(slot->message));
+   slot->len = strlen(message);
+
+   LWLockRelease(BackendCancelLock);
+   return slot->len;

If SetBackendCancelMessage() has to return the length of message actually 
created,
slot->len should be strlen(slot->message) instead of strlen(message).
In the current code, when the return value and slot->len is always set
to the length of the passed message parameter.


Regards,

On Mon, 19 Jun 2017 20:24:43 +0200
Daniel Gustafsson  wrote:

> When terminating, or cancelling, a backend it’s currently not possible to let
> the signalled session know *why* it was dropped.  This has nagged me in the
> past and now it happened to come up again, so I took a stab at this.  The
> attached patch implements the ability to pass an optional text message to the
> signalled session which is included in the error message:
> 
>   SELECT pg_terminate_backend( [, message]);
>   SELECT pg_cancel_backend( [, message]);
> 
> Right now the message is simply appended on the error message, not sure if
> errdetail or errhint would be better? Calling:
> 
>   select pg_terminate_backend(, 'server rebooting');
> 
> ..leads to:
> 
>   FATAL:  terminating connection due to administrator command: "server 
> rebooting"
> 
> Omitting the message invokes the command just like today.
> 
> The message is stored in a new shmem area which is checked when the session is
> aborted.  To keep things simple a small buffer is kept per backend for the
> message.  If deemed too costly, keeping a central buffer from which slabs are
> allocated can be done (but seemed rather complicated for little gain compared
> to the quite moderate memory spend.) 
> 
> cheers ./daniel
> 


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication in the same cluster

2017-06-21 Thread Albe Laurenz

Tom Lane wrote:
> Petr Jelinek  writes:
>> On 26/04/17 18:59, Bruce Momjian wrote:
>>> ... it just hangs.  My server logs say:
> 
>> Yes that's result of how logical replication slots work, the transaction
>> that needs to finish is your transaction. It can be worked around by
>> creating the slot manually via the SQL interface for example and create
>> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> 
> If that's a predictable deadlock, I think a minimum expectation is that
> the system should notice it and throw an error, not just hang.  (Then
> the error could give a hint about how to work around it.)  But the case
> Bruce has in mind doesn't seem like a crazy use-case to me.  Can't we
> make it "just work"?

+1

I think that many people who start experimenting with logical replication
will run into this (like I did).

If nothing else, it's bad PR.
People will get the first impression that logical replication doesn't
work well.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Yugo Nagata

On Wed, 21 Jun 2017 19:08:35 +0530
Kuntal Ghosh  wrote:

> On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
> >
> > Attached is a patch for the documentation fix.
> >
> Please attach the patch as well. :-)

I'm sorry, I forgot it. I attahed this.

> 
> 
> -- 
> Thanks & Regards,
> Kuntal Ghosh
> EnterpriseDB: http://www.enterprisedb.com
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Yugo Nagata 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 72eb073..1188f07 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2203,12 +2203,12 @@ LOG:  database system is ready to accept read only connections
 pg_cancel_backend()
 and pg_terminate_backend() will work on user backends,
 but not the Startup process, which performs
-recovery. pg_stat_activity does not show an
-entry for the Startup process, nor do recovering transactions show
-as active. As a result, pg_prepared_xacts
-is always empty during recovery. If you wish to resolve in-doubt
-prepared transactions, view pg_prepared_xacts on the
-primary and issue commands to resolve transactions there.
+recovery. pg_stat_activity does not show
+recovering transactions as active. As a result,
+pg_prepared_xacts is always empty during
+recovery. If you wish to resolve in-doubt prepared transactions, view
+pg_prepared_xacts on the primary and issue commands to
+resolve transactions there.

 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh

On Wed, Jun 21, 2017 at 7:07 PM, Dilip Kumar  wrote:
> On Wed, Jun 21, 2017 at 6:50 PM, Kuntal Ghosh
>  wrote:
>> I think we can just check dsm_find_mapping() to check whether the dsm
>> handle is already attached. Something like,
>>
>> }
>> -   else
>> +   else if(!dsm_find_mapping(AutoVacuumShmem->av_dsa_handle))
>> {
>> AutoVacuumDSA = dsa_attach(AutoVacuumShmem->av_dsa_handle);
>> dsa_pin_mapping(AutoVacuumDSA);
>>
>> Thoughts?
>
> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>
Okay. Is there any particular scenario you've in mind where this may fail?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Simon Riggs

On 21 June 2017 at 16:15, Yugo Nagata  wrote:
> On Wed, 21 Jun 2017 19:08:35 +0530
> Kuntal Ghosh  wrote:
>
>> On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
>> >
>> > Attached is a patch for the documentation fix.
>> >
>> Please attach the patch as well. :-)
>
> I'm sorry, I forgot it. I attahed this.

Thanks, will apply.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-21 Thread Amit Kapila

On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  wrote:
> On Tue, Jun 20, 2017 at 7:05 PM, Sergey Burladyan  wrote:
>> Amit Kapila  writes:
>>
>>> On Tue, Jun 20, 2017 at 3:40 PM, Sergey Burladyan  
>>> wrote:
>> I use pg 9.2 and "skipping restartpoint, already performed at" is from
>> src/backend/access/transam/xlog.c:8643
>> after this statement it return from CreateRestartPoint() and do not run
>>8687 CheckPointGuts(lastCheckPoint.redo, flags);
>>
>
> You are right, so it will skip restartpoint in such a case.
>
>>> >> > Uh, as I understand it the rsync is going to copy the missing WAL file
>>> >> > from the new master to the standby, right, and I think pg_controldata
>>> >> > too, so it should be fine.  Have you tested to see if it fails?
>>> >
>>> > It need old WAL files from old version for correct restore heap
>>> > files. New WAL files from new version does not have this information.
>>> >
>>>
>>> So in such a case can we run rsync once before pg_upgrade?
>>
>> I just copy last WAL from stopped old master into running old standby
>> before it shutdown and wait till it replayed. After that standby can
>> issue restartpoint at the same location as in stopped master.
>>
>
> Hmm.  I think we need something that works with lesser effort because
> not all users will be as knowledgeable as you are, so if they make any
> mistakes in copying the file manually, it can lead to problems.  How
> about issuing a notification (XLogArchiveNotifySeg) in shutdown
> checkpoint if archiving is enabled?
>

I have thought more about the above solution and it seems risky to
notify archiver for incomplete WAL segments (which will be possible in
this case as there is no guarantee that Checkpoint record will fill
the segment).  So, it seems to me we should update the document unless
you or someone has some solution to this problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Dilip Kumar

On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
 wrote:
>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>
> Okay. Is there any particular scenario you've in mind where this may fail?

It's not about failure, but about the abstraction.  When we are using
the DSA we should not directly access the DSM which is under DSA.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-21 Thread Yugo Nagata

On Wed, 21 Jun 2017 12:06:33 +0900
Michael Paquier  wrote:

> On Tue, Jun 20, 2017 at 3:24 AM, Daniel Gustafsson  wrote:
> > The message is stored in a new shmem area which is checked when the session 
> > is
> > aborted.  To keep things simple a small buffer is kept per backend for the
> > message.  If deemed too costly, keeping a central buffer from which slabs 
> > are
> > allocated can be done (but seemed rather complicated for little gain 
> > compared
> > to the quite moderate memory spend.)
> 
> I think that you are right to take the approach with a per-backend
> slot. This will avoid complications related to entry removals and
> locking issues. There would be scaling issues as well if things get
> very signaled for a lot of backends.
> 
> +#define MAX_CANCEL_MSG 128
> That looks enough.
> 
> +   LWLockAcquire(BackendCancelLock, LW_EXCLUSIVE);
> +
> +   strlcpy(slot->message, message, sizeof(slot->message));
> +   slot->len = strlen(message);
> Why not using one spin lock per slot and save it in BackendCancelShmemStruct?

+1

I found an example that a spin lock is used during strlcpy in WalReceiverMain().

> 
> +   pid_t   pid = PG_GETARG_INT32(0);
> +   char   *msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
> +
> +   PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg));
> It would be more solid to add some error handling for messages that
> are too long, or at least truncate the message if too long.

I agree that error handling for too long messages is needed.
Although long messages are truncated in SetBackendCancelMessage(),
it is better to inform users that the message they can read was
truncated one. Or, maybe we should prohibit too long message
is passed in pg_teminate_backend()


> -- 
> Michael
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Tom Lane

Amit Langote  writes:
> On 2017/06/21 16:59, Etsuro Fujita wrote:
>> but I noticed that that function doesn't use the relation descriptor at
>> all.  Since partitioned_rels is given in case of an UPDATE/DELETE on a
>> partitioned table, the relation is opened in that case, but the relation
>> descriptor isn't referenced at all in initializing WITH CHECK OPTION
>> constraints and/or RETURNING projections.  (The mtstate->resultRelinfo
>> array is referenced in those initialization, instead.)  So, I'd like to
>> propose to remove this from that function.  Attached is a patch for that.

> Thanks for cleaning that up.  I cannot see any problem in applying the patch.

Actually, isn't ModifyTable.partitioned_rels *always* NIL?  I can't
see anyplace in the planner that sets it differently.  I think we should
flush the field altogether.  Anybody who doesn't want that should at least
provide the missing comment for create_modifytable_path's partitioned_rels
argument (and yes, the fact that that is missing isn't making me any
more favorably disposed...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan



On 06/21/2017 08:20 AM, Andrew Dunstan wrote:
>
> On 06/20/2017 08:30 PM, Tom Lane wrote:
>> Michael Paquier  writes:
>>> On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
 Yeah, I thought it would work fine with Makefile-using Windows toolchains.
 But people who use MSVC need something else, no?
>>> Are there that many anyway who care?
>> Well, *I* don't care, but I thought we wanted to support Windows-using
>> developers as reasonably first-class citizens.
>>
>>  
>
>
> I imagine we can rig up something fairly simple based on Craig Ringer's
> recent work in building extensions on Windows using cmake.
>
> I'll take a look when I get a chance, but I don't think it's a high
> priority.
>


Unfortunately this seems precluded by the use of the non-standard
"err.h". It looks like that will trip us up on mingw also.

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 8:57 AM, Dilip Kumar  wrote:
> For the default partition we are only setting bound->content[0] to
> default, but content for others key
> attributes are not initialized.  But later in the code, if the content
> of the first key is RANGE_DATUM_DEFAULT then it should not access the
> next content,  but I see there are some exceptions.  Which can access
> uninitialized value?

I think somebody should do some testing of the existing code with
valgrind.  And then apply the list-partitioning patch and this patch,
and do some more testing with valgrind.  It seems to be really easy to
miss these uninitialized access problems during code review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-21 Thread Daniel Gustafsson

> On 21 Jun 2017, at 16:30, Yugo Nagata  wrote:
> 
> On Wed, 21 Jun 2017 12:06:33 +0900
> Michael Paquier  wrote:
> 
>> On Tue, Jun 20, 2017 at 3:24 AM, Daniel Gustafsson  wrote:
>>> The message is stored in a new shmem area which is checked when the session 
>>> is
>>> aborted.  To keep things simple a small buffer is kept per backend for the
>>> message.  If deemed too costly, keeping a central buffer from which slabs 
>>> are
>>> allocated can be done (but seemed rather complicated for little gain 
>>> compared
>>> to the quite moderate memory spend.)
>> 
>> I think that you are right to take the approach with a per-backend
>> slot. This will avoid complications related to entry removals and
>> locking issues. There would be scaling issues as well if things get
>> very signaled for a lot of backends.
>> 
>> +#define MAX_CANCEL_MSG 128
>> That looks enough.
>> 
>> +   LWLockAcquire(BackendCancelLock, LW_EXCLUSIVE);
>> +
>> +   strlcpy(slot->message, message, sizeof(slot->message));
>> +   slot->len = strlen(message);
>> Why not using one spin lock per slot and save it in BackendCancelShmemStruct?
> 
> +1
> 
> I found an example that a spin lock is used during strlcpy in 
> WalReceiverMain().

Yeah I agree as well, will fix.

>> +   pid_t   pid = PG_GETARG_INT32(0);
>> +   char   *msg = text_to_cstring(PG_GETARG_TEXT_PP(1));
>> +
>> +   PG_RETURN_BOOL(pg_terminate_backend_internal(pid, msg));
>> It would be more solid to add some error handling for messages that
>> are too long, or at least truncate the message if too long.
> 
> I agree that error handling for too long messages is needed.
> Although long messages are truncated in SetBackendCancelMessage(),
> it is better to inform users that the message they can read was
> truncated one. Or, maybe we should prohibit too long message
> is passed in pg_teminate_backend()

The message is truncated in SetBackendCancelMessage() for safety, but
pg_{cancel|terminate}_backend() could throw an error on too long message, or
warning truncation, to the caller as well.  Personally I think a warning is the
appropriate response, but I don’t really have a strong opinion.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 5:28 AM, Amit Langote
 wrote:>> The comment "UPDATE/DELETE
cases are handled above" is referring to
>> the code that initializes the WCOs generated by the planner.  You've
>> modified the comment in your patch, but the associated code: your
>> updated comment says that only "DELETEs and local UPDATES are handled
>> above", but in reality, *all* updates are still handled above.  And
>> then they are handled again here.  Similarly for returning lists.
>> It's certainly not OK for the comment to be inaccurate, but I think
>> it's also bad to redo the work which the planner has already done,
>> even if it makes the patch smaller.
>
> I guess this has to do with the UPDATE turning into DELETE+INSERT.  So, it
> seems like WCOs are being initialized for the leaf partitions
> (ResultRelInfos in the mt_partitions array) that are in turn are
> initialized for the aforementioned INSERT.  That's why the term "...local
> UPDATEs" in the new comment text.
>
> If that's true, I wonder if it makes sense to apply what would be
> WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
> by calling ExecInsert()?

I think we probably should apply the insert policy, just as we're
executing the insert trigger.

>> Also, I feel like it's probably not correct to use the first result
>> relation as the nominal relation for building WCOs and returning lists
>> anyway.  I mean, if the first result relation has a different column
>> order than the parent relation, isn't this just broken?  If it works
>> for some reason, the comments don't explain what that reason is.
>
> Yep, it's more appropriate to use
> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
> is, if answer to the question I raised above is positive.

The questions appear to me to be independent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 10:33 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> On 2017/06/21 16:59, Etsuro Fujita wrote:
>>> but I noticed that that function doesn't use the relation descriptor at
>>> all.  Since partitioned_rels is given in case of an UPDATE/DELETE on a
>>> partitioned table, the relation is opened in that case, but the relation
>>> descriptor isn't referenced at all in initializing WITH CHECK OPTION
>>> constraints and/or RETURNING projections.  (The mtstate->resultRelinfo
>>> array is referenced in those initialization, instead.)  So, I'd like to
>>> propose to remove this from that function.  Attached is a patch for that.
>
>> Thanks for cleaning that up.  I cannot see any problem in applying the patch.
>
> Actually, isn't ModifyTable.partitioned_rels *always* NIL?  I can't
> see anyplace in the planner that sets it differently.  I think we should
> flush the field altogether.  Anybody who doesn't want that should at least
> provide the missing comment for create_modifytable_path's partitioned_rels
> argument (and yes, the fact that that is missing isn't making me any
> more favorably disposed...)

It appears to me that create_modifytable_path() has a partitioned_rels
argument and it looks like inheritance_planner not only passes it but
guarantees that it's non-empty when the target is a partitioned table.
You're right that there is a comment missing from the function header,
but it's not too hard to figure out what it should be.  Just consult
the definition of ModifyTable itself:

/* RT indexes of non-leaf tables in a partition tree */
List   *partitioned_rels;

The point here is that if we have a partitioned table with a bunch of
descendent tables, the non-leaf partitions are never actually scanned;
there's no file on disk to scan.  Despite the lack of a scan, we still
need to arrange for those relations to be locked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-06-21 Thread Masahiko Sawada

On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada  wrote:
> On Wed, May 17, 2017 at 1:30 AM, Robert Haas  wrote:
>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila  wrote:
>>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane  wrote:
 Robert Haas  writes:
> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
> wrote:
>> ... I'd like to propose to change relation
>> extension lock management so that it works using LWLock instead.

> That's not a good idea because it'll make the code that executes while
> holding that lock noninterruptible.

 Is that really a problem?  We typically only hold it over one kernel call,
 which ought to be noninterruptible anyway.
>>>
>>> During parallel bulk load operations, I think we hold it over multiple
>>> kernel calls.
>>
>> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
>> kernel call, no?  Nor is vm_extend.
>
> Yeah, these functions could call more than one kernel calls while
> holding extension lock.
>
>> Also, it's not just the backend doing the filesystem operation that's
>> non-interruptible, but also any waiters, right?
>>
>> Maybe this isn't a big problem, but it does seem to be that it would
>> be better to avoid it if we can.
>>
>
> I agree to change it to be interruptible for more safety.
>

Attached updated version patch. To use the lock mechanism similar to
LWLock but interrupt-able, I introduced new lock manager for extension
lock. A lot of code especially locking and unlocking, is inspired by
LWLock but it uses the condition variables to wait for acquiring lock.
Other part is not changed from previous patch. This is still a PoC
patch, lacks documentation. The following is the measurement result
with test script same as I used before.

* Copy test script
 HEADPatched
4436.6   436.1
8561.8   561.8
16   580.7   579.4
32   588.5   597.0
64   596.1   599.0

* Insert test script
 HEADPatched
4156.5   156.0
8167.0   167.9
16   176.2   175.6
32   181.1   181.0
64   181.5   183.0

Since I replaced heavyweight lock with lightweight lock I expected the
performance slightly improves from HEAD but it was almost same result.
I'll continue to look at more detail.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Moving_extension_lock_out_of_heavyweight_lock_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 7:56 AM, Yugo Nagata  wrote:
> I have found that we can cancel/terminate autovacuum launchers and
> background worker processes by pg_cancel/terminate_backend function.
> I'm wondering this behavior is not expected and if not I want to fix it.

I think it is expected.  Even if we blocked it, those processes have
to cope gracefully with SIGTERM, because anyone with access to the OS
user can kill them that way by hand.

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1].
>
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend
> --
>  t
> (1 row)

That seems to be a bug in logical replication.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]
>
> postgres=# select pg_terminate_backend(30900);
>  pg_terminate_backend
> --
>  t
> (1 row)

That is as I would expect.

> [2]
> On the other hand, when we use pg_cancel_backend for autovacuum launcher,
> it causes the following error. I'll report the detail in another thread.
>
>  ERROR:  can't attach the same segment more than once

I think that's a bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_terminate_backend can terminate background workers and autovacuum launchers

2017-06-21 Thread Alvaro Herrera

Yugo Nagata wrote:

> However, we can terminate background workers by pg_terminate_backend.
> In the following example, I terminated the logical replication launcher,
> and this process did not appear again[1]. 
> 
> postgres=# select pg_terminate_backend(30902);
>  pg_terminate_backend 
> --
>  t
> (1 row)

I think failing to restart the replication launcher after it stops is
probably a bug, and should be fixed by having postmaster start another
one if it dies.

> Similarly, we can terminate autovacuum launcher by pg_terminate_backend,
> but a new process is restarted by postmaster in this case.[2]

Yeah, this is operating as intended.  You can turn autovacuum off and
the launcher should go away, and turn it back on and launcher should
start.  So we expect the autovac launcher to respond to signals.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shortened URLs for commit messages

2017-06-21 Thread Alvaro Herrera

Bruce Momjian wrote:

> Oh, here is a fixed version that requires an @ sign, which all message
> id's have:
> 
>   sed '/http/!s;^\(Discussion: *\)\(.*@.*\)$;\1https://postgr.es/m/\2;'

So how do you actually use this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Partition for Range

2017-06-21 Thread Dilip Kumar

On Wed, Jun 21, 2017 at 8:08 PM, Robert Haas  wrote:
> I think somebody should do some testing of the existing code with
> valgrind.  And then apply the list-partitioning patch and this patch,
> and do some more testing with valgrind.  It seems to be really easy to
> miss these uninitialized access problems during code review.

I think it will help,  but it may not help in all the scenarios
because most of the places we are allocating memory with palloc0 (
initialized with 0) but it never initialized with RANGE_DATUM_DEFAULT
except the first element (in the case of DEFAULT partition).  And,
later they may be considered as RANGE_DATUM_FINITE (because its value
is 0).

One solution can be that if bound is DEFAULT then initialize with
RANGE_DATUM_DEFAULT for the complete content array for that partition
bound instead of just first.  Otherwise, we need to be careful of
early exiting wherever we are looping the content array of the DEFAULT
bound.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane

Andrew Dunstan  writes:
> Unfortunately this seems precluded by the use of the non-standard
> "err.h". It looks like that will trip us up on mingw also.

Hm, why?  Doesn't the -I search path get the right thing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Useless code in ExecInitModifyTable

2017-06-21 Thread Tom Lane

Robert Haas  writes:
> It appears to me that create_modifytable_path() has a partitioned_rels
> argument and it looks like inheritance_planner not only passes it but
> guarantees that it's non-empty when the target is a partitioned table.

Oh, somehow I missed the call in inheritance_planner.  Right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andres Freund

On 2017-06-20 20:30:34 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Wed, Jun 21, 2017 at 8:43 AM, Tom Lane  wrote:
> >> Yeah, I thought it would work fine with Makefile-using Windows toolchains.
> >> But people who use MSVC need something else, no?
> 
> > Are there that many anyway who care?
> 
> Well, *I* don't care, but I thought we wanted to support Windows-using
> developers as reasonably first-class citizens.

The old one didn't have windows support either, right?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-20 20:30:34 -0400, Tom Lane wrote:
>> Well, *I* don't care, but I thought we wanted to support Windows-using
>> developers as reasonably first-class citizens.

> The old one didn't have windows support either, right?

Not that I was aware of, but a large part of the point here is to
make things better than they were before.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shortened URLs for commit messages

2017-06-21 Thread Bruce Momjian

On Wed, Jun 21, 2017 at 11:11:57AM -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Oh, here is a fixed version that requires an @ sign, which all message
> > id's have:
> > 
> > sed '/http/!s;^\(Discussion: *\)\(.*@.*\)$;\1https://postgr.es/m/\2;'
> 
> So how do you actually use this?

My commit script is here:

https://momjian.us/main/writings/pgsql/src/pgcommit

It basically runs some checks and then creates a temp file with lines
labeled by their purpose.  It edits the temp file, then runs the temp
file through a number of filters, and one of those does the URL
shortening via pgurl at:

https://momjian.us/main/writings/pgsql/src/pgurl

It also changes bare message-ids on the 'Discussion' line to shorted
URLs too.  It also removes empty labeled lines, and exits if nothing has
been changed in the editor.  It then does the commit (potentially to
multiple branches), and then the push.

The script calls many of other custom scripts but you can get an idea
how it works.  I am happy to supply more tools as desired.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-21 Thread Kuntal Ghosh

On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>  wrote:
>>> IMHO, It's not a good idea to use DSM call to verify the DSA handle.
>>>
>> Okay. Is there any particular scenario you've in mind where this may fail?
>
> It's not about failure, but about the abstraction.  When we are using
> the DSA we should not directly access the DSM which is under DSA.
>
Okay. I thought that I've found at least one usage of
dsm_find_mapping() in the code. :-)

But, I've some more doubts.
1. When should we use dsm_find_mapping()? (The first few lines of
dsm_attach is same as dsm_find_mapping().)
2. As a user of dsa, how should we check whether my dsa handle is
already attached? I guess this is required because, if a user tries to
re-attach a dsa handle,  it's punishing the user by throwing an error
and the user wants to avoid such errors.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-21 Thread Bruce Momjian

On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
> On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  wrote:
> > Hmm.  I think we need something that works with lesser effort because
> > not all users will be as knowledgeable as you are, so if they make any
> > mistakes in copying the file manually, it can lead to problems.  How
> > about issuing a notification (XLogArchiveNotifySeg) in shutdown
> > checkpoint if archiving is enabled?
> >
> 
> I have thought more about the above solution and it seems risky to
> notify archiver for incomplete WAL segments (which will be possible in
> this case as there is no guarantee that Checkpoint record will fill
> the segment).  So, it seems to me we should update the document unless
> you or someone has some solution to this problem.

The over-arching question is how do we tell users to verify that the WAL
has been replayed on the standby?  I am thinking we would say that for
streaming replication, the "Latest checkpoint location" should match on
the primary and standby, while for log shipping, the standbys should be
exactly one WAL file less than the primary.

As far as I know this is the only remaining open issue.  Sergey, please
verify.  I appreciate the work everyone has done to improve this, and
all the existing fixes have been pushed to all supported branches.  :-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Incorrect documentation about pg_stat_activity

2017-06-21 Thread Yugo Nagata

On Wed, 21 Jun 2017 16:18:50 +0200
Simon Riggs  wrote:

> On 21 June 2017 at 16:15, Yugo Nagata  wrote:
> > On Wed, 21 Jun 2017 19:08:35 +0530
> > Kuntal Ghosh  wrote:
> >
> >> On Wed, Jun 21, 2017 at 6:05 PM, Yugo Nagata  wrote:
> >> >
> >> > Attached is a patch for the documentation fix.
> >> >
> >> Please attach the patch as well. :-)
> >
> > I'm sorry, I forgot it. I attahed this.
> 
> Thanks, will apply.

Thanks, too.

> 
> -- 
> Simon Riggshttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-21 Thread Marco Atzeri


On 20/06/2017 17:37, Tom Lane wrote:

I wrote:

Marco Atzeri  writes:

Building on Cygwin latest 10  beta1 or head sourece,
make check fails as:
...
performing post-bootstrap initialization ... 2017-05-31 23:23:22.214
CEST [16860] FATAL:  collation "ja_JP" for encoding "EUC_JP" already exists



Hmph.  Could we see the results of "locale -a | grep ja_JP" ?


Despite the lack of followup from the OP, I'm pretty troubled by this
report.  It shows that the reimplementation of OS collation data import
as pg_import_system_collations() is a whole lot more fragile than the
original coding.  We have never before trusted "locale -a" to not produce
duplicate outputs, not since the very beginning in 414c5a2e.  AFAICS,
the current coding has also lost the protections we added very shortly
after that in 853c1750f; and it has also lost the admittedly rather
arbitrary, but at least deterministic, preference order for conflicting
short aliases that was in the original initdb code.


Hi Tom,
I raised the duplication issue on the cygwin mailing list,
and one of the core developer reports that
they saw the same issues on Linux in the past.

https://cygwin.com/ml/cygwin/2017-06/msg00253.html



regards, tom lane


Regards
Marco




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Logical replication launcher never been restarted when terminated

2017-06-21 Thread Yugo Nagata

Hi,

As I report in another thread[1], when the logical replication launcher 
is terminated by SIGTERM, it never been restarted and we need to restart
the server to enable logical replication again.

This is because the logical replication launcher exits with exitstatus 0,
so if it exits with status 1 it is restarted by the postmaster normally.
Attached is a simple patch to fix it in this way.

However, I'm not sure this is the best way. For example, in this way,
we get the following log when the process is terminated, which we don't
get when it exits with status 0.

 LOG:  worker process: logical replication launcher (PID 11526) exited with 
exit code 1

If we don't want to get this message, we need more fixes in 
CleanupBackgroundWorker()
or around it.

[1]
https://www.postgresql.org/message-id/20170621205657.61d90605.nagata%40sraoss.co.jp

-- 
Yugo Nagata 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f99dd0a..a833dc5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2855,7 +2855,7 @@ ProcessInterrupts(void)
 	(errmsg("logical replication launcher shutting down")));
 
 			/* The logical replication launcher can be stopped at any time. */
-			proc_exit(0);
+			proc_exit(1);
 		}
 		else if (RecoveryConflictPending && RecoveryConflictRetryable)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Amit Khandekar

On 21 June 2017 at 00:23, Robert Haas  wrote:
> On Tue, Jun 20, 2017 at 2:54 AM, Amit Khandekar  
> wrote:
>>> I guess I don't see why it should work like this.  In the INSERT case,
>>> we must build withCheckOption objects for each partition because those
>>> partitions don't appear in the plan otherwise -- but in the UPDATE
>>> case, they're already there, so why do we need to build anything at
>>> all?  Similarly for RETURNING projections.  How are the things we need
>>> for those cases not already getting built, associated with the
>>> relevant resultRelInfos?  Maybe there's a concern if some children got
>>> pruned - they could turn out later to be the children into which
>>> tuples need to be routed. But the patch makes no distinction
>>> between possibly-pruned children and any others.
>>
>> Yes, only a subset of the partitions appear in the UPDATE subplans. I
>> think typically for updates, a very small subset of the total leaf
>> partitions will be there in the plans, others would get pruned. IMHO,
>> it would not be worth having an optimization where it opens only those
>> leaf partitions which are not already there in the subplans. Without
>> the optimization, we are able to re-use the INSERT infrastructure
>> without additional changes.
>
> Well, that is possible, but certainly not guaranteed.  I mean,
> somebody could do a whole-table UPDATE, or an UPDATE that hits a
> smattering of rows in every partition;

I am not saying that it's guaranteed to be a small subset. I am saying
that it would be typically a small subset for
update-of-partitioned-key case. Seems weird if a user causes an
update-row-movement for multiple partitions at the same time.
Generally it would be an administrative task where some/all of the
rows of a partition need to have their partition key updated that
cause them to change their partition, and so there would be probably a
where clause that would narrow down the update to that particular
partition, because without the where clause the update is anyway
slower and it's redundant to scan all other partitions.

But, point taken, that there can always be certain cases involving
multiple table partition-key updates.

> e.g. the table is partitioned on order number, and you do UPDATE
> lineitem SET product_code = 'K372B' WHERE product_code = 'K372'.

This query does not update order number, so here there is no
partition-key-update. Are you thinking that the patch is generating
the per-leaf-partition WCO expressions even for a update not involving
a partition key ?

>
> Leaving that aside, the point here is that you're rebuilding
> withCheckOptions and returningLists that have already been built in
> the planner.  That's bad for two reasons.  First, it's inefficient,
> especially if there are many partitions.

Yeah, I agree that this becomes more and more redundant if the update
involves more partitions.

> Second, it will amount to a functional bug if you get a
> different answer than the planner did.

Actually, the per-leaf WCOs are meant to be executed on the
destination partitions where the tuple is moved, while the WCOs
belonging to the per-subplan resultRelInfo are meant for the
resultRelinfo used for the UPDATE plans. So actually it should not
matter whether they look same or different, because they are fired at
different objects. Now these objects can happen to be the same
relations though.

But in any case, it's not clear to me how the mapped WCO and the
planner's WCO would yield a different answer if they are both the same
relation. I am possibly missing something. The planner has already
generated the withCheckOptions for each of the resultRelInfo. And then
we are using one of those to re-generate the WCO for a leaf partition
by only adjusting the attnos. If there is already a WCO generated in
the planner for that leaf partition (because that partition was
present in mtstate->resultRelInfo), then the re-built WCO should be
exactly look same as the earlier one, because they are the same
relations, and so the attnos generated in them would be same since the
Relation TupleDesc is the same.

> Note this comment in the existing code:
>
> /*
>  * Build WITH CHECK OPTION constraints for each leaf partition rel. Note
>  * that we didn't build the withCheckOptionList for each partition within
>  * the planner, but simple translation of the varattnos for each partition
>  * will suffice.  This only occurs for the INSERT case; UPDATE/DELETE
>  * cases are handled above.
>  */
>
> The comment "UPDATE/DELETE cases are handled above" is referring to
> the code that initializes the WCOs generated by the planner.  You've
> modified the comment in your patch, but the associated code: your
> updated comment says that only "DELETEs and local UPDATES are handled
> above", but in reality, *all* updates are still handled above.  And

Actually I meant, "above works for only local updates. For
row-movement-updates, we need per-leaf partition WCOs, because when
th

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Amit Khandekar

On 21 June 2017 at 20:14, Robert Haas  wrote:
> On Wed, Jun 21, 2017 at 5:28 AM, Amit Langote
>  wrote:>> The comment "UPDATE/DELETE
> cases are handled above" is referring to
>>> the code that initializes the WCOs generated by the planner.  You've
>>> modified the comment in your patch, but the associated code: your
>>> updated comment says that only "DELETEs and local UPDATES are handled
>>> above", but in reality, *all* updates are still handled above.  And
>>> then they are handled again here.  Similarly for returning lists.
>>> It's certainly not OK for the comment to be inaccurate, but I think
>>> it's also bad to redo the work which the planner has already done,
>>> even if it makes the patch smaller.
>>
>> I guess this has to do with the UPDATE turning into DELETE+INSERT.  So, it
>> seems like WCOs are being initialized for the leaf partitions
>> (ResultRelInfos in the mt_partitions array) that are in turn are
>> initialized for the aforementioned INSERT.  That's why the term "...local
>> UPDATEs" in the new comment text.
>>
>> If that's true, I wonder if it makes sense to apply what would be
>> WCO_RLS_UPDATE_CHECK to a leaf partition that the tuple will be moved into
>> by calling ExecInsert()?
>
> I think we probably should apply the insert policy, just as we're
> executing the insert trigger.

Yes, the RLS quals should execute during tuple routing according to
whether it is a update or whether it has been converted to insert. I
think the tests don't quite test the insert part. Will check.

>
>>> Also, I feel like it's probably not correct to use the first result
>>> relation as the nominal relation for building WCOs and returning lists
>>> anyway.  I mean, if the first result relation has a different column
>>> order than the parent relation, isn't this just broken?  If it works
>>> for some reason, the comments don't explain what that reason is.
>>
>> Yep, it's more appropriate to use
>> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
>> is, if answer to the question I raised above is positive.

>From what I had checked earlier when coding that part,
rootResultRelInfo is NULL in case of inserts, unless something has
changed in later commits. That's the reason I decided to use the first
resultRelInfo.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] archive_timeout ignored directly after promotion

2017-06-21 Thread Andres Freund

Hi,

When promoting a standby without using the "fast promotion" logic,
e.g. by using recovery targets (which doesn't use fast promotion for
unbeknownst to me reasons), checkpointer doesn't necessarily respect
archive timeout for the first cycle after promotion.  The reason for
that is that it determines the sleep times like
if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
{
elapsed_secs = now - last_xlog_switch_time;
if (elapsed_secs >= XLogArchiveTimeout)
continue;   /* no sleep for us ... 
*/
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - 
elapsed_secs);
}
and RecoveryInProgress() will still return true.  We just fudge that
when creating the end-of-recovery checkpoint:
/*
 * The end-of-recovery checkpoint is a real checkpoint 
that's
 * performed while we're still in recovery.
 */
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;

One easy way to fix that would be to just wakeup the checkpointer from
the startup process once at the end of recovery, but it'd not be
pretty.   I think it'd be better to change the
do_restartpoint = RecoveryInProgress();

/*
 * The end-of-recovery checkpoint is a real checkpoint 
that's
 * performed while we're still in recovery.
 */
if (flags & CHECKPOINT_END_OF_RECOVERY)
do_restartpoint = false;

into having a per-loop 'local_in_recovery' variable, that we turn off
once a CHECKPOINT_END_OF_RECOVERY checkpoint is requested.

Comments?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

2017-06-21 Thread Andres Freund

Hi,

When doing a PITR style recovery, with recovery target set, we're
currently not doing a fast promotion, in contrast to the handling when
doing a pg_ctl or trigger file based promotion. That can prolong making
the server available for writes.

I can't really see a reason for this?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-21 Thread Peter Eisentraut

On 6/20/17 22:54, Jeff Janes wrote:
> I think this should go away:
> 
> ereport(NOTICE,
>   (errmsg("created replication slot \"%s\" on
> publisher",
>   slotname)));
> 
> It doesn't appear to be contingent on anything other than the content of
> the command you just gave it.  I don't think we need a NOTICE saying
> that it did that thing I just told it to do--that should be implicit by
> the lack of an ERROR.

I'm appreciative of this complaint.  The point is that we are allocating
resources on a remote host that should not be forgotten.  I could go
either way.

Other opinions?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan



On 06/21/2017 11:30 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Unfortunately this seems precluded by the use of the non-standard
>> "err.h". It looks like that will trip us up on mingw also.
> Hm, why?  Doesn't the -I search path get the right thing?
>
>   


The file isn't present at all. On my Linux workstation "man errx" says:

CONFORMING TO
   These functions are nonstandard BSD extensions.


cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Tom Lane

Andrew Dunstan  writes:
> On 06/21/2017 11:30 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Unfortunately this seems precluded by the use of the non-standard
>>> "err.h". It looks like that will trip us up on mingw also.

>> Hm, why?  Doesn't the -I search path get the right thing?

> The file isn't present at all.

Uh, what?  It's in the repo.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-21 Thread Peter Eisentraut

On 6/21/17 09:02, Albe Laurenz wrote:
> 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
> Broken pipe
> 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
> 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker 
> for subscription "reprec" has started
> DEBUG:  received replication command: IDENTIFY_SYSTEM
> DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
> 0/0 (proto_version '1', publication_names '"repsend"')
> 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
> ReplicationSlotRelease, slot.c:394
> 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
> terminated by signal 6: Aborted
> 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
> server processes
> 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
> Broken pipe
> 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
> reinitializing

I can't reproduce that.  I let it loop around for about 10 minutes and
it was fine.

I notice that you have some debug settings on.  Could you share your
exact setup steps from initdb, as well as configure options, just in
case one of these settings is causing a problem?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_bsd_indent 2.0 is available from git.postgresql.org

2017-06-21 Thread Andrew Dunstan



On 06/21/2017 02:25 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 06/21/2017 11:30 AM, Tom Lane wrote:
>>> Andrew Dunstan  writes:
 Unfortunately this seems precluded by the use of the non-standard
 "err.h". It looks like that will trip us up on mingw also.
>>> Hm, why?  Doesn't the -I search path get the right thing?
>> The file isn't present at all.
> Uh, what?  It's in the repo.
>
>   


Oh, sorry, my bad, brain fade while multitasking.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC 2017 Proposal for predicate locking in hash index

2017-06-21 Thread Shubham Barai

Hi,

Now that hash index support write-ahead logging, it will be great if we add
support for predicate locking to it.
Implementation of predicate locking in hash index seems very simple.
I have already made changes in the code. I am currently working on testing.

Here is my approach

1) PredicateLockPage()

->_hash_first()

During a scan operation, acquire a predicate lock on the primary page of a
bucket.

2) CheckForSerializableConflictIn()

->_hash_doinsert()

During an insert operation, check if there is any predicate lock on the
primary page of a bucket.


3) PredicateLockPageSplit()

In case of a bucket split, copy predicate lock from the primary page of an
old bucket to the primary page of a new bucket.

Any suggestions or corrections will be appreciated.

Regards,
Shubham



   Sent with Mailtrack

Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread Robert Haas

On Thu, Jun 15, 2017 at 3:11 AM, Satyanarayana Narlapuram
 wrote:
> The proposal is to tweak the connectivity wire protocol, and add a
> connection id (GUID) filed in the startup message. We can trace the
> connection using this GUID and investigate further on where the connection
> failed.

Wire protocol changes are scary, because they can result in a need to
update every client connector and every bit of middleware that speaks
PostgreSQL, not just the server.  If we thought this feature had
enough value to justify adding it, we could add it as an optional
feature so that existing clients don't break.  But if we added it and
only libpq and the server ended up supporting it, that would be a
disappointing outcome.

> Client adds a connection id in the startup message and send it to the server
> it is trying to connect to. Proxy logs the connection id information in its
> logs, and passes it to the server. Server logs the connection Id in the
> server log, and set it in the GUC variable (ConnectionId).

Are you imagining that this would be done by the client or by the
driver the client is using?  If the latter, it's transparent but maybe
useless, because the client won't even know that this ID got created.
If it's done by the client code proper, then it only works if not only
the server and every bit of middleware and every connector but also
every client application in the world is updated, which does not seem
like a thing that is likely to occur.

> This field can be an optional field driven by the connection parameters for
> PSql (-C or--include-clientId).

This makes it sound like you're imagining it being added by an
explicit action on the part of the client, which means we're in
update-every-PostgreSQL-application-in-the-world territory.

> P.S: I am looking for initial feedback on this idea and can provide more
> design details if needed.

I think you need to build a more compelling case for why this would be
useful, why application_name isn't the right answer, and how we avoid
forcing everybody in the world to worry about this new thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread Robert Haas

On Thu, Jun 15, 2017 at 9:50 AM, Tom Lane  wrote:
> Can you give a concrete example where this would have
> helped above and beyond knowing, eg, the source and time of the connection
> attempt?

I can imagine that in really high-volume use cases (such as the OP
apparently has) the number of client connections might be so large
that identification by timestamp isn't useful, and the source IP will
be obscured after the first hop through a connection pooler or other
middleware.  If you've got 100 connections per second coming in,
matching things up by timestamp across different machines is going to
be tough.

But I agree with your other concerns.  I think the problem is real,
but I'm not sure that this is the best solution.  On the other hand,
I'm also not entirely sure I understand the proposal yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication launcher never been restarted when terminated

2017-06-21 Thread Peter Eisentraut

On 6/21/17 13:03, Yugo Nagata wrote:
> As I report in another thread[1], when the logical replication launcher 
> is terminated by SIGTERM, it never been restarted and we need to restart
> the server to enable logical replication again.
> 
> This is because the logical replication launcher exits with exitstatus 0,
> so if it exits with status 1 it is restarted by the postmaster normally.
> Attached is a simple patch to fix it in this way.

Fixed, thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Satyanarayana Narlapuram

> I should think that in such cases, the end client is exactly not what you 
> want to be choosing which server it gets redirected to.  You'd be wanting to 
> base that on >policies defined at the pooler.  There are already plenty of 
> client-supplied attributes you could use as inputs for such policies (user 
> name and application name, for >instance).
Pooler would be the end client for the Postgres database cluster, and 
connection string changes are required at the pooler. There is no change in the 
connection string format in such cases.

>Why do we need to incur a protocol break to add another one?
This is optional and is not a protocol break. This doesn't make the cluster 
name field mandatory in the startup message. If the client specifies the extra 
parameter in the connection string to include the server name in the startup 
message, then only it will be included otherwise it is not. In a proxy 
scenario, end client's startup message doesn't need to include the server name 
in it, and for proxy it is optional to include this field while sending the 
startup message to the server. It is preferred to set the field for the Azure 
PostgreSQL service instead of appending the cluster name to the user name.

Proposed LibPQ connection string format would be:

host=localhost port=5432 dbname=mydb connect_timeout=10 
include_cluster_name=true 

include_cluster_name is an optional parameter and setting this true includes 
cluster_name in the startup message and will not be included otherwise.

Thanks,
Satya

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Thursday, June 15, 2017 8:58 AM
To: Alvaro Herrera 
Cc: Satyanarayana Narlapuram ; 
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Making server name part of the startup message

Alvaro Herrera  writes:
> Tom Lane wrote:
>> This makes no sense at all.  The client is telling the server what 
>> the server's name is?

> I think for instance you could have one pgbouncer instance (or 
> whatever
> pooler) pointing to several different servers.  So the client connects 
> to the pooler and indicates which of the servers to connect to.

I should think that in such cases, the end client is exactly not what you want 
to be choosing which server it gets redirected to.  You'd be wanting to base 
that on policies defined at the pooler.  There are already plenty of 
client-supplied attributes you could use as inputs for such policies (user name 
and application name, for instance).
Why do we need to incur a protocol break to add another one?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Satyanarayana Narlapuram

PgBouncer for example assumes that the database names are unique across the 
database clusters it is serving. Our front-end Gateways can serve tens of 
thousands of Postgres servers spanning multiple customers, and organizations, 
and enforcing the database names being unique is not possible for the users of 
the service.

> For the original idea in this thread, using something like dbname@server 
> seems a more logical choice than username@server.

We considered this option but connecting to the database from the GUI tools is 
not very intuitive / possible. Also /c switch in Psql requires including full 
cluster_name every time user connect to a different database.

Thanks,
Satya
From: Magnus Hagander [mailto:mag...@hagander.net]
Sent: Thursday, June 15, 2017 9:24 AM
To: Tom Lane 
Cc: Alvaro Herrera ; Satyanarayana Narlapuram 
; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Making server name part of the startup message

On Thu, Jun 15, 2017 at 5:57 PM, Tom Lane 
mailto:t...@sss.pgh.pa.us>> wrote:
Alvaro Herrera mailto:alvhe...@2ndquadrant.com>> 
writes:
> Tom Lane wrote:
>> This makes no sense at all.  The client is telling the server what the
>> server's name is?

> I think for instance you could have one pgbouncer instance (or whatever
> pooler) pointing to several different servers.  So the client connects
> to the pooler and indicates which of the servers to connect to.

I should think that in such cases, the end client is exactly not what
you want to be choosing which server it gets redirected to.  You'd
be wanting to base that on policies defined at the pooler.  There are
already plenty of client-supplied attributes you could use as inputs
for such policies (user name and application name, for instance).
Why do we need to incur a protocol break to add another one?

The normal one to use for pgbonucer today is, well, "database name". You can 
then have pgbouncer map different databases to different backend servers. It's 
fairly common in my experience to have things like "dbname" and "dbname-ro" 
(for example) as different database names with one mapping to the master and 
one mapping to a load-balanced set of standbys, and things like that. ISTM that 
using the database name is a good choice for that.

For the original idea in this thread, using something like dbname@server seems 
a more logical choice than username@server.

TBH, so maybe I'm misunderstanding the original issue?

--
 Magnus Hagander
 Me: 
https://www.hagander.net/
 Work: 
https://www.redpill-linpro.com/

Re: [HACKERS] Missing comment for ResultRelInfo in execnodes.h

2017-06-21 Thread Peter Eisentraut

On 6/20/17 22:37, Etsuro Fujita wrote:
> On 2017/06/21 3:30, Peter Eisentraut wrote:
>> On 6/20/17 05:50, Etsuro Fujita wrote:
>>> Here is a small patch to add a comment on its new member PartitionRoot.
>>
>> The existing comment style is kind of unusual.  How about the attached
>> to clean it up a bit?
> 
> +1 for that change.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding connection id in the startup message

2017-06-21 Thread David G. Johnston

On Wed, Jun 21, 2017 at 12:15 PM, Robert Haas  wrote:
> I think the problem is real,
> but I'm not sure that this is the best solution.  On the other hand,
> I'm also not entirely sure I understand the proposal yet.

Given the problems with changing the protocol it does seem like
generalizing away from "connection id" to "arbitrary payload" would be
useful.

Like, add a new "attachment map" area where any application-aware node
that encounters the message can attach data.  If some node further
downstream sends a response message the contents of the attachment map
would be sent back as-is.

The advantage of "connection id" is that the messages are still of
fixed size whereas the more flexible arrangement would necessitate
that message sizes vary.  In return middleware gets a place to store
session information that can be used more broadly than a simple
externally generated connection id.

David J.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-06-21 Thread Robert Haas

On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
 wrote:
> Open Items:
>
> 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> HeapTuple and HeapScanDesc, So these scans are directly operating
> on those structures and providing the result.
>
> These scan types may not be applicable to different storage formats.
> So how to handle them?

I think that BitmapHeapScan, at least, is applicable to any table AM
that has TIDs.   It seems to me that in general we can imagine three
kinds of table AMs:

1. Table AMs where a tuple can be efficiently located by a real TID.
By a real TID, I mean that the block number part is really a block
number and the item ID is really a location within the block.  These
are necessarily quite similar to our current heap, but they can change
the tuple format and page format to some degree, and it seems like in
many cases it should be possible to plug them into our existing index
AMs without too much heartache.  Both index scans and bitmap index
scans ought to work.

2. Table AMs where a tuple has some other kind of locator.  For
example, imagine an index-organized table where the locator is the
primary key, which is a bit like what Alvaro had in mind for indirect
indexes.  If the locator is 6 bytes or less, it could potentially be
jammed into a TID, but I don't think that's a great idea.  For things
like int8 or numeric, it won't work at all.  Even for other things,
it's going to cause problems because the bit patterns won't be what
the code is expecting; e.g. bitmap scans care about the structure of
the TID, not just how many bits it is.  (Due credit: Somebody, maybe
Alvaro, pointed out this problem before, at PGCon.)  For these kinds
of tables, larger modifications to the index AMs are likely to be
necessary, at least if we want a really general solution, or maybe we
should have separate index AMs - e.g. btree for traditional TID-based
heaps, and generic_btree or indirect_btree or key_btree or whatever
for heaps with some other kind of locator.  It's not too hard to see
how to make index scans work with this sort of structure but it's very
unclear to me whether, or how, bitmap scans can be made to work.

3. Table AMs where a tuple doesn't really have a locator at all.  In
these cases, we can't support any sort of index AM at all.  When the
table is queried, there's really nothing the core system can do except
ask the table AM for a full scan, supply the quals, and hope the table
AM has some sort of smarts that enable it to optimize somehow.  For
example, you can imagine converting cstore_fdw into a table AM of this
sort - ORC has a sort of inbuilt BRIN-like indexing that allows whole
chunks to be proven uninteresting and skipped.  (You could use chunk
number + offset to turn this into a table AM of the previous type if
you wanted to support secondary indexes; not sure if that'd be useful,
but it'd certainly be harder.)

I'm more interested in #1 than in #3, and more interested in #3 than
#2, but other people may have different priorities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane

I wrote:
> Barring objections, I'd like to reindent HEAD with the new version
> of pg_bsd_indent (and correspondingly updated pgindent script)
> tomorrow, say around 1800 UTC.

... and it's done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql transactons not fully isolated

2017-06-21 Thread Merlin Moncure

On Tue, Jun 20, 2017 at 2:58 PM, Merlin Moncure  wrote:
> On Tue, Jun 20, 2017 at 2:34 PM, David G. Johnston
>  wrote:
>> On Tue, Jun 20, 2017 at 12:22 PM, Chapman Flack  
>> wrote:
>>> I get the reported result (DELETE 0 and a table containing 2 and 3)
>>> in both 'read committed' and 'read uncommitted'.
>>
>> Practically speaking those are a single transaction isolation mode.
>>
>> https://www.postgresql.org/docs/10/static/transaction-iso.html
>>
>> I think Merlin has mis-read the article he linked to.  The example
>> being used there never claims to be done under serialization and seems
>> to describe an example of the perils of relying on the default
>> isolation level.
>
> oops -- could be operator error :-)


yep, I made the rookie mistake of setting transaction isolation level
(which immediately evaporated since it wasn't bracketed by the
transaction), but not for the default.  Sorry for the noise,
serialization failures are raised and that is acceptable behavior per
spec AIUI.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 1:37 PM, Amit Khandekar  wrote:
>> e.g. the table is partitioned on order number, and you do UPDATE
>> lineitem SET product_code = 'K372B' WHERE product_code = 'K372'.
>
> This query does not update order number, so here there is no
> partition-key-update. Are you thinking that the patch is generating
> the per-leaf-partition WCO expressions even for a update not involving
> a partition key ?

No, it just wasn't a great example.  Sorry.

>> Second, it will amount to a functional bug if you get a
>> different answer than the planner did.
>
> Actually, the per-leaf WCOs are meant to be executed on the
> destination partitions where the tuple is moved, while the WCOs
> belonging to the per-subplan resultRelInfo are meant for the
> resultRelinfo used for the UPDATE plans. So actually it should not
> matter whether they look same or different, because they are fired at
> different objects. Now these objects can happen to be the same
> relations though.
>
> But in any case, it's not clear to me how the mapped WCO and the
> planner's WCO would yield a different answer if they are both the same
> relation. I am possibly missing something. The planner has already
> generated the withCheckOptions for each of the resultRelInfo. And then
> we are using one of those to re-generate the WCO for a leaf partition
> by only adjusting the attnos. If there is already a WCO generated in
> the planner for that leaf partition (because that partition was
> present in mtstate->resultRelInfo), then the re-built WCO should be
> exactly look same as the earlier one, because they are the same
> relations, and so the attnos generated in them would be same since the
> Relation TupleDesc is the same.

If the planner's WCOs and mapped WCOs are always the same, then I
think we should try to avoid generating both.  If they can be
different, but that's intentional and correct, then there's no
substantive problem with the patch but the comments need to make it
clear why we are generating both.

> Actually I meant, "above works for only local updates. For
> row-movement-updates, we need per-leaf partition WCOs, because when
> the row is inserted into target partition, that partition may be not
> be included in the above planner resultRelInfo, so we need WCOs for
> all partitions". I think this said comment should be sufficient if I
> add this in the code ?

Let's not get too focused on updating the comment until we are in
agreement about what the code ought to be doing.  I'm not clear
whether you accept the point that the patch needs to be changed to
avoid generating the same WCOs and returning lists in both the planner
and the executor.

>> Also, I feel like it's probably not correct to use the first result
>> relation as the nominal relation for building WCOs and returning lists
>> anyway.  I mean, if the first result relation has a different column
>> order than the parent relation, isn't this just broken?  If it works
>> for some reason, the comments don't explain what that reason is.
>
> Not sure why parent relation should come into picture. As long as the
> first result relation belongs to one of the partitions in the whole
> partition tree, we should be able to use that to build WCOs of any
> other partitions, because they have a common set of attributes having
> the same name. So we are bound to find each of the attributes of first
> resultRelInfo in the other leaf partitions during attno mapping.

Well, at least for returning lists, we've got to generate the
returning lists so that they all match the column order of the parent,
not the parent's first child.  Otherwise, for example, UPDATE
parent_table ... RETURNING * will not work correctly.  The tuples
returned by the returning clause have to have the attribute order of
parent_table, not the attribute order of parent_table's first child.
I'm not sure whether WCOs have the same issue, but it's not clear to
me why they wouldn't: they contain a qual which is an expression tree,
and presumably there are Var nodes in there someplace, and if so, then
they have varattnos that have to be right for the purpose for which
they're going to be used.

>> +for (i = 0; i < num_rels; i++)
>> +{
>> +ResultRelInfo *resultRelInfo = &result_rels[i];
>> +Relationrel = resultRelInfo->ri_RelationDesc;
>> +Bitmapset *expr_attrs = NULL;
>> +
>> +pull_varattnos((Node *) rel->rd_partcheck, 1, &expr_attrs);
>> +
>> +/* Both bitmaps are offset by FirstLowInvalidHeapAttributeNumber. */
>> +if (bms_overlap(expr_attrs, GetUpdatedColumns(resultRelInfo, 
>> estate)))
>> +return true;
>> +}
>>
>> This seems like an awfully expensive way of performing this test.
>> Under what circumstances could this be true for some result relations
>> and false for others;
>
> One resultRelinfo can have no partition key column used in its quals,
> but the next resultRelinfo can have quite different quals, and these
> quals can h

Re: [HACKERS] Making server name part of the startup message

2017-06-21 Thread Tom Lane

Satyanarayana Narlapuram  writes:
>> Why do we need to incur a protocol break to add another one?

> This is optional and is not a protocol break.

Yes, it is.  We've been around on this sort of thing before and we
understand the consequences.  If the option is carried in the startup
message, the client has to send it without knowing whether the server
is of new enough version to accept it.  If not, the server will reject
the connection (with a scary looking message in its log) and the client
then has to retry without the option.  This is not distinguishable from
what you have to do if you consider the startup message as belonging
to a new protocol version 4 instead of 3.

We have done this in the past, but it's painful, subject to bugs,
and generally is a pretty high price to pay for a marginal feature.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-21 Thread Robert Haas

On Wed, Jun 21, 2017 at 1:38 PM, Amit Khandekar  wrote:
>>> Yep, it's more appropriate to use
>>> ModifyTableState->rootResultRelationInfo->ri_RelationDesc somehow.  That
>>> is, if answer to the question I raised above is positive.
>
> From what I had checked earlier when coding that part,
> rootResultRelInfo is NULL in case of inserts, unless something has
> changed in later commits. That's the reason I decided to use the first
> resultRelInfo.

We're just going around in circles here.  Saying that you decided to
use the first child's resultRelInfo because you didn't have a
resultRelInfo for the parent is an explanation of why you wrote the
code the way you did, but that doesn't make it correct.  I want to
know why you think it's correct.

I think it's probably wrong, because it seems to me that if the INSERT
code needs to use the parent's ResultRelInfo rather than the first
child's ResultRelInfo, the UPDATE code probably needs to do the same.
Commit d3cc37f1d801a6b5cad9bf179274a8d767f1ee50 got rid of
resultRelInfos for non-leaf partitions, and commit
e180c8aa8caf5c55a273d4a8e6092e77ff3cff10 added the resultRelInfo back
for the topmost parent, because otherwise it didn't work correctly.
If every partition in the hierarchy has a different attribute
ordering, then it seems to me that it must surely matter which of
those attribute orderings we pick.  It's hard to imagine that we can
pick *either* the parent's attribute ordering *or* that of the first
child and nothing will be different - the attribute numbers inside the
returning lists and WCOs we create have got to get used somehow, so
surely it matters which attribute numbers we use, doesn't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Bruce Momjian

On Wed, Jun 21, 2017 at 04:07:30PM -0400, Tom Lane wrote:
> I wrote:
> > Barring objections, I'd like to reindent HEAD with the new version
> > of pg_bsd_indent (and correspondingly updated pgindent script)
> > tomorrow, say around 1800 UTC.
> 
> ... and it's done.

You are eventually doing all active branches, right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Broken O(n^2) avoidance in wal segment recycling.

2017-06-21 Thread Andres Freund

Hi,

Author: Heikki Linnakangas 
Branch: master Release: REL9_5_BR [b2a5545bd] 2015-04-13 16:53:49 +0300
Branch: REL9_4_STABLE Release: REL9_4_2 [d72792d02] 2015-04-13 17:22:21 +0300
Branch: REL9_3_STABLE Release: REL9_3_7 [a800267e4] 2015-04-13 17:22:35 +0300
Branch: REL9_2_STABLE Release: REL9_2_11 [cc2939f44] 2015-04-13 17:26:59 +0300
Branch: REL9_1_STABLE Release: REL9_1_16 [ad2925e20] 2015-04-13 17:26:49 +0300
Branch: REL9_0_STABLE Release: REL9_0_20 [5b6938186] 2015-04-13 17:26:35 +0300

Don't archive bogus recycled or preallocated files after timeline switch.

Moved xlog file deletion from RemoveOldXlogFiles() into its own
RemoveXlogFile() routine, because it introduced a new function also
deleting/recycling segments.

It did so moving
+   /* Needn't recheck that slot on future iterations */
+   endlogSegNo++;
into the new routine. But it's useless there, because it's just a stack
variable, which is going to be freshly computed with
+   XLByteToPrevSeg(endptr, endlogSegNo);
on the next call.

That logic was introduced in

commit 61b861421b0b849a0dffe36238b8e504624831c1
Author: Tom Lane 
Date:   2005-04-15 18:48:10 +

Modify MoveOfflineLogs/InstallXLogFileSegment to avoid O(N^2) behavior
when recycling a large number of xlog segments during checkpoint.
The former behavior searched from the same start point each time,
requiring O(checkpoint_segments^2) stat() calls to relocate all the
segments.  Instead keep track of where we stopped last time through.

but was neutered by the commit above.

We've not heard any complaints about this afaik, but it's not something
that's easily diagnosable as being a problem.  Therefore I suspect we
should fix and backpatch this?

Heikki?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane

Bruce Momjian  writes:
> On Wed, Jun 21, 2017 at 04:07:30PM -0400, Tom Lane wrote:
>> ... and it's done.

> You are eventually doing all active branches, right?

I don't think we'd entirely decided that we should do that, or when
to do it.  I'm not in a huge hurry; we might find some more tweaks
we want to make -- particularly around typedef collection -- before
we call it done.

For reference, the patchset wound up changing just about 1 lines,
out of a total code base approaching 1.4M lines, so less than 1%
churn.  That's not as bad as I'd thought it would be going in.
But I'm not sure if that represents an argument for or against
reindenting the back branches.  It's probably more than the number of
lines affected by that 8.1 comment-right-margin adjustment that caused
us so much back-patching pain later.

Right now we're really just speculating about how much pain there will
be, on either end of this.  So it'd be interesting for somebody who's
carrying large out-of-tree patches (EDB? Citus?) to try the new
pgindent version on a back branch and see how much of their patches no
longer apply afterwards.  And I think it'd make sense to wait a few
months and garner some experience with back-patching from v10 into the
older branches, so we have more than guesses about how much pain not
reindenting will be for us.

I'd earlier suggested that waiting till around the time of 10.0
release might be a good idea, and that still seems like a
reasonable timeframe.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Phantom segment upon promotion causing troubles.

2017-06-21 Thread Andres Freund

Hi,

On 2017-06-20 16:11:32 +0300, Heikki Linnakangas wrote:
> On 06/19/2017 10:30 AM, Andres Freund wrote:
> > Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
> > weird enough to be interesting.  What he'd observed was that he promoted
> > some PITR standby, and early clones of that node work, but later clones
> > did not, failing to read some segment.
> > 
> > The problems turns out to be the following:  [explanation]
> 
> Good detective work!

Thanks.


> > The minimal fix here is presumably not to use XLByteToPrevSeg() in
> > RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
> > serves here - I don't think it's ever needed.
> 
> Agreed, I don't see a reason for it either.

Pushed. And found like three other things while investigating :/


> > There seems to be a larger question ehre though: Why does
> > XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> > at that period?  That seems like a bad idea, especially in more
> > complicated scenarios where some precursor timeline might live for
> > longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
> > which timeline a segment ought to come from, based on the historY?
> 
> Yeah. I've had that thought for years as well, but there has never been any
> pressing reason to bite the bullet and rewrite it, so I haven't gotten
> around to it.

Heh.  Still seems like something we should tackle - but it'd not be
urgent enough to backpatch, so it doesn't quite seem like something to
tackle *just now* :/

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE SUBSCRIPTION log noise

2017-06-21 Thread Euler Taveira

2017-06-21 15:14 GMT-03:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> > It doesn't appear to be contingent on anything other than the content of
> > the command you just gave it.  I don't think we need a NOTICE saying
> > that it did that thing I just told it to do--that should be implicit by
> > the lack of an ERROR.
>
> I'm appreciative of this complaint.  The point is that we are allocating
> resources on a remote host that should not be forgotten.  I could go
> either way.


It is documented that subscription can create a replication slot on remote
host as mentioned in section "Replication Slot Management". If we want to
maintain the message let's downgrade it to DEBUG1 (as we do with "create
implicit index for table") but I prefer to rip it out.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Andres Freund

On 2017-06-21 17:28:32 -0400, Tom Lane wrote:
> Right now we're really just speculating about how much pain there will
> be, on either end of this.  So it'd be interesting for somebody who's
> carrying large out-of-tree patches (EDB? Citus?) to try the new
> pgindent version on a back branch and see how much of their patches no
> longer apply afterwards.

Citus isn't a patched version of postgres anymore, butan extension (with
some ugly hacks to make that possible ...).  Therefore it shouldn't
affect us in any meaningful way.

> And I think it'd make sense to wait a few
> months and garner some experience with back-patching from v10 into the
> older branches, so we have more than guesses about how much pain not
> reindenting will be for us.

Hm.  A few days / a week or two, okay.  But a few months?  That'll incur
the backpatch cost during all that time...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-21 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-21 17:28:32 -0400, Tom Lane wrote:
>> And I think it'd make sense to wait a few
>> months and garner some experience with back-patching from v10 into the
>> older branches, so we have more than guesses about how much pain not
>> reindenting will be for us.

> Hm.  A few days / a week or two, okay.  But a few months?  That'll incur
> the backpatch cost during all that time...

We can always move up the decision date if we find that our pain
sensors have overloaded.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-21 Thread Tom Lane

Today, lorikeet failed with a new variant on the bgworker start crash:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2017-06-21%2020%3A29%3A10

This one is even more exciting than the last one, because it sure looks
like the crashing bgworker took the postmaster down with it.  That is
Not Supposed To Happen.

Wondering if we broke something here recently, I tried to reproduce it
on a Linux machine by adding a randomized Assert failure in
shm_mq_set_sender.  I don't see any such problem, even with EXEC_BACKEND;
we recover from the crash as-expected.

So I'm starting to get a distinct feeling that there's something wrong
with the cygwin port.  But I dunno what.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PG 10beta2 schedule

2017-06-21 Thread Tom Lane

The release team has agreed that a good time to put out beta2 will
be the second week of July, ie wrap tarballs 10 July for announcement
13 July.

I imagine beta3 will appear along with the scheduled back-branch
releases in the second week of August, but that's not positively
decided yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Rod Taylor

In the attached script, the second insert into t2 (as part of the CTE)
should succeed. My actual use case isn't much more complex; the function is
used primarily to allow peaking at columns that the function definer has
access to but a typical user does not. Function also makes it easy to copy
this policy to a number of structures.

The function within the policy doesn't seem to be able to see records
inserted by earlier statements in the CTE. Perhaps this is as simple as
adding a command counter increment in the right place?

Fails in 9.5.7 and HEAD.

-- 
Rod Taylor


cte_rls_fail.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
> 
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
> 
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
> 
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> 
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.

\set nbranches 1 * :scale
\set ntellers 10 * :scale
\set naccounts 10 * :scale
\set aid random(1, :naccounts)
\set bid random(1, :nbranches)
\set tid random(1, :ntellers)
\set delta random(-5000, 5000)
\beginbatch
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, 
:aid, :delta, CURRENT_TIMESTAMP);
END;
\endbatch

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Craig Ringer

On 22 Jun. 2017 07:40, "Andres Freund"  wrote:

On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
>   Andres Freund wrote:
>
> > FWIW, I still think this needs a pgbench or similar example integration,
> > so we can actually properly measure the benefits.
>
> Here's an updated version of the patch I made during review,
> adding \beginbatch and \endbatch to pgbench.
> The performance improvement appears clearly
> with a custom script of this kind:
>   \beginbatch
>  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
>  ..above repeated 1000 times...
>   \endbatch
>
> versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
>
> On localhost on my desktop I tend to see a 30% difference in favor
> of the batch mode with that kind of test.
> On slower networks there are much bigger differences.

This is seriously impressive.  Just using the normal pgbench mixed
workload, wrapping a whole transaction into a batch *doubles* the
throughput.  And that's locally over a unix socket - the gain over
actual network will be larger.


In my original tests I got over a 300x improvement on WAN :) . I should
check if the same applies with pgbench.

Re: [HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Tom Lane

Rod Taylor  writes:
> In the attached script, the second insert into t2 (as part of the CTE)
> should succeed.

No, I don't think so.  You declared the check function as STABLE which
means it is confined to seeing the same snapshot as the surrounding query.
So it can't see anything inserted by that query.

Possibly it'd work as you wish with a VOLATILE function.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Restart logical replication launcher when killed

2017-06-21 Thread Michael Paquier

On Thu, Jun 22, 2017 at 4:16 AM, Peter Eisentraut  wrote:
> Restart logical replication launcher when killed

-   /* The logical replication launcher can be stopped at any time. */
-   proc_exit(0);
+   /* The logical replication launcher can be stopped at any time.
+* Use exit status 1 so the background worker is restarted. */
+   proc_exit(1);
I know I am noisy on the matter but... This comment format is not PG-like.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Restart logical replication launcher when killed

2017-06-21 Thread Andres Freund

On 2017-06-22 08:46:35 +0900, Michael Paquier wrote:
> On Thu, Jun 22, 2017 at 4:16 AM, Peter Eisentraut  wrote:
> > Restart logical replication launcher when killed
> 
> -   /* The logical replication launcher can be stopped at any time. */
> -   proc_exit(0);
> +   /* The logical replication launcher can be stopped at any time.
> +* Use exit status 1 so the background worker is restarted. */
> +   proc_exit(1);
> I know I am noisy on the matter but... This comment format is not PG-like.

That's since been repaired by the new pgindent run.  But what I'm a bit
confused about is why we're going for proc_exit() for replication
launchers when other types of processes simply FATAL out?  Seems like a
weird change.  It's not like it's not log-worthy if somebody kills the
launcher?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RLS in CTE incorrect permission failure

2017-06-21 Thread Rod Taylor

On Wed, Jun 21, 2017 at 7:46 PM, Tom Lane  wrote:

> Rod Taylor  writes:
> > In the attached script, the second insert into t2 (as part of the CTE)
> > should succeed.
>
> No, I don't think so.  You declared the check function as STABLE which
> means it is confined to seeing the same snapshot as the surrounding query.
> So it can't see anything inserted by that query.
>
> Possibly it'd work as you wish with a VOLATILE function.
>

Indeed, that works as expected.

Sorry for the noise.


-- 
Rod Taylor

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-21 Thread Michael Paquier

On Wed, Jun 21, 2017 at 11:42 PM, Daniel Gustafsson  wrote:
> The message is truncated in SetBackendCancelMessage() for safety, but
> pg_{cancel|terminate}_backend() could throw an error on too long message, or
> warning truncation, to the caller as well.  Personally I think a warning is 
> the
> appropriate response, but I don’t really have a strong opinion.

And a NOTICE? That's what happens for relation name truncation. You
are right that having a check in SetBackendCancelMessage() makes the
most sense as bgworkers could just call the low level API. Isn't the
concept actually closer to just a backend message? This slot could be
used for other purposes than cancellation.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-21 Thread Andres Freund

On 2017-06-21 16:40:48 -0700, Andres Freund wrote:
> On 2017-06-20 17:51:23 +0200, Daniel Verite wrote:
> > Andres Freund wrote:
> > 
> > > FWIW, I still think this needs a pgbench or similar example integration,
> > > so we can actually properly measure the benefits.
> > 
> > Here's an updated version of the patch I made during review,
> > adding \beginbatch and \endbatch to pgbench.
> > The performance improvement appears clearly
> > with a custom script of this kind:
> >   \beginbatch
> >  UPDATE pgbench_branches SET bbalance = bbalance + 1 WHERE bid = 0;
> >  ..above repeated 1000 times...
> >   \endbatch
> > 
> > versus the same with a BEGIN; END; pair instead of \beginbatch \endbatch
> > 
> > On localhost on my desktop I tend to see a 30% difference in favor
> > of the batch mode with that kind of test.
> > On slower networks there are much bigger differences.
> 
> This is seriously impressive.  Just using the normal pgbench mixed
> workload, wrapping a whole transaction into a batch *doubles* the
> throughput.  And that's locally over a unix socket - the gain over
> actual network will be larger.

I've not analyzed this further, but something with the way network is
done isn't yet quite right either in the pgbench patch or in the libpq
patch.  You'll currently get IO like:

sendto(3, "B\0\0\0\22\0P1_2\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t\0"..., 36, 
MSG_NOSIGNAL, NULL, 0) = 36
sendto(3, "B\0\0\0\35\0P1_4\0\0\0\0\1\0\0\0\0073705952\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_6\0\0\0\0\1\0\0\0\0077740854\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\35\0P1_8\0\0\0\0\1\0\0\0\0071570280\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_10\0\0\0\0\1\0\0\0\0072634305\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_12\0\0\0\0\1\0\0\0\0078960656\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_14\0\0\0\0\1\0\0\0\0073030370\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\35\0P1_16\0\0\0\0\1\0\0\0\006376125\0\1\0\0D\0"..., 47, 
MSG_NOSIGNAL, NULL, 0) = 47
sendto(3, "B\0\0\0\36\0P1_18\0\0\0\0\1\0\0\0\0072982423\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_20\0\0\0\0\1\0\0\0\0073860195\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_22\0\0\0\0\1\0\0\0\0072794433\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\36\0P1_24\0\0\0\0\1\0\0\0\0075475271\0\1\0\0D"..., 48, 
MSG_NOSIGNAL, NULL, 0) = 48
sendto(3, "B\0\0\0\23\0P1_25\0\0\0\0\0\0\1\0\0D\0\0\0\6P\0E\0\0\0\t"..., 37, 
MSG_NOSIGNAL, NULL, 0) = 37
sendto(3, "S\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
recvfrom(3, "2\0\0\0\4n\0\0\0\4C\0\0\0\nBEGIN\0002\0\0\0\4T\0\0\0!\0"..., 
16384, 0, NULL, NULL) = 775
recvfrom(3, 0x559a02667ff2, 15630, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667fb1, 15695, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f6c, 15764, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667f2b, 15829, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667eea, 15894, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667ea9, 15959, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e68, 16024, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667e24, 16092, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667de3, 16157, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667da2, 16222, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d5d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d1c, 16356, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(3, 0x559a02667d06, 16378, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)

I.e. we're doing tiny write send() syscalls (they should be coalesced)
and then completely unnecessarily call recv() over and over again
without polling.  To me it looks very much like we're just doing either
exactly once per command...

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rules on table partitions

2017-06-21 Thread Amit Langote

On 2017/06/21 18:49, Dean Rasheed wrote:
> On 20 June 2017 at 03:01, Amit Langote  wrote:
>> Hmm, yes.  The following exercise convinced me.
>>
>> create table r (a int) partition by range (a);
>> create table r1 partition of r for values from (1) to (10);
>> create rule "_RETURN" as on select to r1 do instead select * from r;
>>
>> insert into r values (1);
>> ERROR:  cannot insert into view "r1"
>> HINT:  To enable inserting into the view, provide an INSTEAD OF INSERT
>> trigger or an unconditional ON INSERT DO INSTEAD rule.
>>
>> The error is emitted by CheckValidResultRel() that is called on individual
>> leaf partitions when setting up tuple-routing in ExecInitModifyTable.
>>
>> I agree that we should forbid this case, so please find attached a patch.
>>
> 
> Committed.

Thanks, Dean.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 144 matches

Mail list logo