Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Marina Polyakova

On 2018-11-16 22:59, Alvaro Herrera wrote:

On 2018-Sep-05, Marina Polyakova wrote:


v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
- a patch for the RandomState structure (this is used to reset a 
client's

random seed during the repeating of transactions after
serialization/deadlock failures).


Pushed this one with minor stylistic changes (the most notable of which
is the move of initRandomState to where the rest of the random 
generator

infrastructure is, instead of in a totally random place).  Thanks,


Thank you very much! I'm going to send a new patch set until the end of 
this week (I'm sorry I was very busy in the release of Postgres Pro 
11...).


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: ToDo: show size of partitioned table

2018-11-19 Thread Pavel Stehule
po 19. 11. 2018 v 3:42 odesílatel Michael Paquier 
napsal:

> On Sun, Nov 18, 2018 at 11:17:37PM -0300, Alvaro Herrera wrote:
> > To be certain I'm not going against some old decision, I digged up
> > Amit's old patches.  Turns out he submitted psql's describe.c using the
> > term "partitioned table" on August 10th [1] and then based on a
> > discussion where Robert suggested calling these new objects "partition
> > roots" instead to avoid confusion, it was changed to "table" in the next
> > submission on August 26th [2].  It seems the right call to have used the
> > term "table" in many places (rather than "partition roots"), but at
> > least in psql's \dt it seems extremely useful to show the type as
> > "partitioned table" instead, because it is one place where the
> > distinction is clearly useful.
>
> +1.
>
> > In this thread there have been no contrary votes, so I'm pushing this
> > part soon.
> >
> > [1]
> https://postgr.es/m/ad16e2f5-fc7c-cc2d-333a-88d4aa446...@lab.ntt.co.jp
> > [2]
> https://postgr.es/m/169708f6-6e5a-18d1-707b-1b323e4a6...@lab.ntt.co.jp
>
> Sorry for degressing, but could you also update \di at the same time so
> as it shows "partitioned index"?  listTables() should be switched to use
> partitioned tables and partitioned indexes, and permissionsList() has a
> reference to partitioned tables.  While on it, this gives the attached..
>

It has sense

Pavel

--
> Michael
>


Can I skip function ResolveRecoveryConflictWithSnapshot if setting hot_standby_feedback=on all the time

2018-11-19 Thread 范孝剑(康贤)
Hello,
Can I skip function ResolveRecoveryConflictWithSnapshot if setting 
hot_standby_feedback=on all the time?
As I know,  function ResolveRecoveryConflictWithSnapshot is used for resolving 
conflicts once master cleans dead tuples. But if I set hot_standby_feedback to 
on, it will not appear conflicts, so I can skip to execute function 
ResolveRecoveryConflictWithSnapshot, am I right?

Trigger tuple slotification (extracted from pluggable storage patch)

2018-11-19 Thread Amit Khandekar
Hi,

In [1],  there was a plan to do the trigger tuple slotification
changes as a separate patch, and not include them in the pluggable
storage changes. I have come up with such a patch by extracting the
trigger-specific changes from the pluggable storage changes proposed
in [1]. I used this repository to extract the changes :
https://github.com/anarazel/postgres-pluggable-storage.git

Extracted changes from this commit :

commit 6066202a97c9a2d540fd2a2e5fd2cad22f307cd2
Author: Andres Freund 
Date:   Tue Oct 2 22:15:23 2018 -0700

Pluggable Storage.

Basically, the attached patch contains changes that slotify the
trigger tuple handling. Details below :

In the pluggable storage changes, there were some TODOs such as this
in trigger.c :
-   /* TODO : */
-   if (false && oldtuple != newtuple && oldtuple != slottuple)

I removed this condition. Andres, you had put this in the pluggable
storage patch, probably to check whether it is important. But after
giving a thought, I think it is not required.

Did all the slotification changes in ri_trigger.c also, because they
are all dependent on the slotification of Trigger tuples.

In AfterTriggerSaveEvent(), newslot->tts_tid value is copied over to
new_event.ate_ctid1. Since this function only accepts slot now,
there's no point in retrieving the tid from the slot tuple since later
on we are going to have tid in the slot, looking at the pluggable
storage changes. So in a separate patch
(0001-Populate-slot-tts_tid-wherever-tuple-t_self-changes.patch), I
have just added this field in the TupleTableSlot, and populated
slot->tts_tid wherever tuple->t_self changes.

[1] https://commitfest.postgresql.org/14/1283/

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


0001-Populate-slot-tts_tid-wherever-tuple-t_self-changes.patch
Description: Binary data


0002-Trigger-tuple-slotification.patch
Description: Binary data


Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

2018-11-19 Thread Masahiko Sawada
On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra
 wrote:
>
> Hi,
>
> It seems we have pretty annoying problem with logical decoding when
> performing VACUUM FULL / CLUSTER on a table with toast-ed data.
>
> The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
> records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
> ignored in logical decoding, and so reorderbuffer never gets those
> records. But we do decode the TOAST data, and reorderbuffer stashes them
> in toast_hash hash table. Which gets reset only when handling a row from
> the main heap, and that never arrives. So we end up stashing all the
> TOAST data in memory :-(
>
> So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
> likely to break any logical replication connection. And it does not
> matter if you replicate this particular table.
>
> Luckily enough, this can leverage some of the pieces introduced by
> e9edc1ba which was meant to deal with rewrites of system tables, and in
> raw_heap_insert it added this:
>
> /*
>  * The new relfilenode's relcache entrye doesn't have the necessary
>  * information to determine whether a relation should emit data for
>  * logical decoding.  Force it to off if necessary.
>  */
> if (!RelationIsLogicallyLogged(state->rs_old_rel))
> options |= HEAP_INSERT_NO_LOGICAL;
>
> As raw_heap_insert is used only for heap rewrites, we can simply remove
> the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
> data logged from here.
>

This fix seems fine to me.

> This does fix the issue, because we still decode the TOAST changes but
> there are no data and so
>
> if (change->data.tp.newtuple != NULL)
> {
> dlist_delete(&change->node);
> ReorderBufferToastAppendChunk(rb, txn, relation,
>   change);
> }
>
> ends up not stashing the change in the hash table.

With the below change you proposed can we remove the above condition
because toast-insertion changes being processed by the reorder buffer
always have a new tuple? If a toast-insertion record doesn't have a
new tuple it has already ignored when decoding.

> It's imperfect,
> because we still decode the changes (and stash them to disk), but ISTM
> that can be fixed by tweaking DecodeInsert a bit to just ignore those
> changes entirely.
>
> Attached is a PoC patch with these two fixes.
>

I think this change is also fine.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: TupleTableSlot abstraction

2018-11-19 Thread Amit Khandekar
On Wed, 14 Nov 2018 at 05:00, Andres Freund  wrote:
> After this, I hope Amit Khandekar will rebase a patch he's sent me
> internally that converts triggers to use slots. I'll work on rebasing
> the pluggable storage patch ontop of this.

Shared this patch in a separate mail thread :
https://www.postgresql.org/message-id/CAJ3gD9fjpoPHSHB-Ufj7ciT8nV0JSA2gdJUdtxo-bMyPrpjk%3DQ%40mail.gmail.com
--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: zheap: a new storage format for PostgreSQL

2018-11-19 Thread Daniel Westermann

> Thanks, it makes difference and keep us motivated for making progress.
+1

Is it intended behavior that a database can not be dropped when undo apply is 
running in the background?

zheap=# update pgbench_accounts set filler = 'bbb' where mod(aid,10) = 0;
UPDATE 100
zheap=# rollback;
ROLLBACK
zheap=# drop database zheap;
ERROR:  cannot drop the currently open database
zheap=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# drop database zheap;
ERROR:  database "zheap" is being accessed by other users
DETAIL:  There is 1 other session using the database.
postgres=# drop database zheap;
ERROR:  database "zheap" is being accessed by other users
DETAIL:  There is 1 other session using the database.
postgres=#

Regards
Daniel


Re: logical decoding vs. VACUUM FULL / CLUSTER on table with TOAST-ed data

2018-11-19 Thread Tomas Vondra

On 11/19/18 10:28 AM, Masahiko Sawada wrote:

On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra
 wrote:


Hi,

It seems we have pretty annoying problem with logical decoding when
performing VACUUM FULL / CLUSTER on a table with toast-ed data.

The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
ignored in logical decoding, and so reorderbuffer never gets those
records. But we do decode the TOAST data, and reorderbuffer stashes them
in toast_hash hash table. Which gets reset only when handling a row from
the main heap, and that never arrives. So we end up stashing all the
TOAST data in memory :-(

So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
likely to break any logical replication connection. And it does not
matter if you replicate this particular table.

Luckily enough, this can leverage some of the pieces introduced by
e9edc1ba which was meant to deal with rewrites of system tables, and in
raw_heap_insert it added this:

 /*
  * The new relfilenode's relcache entrye doesn't have the necessary
  * information to determine whether a relation should emit data for
  * logical decoding.  Force it to off if necessary.
  */
 if (!RelationIsLogicallyLogged(state->rs_old_rel))
 options |= HEAP_INSERT_NO_LOGICAL;

As raw_heap_insert is used only for heap rewrites, we can simply remove
the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
data logged from here.



This fix seems fine to me.


This does fix the issue, because we still decode the TOAST changes but
there are no data and so

 if (change->data.tp.newtuple != NULL)
 {
 dlist_delete(&change->node);
 ReorderBufferToastAppendChunk(rb, txn, relation,
   change);
 }

ends up not stashing the change in the hash table.


With the below change you proposed can we remove the above condition
because toast-insertion changes being processed by the reorder buffer
always have a new tuple? If a toast-insertion record doesn't have a
new tuple it has already ignored when decoding.



Good point. I think you're right the reorderbuffer part may be 
simplified as you propose.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: zheap: a new storage format for PostgreSQL

2018-11-19 Thread Amit Kapila
On Mon, Nov 19, 2018 at 3:59 PM Daniel Westermann
 wrote:
>
> > Thanks, it makes difference and keep us motivated for making progress.
> +1
>
> Is it intended behavior that a database can not be dropped when undo apply is 
> running in the background?
>

Yes, we need to connect to the database for performing rollback
actions.  Once the rollback for that database is over, undo apply
worker will exit and you should be able to drop the database.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2018-11-19 Thread John Naylor
On 11/19/18, Amit Kapila  wrote:
> On Mon, Nov 19, 2018 at 7:30 AM John Naylor  wrote:
>> Let's say we have to wait to acquire a relation extension lock,
>> because another backend had already started extending the heap by 1
>> block. We call GetPageWithFreeSpace() and now the local map looks like
>>
>> 0123
>> TTA0
>>
>> By using bitwise OR to set availability, the already-tried blocks
>> remain as they are. With only 2 states, the map would look like this
>> instead:
>>
>> 0123
>> AAAN
>>

> In my mind for such a case it should look like below:
> 0123
> NNAN

Okay, to retain that behavior with only 2 status codes, I have
implemented the map as a struct with 2 members: the cached number of
blocks, plus the same array I had before. This also allows a more
efficient implementation at the micro level. I just need to do some
more testing on it.

[ abortive states ]
> I think it might come from any other place between when you set it and
> before it got cleared (like any intermediate buffer and pin related
> API's).

Okay, I will look into that.

> One other thing that slightly bothers me is the call to
> RelationGetNumberOfBlocks via fsm_allow_writes.  It seems that call
> will happen quite frequently in this code-path and can have some
> performance impact.  As of now, I don't have any idea to avoid it or
> reduce it more than what you already have in the patch, but I think we
> should try some more to avoid it.  Let me know if you have any ideas
> around that?

FWIW, I believe that the callers of RecordPageWithFreeSpace() will
almost always avoid that call. Otherwise, there is at least one detail
that could use attention: If rel->rd_rel->relpages shows fewer pages
than the threshold, than the code doesn't trust it to be true. Might
be worth revisiting.
Aside from that, I will have to think about it.

More generally, I have a couple ideas about performance:

1. Only mark available every other block such that visible blocks are
interleaved as the relation extends. To explain, this diagram shows a
relation extending, with 1 meaning marked available and 0 meaning
marked not-available.

A
NA
ANA
NANA

So for a 3-block table, we never check block 1. Any free space it has
acquired will become visible when it extends to 4 blocks. For a
4-block threshold, we only check 2 blocks or less. This reduces the
number of lock/pin events but still controls bloat. We could also
check both blocks of a 2-block table.

2. During manual testing I seem to remember times that the FSM code
was invoked even though I expected the smgr entry to have a cached
target block. Perhaps VACUUM or something is clearing that away
unnecessarily. It seems worthwhile to verify and investigate, but that
seems like a separate project.

-John Naylor



Re: WIP: Avoid creation of the free space map for small tables

2018-11-19 Thread Amit Kapila
On Mon, Nov 19, 2018 at 4:40 PM John Naylor  wrote:
>
> On 11/19/18, Amit Kapila  wrote:
> > On Mon, Nov 19, 2018 at 7:30 AM John Naylor  wrote:
> >> Let's say we have to wait to acquire a relation extension lock,
> >> because another backend had already started extending the heap by 1
> >> block. We call GetPageWithFreeSpace() and now the local map looks like
> >>
> >> 0123
> >> TTA0
> >>
> >> By using bitwise OR to set availability, the already-tried blocks
> >> remain as they are. With only 2 states, the map would look like this
> >> instead:
> >>
> >> 0123
> >> AAAN
> >>
>
> > In my mind for such a case it should look like below:
> > 0123
> > NNAN
>
> Okay, to retain that behavior with only 2 status codes, I have
> implemented the map as a struct with 2 members: the cached number of
> blocks, plus the same array I had before. This also allows a more
> efficient implementation at the micro level. I just need to do some
> more testing on it.
>

Okay.

> [ abortive states ]
> > I think it might come from any other place between when you set it and
> > before it got cleared (like any intermediate buffer and pin related
> > API's).
>
> Okay, I will look into that.
>
> > One other thing that slightly bothers me is the call to
> > RelationGetNumberOfBlocks via fsm_allow_writes.  It seems that call
> > will happen quite frequently in this code-path and can have some
> > performance impact.  As of now, I don't have any idea to avoid it or
> > reduce it more than what you already have in the patch, but I think we
> > should try some more to avoid it.  Let me know if you have any ideas
> > around that?
>
> FWIW, I believe that the callers of RecordPageWithFreeSpace() will
> almost always avoid that call. Otherwise, there is at least one detail
> that could use attention: If rel->rd_rel->relpages shows fewer pages
> than the threshold, than the code doesn't trust it to be true. Might
> be worth revisiting.
>

I think it is less of a concern when called from vacuum code path.

> Aside from that, I will have to think about it.
>
> More generally, I have a couple ideas about performance:
>
> 1. Only mark available every other block such that visible blocks are
> interleaved as the relation extends. To explain, this diagram shows a
> relation extending, with 1 meaning marked available and 0 meaning
> marked not-available.
>
> A
> NA
> ANA
> NANA
>
> So for a 3-block table, we never check block 1. Any free space it has
> acquired will become visible when it extends to 4 blocks. For a
> 4-block threshold, we only check 2 blocks or less. This reduces the
> number of lock/pin events but still controls bloat. We could also
> check both blocks of a 2-block table.
>

We can try something like this if we see there is any visible
performance hit in some scenario.

> 2. During manual testing I seem to remember times that the FSM code
> was invoked even though I expected the smgr entry to have a cached
> target block. Perhaps VACUUM or something is clearing that away
> unnecessarily. It seems worthwhile to verify and investigate, but that
> seems like a separate project.
>

makes sense, let's not get distracted by stuff that is not related to
this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-11-19 Thread Tomas Vondra

On 11/17/18 4:53 PM, Alvaro Herrera wrote:

Here are versions for branches 10 and 11.  The main change is that the
COPY test didn't have the partitioned table, because it was recently
introduced (0d5f05cde011) so I backpatched that part also.  It's a bit
useless, but I'd rather backpatch the same thing rather than have
different lines there ...



The patch seems fine to me.

It's a bit unfortunate that we simply disable the optimization on 
partitioned tables instead of fixing it somehow, but I guess it's better 
than nothing and no one has a very good idea how to make it work.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: zheap: a new storage format for PostgreSQL

2018-11-19 Thread Daniel Westermann
>Yes, we need to connect to the database for performing rollback
>actions.  Once the rollback for that database is over, undo apply
>worker will exit and you should be able to drop the database.

Thank you, Amit.
Can you have a look at this one?

create table t1 ( a text ) partition by list (a);
create table t1_1 PARTITION of t1 (a) for values in ('a');
create table t1_2 PARTITION of t1 (a) for values in ('b');
create table t1_3 PARTITION of t1 (a) for values in ('c');
create table t1_4 PARTITION of t1 (a) default;

postgres=# \d+ t1
   Table "public.t1"
 Column | Type | Collation | Nullable | Default | Storage  | Stats target | 
Description 
+--+---+--+-+--+--+-
 a  | text |   |  | | extended |  | 
Partition key: LIST (a)
Partitions: t1_1 FOR VALUES IN ('a'),
t1_2 FOR VALUES IN ('b'),
t1_3 FOR VALUES IN ('c'),
t1_4 DEFAULT
Options: storage_engine=zheap


insert into t1 select 'a' from generate_series ( 1, 100 );
insert into t1 select 'b' from generate_series ( 1, 100 );
insert into t1 select 'c' from generate_series ( 1, 100 );

postgres=# begin;
BEGIN
postgres=# update t1 set a = 'd' where a = 'a';
UPDATE 100
postgres=# rollback;
ROLLBACK
postgres=# select * from t1 where a = 'd';
postgres=# select * from t1 where a = 'd';
postgres=# select * from t1 where a = 'd';

The selects at the end take seconds and a lot of checkpoints are happening.

Regards
Daniel








Re: [PATCH] get rid of StdRdOptions, use individual binary reloptions representation for each relation kind instead

2018-11-19 Thread Nikolay Shaplov
В письме от 2 октября 2018 13:46:13 пользователь Michael Paquier написал:
> On Fri, Sep 14, 2018 at 09:30:25PM +0300, Nikolay Shaplov wrote:
> > BTW this commit shows why do this patch is important: 857f9c36 adds new
> > option for b-tree indexes. But thanks to the StdRdOptions this option
> > will exist for no practical use in all heaps that has just any option set
> > to non-default value, and in indexes that use StdRdOptions (and also has
> > any option set) And there will be more. StdRdOptions is long outdated
> > solution and it needs to be replaced.
>
> This patch does not apply anymore, so this is moved to next CF, waiting
> for author.
Oup...It seems to me that I've fogot to rebase it when it is needed...
Did it now



--
Do code for fun.diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index db84da0..e7e2392 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -22,7 +22,7 @@
 #include "access/htup_details.h"
 #include "access/nbtree.h"
 #include "access/reloptions.h"
-#include "access/spgist.h"
+#include "access/spgist_private.h"
 #include "access/tuptoaster.h"
 #include "catalog/pg_type.h"
 #include "commands/defrem.h"
@@ -46,9 +46,8 @@
  * upper and lower bounds (if applicable); for strings, consider a validation
  * routine.
  * (ii) add a record below (or use add__reloption).
- * (iii) add it to the appropriate options struct (perhaps StdRdOptions)
- * (iv) add it to the appropriate handling routine (perhaps
- * default_reloptions)
+ * (iii) add it to the appropriate options struct
+ * (iv) add it to the appropriate handling routine
  * (v) make sure the lock level is set correctly for that operation
  * (vi) don't forget to document the option
  *
@@ -1004,7 +1003,7 @@ extractRelOptions(HeapTuple tuple, TupleDesc tupdesc,
 		case RELKIND_TOASTVALUE:
 		case RELKIND_MATVIEW:
 		case RELKIND_PARTITIONED_TABLE:
-			options = heap_reloptions(classForm->relkind, datum, false);
+			options = relation_reloptions(classForm->relkind, datum, false);
 			break;
 		case RELKIND_VIEW:
 			options = view_reloptions(datum, false);
@@ -1337,63 +1336,133 @@ fillRelOptions(void *rdopts, Size basesize,


 /*
- * Option parser for anything that uses StdRdOptions.
+ * Option parsing definition for autovacuum. Used in toast and heap options.
+ */
+
+#define AUTOVACUUM_RELOPTIONS(OFFSET)\
+		{"autovacuum_enabled", RELOPT_TYPE_BOOL, \
+		OFFSET + offsetof(AutoVacOpts, enabled)},\
+		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_threshold)},   \
+		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, analyze_threshold)},  \
+		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_delay)},  \
+		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,\
+		OFFSET + offsetof(AutoVacOpts, vacuum_cost_limit)},  \
+		{"autovacuum_freeze_min_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_min_age)}, \
+		{"autovacuum_freeze_max_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, freeze_max_age)}, \
+		{"autovacuum_freeze_table_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, freeze_table_age)},   \
+		{"autovacuum_multixact_freeze_min_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_min_age)},   \
+		{"autovacuum_multixact_freeze_max_age", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_max_age)},   \
+		{"autovacuum_multixact_freeze_table_age", RELOPT_TYPE_INT,   \
+		OFFSET + offsetof(AutoVacOpts, multixact_freeze_table_age)}, \
+		{"log_autovacuum_min_duration", RELOPT_TYPE_INT, \
+		OFFSET + offsetof(AutoVacOpts, log_min_duration)},   \
+		{"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL, \
+		OFFSET + offsetof(AutoVacOpts, vacuum_scale_factor)},\
+		{"autovacuum_analyze_scale_factor", RELOPT_TYPE_REAL,\
+		OFFSET + offsetof(AutoVacOpts, analyze_scale_factor)}
+
+/*
+ * Option parser for heap
  */
 bytea *
-default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
+heap_reloptions(Datum reloptions, bool validate)
 {
 	relopt_value *options;
-	StdRdOptions *rdopts;
+	HeapRelOptions *rdopts;
 	int			numoptions;
 	static const relopt_parse_elt tab[] = {
-		{"fillfactor", RELOPT_TYPE_INT, offsetof(StdRdOptions, fillfactor)},
-		{"autovacuum_enabled", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, enabled)},
-		{"autovacuum_vacuum_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_threshold)},
-		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
-		offsetof(StdRdOptions, autovac

Re: Sequential UUID Generation

2018-11-19 Thread Uday Bhaskar V
I tried below function as which can be used as default to column. But every
time we need to created 2 sequences, 1st one takes care of the first 8
bytes and 2nd takes care of the 2nd part of the UUID. I have not tested
index and space utilization. I have to examine this. This might not be
completely unique in the nature. but still trying for the best.


CREATE OR REPLACE FUNCTION public.fnu_generate_sequential_uuid(
sequence1 text,
sequence2 text)
RETURNS uuid
LANGUAGE 'plpgsql'
COST 100.0
VOLATILE NOT LEAKPROOF
AS $function$

DECLARE
  sequenceUUIDPart1 text;
  randomUUIDPart2 text;
  counter integer:=0;
  significantByte integer:=0;
  startIndex integer:=0;
  endIndex integer:=0;
BEGIN

-- Get random UUID
randomUUIDPart2 := replace(( uuid_generate_v4 () :: text),'-','');

-- verify first sequence reached max count.
IF to_hex(currval(sequence1)) :: text = '7fff' THEN
 startIndex:=0;
 endIndex:=7;
-- convert sequence into 32 bit string
sequenceUUIDPart1 = rpad(to_hex(nextval(sequence2))::text, 32, '0');
ELSE
  startIndex:=8;
 endIndex:=15;
-- convert sequence into 32 bit string
sequenceUUIDPart1 = rpad(to_hex(nextval(sequence1))::text, 32, '0');
END IF;

RAISE NOTICE 'current Guid: %', sequenceUUIDPart1;

-- loop through the 8th byte to 16th byte, till first sequence max .
-- loop through the 0 to 7 the byte till second sequence end.
FOR counter IN startIndex..endIndex LOOP

 select get_byte(decode(sequenceUUIDPart1::text,'hex'), counter) into
significantByte;

 -- fill last 8 bytes with the generated random UUID values.
 sequenceUUIDPart1 := encode(set_byte(decode(sequenceUUIDPart1
::text,'hex') :: bytea ,counter, significantByte ) :: bytea, 'hex') :: text;
 RAISE NOTICE 'current Guid: %', sequenceUUIDPart1;
END LOOP;

   return sequenceUUIDPart1 :: UUID;
  EXCEPTION
  WHEN OTHERS
THEN
  RAISE EXCEPTION 'An error was encountered in
create_engagement_data_get_aud_area_ent_list - % -ERROR- %', sqlstate,
sqlerrm;
END

$function$;

On Wed, Oct 31, 2018 at 1:51 AM Sehrope Sarkuni  wrote:

> I came across a project for time based UUID ("tuid") a little while back:
> https://github.com/tanglebones/pg_tuid
>
> I haven't used in production but skimmed through the code a bit out of
> technical curiosity. It handles some of the expected edge cases for
> backwards clock drift and concurrent generation.
>
> The repo includes a PG extension and sample app code for generating tuids
> in a couple languages as well as a pure-SQL one (though that one uses
> random() rather than get_random_bytes() so I'd consider it more of an proof
> of concept).
>
> Regards,
> -- Sehrope Sarkuni
> Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
>
>


Re: New GUC to sample log queries

2018-11-19 Thread Tomas Vondra




On 11/19/18 2:57 AM, Michael Paquier wrote:

On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote:

Since it's hard to come up with a concise name that will mention sampling rate
in the context of min_duration_statement, I think it's fine to name this
configuration "log_sample_rate", as long as it's dependency from
log_min_duration_statements is clearly explained in the documentation.


log_sample_rate looks fine to me as a name.


That seems far too short to me - the name should indicate it applies to 
statement logging. I'd say log_statement_sample_rate is better.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New GUC to sample log queries

2018-11-19 Thread Tomas Vondra




On 11/18/18 10:52 AM, Adrien Nayrat wrote:

...
Alors, I wonder if we should use the same logic for other parameters, such as
log_statement_stats
log_parser_stats
log_planner_stats
log_executor_stats


> It was mentioned in this thread
> 
https://www.postgresql.org/message-id/20180710183828.GB3890%40telsasoft.com



You mean apply sampling to those logging options too? I doubt that would 
be very useful, as the those options are IMHO meant for development. 
Maybe tracking some of the info would be useful, but integrating it into 
pg_stat_statements seems like a better solution than just dumping it 
into the server log.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: New GUC to sample log queries

2018-11-19 Thread Dmitry Dolgov
> On Mon, Nov 19, 2018 at 2:40 PM Tomas Vondra  
> wrote:
>
> On 11/19/18 2:57 AM, Michael Paquier wrote:
> > On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote:
> >> Since it's hard to come up with a concise name that will mention sampling 
> >> rate
> >> in the context of min_duration_statement, I think it's fine to name this
> >> configuration "log_sample_rate", as long as it's dependency from
> >> log_min_duration_statements is clearly explained in the documentation.
> >
> > log_sample_rate looks fine to me as a name.
>
> That seems far too short to me - the name should indicate it applies to
> statement logging. I'd say log_statement_sample_rate is better.

I agree, sounds reasonable.



Re: heap_sync seems rather oblivious to partitioned tables (wal_level=minimal)

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-19, Tomas Vondra wrote:

> On 11/17/18 4:53 PM, Alvaro Herrera wrote:
> > Here are versions for branches 10 and 11.  The main change is that the
> > COPY test didn't have the partitioned table, because it was recently
> > introduced (0d5f05cde011) so I backpatched that part also.  It's a bit
> > useless, but I'd rather backpatch the same thing rather than have
> > different lines there ...
> 
> The patch seems fine to me.

Pushed now, thanks.

> It's a bit unfortunate that we simply disable the optimization on
> partitioned tables instead of fixing it somehow, but I guess it's better
> than nothing and no one has a very good idea how to make it work.

Yeah, I think leaving it in the current state is worse than disabling
it, since an error is thrown anyway.  In any case, I guess when we have
a patch it won't be backpatchable.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



[no subject]

2018-11-19 Thread Adelino Silva
Hi,

A system monitoring tool like "nc -vz localhost 5432" to monitor the
database server is causing the warning, any way way to filter or ignore
messages in postgresql log file

LOG:  incomplete startup packet

Many thanks in advance,
Adelino.


Add client connection check during the execution of the query

2018-11-19 Thread s . cherkashin
This patch adds verification of the connection with the client during 
the execution of the SQL query. The feature enables using the GUC 
variable ‘client_connection_check_interval’. The default check interval 
is 1 second. If you set the value of ‘client_connection_check_interval’ 
to 0, then the check will not be performed.
The feature will be useful in cases when, during the execution of a very 
long query, the client suddenly terminates the connection - this will 
allow backend to cancel further execution of the query and free server 
resources.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c4effa034c..83c662de0f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7677,6 +7677,27 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
   
  
 
+ 
+  client_connection_check_interval (integer)
+  
+   client_connection_check_interval configuration parameter
+  
+  
+  
+   
+This parameter sets a time interval (in milliseconds) between periodic 
+verification of the connection with the client during the execution
+of the query. In case when the client aborts the connection,
+the execution of the query will be terminated.
+   
+   
+If value is -1, then this option is disabled, and the backend will
+detect client disconnection only when trying to send him a response
+to the query. Zero selects a suitable default value (1 second).
+   
+  
+ 
+
  
 

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 0c9593d4cc..dd1917efe1 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -120,6 +120,7 @@
  */
 int			Unix_socket_permissions;
 char	   *Unix_socket_group;
+int 		client_connection_check_interval;
 
 /* Where the Unix socket files are (list of palloc'd strings) */
 static List *sock_paths = NIL;
@@ -1926,3 +1927,25 @@ pq_setkeepalivescount(int count, Port *port)
 
 	return STATUS_OK;
 }
+
+bool pq_is_client_connected(void)
+{
+	CheckClientConnectionPending = false;
+	if (IsUnderPostmaster &&
+		MyProcPort != NULL && !PqCommReadingMsg && !PqCommBusy)
+	{
+		char nextbyte;
+		int r;
+
+		r = recv(MyProcPort->sock, &nextbyte, 1, MSG_PEEK | MSG_DONTWAIT);
+
+		if (r == 0 || (r == -1 &&
+			errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR))
+		{
+			ClientConnectionLost = true;
+			InterruptPending = true;
+			return false;
+		}
+	}
+	return true;
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a3b9757565..2243b672ef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3017,6 +3017,13 @@ ProcessInterrupts(void)
 	(errcode(ERRCODE_ADMIN_SHUTDOWN),
 	 errmsg("terminating connection due to administrator command")));
 	}
+	if (client_connection_check_interval > 0 && CheckClientConnectionPending)
+	{
+		if (pq_is_client_connected())
+			enable_timeout_after(SKIP_CLIENT_CHECK_TIMEOUT,
+ client_connection_check_interval);
+
+	}
 	if (ClientConnectionLost)
 	{
 		QueryCancelPending = false; /* lost connection trumps QueryCancel */
@@ -4195,6 +4202,9 @@ PostgresMain(int argc, char *argv[],
 		 */
 		CHECK_FOR_INTERRUPTS();
 		DoingCommandRead = false;
+		if (client_connection_check_interval)
+			enable_timeout_after(SKIP_CLIENT_CHECK_TIMEOUT,
+ client_connection_check_interval);
 
 		/*
 		 * (5) turn off the idle-in-transaction timeout
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c6939779b9..96d44a15e8 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -30,6 +30,7 @@ ProtocolVersion FrontendProtocol;
 volatile sig_atomic_t InterruptPending = false;
 volatile sig_atomic_t QueryCancelPending = false;
 volatile sig_atomic_t ProcDiePending = false;
+volatile sig_atomic_t CheckClientConnectionPending = false;
 volatile sig_atomic_t ClientConnectionLost = false;
 volatile sig_atomic_t IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 4f1d2a0d28..1dfdfe8b12 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -33,6 +33,7 @@
 #include "catalog/pg_db_role_setting.h"
 #include "catalog/pg_tablespace.h"
 #include "libpq/auth.h"
+#include "libpq/libpq.h"
 #include "libpq/libpq-be.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -72,6 +73,7 @@ static void ShutdownPostgres(int code, Datum arg);
 static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
 static void IdleInTransactionSessionTimeoutHandler(void);
+static void ClientCheckTimeoutHandler(void);
 static bool ThereIsAtLeastOneRole(void);
 static void process_startup_options(Port *port, bool am_superuser);
 static void process_settings(Oid databaseid, Oid roleid);
@@ -628

Re: your mail

2018-11-19 Thread Stephen Frost
Greetings,

* Adelino Silva (adelino.j.si...@googlemail.com) wrote:
> A system monitoring tool like "nc -vz localhost 5432" to monitor the
> database server is causing the warning, any way way to filter or ignore
> messages in postgresql log file
> 
> LOG:  incomplete startup packet

Not today.  You'd be *much* better off with a monitor that actually logs
in and does something like 'SELECT 1;' or similar though.

Testing that you can connect to the TCP port is actually pretty useless
as far as checking to see if the service is functioning properly.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: fix psql \conninfo & \connect when using hostaddr

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-17, Fabien COELHO wrote:

> Here is the updated v2
> 
>  - libpq internal function getHostaddr get a length,
>and I added an assert about it.
>  - added a few braces on if/if/else/if/else/else
>  - added an UNKNOWN_HOST macro to hide "???"
>  - moved host_addr[] declaration earlier to avoid some braces

You forgot to free(conn->connip) during freePGconn().

I found the UNKNOWN_HOST business quite dubious -- not only in your
patch but also in the existing coding.  I changed the getHostname API so
that instead of returning "???" it sets the output buffer to the empty
string.  AFAICS the only user-visible behavior is that we no longer
display the "???" in a parenthical comment next to the server address
when a connection fails (this is pre-existing behavior, not changed by
your patch.)

Now, maybe the thinking was that upon seeing this message:

could not connect to server: some error here
Is the server running on host "pg.mines-paristech.fr" (???) and accepting
connections on port 5432?

the user was going to think "oh, my machine must have run out of memory,
I'll reboot and retry".  However, I highly doubt that anybody would
reach that conclusion without reading the source code.  So I deem this
useless.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 342559f657f0c2d7cdf20c29536652a9ceca1d2a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 19 Nov 2018 12:23:15 -0300
Subject: [PATCH] psql: Show IP address in \conninfo, if different from host

When hostaddr is given, the actual IP address that psql is connected to
can be totally unexpected for the given host.  The more verbose output
we now generate makes things clearer.  Since the "host" and "hostaddr"
parts of the conninfo could come from wildly different sources (say one
is in the service specification or a URI-style conninfo and the other is
not), this is useful.  This is also useful if a hostname resolves to
multiple addresses.

Author: Fabien Coelho
Reviewed-by: Pavel Stehule, Arthur Zakirov
Discussion: https://postgr.es/m/alpine.DEB.2.21.1810261532380.27686@lancre
	https://postgr.es/m/alpine.DEB.2.21.1808201323020.13832@lancre
---
 doc/src/sgml/libpq.sgml   | 30 
 src/bin/psql/command.c| 82 +---
 src/interfaces/libpq/exports.txt  |  1 +
 src/interfaces/libpq/fe-connect.c | 98 +--
 src/interfaces/libpq/libpq-fe.h   |  1 +
 src/interfaces/libpq/libpq-int.h  |  1 +
 6 files changed, 172 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 601091c570..d2e5b08541 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1735,6 +1735,36 @@ char *PQhost(const PGconn *conn);
  
 
 
+
+
+ 
+  PQhostaddr
+  
+   PQhostaddr
+  
+ 
+
+ 
+  
+   Returns the server IP address of the active connection.
+   This can be the address that a host name resolved to,
+   or an IP address provided through the hostaddr
+   parameter.
+
+char *PQhostaddr(const PGconn *conn);
+
+  
+
+  
+   PQhostaddr returns NULL if the
+   conn argument is NULL.
+   Otherwise, if there is an error producing the host information
+   (perhaps if the connection has not been fully established or
+   there was an error), it returns an empty string.
+  
+ 
+
+
 
  
   PQport
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 04e227b5a6..ee88e1ca5c 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -596,14 +596,30 @@ exec_command_conninfo(PsqlScanState scan_state, bool active_branch)
 		else
 		{
 			char	   *host = PQhost(pset.db);
+			char	   *hostaddr = PQhostaddr(pset.db);
 
-			/* If the host is an absolute path, the connection is via socket */
+			/*
+			 * If the host is an absolute path, the connection is via socket
+			 * unless overriden by hostaddr
+			 */
 			if (is_absolute_path(host))
-printf(_("You are connected to database \"%s\" as user \"%s\" via socket in \"%s\" at port \"%s\".\n"),
-	   db, PQuser(pset.db), host, PQport(pset.db));
+			{
+if (hostaddr && *hostaddr)
+	printf(_("You are connected to database \"%s\" as user \"%s\" on address \"%s\" at port \"%s\".\n"),
+		   db, PQuser(pset.db), hostaddr, PQport(pset.db));
+else
+	printf(_("You are connected to database \"%s\" as user \"%s\" via socket in \"%s\" at port \"%s\".\n"),
+		   db, PQuser(pset.db), host, PQport(pset.db));
+			}
 			else
-printf(_("You are connected to database \"%s\" as user \"%s\" on host \"%s\" at port \"%s\".\n"),
-	   db, PQuser(pset.db), host, PQport(pset.db));
+			{
+if (hostaddr && *hostaddr && strcmp(host, hostaddr) != 0)
+	printf(_("You are connected to database \"%s\" as user \"%s\" on host \"%s\" (address \"%s\") at port \"%s\".\n"),
+			

Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

2018-11-19 Thread Dmitry Dolgov
> On Tue, Jul 24, 2018 at 5:52 PM Dave Cramer  wrote:
>
> Back in 2016 a patch was proposed that seems to have died on the vine. See 
> https://www.postgresql.org/message-id/flat/cafgjrd3hdyoa33m69tbeofnner2bzbwa8ffjt2v5vfztbvu...@mail.gmail.com
> for the history and https://commitfest.postgresql.org/10/621/ for the 
> commitfest entry.
> I've rebased the patches and attached them for consideration.
> JDBC tests here 
> https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/test/java/org/postgresql/replication/LogicalReplicationTest.java
>  all pass

Unfortunately the second patch from the series can't be applied anymore, could
you please rebase it one more time? Other than that it looks strange for me
that the corresponding discussion stopped when it was quite close to be in a
good shape, bouncing from "rejected with feedback" to "needs review". Can
someone from involved people clarify what's the current status of this patch?



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-19, Marina Polyakova wrote:

> On 2018-11-16 22:59, Alvaro Herrera wrote:
> > On 2018-Sep-05, Marina Polyakova wrote:
> > 
> > > v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch
> > > - a patch for the RandomState structure (this is used to reset a
> > > client's
> > > random seed during the repeating of transactions after
> > > serialization/deadlock failures).
> > 
> > Pushed this one with minor stylistic changes (the most notable of which
> > is the move of initRandomState to where the rest of the random generator
> > infrastructure is, instead of in a totally random place).  Thanks,
> 
> Thank you very much! I'm going to send a new patch set until the end of this
> week (I'm sorry I was very busy in the release of Postgres Pro 11...).

Great, thanks.

I also think that the pgbench_error() patch should go in before the main
one.  It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re:

2018-11-19 Thread bricklen
On Mon, Nov 19, 2018 at 6:54 AM Adelino Silva <
adelino.j.si...@googlemail.com> wrote:

> A system monitoring tool like "nc -vz localhost 5432" to monitor the
> database server is causing the warning, any way way to filter or ignore
> messages in postgresql log file
> LOG:  incomplete startup packet
>

No way to suppress that message that I'm aware of, but you might find the
pg_isready tool handy for checking the status of the Postgres server.
https://www.postgresql.org/docs/current/app-pg-isready.html


Time to backpatch config/ax_pthread.m4 changes?

2018-11-19 Thread Tom Lane
I noticed that buildfarm member quokka (RHEL 7 / ppc64) has started
failing configure in the 9.4 and 9.5 branches.  This appears to be
because configure is not detecting the need to use "-pthread" to get
thread-related functions.  We didn't change anything in that area
ourselves, so I'm guessing that this is a result of a platform update,
which more than likely is going to start affecting other users before
long, so we'd better fix it.

It looks to me like the reason it's okay in 9.6 and up is
commit e97af6c8b ("Replace our hacked version of ax_pthread.m4 with
latest upstream version") plus a few followup fixes.  I propose to
back-patch those into 9.5 and 9.4.  Heikki had attempted to back-patch
into 9.5 originally, but gave up when the followup fixes seemed to
be getting out of hand --- but there weren't really that many.
Anyway, now that that code has been stable for several years, I see
little reason not to back-patch it.

regards, tom lane



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2018-11-19 Thread Robert Haas
On Sun, Nov 18, 2018 at 9:43 PM Amit Langote
 wrote:
> On 2018/11/17 9:06, Michael Paquier wrote:
> > On Fri, Nov 16, 2018 at 09:38:40AM -0500, Robert Haas wrote:
> >> OK, but it seems to me that your version of my patch rearranges the
> >> code more than necessary.
> >>
> >> How about the attached?
> >
> > What you are proposing here looks good to me.  Thanks!
>
> Me too, now that I see the patch closely.  The errors I'd seen in the
> regression tests were due to uninitialized oids variable which is fixed in
> the later patches, not due to "confused memory context switching" as I'd
> put it [1] and made that the reason for additional changes.

OK.  Rebased again, and committed (although I forgot to include a link
to this discussion - sorry about that).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: fix psql \conninfo & \connect when using hostaddr

2018-11-19 Thread Alvaro Herrera
Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Time to backpatch config/ax_pthread.m4 changes?

2018-11-19 Thread Heikki Linnakangas

On 19/11/2018 19:12, Tom Lane wrote:

I noticed that buildfarm member quokka (RHEL 7 / ppc64) has started
failing configure in the 9.4 and 9.5 branches.  This appears to be
because configure is not detecting the need to use "-pthread" to get
thread-related functions.  We didn't change anything in that area
ourselves, so I'm guessing that this is a result of a platform update,
which more than likely is going to start affecting other users before
long, so we'd better fix it.

It looks to me like the reason it's okay in 9.6 and up is
commit e97af6c8b ("Replace our hacked version of ax_pthread.m4 with
latest upstream version") plus a few followup fixes.  I propose to
back-patch those into 9.5 and 9.4.  Heikki had attempted to back-patch
into 9.5 originally, but gave up when the followup fixes seemed to
be getting out of hand --- but there weren't really that many.
Anyway, now that that code has been stable for several years, I see
little reason not to back-patch it.


Makes sense.

It would be nice to pick the latest version from the authoritative 
source (https://www.gnu.org/software/autoconf-archive/ax_pthread.html), 
while we're at it. I picked a draft version back then, before it was 
committed to the main branch. Although, if we do that, then the argument 
that the code's been stable for several years doesn't apply anymore...


- Heikki



Re: Time to backpatch config/ax_pthread.m4 changes?

2018-11-19 Thread Tom Lane
Heikki Linnakangas  writes:
> On 19/11/2018 19:12, Tom Lane wrote:
>> It looks to me like the reason it's okay in 9.6 and up is
>> commit e97af6c8b ("Replace our hacked version of ax_pthread.m4 with
>> latest upstream version") plus a few followup fixes.  I propose to
>> back-patch those into 9.5 and 9.4.  Heikki had attempted to back-patch
>> into 9.5 originally, but gave up when the followup fixes seemed to
>> be getting out of hand --- but there weren't really that many.
>> Anyway, now that that code has been stable for several years, I see
>> little reason not to back-patch it.

> Makes sense.

> It would be nice to pick the latest version from the authoritative 
> source (https://www.gnu.org/software/autoconf-archive/ax_pthread.html), 
> while we're at it. I picked a draft version back then, before it was 
> committed to the main branch. Although, if we do that, then the argument 
> that the code's been stable for several years doesn't apply anymore...

Hm.  I'll check that, but I'm inclined to just make any change like that
in HEAD, since we aren't aware of live bugs it would fix.

regards, tom lane



Regarding performance regression on specific query

2018-11-19 Thread Jung, Jinho

Hello,

I am Jinho Jung, PhD student at Georgia Tech and I am conducting DB performance 
research. I am sending our evaluation result to get the feedback regarding our 
findings.

We found several performance regression queries. Attached files are three of 
them that we confirmed performance regression (in terms of query execution 
time) between v10.6 and v9.4.20. Hope you can test these queries and give us 
feedback. For your information, we are also sending the execution time measured 
on our machine.


Execution time
=
1.sql
10.6  : 469 ms
9.4.20: 10 ms

4.sql
10.6  : 34019 ms
9.4.20: 0.4 ms

20.sql
10.6  : 2791 ms
9.4.20: 61 ms


Evaluation setup
==
1) compile 9.4.20 and 10.6 from released source code 
(https://www.postgresql.org/ftp/source/)
2) without importing additional data, just run the attached queries

We appreciate you taking time for test!

Best regards,
Jinho Jung


1.sql
Description: 1.sql


4.sql
Description: 4.sql


20.sql
Description: 20.sql


Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-19 Thread Michael Banck
Hi,

On Sun, Oct 14, 2018 at 10:59:02AM +0900, Michael Paquier wrote:
> On Sat, Oct 13, 2018 at 05:53:00PM -0400, Andrew Dunstan wrote:
> > It occurred to me that a pretty simple fix could just be to blacklist
> > everything that didn't start with a digit. The whitelist approach is
> > probably preferable... depends how urgent we see this as.
> 
> Yeah, possibly, still that's not a correct long-term fix.  So attached
> is what I think we should do for HEAD and REL_11_STABLE with an approach
> using a whitelist.  I have added positive and negative tests on top of
> the existing TAP tests, as suggested by Michael B, and I made the code
> use relpath.h to make sure that we don't miss any fork types.

I was looking at extending the testsuite for the online checksum 
verification feature as requested by Fabien, and it seems some of the
additional tests are not working as intended:

> diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl 
> b/src/bin/pg_verify_checksums/t/002_actions.pl
> index dc29da09af..33408ced89 100644
> --- a/src/bin/pg_verify_checksums/t/002_actions.pl
> +++ b/src/bin/pg_verify_checksums/t/002_actions.pl

[...]

> +append_to_file "$pgdata/global/9_vm.123", "";
> +
>  # Checksums pass on a newly-created cluster
>  command_ok(['pg_verify_checksums',  '-D', $pgdata],
>  "succeeds with offline cluster");
> @@ -67,3 +89,30 @@ $node->command_checks_all([ 'pg_verify_checksums', '-D', 
> $pgdata, '-r',
> [qr/Bad checksums:.*1/],
> [qr/checksum verification 
> failed/],
> 'fails for corrupted data on 
> single relfilenode');
> +
> +# Utility routine to check that pg_verify_checksums is correctly
> +# able to detect correctly-shaped relation files with some data.
> +sub fail_corrupt
> +{
> + my $node = shift;
> + my $file = shift;
> + my $pgdata = $node->data_dir;
> +
> + # Create the file with some data in it.
> + append_to_file "$pgdata/global/$file", "foo";
> +
> + $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
> +   1,
> +   [qr/^$/],
> +   [qr/could not read block/],
> +   "fails for corrupted data in 
> $file");
> +}
> +
> +fail_corrupt($node, "9");
> +fail_corrupt($node, "9.123");
> +fail_corrupt($node, "9_fsm");
> +fail_corrupt($node, "9_init");
> +fail_corrupt($node, "9_vm");
> +fail_corrupt($node, "9_init.123");
> +fail_corrupt($node, "9_fsm.123");
> +fail_corrupt($node, "9_vm.123");

First off, I think those fail_corrupt files should have different
filenames than the empty ones above (I left `9_vm.123' as an
example of a duplicated file).

Second, and this is more severe, the subroutine never removes those
files so in practise, we are checking the first (or possibly some randon
other) file instead of the one we think we are looking at (or at least
what I think the intention is). This can be shown if you expand the
regexp from `qr/could not read block/' to `qr/could not read block 0 in
file.*$file\":/':

|t/002_actions.pl .. 6/38 
|#   Failed test 'fails for corrupted data in 9.123 stderr /(?^:could
|#   not read block 0 in file.*9.123\":)/'
|#   at t/002_actions.pl line 109.
|#   'pg_verify_checksums: could not read block 0 in file
|#   
"[...]src/bin/pg_verify_checksums/tmp_check/t_002_actions_node_checksum_data/pgdata/global/9":
|#   read 3 of 8192
|# '
|# doesn't match '(?^:could not read block 0 in file.*9.123\":)'

So either we remove or truncate the files again after
$node->command_checks_all(), or the tests use the relfilenode feature.

However, in the latter case we would have to use a different main
relfilenode ID for each subtest, as otherwise pg_verify_checksums would
again fail on the first file it scans.

So I am proposing something like the attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 91adc7fef1..96fbd96e4f 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -100,16 +100,17 @@ $node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
 sub fail_corrupt
 {
 	my $node = shift;
+	my $relfilenode = shift;
 	my $file = shift;
 	my 

plpgsql plugin - stmt_beg/end is not called for top level block of statements

2018-11-19 Thread Pavel Stehule
Hi

I am playing with plpgsql profiling and and plpgsql plugin API. I found so
callback stmt_beg and stmt_end was not called for top statement due direct
call exec_stmt_block function.

<-->estate.err_text = NULL;
<-->estate.err_stmt = (PLpgSQL_stmt *) (func->action);
<-->rc = exec_stmt_block(&estate, func->action);
<-->if (rc != PLPGSQL_RC_RETURN)
<-->{
<--><-->estate.err_stmt = NULL;
<--><-->estate.err_text = NULL;

Isn't better to call exec_stmt there? Then plpgsql plugin function will be
called really for every plpgsql statement.

Regards

Pavel


Re: [HACKERS] generated columns

2018-11-19 Thread Alvaro Herrera
Disclaimer:  I had never seen this patch before.  I did not participate
in this feature design.  I did not discuss this review with the author
or anybody in 2ndQuadrant.  I do not have any particular affective bonds
with its author.  I did not receive payment nor goods in exchange for
this review.  I encourage others to review this patch, and all other
patches in the current commitfest and all future commitfests.


Now that the TupleTableSlot work has landed, the API of
ExecComputeStoredGenerated is clearly inadequate.  This should be
adjusted to work with the slot only, and not generate a heap tuple at
all -- if the calling code needs the heap tuple, have that code generate
that from the slot.  (Example problem: ExecConstraints runs using the
slot, not the heap tuple.)

The pg_dump tests verify a virtual generated column, but not a virtual
stored column.  It'd be good to have one for the latter.  The tables in
new test "generated" appear not to be dropped, which is good to test
pg_upgrade as well as pg_dump; I'd add a comment that this is on
purpose, lest someone else add DROP lines there later.  I think some
tests for logical replication would be good as well.

It's unclear why you made generated columns on partitions unsupported.
I'd fix the limitation if possible, but if not, at least document it.
(I particularly notice that partition key is marked as unsupported in
the regression test.  Consider partitioning on a SERIAL column, this is
clearly something that users will expect to work.)

About your catalog representation question:

On 2018-Oct-30, Peter Eisentraut wrote:

> 1. (current implementation) New column attgenerated contains 's' for
> STORED, 'v' for VIRTUAL, '\0' for nothing.  atthasdef means "there is
> something in pg_attrdef for this column".  So a generated column would
> have atthasdef = true, and attgenerated = s/v.  A traditional default
> would have atthasdef = true and attgenerated = '\0'.  The advantage is
> that this is easiest to implement and the internal representation is the
> most useful and straightforward.  The disadvantage is that old client
> code that wants to detect whether a column has a default would need to
> be changed (otherwise it would interpret a generated column as having a
> default value instead).

I think this is a reasonable implementation.  Client code that is
confused by this will obviously have to be updated, but if it isn't, the
bug is not serious.  I would clearly not go to the extreme trouble that
#2 poses just to avoid this problem.

That said, I think your "radical alternative" #3 is better.  Maybe we
want to avoid multiple compatibility breaks, so we'd go with 3 for pg12
instead of doing 1 now and changing it again later.

Like Robert, I would use something other than '\0' anyhow.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: fix psql \conninfo & \connect when using hostaddr

2018-11-19 Thread Fabien COELHO



Hello Alvaro,


 - libpq internal function getHostaddr get a length,
   and I added an assert about it.
 - added a few braces on if/if/else/if/else/else
 - added an UNKNOWN_HOST macro to hide "???"
 - moved host_addr[] declaration earlier to avoid some braces


You forgot to free(conn->connip) during freePGconn().


Argh, indeed:-(


I found the UNKNOWN_HOST business quite dubious -- not only in your
patch but also in the existing coding.


So did I:-) I only kept it because it was already done like that.

I changed the getHostname API so that instead of returning "???" it sets 
the output buffer to the empty string.


I hesitated to do exactly that. I'm fine with that.

Thanks for the push.

--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Fabien COELHO



Hello Alvaro,


I also think that the pgbench_error() patch should go in before the main
one.  It seems a bit pointless to introduce code using a bad API only to
fix the API together with all the new callers immediately afterwards.


I'm not that keen on this part of the patch, because ISTM that introduces 
significant and possibly costly malloc/free cycles when handling error, 
which do not currently exist in pgbench.


Previously an error was basically the end of the script, but with the 
feature being introduced by Marina some errors are handled, in which case 
we end up with paying these costs in the test loop. Also, refactoring 
error handling is not necessary for the new feature. That is why I advised 
to move it away and possibly keep it for later.


Related to Marina patch (triggered by reviewing the patches), I have 
submitted a refactoring patch which aims at cleaning up the internal state 
machine, so that additions and checking that all is well is simpler.


https://commitfest.postgresql.org/20/1754/

It has been reviewed, I think I answered to the reviewer concerns, but the 
reviewer did not update the patch state on the cf app, so I do not know 
whether he is unsatisfied or if it was just forgotten.


--
Fabien.



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-19, Fabien COELHO wrote:

> 
> Hello Alvaro,
> 
> > I also think that the pgbench_error() patch should go in before the main
> > one.  It seems a bit pointless to introduce code using a bad API only to
> > fix the API together with all the new callers immediately afterwards.
> 
> I'm not that keen on this part of the patch, because ISTM that introduces
> significant and possibly costly malloc/free cycles when handling error,
> which do not currently exist in pgbench.

Oh, I wasn't aware of that.

> Related to Marina patch (triggered by reviewing the patches), I have
> submitted a refactoring patch which aims at cleaning up the internal state
> machine, so that additions and checking that all is well is simpler.
> 
>   https://commitfest.postgresql.org/20/1754/

let me look at this one.

> It has been reviewed, I think I answered to the reviewer concerns, but the
> reviewer did not update the patch state on the cf app, so I do not know
> whether he is unsatisfied or if it was just forgotten.

Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2018-11-19 Thread Fabien COELHO




Feel free to update a patch status to "needs review" yourself after
submitting a new version that in your opinion respond to a reviewer's
comments.


Sure, I do that. But I will not switch any of my patch to "Ready". AFAICR 
the concerns where mostly about imprecise comments in the code, and a few 
questions that I answered.


--
Fabien.



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
Hi,

On 2018-11-01 22:52:19 +0100, Fabien COELHO wrote:
> 
> > > > Their commit r265375 fixed the ability to compile itself, but built
> > > > PostgreSQL binaries remain broken there and thereafter.
> > > > 
> > > > |...]
> > > 
> > > Thanks a lot for this investigation! I can fill in a gcc bug report. There
> > > would be a enormous work to narrow it down to a small test case, it is
> > > unclear how they can act about it, but at least they would know.
> > 
> > Have you done so? If so, what's the bug number?
> 
> Not yet. There was no answer to my suggestion, so I did not feel like that
> urgent. I'll try to do it over next week.

FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
just an accident and the optimization's introduced in the above revision
aren't still broken.

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly&br=HEAD

Greetings,

Andres Freund



Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-19 Thread Tom Lane
Etsuro Fujita  writes:
> I agree that it's better to keep the BeginCopyFrom API as-is.  Also, I 
> think your version would handle SIGPIPE in COPY FROM PROGRAM more 
> properly than what I proposed.  So, +1 from me.

Thanks for reviewing!  I've pushed it now, though at the last minute
I reconsidered resetting SIGUSR2 as my previous patch did.  The trouble
with resetting that is that it results in a small window where receipt
of SIGUSR2 would result in backend termination, which we surely don't
want.  Now, it doesn't look to me like the postmaster will ever send
SIGUSR2 to ordinary backends, but it wouldn't be terribly surprising
if someone makes a change that relies on the expectation of SIGUSR2
being SIG_IGN'd by backends.  I don't see any real upside to resetting
SIGUSR2 for the called program that would justify taking any risk
there.  (Note that this concern doesn't apply to SIGPIPE since we
only expect that to be raised synchronously, ie during a write.)

As Kyotaro-san said upthread, it's odd that exec() provides no
way to reset all the handlers to SIG_DFL on the child side.
But since it doesn't, we're not really able to do much more than
this.

regards, tom lane



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Tom Lane
Andres Freund  writes:
> FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> just an accident and the optimization's introduced in the above revision
> aren't still broken.
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly&br=HEAD

Yeah, I saw that moonjelly had gone green.  Are you going to update
serinus' compiler soon?

(And while I'm bugging you, would you fix your buildfarm menagerie
to build REL_11_STABLE?)

regards, tom lane



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
Hi,


On 2018-11-19 17:32:37 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> > just an accident and the optimization's introduced in the above revision
> > aren't still broken.
> > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly&br=HEAD
> 
> Yeah, I saw that moonjelly had gone green.  Are you going to update
> serinus' compiler soon?

I'm using debian's gcc-snapshot package, which updates a bit less
frequently than what Fabien does. It'll auto-update as soon as that has
happened.

What disconcerts me a bit is that afaict there wasn't an intentional fix
in the relevant area (disabling value range propagation fixes the
miscompilation), so I'm not quite sure.


> (And while I'm bugging you, would you fix your buildfarm menagerie
> to build REL_11_STABLE?)

Oh, hum. I thought I had. But I did it only for the LLVM enabled
animals. Results should be coming up soon.

Greetings,

Andres Freund



Re: pgbench - doCustom cleanup

2018-11-19 Thread Alvaro Herrera
On 2018-Nov-17, Fabien COELHO wrote:

> 
> > Attached is a v3, where I have updated inaccurate comments.
> 
> Attached v4 is a rebase after 409231919443984635b7ae9b7e2e261ab984eb1e

Attached v5.  I thought that separating the part that executes the
command was an obvious readability improvement.  Tests still pass,
though maybe I broke something inadvertently.  (I started by thinking
"does this block require a 'fall-through' comment?"  The 600 line
function is pretty hard to read with the multiple messy switches and
conditionals; split into 400 + 200 subroutine it's nicer to reason
about.)

Do we really use the word "move" to talk about state changes?  It sounds
very odd to me.  I would change that to "transition" -- would anybody
object to that?  (Not changed in v5.)

On INSTR_TIME_SET_CURRENT_LAZY(), you cannot just put an "if" inside a
macro -- consider this:
if (foo)
INSTR_TIME_SET_CURRENT_LAZY(bar);
else
something_else();
Which "if" is the else now attached to?  Now maybe the C standard has an
answer for that (I don't know what it is), but it's hard to read and
likely the compiler will complain anyway.  I wrapped it in "do { }
while(0)" as is customary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 73d3de0677..df14af5259 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -294,24 +294,32 @@ typedef enum
 {
 	/*
 	 * The client must first choose a script to execute.  Once chosen, it can
-	 * either be throttled (state CSTATE_START_THROTTLE under --rate) or start
-	 * right away (state CSTATE_START_TX).
+	 * either be throttled (state CSTATE_START_THROTTLE under --rate), start
+	 * right away (state CSTATE_START_TX) or not start at all if the timer was
+	 * exceeded (state CSTATE_FINISHED).
 	 */
 	CSTATE_CHOOSE_SCRIPT,
 
 	/*
 	 * In CSTATE_START_THROTTLE state, we calculate when to begin the next
 	 * transaction, and advance to CSTATE_THROTTLE.  CSTATE_THROTTLE state
-	 * sleeps until that moment.  (If throttling is not enabled, doCustom()
-	 * falls directly through from CSTATE_START_THROTTLE to CSTATE_START_TX.)
+	 * sleeps until that moment.
+	 *
+	 * It may also detect that the next transaction would start beyond the end
+	 * of run, and switch to CSTATE_FINISHED.
 	 */
 	CSTATE_START_THROTTLE,
 	CSTATE_THROTTLE,
 
 	/*
 	 * CSTATE_START_TX performs start-of-transaction processing.  Establishes
-	 * a new connection for the transaction, in --connect mode, and records
-	 * the transaction start time.
+	 * a new connection for the transaction in --connect mode, records
+	 * the transaction start time, and proceed to the first command.
+	 *
+	 * Note: once a script is started, it will either error or run till
+	 * its end, where it may be interrupted. It is not interrupted while
+	 * running, so pgbench --time is to be understood as tx are allowed to
+	 * start in that time, and will finish when their work is completed.
 	 */
 	CSTATE_START_TX,
 
@@ -324,9 +332,6 @@ typedef enum
 	 * and we enter the CSTATE_SLEEP state to wait for it to expire. Other
 	 * meta-commands are executed immediately.
 	 *
-	 * CSTATE_SKIP_COMMAND for conditional branches which are not executed,
-	 * quickly skip commands that do not need any evaluation.
-	 *
 	 * CSTATE_WAIT_RESULT waits until we get a result set back from the server
 	 * for the current command.
 	 *
@@ -334,19 +339,25 @@ typedef enum
 	 *
 	 * CSTATE_END_COMMAND records the end-of-command timestamp, increments the
 	 * command counter, and loops back to CSTATE_START_COMMAND state.
+	 *
+	 * CSTATE_SKIP_COMMAND is used by conditional branches which are not
+	 * executed. It quickly skip commands that do not need any evaluation.
+	 * This state can move forward several commands, till there is something
+	 * to do or the end of the script.
 	 */
 	CSTATE_START_COMMAND,
-	CSTATE_SKIP_COMMAND,
 	CSTATE_WAIT_RESULT,
 	CSTATE_SLEEP,
 	CSTATE_END_COMMAND,
+	CSTATE_SKIP_COMMAND,
 
 	/*
-	 * CSTATE_END_TX performs end-of-transaction processing.  Calculates
-	 * latency, and logs the transaction.  In --connect mode, closes the
-	 * current connection.  Chooses the next script to execute and starts over
-	 * in CSTATE_START_THROTTLE state, or enters CSTATE_FINISHED if we have no
-	 * more work to do.
+	 * CSTATE_END_TX performs end-of-transaction processing.  It calculates
+	 * latency, and logs the transaction.  In --connect mode, it closes the
+	 * current connection.
+	 *
+	 * Then either starts over in CSTATE_CHOOSE_SCRIPT, or enters CSTATE_FINISHED
+	 * if we have no more work to do.
 	 */
 	CSTATE_END_TX,
 
@@ -575,6 +586,7 @@ static void processXactStats(TState *thread, CState *st, instr_time *now,
 static void pgbench_error(const char *fmt,...) pg_attribute_printf(1, 2);
 static void addScript(ParsedScript script);
 static void *th

Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
On 2018-11-19 14:38:04 -0800, Andres Freund wrote:
> Hi,
> 
> 
> On 2018-11-19 17:32:37 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> > > just an accident and the optimization's introduced in the above revision
> > > aren't still broken.
> > > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly&br=HEAD
> > 
> > Yeah, I saw that moonjelly had gone green.  Are you going to update
> > serinus' compiler soon?
> 
> I'm using debian's gcc-snapshot package, which updates a bit less
> frequently than what Fabien does. It'll auto-update as soon as that has
> happened.
> 
> What disconcerts me a bit is that afaict there wasn't an intentional fix
> in the relevant area (disabling value range propagation fixes the
> miscompilation), so I'm not quite sure.

Ooops, somehow deleted too much when editing the sentence. I was trying
to say that I'm not sure whether this is actually fixed, or a bug was
just hidden.

The issue is "fixed" in :
https://github.com/gcc-mirror/gcc/commit/48625f587d786089ce1e245f192bcf706f8fc1fc
and had been "caused" (or surfaced) in:
https://github.com/gcc-mirror/gcc/commit/be44111ed7ea7611d324d4d570e2f2d380f784b4

Greetings,

Andres Freund



Re: pgsql: Add TAP tests for pg_verify_checksums

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 07:11:19PM +0100, Michael Banck wrote:
> First off, I think those fail_corrupt files should have different
> filenames than the empty ones above (I left `9_vm.123' as an
> example of a duplicated file).

A comment about that is a good idea.  So added.

> So either we remove or truncate the files again after
> $node->command_checks_all(), or the tests use the relfilenode
> feature.
>
> However, in the latter case we would have to use a different main
> relfilenode ID for each subtest, as otherwise pg_verify_checksums would
> again fail on the first file it scans.

The take here is to make sure that the correct file name is showing up
in the produced error message, and removing the files makes future test
more robust, so I have committed a version which removes the files
instead.
--
Michael


signature.asc
Description: PGP signature


Re: ToDo: show size of partitioned table

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 09:33:58AM +0100, Pavel Stehule wrote:
> po 19. 11. 2018 v 3:42 odesílatel Michael Paquier 
> napsal:
>> Sorry for degressing, but could you also update \di at the same time so
>> as it shows "partitioned index"?  listTables() should be switched to use
>> partitioned tables and partitioned indexes, and permissionsList() has a
>> reference to partitioned tables.  While on it, this gives the attached..
>>
> 
> It has sense

For the archive's sake, d56e0fde has been committed for this purpose.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 10:48:06AM +0500, Andrey Lepikhov wrote:
> According to my experience, I clarify some comments to avoid this mistakes
> in the future (see attachment).

No objections from here.

> - * The returned pointer (or *errormsg) points to an internal buffer that's
> - * valid until the next call to XLogReadRecord.
> + * The returned pointer (or *errormsg) points to an internal read-only buffer
> + * that's valid until the next call to XLogReadRecord.

Not sure that this bit adds much.

> - /* Buffer for current ReadRecord result (expandable) */
> + /*
> +  * Buffer for current ReadRecord result (expandable).
> +  * Used in the case, than current ReadRecord result don't fit on the
> +  * currently read WAL page.
> +  */

Yeah, this one is not entirely true now.  What about the following
instead:
-   /* Buffer for current ReadRecord result (expandable) */
+   /*
+* Buffer for current ReadRecord result (expandable), used when a record
+* crosses a page boundary.
+*/
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] XLogReadRecord returns pointer to currently read page

2018-11-19 Thread Andrey Lepikhov




On 20.11.2018 6:30, Michael Paquier wrote:

On Mon, Nov 19, 2018 at 10:48:06AM +0500, Andrey Lepikhov wrote:

According to my experience, I clarify some comments to avoid this mistakes
in the future (see attachment).


No objections from here.


- * The returned pointer (or *errormsg) points to an internal buffer that's
- * valid until the next call to XLogReadRecord.
+ * The returned pointer (or *errormsg) points to an internal read-only buffer
+ * that's valid until the next call to XLogReadRecord.


Not sure that this bit adds much.

Ok



-   /* Buffer for current ReadRecord result (expandable) */
+   /*
+* Buffer for current ReadRecord result (expandable).
+* Used in the case, than current ReadRecord result don't fit on the
+* currently read WAL page.
+*/


Yeah, this one is not entirely true now.  What about the following
instead:
-   /* Buffer for current ReadRecord result (expandable) */
+   /*
+* Buffer for current ReadRecord result (expandable), used when a record
+* crosses a page boundary.
+*/


I agree


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 10/30/18 11:59 AM, Stephen Frost wrote:
> > * Kyotaro HORIGUCHI (horiguchi.kyot...@lab.ntt.co.jp) wrote:
> >> So I'm +1 for the Michael's current patch as (I think) we can't
> >> make visible or large changes.
> >>
> >> That said, I agree with Stephen's concern on the point we could
> >> omit requried files in future, but on the other hand I don't want
> >> random files are simply rejected.
> > 
> > They aren't rejected- there's a warning thrown about them.
> 
> pgBackRest has been using a whitelist/blacklist method for identifying
> checksummable files for almost 2 years we haven't seen any issues.  The
> few times a "random" file appeared in the logs with checksum warnings it
> was later identified as having been mistakenly copied into $PGDATA.  The
> backup still completed successfully in these cases.
> 
> So to be clear, we whitelist the global, base, and pg_tblspc dirs and
> blacklist PG_VERSION, pg_filenode.map, pg_internal.init, and pg_control
> (just for global) when deciding which files to checksum.  Recently we
> added logic to exclude unlogged and temporary relations as well, though
> that's not required.
> 
> For PG11 I would recommend just adding the param file generated by exec
> backend to the black list for both pg_basebackup and pg_verifychecksums,
> then create a common facility for blacklisting for PG12.

Michael, this obviously didn't happen and instead we ended up releasing
11.1 with your changes, but I don't feel like this issue is closed and
I'm a bit disappointed that there hasn't been any further responses or
discussions on this.

I discussed this at length with David and Amit, both of whom have now
also commented on this issue, at PGConf.Eu, but still there hasn't been
a response from you.  Is your thought here that your lack of response
should be taken as meaning I should simply revert your commit and then
commit your earlier patch to just add the param file?  While we
typically take silence as acceptance, it's a bit different when it comes
to reverting someone else's change, at least to my mind.

I'm happy to go ahead and make those changes if there's no disagreement
regarding it.

Also, just to be clear, I don't intend this with any animosity and I
certainly understand if it just has fallen through the cracks or been
lost in the shuffle but I really don't like the implication put forward
that we're happy to not throw any kind of warning or notice about random
files showing up in the PG data directories.

Thanks!

Stephen


signature.asc
Description: PGP signature


weird objectaddress.c entry for transforms

2018-11-19 Thread Andres Freund
Hi Peter, Alvaro, All,

(Peter, this is your change, Alvaro, you're the objectaddress.c master)

While looking through my oids removal patch I noticed that the
objectaddress.c ObjectProperty entry for transforms looks wrong:

typedef struct
{
Oid class_oid;  /* oid of catalog */
Oid oid_index_oid;  /* oid of index on system oid 
column */
int oid_catcache_id;/* id of catcache on 
system oid column  */
int name_catcache_id;   /* id of catcache on 
(name,namespace), or
 * 
(name) if the object does not live in a
 * 
namespace */
AttrNumber  attnum_name;/* attnum of name field */
AttrNumber  attnum_namespace;   /* attnum of namespace field */
AttrNumber  attnum_owner;   /* attnum of owner field */
AttrNumber  attnum_acl; /* attnum of acl field */
ObjectType  objtype;/* OBJECT_* of this object type 
*/
boolis_nsp_name_unique; /* can the nsp/name combination (or 
name
 * 
alone, if there's no namespace) be
 * 
considered a unique identifier for an
 * 
object of this class? */
} ObjectPropertyType;

{
TransformRelationId,
TransformOidIndexId,
TRFOID,
InvalidAttrNumber,
},

this was added in 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cac76582053e

The InvalidAttrNumber entry initializes the member for name_catcache_id,
instead of -1 as the other entries, and in contrast to the rest of the
entries it doesn't initialize the other fields. In particular objtype is
0 instead of -1 as the rest.

Greetings,

Andres Freund



Re: weird objectaddress.c entry for transforms

2018-11-19 Thread Andres Freund
On 2018-11-19 17:53:22 -0800, Andres Freund wrote:
> Hi Peter, Alvaro, All,
> 
> (Peter, this is your change, Alvaro, you're the objectaddress.c master)
> 
> While looking through my oids removal patch I noticed that the
> objectaddress.c ObjectProperty entry for transforms looks wrong:
> 
> typedef struct
> {
>   Oid class_oid;  /* oid of catalog */
>   Oid oid_index_oid;  /* oid of index on system oid 
> column */
>   int oid_catcache_id;/* id of catcache on 
> system oid column  */
>   int name_catcache_id;   /* id of catcache on 
> (name,namespace), or
>* 
> (name) if the object does not live in a
>* 
> namespace */
>   AttrNumber  attnum_name;/* attnum of name field */
>   AttrNumber  attnum_namespace;   /* attnum of namespace field */
>   AttrNumber  attnum_owner;   /* attnum of owner field */
>   AttrNumber  attnum_acl; /* attnum of acl field */
>   ObjectType  objtype;/* OBJECT_* of this object type 
> */
>   boolis_nsp_name_unique; /* can the nsp/name combination (or 
> name
>* 
> alone, if there's no namespace) be
>* 
> considered a unique identifier for an
>* 
> object of this class? */
> } ObjectPropertyType;
> 
>   {
>   TransformRelationId,
>   TransformOidIndexId,
>   TRFOID,
>   InvalidAttrNumber,
>   },
> 
> this was added in 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cac76582053e
> 
> The InvalidAttrNumber entry initializes the member for name_catcache_id,
> instead of -1 as the other entries, and in contrast to the rest of the
> entries it doesn't initialize the other fields. In particular objtype is
> 0 instead of -1 as the rest.

Also, uh, shouldn't this actually fill out some of those fields? Most of
them actually apply to transforms.

Greetings,

Andres Freund



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 08:45:29PM -0500, Stephen Frost wrote:
> Michael, this obviously didn't happen and instead we ended up releasing
> 11.1 with your changes, but I don't feel like this issue is closed and
> I'm a bit disappointed that there hasn't been any further responses or
> discussions on this.

This issue is not closed.

> I discussed this at length with David and Amit, both of whom have now
> also commented on this issue, at PGConf.Eu, but still there hasn't been
> a response from you.  Is your thought here that your lack of response
> should be taken as meaning I should simply revert your commit and then
> commit your earlier patch to just add the param file?  While we
> typically take silence as acceptance, it's a bit different when it comes
> to reverting someone else's change, at least to my mind.
>
> I'm happy to go ahead and make those changes if there's no disagreement
> regarding it.

Well, I did not have any express feeling to offer a response as it seems
to me that the topic of how to handle things has not moved a iota to an
agreement.  From what I can see, there are still two school of thoughts:
- Use a white list.  Individuals which have expressed an interest on
this approach, based on the status of this thread are myself, Kyotaro
Horiguchi.  And at least a third person which I think would prefer the
white-list approach is Andres, but he did not reply directly to this
thread.
- Use a black list, which a least a set of three people have expressed
an opinion about on this thread: Amit Kapila, David Steele and
yourself.

> Also, just to be clear, I don't intend this with any animosity and I
> certainly understand if it just has fallen through the cracks or been
> lost in the shuffle but I really don't like the implication put forward
> that we're happy to not throw any kind of warning or notice about random
> files showing up in the PG data directories.

Don't worry about that!  Thanks for trying to make this thread moving
on.

I am still a fan of the whitelist approach as there is no actual point
in restricting what people can do with Postgres in terms of
extensibility (relying on tablespace paths for storage plugin looks like
an important thing to me, and we would close doors with a black list,
causing warnings to be generated for basically everything which is not
from heap).  What worries me the most is actually the fact that we have
not heard from the original authors of the pg_verify_checksums what they
think on the matter and how we ought to do things, because their
opinion is important.  If there is a clear agreement for the direction
to take, I am of course perfectly fine if the conclusion is the opposite
of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to
conclude that we have an actual agreement. 
--
Michael


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings Michael,

* Michael Paquier (mich...@paquier.xyz) wrote:
> I am still a fan of the whitelist approach as there is no actual point
> in restricting what people can do with Postgres in terms of
> extensibility (relying on tablespace paths for storage plugin looks like
> an important thing to me, and we would close doors with a black list,
> causing warnings to be generated for basically everything which is not
> from heap).  What worries me the most is actually the fact that we have
> not heard from the original authors of the pg_verify_checksums what they
> think on the matter and how we ought to do things, because their
> opinion is important.  If there is a clear agreement for the direction
> to take, I am of course perfectly fine if the conclusion is the opposite
> of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to
> conclude that we have an actual agreement. 

I can understand that we want PostgreSQL to be extensible, but as David
pointed out up-thread, what we've actually seen in the wild are cases
where random files have mistakenly ended up in the data directory and
those have been cases where it's been quite good to have the warnings
thrown to indicate that there's been some mistake.  I don't think we do
our users any service by simply ignoring random files showing up in the
data directories.

As has been mentioned elsewhere, there's really a 'right' way to do
things and allowing PG to be 'extensible' by simply ignoring random
files showing up isn't that- if we want PG to be extensible in this way
then we need to provide a mechanism for that to happen.

While I'd also like to hear from the authors of pg_verify_checksums as
to their thoughts, I'm guessing that they're mostly watching from the
sidelines while we discuss and not wanting to end up picking the wrong
side.

When it comes to what we typically do, at least imv, when there's an
impass or a disagreement of approaches is to actually not move forward
with one side of it over what was in place previously.  David, in
particular, was certainly involved in the verify checksums work and in
the changes for pg_basebackup, having had quite a bit of experience
implementing that same mechanism in pgbackrest quite a while before it
got into PG proper.  That real-world experience with exactly this
feature is really quite relevant, imv.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Andres Freund
Hi,

On 2018-11-19 21:18:43 -0500, Stephen Frost wrote:
> As has been mentioned elsewhere, there's really a 'right' way to do
> things and allowing PG to be 'extensible' by simply ignoring random
> files showing up isn't that- if we want PG to be extensible in this way
> then we need to provide a mechanism for that to happen.

I still don't buy this argument. I'm giving up here, as I just don't
have enough energy to keep up with this discussion.

FWIW, I think it's bad, that we don't error out on checksum failures in
basebackups by default. And that's only really feasible with a
whitelisting approach.

Greetings,

Andres Freund



Re: zheap: a new storage format for PostgreSQL

2018-11-19 Thread Amit Kapila
On Mon, Nov 19, 2018 at 6:36 PM Daniel Westermann
 wrote:
>
> >Yes, we need to connect to the database for performing rollback
> >actions.  Once the rollback for that database is over, undo apply
> >worker will exit and you should be able to drop the database.
>
> Thank you, Amit.
> Can you have a look at this one?
>
> create table t1 ( a text ) partition by list (a);
> create table t1_1 PARTITION of t1 (a) for values in ('a');
> create table t1_2 PARTITION of t1 (a) for values in ('b');
> create table t1_3 PARTITION of t1 (a) for values in ('c');
> create table t1_4 PARTITION of t1 (a) default;
>
> postgres=# \d+ t1
>Table "public.t1"
>  Column | Type | Collation | Nullable | Default | Storage  | Stats target | 
> Description
> +--+---+--+-+--+--+-
>  a  | text |   |  | | extended |  |
> Partition key: LIST (a)
> Partitions: t1_1 FOR VALUES IN ('a'),
> t1_2 FOR VALUES IN ('b'),
> t1_3 FOR VALUES IN ('c'),
> t1_4 DEFAULT
> Options: storage_engine=zheap
>
>
> insert into t1 select 'a' from generate_series ( 1, 100 );
> insert into t1 select 'b' from generate_series ( 1, 100 );
> insert into t1 select 'c' from generate_series ( 1, 100 );
>
> postgres=# begin;
> BEGIN
> postgres=# update t1 set a = 'd' where a = 'a';
> UPDATE 100
> postgres=# rollback;
> ROLLBACK
>

Here, you are doing a big rollback, so I guess it will be pushed to
background unless you increase the value of 'rollback_overflow_size'.
You can confirm that by checking if any undo apply worker is active
and rollback finishes immediately.

> postgres=# select * from t1 where a = 'd';
> postgres=# select * from t1 where a = 'd';
> postgres=# select * from t1 where a = 'd';
>
> The selects at the end take seconds
>

I think what is happening is as rollback is still in progress, the
scan needs to fetch the data from undo and it will be slow.

> and a lot of checkpoints are happening.
>

It is because Rollbacks also write WAL and you are doing a big
Rollback which will lead to re-write of the entire table.

I guess if you allow rollback to complete before issuing a select, you
will see better results.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-19 Thread Amit Kapila
On Mon, Nov 19, 2018 at 10:48 AM Haribabu Kommi
 wrote:
> On Mon, Nov 19, 2018 at 1:37 PM Alvaro Herrera  
> wrote:
>>
>> On 2018-Nov-19, Michael Paquier wrote:
>>
>> > On Mon, Nov 19, 2018 at 10:41:22AM +1100, Haribabu Kommi wrote:
>> > > So 6 new functions needs to be added to cover all the above cases,
>> > > IMO, we may need functions for all combinations, because I feel some
>> > > user may have the requirement of left out combination, in case if we 
>> > > choose
>> > > only some combinations.
>> >
>> > That's bloating the interface in my opinion.
>>
>> I understand.
>>
>> Let's call for a vote from a larger audience.  It's important to get
>> this interface right, ISTM.
>
> 4. Single API with -1 as invalid value, treat NULL as no matching. (Only 
> problem
>  with this approach is till now -1 is also a valid queryid, but setting -1 as 
> queryid
> needs to be avoided.
>

Hmm, can we use 0 as default value without any such caveat?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: More issues with pg_verify_checksums and checksum verification in base backups

2018-11-19 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-11-19 21:18:43 -0500, Stephen Frost wrote:
> > As has been mentioned elsewhere, there's really a 'right' way to do
> > things and allowing PG to be 'extensible' by simply ignoring random
> > files showing up isn't that- if we want PG to be extensible in this way
> > then we need to provide a mechanism for that to happen.
> 
> I still don't buy this argument. I'm giving up here, as I just don't
> have enough energy to keep up with this discussion.
> 
> FWIW, I think it's bad, that we don't error out on checksum failures in
> basebackups by default. And that's only really feasible with a
> whitelisting approach.

No, we could error out on checksum failures in either approach, but we
explicitly don't with good reason: if you're doing a backup, you
probably want to actually capture the current data.

This is something we've thought quite a bit about.  In fact, as I
recall, the original pg_basebackup code actually *did* error out, even
with the blacklist approach, and we made a solid argument which was
ultimately agreed to by those involved at the time that erroring out
half-way through was a bad idea.

What we do instead is exit with a non-zero exit code to make it clear
that there was an issue, to allow the user to capture that and raise
alarms, but to still have all of the data which we were able to
capture in the hopes that the backup is at least salvagable.  In
addition, at least in pgbackrest, we don't consider such a backup to be
pristine and therefore we don't expire out the prior backups- we don't
do any backup expiration in pg_basebackup, so it's up to the user to
make sure that if pg_basebackup exits with a non-zero exit code that
they capture and handle that and *don't* blow away a previously good
backup.

The very last thing *any* backup tool should do is give up half-way
through and throw a nasty error, leaving you with the knowledge that
your system is hosed *and* no backup of what was there exist and
making it extremely likely that whatever corruption exists is being
propagated further.

Let's try to not conflate these two issues though, they're quite
independent.

Thanks!

Stephen


signature.asc
Description: PGP signature


typo fix

2018-11-19 Thread Amit Langote
Hi,

It seems to me that EquivalenceClass, the struct/type name, has been
misspelled as 'EquivalenceClasses' a couple of times in the comment above
its definition.

Attached fixes that.

Thanks,
Amit
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd24203dd..a695012c7a 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -863,7 +863,7 @@ typedef struct StatisticExtInfo
 } StatisticExtInfo;
 
 /*
- * EquivalenceClasses
+ * EquivalenceClass
  *
  * Whenever we can determine that a mergejoinable equality clause A = B is
  * not delayed by any outer join, we create an EquivalenceClass containing
@@ -880,7 +880,7 @@ typedef struct StatisticExtInfo
  * that all or none of the input datatypes are collatable, so that a single
  * collation value is sufficient.)
  *
- * We also use EquivalenceClasses as the base structure for PathKeys, letting
+ * We also use EquivalenceClass as the base structure for PathKeys, letting
  * us represent knowledge about different sort orderings being equivalent.
  * Since every PathKey must reference an EquivalenceClass, we will end up
  * with single-member EquivalenceClasses whenever a sort key expression has


Re: [HACKERS] Restricting maximum keep segments by repslots

2018-11-19 Thread Michael Paquier
On Mon, Nov 19, 2018 at 01:39:58PM +0900, Michael Paquier wrote:
> I was just coming by to look at bit at the patch series, and bumped
> into that:

So I have been looking at the last patch series 0001-0004 posted on this
thread, and coming from here:
https://postgr.es/m/20181025.215518.189844649.horiguchi.kyot...@lab.ntt.co.jp

/* check that the slot is gone */
SELECT * FROM pg_replication_slots
It could be an idea to switch to the expanded mode here, not that it
matters much still..

+IsLsnStillAvaiable(XLogRecPtr targetLSN, uint64 *restBytes)
You mean Available here, not Avaiable.  This function is only used when
scanning for slot information with pg_replication_slots, so wouldn't it
be better to just return the status string in this case?

Not sure I see the point of the "remain" field, which can be found with
a simple calculation using the current insertion LSN, the segment size
and the amount of WAL that the slot is retaining.  It may be interesting
to document a query to do that though.

GetOldestXLogFileSegNo() has race conditions if WAL recycling runs in
parallel, no?  How is it safe to scan pg_wal on a process querying
pg_replication_slots while another process may manipulate its contents
(aka the checkpointer or just the startup process with an
end-of-recovery checkpoint.).  This routine relies on unsafe
assumptions as this is not concurrent-safe.  You can avoid problems by
making sure instead that lastRemovedSegNo is initialized correctly at
startup, which would be normally one segment older than what's in
pg_wal, which feels a bit hacky to rely on to track the oldest segment.

It seems to me that GetOldestXLogFileSegNo() should also check for
segments matching the current timeline, no?

+   if (prev_lost_segs != lost_segs)
+   ereport(WARNING,
+   (errmsg ("some replication slots have lost
required WAL segments"),
+errdetail_plural(
+"The mostly affected slot has lost %ld
segment.",
+"The mostly affected slot has lost %ld
segments.",
+lost_segs, lost_segs)));
This can become very noisy with the time, and it would be actually
useful to know which replication slot is impacted by that.

+  slot doesn't have valid restart_lsn, this field
Missing a determinant here, and restart_lsn should have a 
markup.

+many WAL segments that they fill up the space allotted
s/allotted/allocated/.

+  available. The last two states are seen only when
+   is non-negative. If the
+  slot doesn't have valid restart_lsn, this field
+  is unknown.
I am a bit confused by this statement.  The last two states are "lost"
and "keeping", but shouldn't "keeping" be the state showing up by
default as it means that all WAL segments are kept around.

+# Advance WAL by ten segments (= 160MB) on master
+advance_wal($node_master, 10);
+$node_master->safe_psql('postgres', "CHECKPOINT;");
This makes the tests very costly, which is something we should avoid as
much as possible.  One trick which could be used here, on top of
reducing the number of segment switches, is to use initdb
--wal-segsize=1.
--
Michael


signature.asc
Description: PGP signature


Re: typo fix

2018-11-19 Thread Michael Paquier
On Tue, Nov 20, 2018 at 02:00:39PM +0900, Amit Langote wrote:
> It seems to me that EquivalenceClass, the struct/type name, has been
> misspelled as 'EquivalenceClasses' a couple of times in the comment above
> its definition.

EquivalenceClasses stands for the plural of EquivalenceClass.  So
thinking like that...

> - * EquivalenceClasses
> + * EquivalenceClass

...  This is fine.

> - * We also use EquivalenceClasses as the base structure for PathKeys, letting
> + * We also use EquivalenceClass as the base structure for PathKeys, letting

...  But not that.
--
Michael


signature.asc
Description: PGP signature


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-19 Thread Haribabu Kommi
On Tue, Nov 20, 2018 at 2:06 PM Amit Kapila  wrote:

> On Mon, Nov 19, 2018 at 10:48 AM Haribabu Kommi
>  wrote:
> > On Mon, Nov 19, 2018 at 1:37 PM Alvaro Herrera 
> wrote:
> >>
> >> On 2018-Nov-19, Michael Paquier wrote:
> >>
> >> > On Mon, Nov 19, 2018 at 10:41:22AM +1100, Haribabu Kommi wrote:
> >> > > So 6 new functions needs to be added to cover all the above cases,
> >> > > IMO, we may need functions for all combinations, because I feel some
> >> > > user may have the requirement of left out combination, in case if
> we choose
> >> > > only some combinations.
> >> >
> >> > That's bloating the interface in my opinion.
> >>
> >> I understand.
> >>
> >> Let's call for a vote from a larger audience.  It's important to get
> >> this interface right, ISTM.
> >
> > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only
> problem
> >  with this approach is till now -1 is also a valid queryid, but setting
> -1 as queryid
> > needs to be avoided.
> >
>
> Hmm, can we use 0 as default value without any such caveat?
>

Yes, with strict and 0 as default value can work.
If there is no problem, I can go ahead with the above changes?

Regards,
Haribabu Kommi
Fujitsu Australia


Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2018-11-19 Thread Amit Kapila
On Tue, Nov 20, 2018 at 10:56 AM Haribabu Kommi
 wrote:
>
> On Tue, Nov 20, 2018 at 2:06 PM Amit Kapila  wrote:
>>
>> On Mon, Nov 19, 2018 at 10:48 AM Haribabu Kommi
>>  wrote:
>> > On Mon, Nov 19, 2018 at 1:37 PM Alvaro Herrera  
>> > wrote:
>> >>
>> >> On 2018-Nov-19, Michael Paquier wrote:
>> >>
>> >> > On Mon, Nov 19, 2018 at 10:41:22AM +1100, Haribabu Kommi wrote:
>> >> > > So 6 new functions needs to be added to cover all the above cases,
>> >> > > IMO, we may need functions for all combinations, because I feel some
>> >> > > user may have the requirement of left out combination, in case if we 
>> >> > > choose
>> >> > > only some combinations.
>> >> >
>> >> > That's bloating the interface in my opinion.
>> >>
>> >> I understand.
>> >>
>> >> Let's call for a vote from a larger audience.  It's important to get
>> >> this interface right, ISTM.
>> >
>> > 4. Single API with -1 as invalid value, treat NULL as no matching. (Only 
>> > problem
>> >  with this approach is till now -1 is also a valid queryid, but setting -1 
>> > as queryid
>> > needs to be avoided.
>> >
>>
>> Hmm, can we use 0 as default value without any such caveat?
>
>
> Yes, with strict and 0 as default value can work.
> If there is no problem, I can go ahead with the above changes?
>

I would say wait for a few days (at least 2 days) to see if Alvaro or
anybody else has an opinion on this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-11-19 Thread Takahashi, Ryohei
Hi,


My customer uses PostgreSQL on Windows and hits the problem that following log 
is written to the server logs too frequently (250 thousand times per day).
"LOG:  could not reserve shared memory region (addr=%p) for child %p:"

This log is written when pgwin32_ReserveSharedMemoryRegion() in win32_shmem.c 
fails. If fails, internal_forkexec() in postmaster.c retries up to 100 times. 
In most of my customer cases, internal_forkexec() finally succeeded to 
pgwin32_ReserveSharedMemoryRegion() by retrying.

According to the comment of internal_forkexec(), 
pgwin32_ReserveSharedMemoryRegion() sometimes fails if ASLR is active. If so, I 
think administrators are not interested in this log since it is a normal event. 

I think the log level should not be "LOG", but should be lower level such as 
"DEBUG1".


Regards,
Ryohei Takahashi





Re: typo fix

2018-11-19 Thread Amit Langote
Thank you for looking.

On 2018/11/20 14:13, Michael Paquier wrote:
> On Tue, Nov 20, 2018 at 02:00:39PM +0900, Amit Langote wrote:
>> It seems to me that EquivalenceClass, the struct/type name, has been
>> misspelled as 'EquivalenceClasses' a couple of times in the comment above
>> its definition.
> 
> EquivalenceClasses stands for the plural of EquivalenceClass.  So
> thinking like that...
> 
>> - * EquivalenceClasses
>> + * EquivalenceClass
> 
> ...  This is fine.
> 
>> - * We also use EquivalenceClasses as the base structure for PathKeys, 
>> letting
>> + * We also use EquivalenceClass as the base structure for PathKeys, letting
> 
> ...  But not that.

Hmm, I classified this one as a typo too, because the sentence calls
EquivalenceClasses "the base structure for ...", whereas I think
'EquivalenceClass' is the base structure of PathKey.  That said, I don't
mind to using EquivalanceClasses when speaking of *instances* of
EquivalenceClass, of which I see many in the source code:

$ git grep EquivalenceClasses
postgres_fdw.c: *   Determine which EquivalenceClasses might be
postgres_fdw.c: /* Get the list of interesting EquivalenceClasses. */
copyfuncs.c:   /* EquivalenceClasses are never moved, so just shallow-copy
copyfuncs.c:   /* EquivalenceClasses are never copied, so shallow-copy the
copyfuncs.c:   /* EquivalenceClasses are never copied, so shallow-copy the
optimizer/README:EquivalenceClasses
optimizer/README:merging two existing EquivalenceClasses.  At the end of


But maybe I'm being overly nit-picky. :)

Thanks,
Amit




Re: typo fix

2018-11-19 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Nov 20, 2018 at 02:00:39PM +0900, Amit Langote wrote:
>> - * We also use EquivalenceClasses as the base structure for PathKeys, 
>> letting
>> + * We also use EquivalenceClass as the base structure for PathKeys, letting

> ...  But not that.

The reason that's not good is that it creates a singular-plural mismatch.
If you'd also changed "PathKeys" to "PathKey", it would still read OK,
though I don't think it's an improvement particularly.

(Hm ... though arguably, "structure" should be "structures" if we're
going to let it stand as plural.)

regards, tom lane



Re: typo fix

2018-11-19 Thread Michael Paquier
On Tue, Nov 20, 2018 at 01:58:22AM -0500, Tom Lane wrote:
> The reason that's not good is that it creates a singular-plural mismatch.
> If you'd also changed "PathKeys" to "PathKey", it would still read OK,
> though I don't think it's an improvement particularly.
> 
> (Hm ... though arguably, "structure" should be "structures" if we're
> going to let it stand as plural.)

Indeed, missed that.  This first sentence mentions "orderings" for those
PathKeys, which refers to multiple PathKeys, so actually Amit's patch
seems to be fine, no?
--
Michael


signature.asc
Description: PGP signature


Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2018-11-19 Thread Haozhou Wang
Hi all,

We prepared a patch that includes the hook points. And such hook points are
needed for disk quota extension.
There are two hooks.
One is SmgrStat_hook. It's used to perform ad-hoc logic in storage when
doing smgr create/extend/truncate in general. As for disk quota extension,
this hook is used to detect active tables(new created tables, tables
extending new blocks, or tables being truncated)
The other is BufferExtendCheckPerms_hook. It's used to perform ad-hoc logic
when buffer extend a new block. Since ReadBufferExtended is a hot function,
we call this hook only when blockNum == P_NEW. As  for disk quota
extension, this hook is used to do query enforcement during the query is
loading data.

Any comments are appreciated.

Regards,
Haozhou

On Wed, Nov 14, 2018 at 6:07 PM Hubert Zhang  wrote:

> Thanks, Tomas,
>
> Yes, we want to add the hooks into postgres repo.
> But before that, we plan to firstly get some feedbacks from community
> about the diskquota extension implementation and usage?
> Later, we'll modify our license and submit the hooks into CF.
>
> Thanks
> Hubert
>
> On Wed, Nov 14, 2018 at 3:54 AM Tomas Vondra 
> wrote:
>
>> On Tue, 2018-11-13 at 16:47 +0800, Hubert Zhang wrote:
>> > Hi all,
>> >
>> > We implement disk quota feature on Postgresql as an extension(link:
>> > https://github.com/greenplum-db/diskquota),
>> > If you are interested, try and use it to limit the amount of disk
>> > space that
>> > a schema or a role can use in Postgresql.
>> > Any feedback or question are appreciated.
>> >
>>
>> It's not clear to me what's the goal of this thread? I understand what
>> quotas are about, but are you sharing it because (a) it's a useful
>> extension, (b) you propose adding a couple of new hooks (and keep the
>> extension separate) or (c) you propose adding both the hooks and the
>> extension (into contrib)?
>>
>> I assume it's either (b) and (c), in which case you should add it to
>> 2019-01 CF: https://commitfest.postgresql.org/21/
>>
>> In either case, it might be useful to add a LICENSE to the github
>> repository, it's not clear what's the situation in this respect. That
>> probably matters especially for (b), because for (c) it'd end up with
>> PostgreSQL license automatically.
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>>
>
> --
> Thanks
>
> Hubert Zhang
>


disk_quota_hooks_v1.patch
Description: Binary data


Re: zheap: a new storage format for PostgreSQL

2018-11-19 Thread Komяpa
>
> > In PostGIS workloads, UPDATE table SET geom = ST_CostyFunction(geom,
> magicnumber); is one of biggest time-eaters that happen upon initial load
> and clean up of your data. It is commonly followed by CLUSTER table using
> table_geom_idx; to make sure you're back at full speed and no VACUUM is
> needed, and your table (usually static after that) is more-or-less
> spatially ordered. I see that zheap can remove the need for VACUUM, which
> is a big win already. If you can do something that will allow reorder of
> tuples according to index happen during an UPDATE that rewrites most of
> table, that would be a game changer :)
> >
>
> If the tuples are already in the order of the index, then we would
> retain the order, otherwise, we might not want to anything special for
> ordering w.r.t index.  I think this is important as we are not sure of
> the user's intention and I guess it won't be easy to do such
> rearrangement during Update statement.
>

User's clustering intention is recorded in existence of CLUSTER index over
table. That's not used by anything other than CLUSTER command now though.

When I was looking into current heap implementation it seemed that it's
possible to hook in a lookup for a couple blocks with values adjacent to
the new value, and prefer them to FSM lookup and "current page", for
clustered table. Due to dead tuples, free space is going to end very very
soon in usual heap, so it probably doesn't make sense there - you're
consuming space with old one in old page and new one in new page.

If I understand correctly, in zheap an update would not result in a dead
tuple in old page, so space is not going to end immediately, and this may
unblock path for such further developments. That is, if there is a spot
where to plug in such or similar logic in code :)

I've described the business case in [1].

1:
https://www.postgresql.org/message-id/flat/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


Re: typo fix

2018-11-19 Thread Amit Langote
On 2018/11/20 15:58, Tom Lane wrote:
> Michael Paquier  writes:
>> On Tue, Nov 20, 2018 at 02:00:39PM +0900, Amit Langote wrote:
>>> - * We also use EquivalenceClasses as the base structure for PathKeys, 
>>> letting
>>> + * We also use EquivalenceClass as the base structure for PathKeys, letting
> 
>> ...  But not that.
> 
> The reason that's not good is that it creates a singular-plural mismatch.

Hmm, yeah.

> If you'd also changed "PathKeys" to "PathKey", it would still read OK,
> though I don't think it's an improvement particularly.

So,

- * We also use EquivalenceClasses as the base structure for PathKeys,
+ * We also use EquivalenceClass as the base structure for PathKey,

> (Hm ... though arguably, "structure" should be "structures" if we're
> going to let it stand as plural.)

vs.

- * We also use EquivalenceClasses as the base structure for PathKeys,
+ * We also use EquivalenceClasses as the base structures for PathKeys,

If I'm understanding this right, aren't different orderings represented by
different PathKey nodes considered equivalent if they share the base
EquivalenceClass?  If that's the case, I think the former reads better.

Thanks,
Amit
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 6fd24203dd..ab71ba9674 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -863,7 +863,7 @@ typedef struct StatisticExtInfo
 } StatisticExtInfo;
 
 /*
- * EquivalenceClasses
+ * EquivalenceClass
  *
  * Whenever we can determine that a mergejoinable equality clause A = B is
  * not delayed by any outer join, we create an EquivalenceClass containing
@@ -880,7 +880,7 @@ typedef struct StatisticExtInfo
  * that all or none of the input datatypes are collatable, so that a single
  * collation value is sufficient.)
  *
- * We also use EquivalenceClasses as the base structure for PathKeys, letting
+ * We also use EquivalenceClass as the base structure for PathKey, letting
  * us represent knowledge about different sort orderings being equivalent.
  * Since every PathKey must reference an EquivalenceClass, we will end up
  * with single-member EquivalenceClasses whenever a sort key expression has


Re: Regarding performance regression on specific query

2018-11-19 Thread Amit Langote
Hi,

On 2018/11/20 2:49, Jung, Jinho wrote:
> Execution time
> =
> 1.sql
> 10.6  : 469 ms
> 9.4.20: 10 ms
> 
> 4.sql
> 10.6  : 34019 ms
> 9.4.20: 0.4 ms

I noticed that these two are fixed by running ANALYZE in the database in
which these queries are run.

> 20.sql
> 10.6  : 2791 ms
> 9.4.20: 61 ms

This one may be suffering from a more serious planning issue, as doing
ANALYZE didn't help for this one.  Will have to look closely at how the
plan is changing for worse.

Thanks,
Amit




Re: ToDo: show size of partitioned table

2018-11-19 Thread Michael Paquier
On Mon, Nov 05, 2018 at 11:43:16AM +0100, Pavel Stehule wrote:
> should be fixed now.

Here are some notes on the last version.

+   "  FROM pg_inherits i\n"
Missing schema qualification.

+   case 'P':
+   if (cmd[2] == 'i')
+   success = listPartitions(pattern, show_verbose,
true, false);
+   else if (cmd[2] == 't')
+   success = listPartitions(pattern, show_verbose,
false, true);
+   else if (cmd[2] == '+' || cmd[2] == '\0')
+   success = listPartitions(pattern, show_verbose,
false, false);
+   else
+   status = PSQL_CMD_UNKNOWN;
+   break;
The style is heavy.  Perhaps it would be cleaner to have a
switch/case..  Not a big deal visibly.  show_indexes is true only if the
subcommand is 'i'.  show_tables is true only if the subcommand is 't'.

Using "\dP" with a pattern matching a partitioned index should show a
partitioned index, no?  As far as I know, a partitioned relation can be
either an index or a table.

Testing the feature, \dP shows all partitioned relations, still does not
show the relationship when multiple levels are used.  Could it make
sense to also show the direct parent of a partitioned table when
verbose mode is used?

Could it be possible to have tests for \dP, \dPi and \dPt with matching
patterns?  You could just place that in one of the existing tests where
there are partitioned tables and indexes.
--
Michael


signature.asc
Description: PGP signature