date:20200706

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-06 Thread Masahiko Sawada

On Sat, 4 Jul 2020 at 22:13, Ajin Cherian  wrote:
>
>
>
> On Thu, Jul 2, 2020 at 1:31 PM Masahiko Sawada 
>  wrote:
>>
>>
>>
>> Thanks! Yes, I'm working on this patch while considering to send the
>> stats to stats collector.
>>
>> I've attached PoC patch that implements a simple approach. I'd like to
>> discuss how we collect the replication slot statistics in the stats
>> collector before I bring the patch to completion.
>>
>
> I understand the patch is only in the initial stage but I just tried testing 
> it.

Thank you for testing the patch!

> Using the patch, I enabled logical replication and created two pub/subs 
> (sub1,sub2) for two seperate tables (t1,t2). I inserted data into the second 
> table (t2) such that it spills into disk.
> Then when I checked the stats using the new function 
> pg_stat_get_replication_slots() , I see that the same stats are updated for 
> both the slots, when ideally it should have reflected in the second slot 
> alone.
>
> postgres=# SELECT s.name, s.spill_txns,s.spill_count,s.spill_bytes   
> FROM pg_stat_get_replication_slots() s(name, spill_txns, spill_count, 
> spill_bytes);
>  name | spill_txns | spill_count | spill_bytes
> --++-+-
>  sub1 |  1 |  20 |  132000
>  sub2 |  1 |  20 |  132000
> (2 rows)
>

I think this is because logical decodings behind those two logical
replications decode all WAL records *before* filtering the specified
tables. In logical replication, we decode the whole WAL stream and
then pass it to a logical decoding output plugin such as pgoutput. And
then we filter tables according to the publication. Therefore, even if
subscription sub1 is not interested in changes related to table t2,
the replication slot sub1 needs to decode the whole WAL stream,
resulting in spilling into disk.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Cache lookup errors with functions manipulation object addresses

2020-07-06 Thread Michael Paquier

On Fri, Jul 03, 2020 at 11:04:17AM -0400, Alvaro Herrera wrote:
> 0001 and 0002 look good to me.

Thanks for the review.

> I think 0003 could be a little more careful about indentation; some long
> lines are going to result after pgindent that might be better to handle
> in a different way before commit, e.g., here
> 
>> +{
>> +char *proname = format_procedure_extended(object->objectId,
>> +FORMAT_PROC_FORCE_QUALIFY | FORMAT_PROC_INVALID_AS_NULL);

Yes, I was looking at that for a couple of hours and pgindent made
that a bit weird.  So I have changed the code to just use a separate
variable.  That makes things a bit cleaner.

While refreshing my mind with this code, I got surprised with the
choice of "routine" in getProcedureTypeDescription() when we need a
default object type name for an object not found, so I have switched
that to "procedure" to be more consistent.

I have also spent some time analyzing the coverage of the patch, and
did not find any obvious holes or any remaining missing_ok paths not
covered.  Some comments were also a bit weird after re-reading them,
so I tweaked a couple of places.

Attached is for now a rebased patch.  If there are any comments,
please feel free.  Daniel, Alvaro, does that look fine for you?  I am
letting this stuff aside for a couple of days for the moment.
--
Michael
From 3ca2841d38880eb0f40427e9aa447b5ae3a2e66c Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 6 Jul 2020 16:44:31 +0900
Subject: [PATCH v20] Eliminate user-visible cache lookup errors for objaddr
 SQL functions

When using the following functions, users could see various types of
errors like "cache lookup failed for OID XXX":
* pg_describe_object
* pg_identify_object_as_address
* pg_identify_object
All the lower set of APIs managing object addresses for all types of
the system are made smarter by gaining a missing_ok argument that allows
any caller to control if he is willing to have an actual error or if
he wants to control error at its level by having empty objects if they
are undefined.

Regression tests added in this commit failed with such errors before
being patched.
---
 src/include/catalog/objectaddress.h  |   12 +-
 src/include/utils/regproc.h  |5 +-
 src/backend/catalog/dependency.c |   30 +-
 src/backend/catalog/objectaddress.c  | 1003 +-
 src/backend/catalog/pg_depend.c  |4 +-
 src/backend/catalog/pg_shdepend.c|8 +-
 src/backend/commands/event_trigger.c |9 +-
 src/backend/commands/extension.c |6 +-
 src/backend/commands/tablecmds.c |   12 +-
 src/backend/utils/adt/regproc.c  |   20 +-
 src/test/regress/expected/object_address.out |   98 ++
 src/test/regress/sql/object_address.sql  |   56 +
 doc/src/sgml/func.sgml   |7 +-
 contrib/sepgsql/database.c   |6 +-
 contrib/sepgsql/dml.c|4 +-
 contrib/sepgsql/label.c  |4 +-
 contrib/sepgsql/proc.c   |   14 +-
 contrib/sepgsql/relation.c   |   20 +-
 contrib/sepgsql/schema.c |6 +-
 19 files changed, 996 insertions(+), 328 deletions(-)

diff --git a/src/include/catalog/objectaddress.h b/src/include/catalog/objectaddress.h
index 144715d4f4..b876617c9d 100644
--- a/src/include/catalog/objectaddress.h
+++ b/src/include/catalog/objectaddress.h
@@ -70,14 +70,18 @@ extern bool get_object_namensp_unique(Oid class_id);
 extern HeapTuple get_catalog_object_by_oid(Relation catalog,
 		   AttrNumber oidcol, Oid objectId);
 
-extern char *getObjectDescription(const ObjectAddress *object);
+extern char *getObjectDescription(const ObjectAddress *object,
+  bool missing_ok);
 extern char *getObjectDescriptionOids(Oid classid, Oid objid);
 
 extern int	read_objtype_from_string(const char *objtype);
-extern char *getObjectTypeDescription(const ObjectAddress *object);
-extern char *getObjectIdentity(const ObjectAddress *address);
+extern char *getObjectTypeDescription(const ObjectAddress *object,
+	  bool missing_ok);
+extern char *getObjectIdentity(const ObjectAddress *address,
+			   bool missing_ok);
 extern char *getObjectIdentityParts(const ObjectAddress *address,
-	List **objname, List **objargs);
+	List **objname, List **objargs,
+	bool missing_ok);
 extern struct ArrayType *strlist_to_textarray(List *list);
 
 extern ObjectType get_relkind_objtype(char relkind);
diff --git a/src/include/utils/regproc.h b/src/include/utils/regproc.h
index 145452a5ad..330417e888 100644
--- a/src/include/utils/regproc.h
+++ b/src/include/utils/regproc.h
@@ -29,10 +29,11 @@ extern List *stringToQualifiedNameList(const char *string);
 extern char *format_procedure(Oid procedure_oid);
 extern char *format_procedure_qualified(Oid procedure_oid);
 extern void format_procedure_parts(Oid operator_oi

Re: Include access method in listTables output

2020-07-06 Thread Georgios



‐‐‐ Original Message ‐‐‐
On Monday, July 6, 2020 3:12 AM, Michael Paquier  wrote:

> On Sun, Jul 05, 2020 at 07:13:10AM +0530, vignesh C wrote:
>
> > I'm not sure if we should include showViews, I had seen that the
> > access method was not getting selected for view.
>
> +1. These have no physical storage, so you are looking here for
> relkinds that satisfy RELKIND_HAS_STORAGE().

Thank you for the review.
Find attached v4 of the patch.

>
> ---
>
> Michael



0001-Include-AM-in-listTables-output-v4.patch
Description: Binary data

Re: A patch for get origin from commit_ts.

2020-07-06 Thread Michael Paquier

On Mon, Jul 06, 2020 at 11:12:30AM +0800, movead...@highgo.ca wrote:
> Thanks for the points and follow them, new patch attached.

That was fast, thanks.  I have not tested the patch, but there are
two things I missed a couple of hours back.  Why do you need
pg_last_committed_xact_with_origin() to begin with?  Wouldn't it be
more simple to just add a new column to pg_last_committed_xact() for
the replication origin?  Contrary to pg_xact_commit_timestamp() that
should not be broken for compatibility reasons because it returns only
one value, we don't have this problem with pg_last_committed_xact() as
it already returns one tuple with two values.

+{ oid => '4179', descr => 'get commit origin of a transaction',

A second thing is that the OID of the new function should be in the 
range 8000.., as per the policy introduced in commit a6417078.
src/include/catalog/unused_oids can be used to pick up a value.
--
Michael


signature.asc
Description: PGP signature

Re: proposal: schema variables

2020-07-06 Thread Pavel Stehule

ne 5. 7. 2020 v 15:33 odesílatel Pavel Stehule 
napsal:

>
>
> čt 21. 5. 2020 v 14:49 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 21. 5. 2020 v 13:34 odesílatel Amit Kapila 
>> napsal:
>>
>>> On Thu, May 21, 2020 at 3:41 PM Pavel Stehule 
>>> wrote:
>>> >
>>> > Hi
>>> >
>>> > just rebase without any other changes
>>> >
>>>
>>> You seem to forget attaching the rebased patch.
>>>
>>
>> I am sorry
>>
>> attached.
>>
>
> fresh rebase
>

fix test build

Regards

Pavel


> Regards
>
> Pavel
>
>
>> Pavel
>>
>>
>>> --
>>> With Regards,
>>> Amit Kapila.
>>> EnterpriseDB: http://www.enterprisedb.com
>>>
>>


schema-variables-20200706.patch.gz
Description: application/gzip

Performing partition pruning using row value

2020-07-06 Thread kato-...@fujitsu.com

Hello

I would like to ask about the conditions under which partition pruning is 
performed.
In PostgreSQL 12, when I executed following SQL, partition pruning is not 
performed.

postgres=# explain select * from a where (c1, c2) < (99, 99);
   QUERY PLAN

 Append  (cost=0.00..60.00 rows=800 width=40)
   ->  Seq Scan on a1 a_1  (cost=0.00..28.00 rows=400 width=40)
 Filter: (ROW(c1, c2) < ROW(99, 99))
   ->  Seq Scan on a2 a_2  (cost=0.00..28.00 rows=400 width=40)
 Filter: (ROW(c1, c2) < ROW(99, 99))
(5 rows)

However, pruning is performed when I changed the SQL as follows.

postgres=# explain select * from a where c1  < 99 and c2 < 99;
   QUERY PLAN

 Seq Scan on a1 a  (cost=0.00..28.00 rows=133 width=40)
   Filter: ((c1 < 99) AND (c2 < 99))
(2 rows)

These tables are defined as follows.

create table a( c1 int, c2 int, c3 varchar) partition by range(c1, c2);
create table a1 partition of a for values from(0, 0) to (100, 100);
create table a2 partition of a for values from(100, 100) to (200, 200);


Looking at the code, "(c1, c2) < (99, 99)" is recognized as RowCompExpr and "c1 
< 99 and c2 < 99" is recognized combination of OpExpr.

Currently, pruning is not performed for RowCompExpr, is this correct?
Also, at the end of match_clause_to_partition_key(), the following Comments 
like.

"Since the qual didn't match up to any of the other qual types supported here, 
then trying to match it against any other partition key is a waste of time, so 
just return PARTCLAUSE_UNSUPPORTED."

Because it would take a long time to parse all Expr nodes, does 
match_cluause_to_partition_key() return PART_CLAUSE_UNSUPPORTED when such Expr 
node is passed?

If the number of args in RowCompExpr is small, I would think that expanding it 
would improve performance.

regards,
sho kato

Re: Asymmetric partition-wise JOIN

2020-07-06 Thread Andrey V. Lepikhov


On 12/27/19 12:34 PM, Kohei KaiGai wrote:

The attached v2 fixed the problem, and regression test finished correctly.
Using your patch I saw incorrect value of predicted rows at the top node 
of the plan: "Append  (cost=270.02..35165.37 rows=40004 width=16)"
Full explain of the query plan see in attachment - 
explain_with_asymmetric.sql


if  I disable enable_partitionwise_join then:
"Hash Join  (cost=270.02..38855.25 rows=10001 width=16)"
Full explain - explain_no_asymmetric.sql

I thought that is the case of incorrect usage of cached values of 
norm_selec, but it is a corner-case problem of the eqjoinsel() routine :


selectivity = 1/size_of_larger_relation; (selfuncs.c:2567)
tuples = selectivity * outer_tuples * inner_tuples; (costsize.c:4607)

i.e. number of tuples depends only on size of smaller relation.
It is not a bug of your patch but I think you need to know because it 
may affect on planner decision.


===
P.S. Test case:
CREATE TABLE t0 (a serial, b int);
INSERT INTO t0 (b) (SELECT * FROM generate_series(1e4, 2e4) as g);
CREATE TABLE parts (a serial, b int) PARTITION BY HASH(a)
INSERT INTO parts (b) (SELECT * FROM generate_series(1, 1e6) as g);

--
regards,
Andrey Lepikhov
Postgres Professional


explain_with_asymmetric.sql
Description: application/sql


explain_no_asymmetric.sql
Description: application/sql

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2020-07-06 Thread Amit Kapila

On Mon, Jul 6, 2020 at 11:44 AM Dilip Kumar  wrote:
>
> On Mon, Jul 6, 2020 at 11:31 AM Amit Kapila  wrote:
> >
>
> > > > 10.  I have got the below failure once.  I have not investigated this
> > > > in detail as the patch is still under progress.  See, if you have any
> > > > idea?
> > > > #   Failed test 'check extra columns contain local defaults'
> > > > #   at t/013_stream_subxact_ddl_abort.pl line 81.
> > > > #  got: '2|0'
> > > > # expected: '1000|500'
> > > > # Looks like you failed 1 test of 2.
> > > > make[2]: *** [check] Error 1
> > > > make[1]: *** [check-subscription-recurse] Error 2
> > > > make[1]: *** Waiting for unfinished jobs
> > > > make: *** [check-world-src/test-recurse] Error 2
> > >
> > > Even I got the failure once and after that, it did not reproduce.  I
> > > have executed it multiple time but it did not reproduce again.  Are
> > > you able to reproduce it consistently?
> > >
> >
> > No, I am also not able to reproduce it consistently but I think this
> > can fail if a subscriber sends the replay_location before actually
> > replaying the changes.  First, I thought that extra send_feedback we
> > have in apply_handle_stream_commit might have caused this but I guess
> > that can't happen because we need the commit time location for that
> > and we are storing the same at the end of apply_handle_stream_commit
> > after applying all messages.  I am not sure what is going on here.  I
> > think we somehow need to reproduce this or some variant of this test
> > consistently to find the root cause.
>
> And I think it appeared first time for me,  so maybe either induced
> from past few versions so some changes in the last few versions might
> have exposed it.  I have noticed that almost 50% of the time I am able
> to reproduce after the clean build so I can trace back from which
> version it started appearing that way it will be easy to narrow down.
>

One more comment
ReorderBufferLargestTopTXN
{
..
dlist_foreach(iter, &rb->toplevel_by_lsn)
  {
  ReorderBufferTXN *txn;
+ Size size = 0;
+ Size largest_size = 0;

  txn = dlist_container(ReorderBufferTXN, node, iter.cur);

- /* if the current transaction is larger, remember it */
- if ((!largest) || (txn->size > largest->size))
+ /*
+ * If this transaction have some incomplete changes then only consider
+ * the size upto last complete lsn.
+ */
+ if (rbtxn_has_incomplete_tuple(txn))
+ size = txn->complete_size;
+ else
+ size = txn->total_size;
+
+ /* If the current transaction is larger then remember it. */
+ if ((largest != NULL || size > largest_size) && size > 0)

Here largest_size is a local variable inside the loop which is
initialized to 0 in each iteration and that will lead to picking each
next txn as largest.  This seems wrong to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-06 Thread Amit Kapila

On Sun, Jul 5, 2020 at 2:24 AM Jeff Davis  wrote:
>
> On Sat, 2020-07-04 at 14:49 +0530, Amit Kapila wrote:
> > > We don't even have a user report yet of a
> > > regression compared to PG 12, or one that can't be fixed by
> > > increasing
> > > work_mem.
> > >
> >
> > Yeah, this is exactly the same point I have raised above.  I feel we
> > should wait before designing any solution to match pre-13 behavior
> > for
> > hashaggs to see what percentage of users face problems related to
> > this
> > and how much is a problem for them to increase work_mem to avoid
> > regression.
>
> I agree that it's good to wait for actual problems. But the challenge
> is that we can't backport an added GUC.
>

Is it because we won't be able to edit existing postgresql.conf file
or for some other reasons?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Autovacuum on partitioned table (autoanalyze)

2020-07-06 Thread yuzuko

> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson  wrote:
>
> > On 21 Apr 2020, at 18:21, yuzuko  wrote:
>
> > I'll update the patch soon.
>
> Do you have an updated version to submit?  The previous patch no longer 
> applies
> to HEAD, so I'm marking this entry Waiting on Author in the meantime.
>
Thank you for letting me know.
I attach the latest patch applies to HEAD.

I think there are other approaches like Tom's idea that Justin previously
referenced, but this patch works the same way as previous patches.
(tracks updated/inserted/deleted tuples and checks whether the partitioned
tables needs auto-analyze, same as nonpartitioned tables)
Because I wanted to be able to analyze partitioned tables by autovacuum
as a first step, and I think this approach is the simplest way to do it.

-- 
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

v8_autovacuum_on_partitioned_table.patch
Description: Binary data

Proposal: Automatic partition creation

2020-07-06 Thread Anastasia Lubennikova

The previous discussion of automatic partition creation [1] has 
addressed static and dynamic creation of partitions and ended up with 
several syntax proposals.

In this thread, I want to continue this work.

Attached is PoC for static partition creation. The patch core is quite 
straightforward. It adds one more transform clause to convert given 
partitioning specification into several CREATE TABLE statements.


The patch implements following syntax:

CREATE TABLE ... PARTITION BY partition_method (list_of_columns)
partition_auto_create_clause

where partition_auto_create_clause is

CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec

and partition_bound_spec is:

MODULUS integer | VALUES IN (expr [,...]) [, ] |  INTERVAL 
range_step FROM range_start TO range_end


For more examples check auto_partitions.sql in the patch.

TODO:

- CONFIGURATION is just an existing keyword, that I picked as a stub.
 Ideas on better wording are welcome.

- IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet
I wonder, is it worth placing a stub for dynamic partitioning, or we can 
rather add these keywords later.


- HASH and LIST static partitioning works as expected.
Testing and feedback are welcome.

- RANGE partitioning is not really implemented in this patch.
Now it only accepts interval data type as 'interval' and respectively 
date types as range_start and range_end expressions.
Only one partition is created. I found it difficult to implement the 
generation of bounds using internal functions and data types.
Both existing solutions (pg_pathman and pg_partman) rely on SQL level 
routines [2].
I am going to implement this via SPI, which allow to simplify checks and 
calculations. Do you see any pitfalls in this approach?


- Partition naming. Now partition names for all methods look like 
$tablename_$partnum
Do we want more intelligence here? Now we have 
RunObjectPostCreateHook(), which allows to rename the table.
To make it more user-friendly, we can later implement pl/pgsql function 
that sets the callback, as it is done in pg_pathman set_init_callback() [3].


- Current design doesn't allow to create default partition 
automatically. Do we need this functionality?


- Do you see any restrictions for future extensibility (dynamic 
partitioning, init_callback, etc.) in the proposed design ?


I expect this to be a long discussion, so here is the wiki page [4] to 
fix important questions and final agreements.


[1] 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre
[2] 
https://github.com/postgrespro/pg_pathman/blob/dbcbd02e411e6acea6d97f572234746007979538/range.sql#L99

[3] https://github.com/postgrespro/pg_pathman#additional-parameters
[4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 5808c5e843cac1e1383366a5cbff116eaa433f90 Mon Sep 17 00:00:00 2001
From: anastasia 
Date: Fri, 3 Jul 2020 03:34:24 +0300
Subject: [PATCH] WIP create partitions automatically Implement new syntax to
 generate bounds for HASH, LIST and RANGE partitions. Implement automatic
 partition creation for HASH and LIST. Check new regression test
 'auto_partitions.sql' for syntax examples

---
 src/backend/commands/tablecmds.c  |   7 +
 src/backend/nodes/copyfuncs.c |  33 
 src/backend/nodes/equalfuncs.c|  29 +++
 src/backend/nodes/outfuncs.c  |  28 +++
 src/backend/nodes/readfuncs.c |  33 
 src/backend/parser/gram.y | 160 ++---
 src/backend/parser/parse_utilcmd.c| 166 ++
 src/include/nodes/nodes.h |   2 +
 src/include/nodes/parsenodes.h|  43 +
 src/include/parser/kwlist.h   |   1 +
 src/include/partitioning/partdefs.h   |   4 +
 src/test/regress/expected/auto_partitions.out |   0
 src/test/regress/parallel_schedule|   2 +
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/auto_partitions.sql  |  43 +
 15 files changed, 523 insertions(+), 29 deletions(-)
 create mode 100644 src/test/regress/expected/auto_partitions.out
 create mode 100644 src/test/regress/sql/auto_partitions.sql

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f79044f39f..7b2c651952 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -628,6 +628,13 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	else
 		partitioned = false;
 
+	if (!partitioned && stmt->partautocreate)
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("PARTITION bounds can only be used on partitioned tables")));
+	}
+
 	/*
 	 * Look up the namespace in which we are supposed to create the relation,
 	 * check we have permission to create there

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-06 Thread Amit Kapila

On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada
 wrote:
>
> I've attached PoC patch that implements a simple approach. I'd like to
> discuss how we collect the replication slot statistics in the stats
> collector before I bring the patch to completion.
>
> In this PoC patch, we have the array of PgStat_ReplSlotStats struct
> which has max_replication_slots entries. The backend and wal sender
> send the slot statistics to the stats collector when decoding a commit
> WAL record.
>
> typedef struct PgStat_ReplSlotStats
> {
> charslotname[NAMEDATALEN];
> PgStat_Counter  spill_txns;
> PgStat_Counter  spill_count;
> PgStat_Counter  spill_bytes;
> } PgStat_ReplSlotStats;
>
> What I'd like to discuss are:
>
> Since the unique identifier of replication slots is the name, the
> process sends slot statistics along with slot name to stats collector.
> I'm concerned about the amount of data sent to the stats collector and
> the cost of searching the statistics within the statistics array (it’s
> O(N) where N is max_replication_slots). Since the maximum length of
> slot name is NAMEDATALEN (64 bytes default) and max_replication_slots
> is unlikely a large number, I might be too worrying but it seems like
> it’s better to avoid that if we can do that easily. An idea I came up
> with is to use the index of slots (i.g., the index of
> ReplicationSlotCtl->replication_slots[]) as the index of statistics of
> slot in the stats collector. But since the index of slots could change
> after the restart we need to synchronize the index of slots on both
> array somehow. So I thought that we can determine the index of the
> statistics of slots at ReplicationSlotAcquire() or
> ReplicationSlotCreate(), but it will in turn need to read stats file
> while holding ReplicationSlotControlLock to prevent the same index of
> the statistics being used by the concurrent process who creating a
> slot. I might be missing something though.
>

I don't think we should be bothered about the large values of
max_replication_slots.  The default value is 10 and I am not sure if
users will be able to select values large enough that we should bother
about searching them by name.  I think if it could turn out to be a
problem then we can try to invent something to mitigate it.

> Second, as long as the unique identifier is the slot name there is no
> convenient way to distinguish between the same name old and new
> replication slots, so the backend process or wal sender process sends
> a message to the stats collector to drop the replication slot at
> ReplicationSlotDropPtr(). This strategy differs from what we do for
> table, index, and function statistics. It might not be a problem but
> I’m thinking a better way.
>

Can we rely on message ordering in the transmission mechanism (UDP)
for stats?  The wiki suggests [1] we can't.  If so, then this might
not work.

> The new view name is also an open question. I prefer
> pg_stat_replication_slots and to add stats of physical replication
> slots to the same view in the future, rather than a separate view.
>

This sounds okay to me.

> Aside from the above, this patch will change the most of the changes
> introduced by commit 9290ad198b1 and introduce new code much. I’m
> concerned whether such changes are acceptable at the time of beta 2.
>

I think it depends on the final patch. My initial thought was that we
should do this for PG14 but if you are suggesting removing the changes
done by commit 9290ad198b1 then we need to think over it.  I could
think of below options:
a. Revert 9290ad198b1 and introduce stats for spilling in PG14.  We
were anyway having spilling without any work in PG13 but didn’t have
stats.
b. Try to get your patch in PG13 if we can, otherwise, revert the
feature 9290ad198b1.
c. Get whatever we have in commit 9290ad198b1 for PG13 and
additionally have what we are discussing here for PG14.  This means
that spilled stats at slot level will be available in PG14 via
pg_stat_replication_slots and for individual WAL senders it will be
available via pg_stat_replication both in PG13 and PG14.  Even if we
can get your patch in PG13, we can still keep those in
pg_stat_replication.
d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
for PG14.  I don't think this will be a popular approach.

[1] - https://en.wikipedia.org/wiki/User_Datagram_Protocol

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-07-06 Thread Bharath Rupireddy

>>
>> If I understand it correctly, your suggestion is to add
>> keep_connection option and use that while defining the server object.
>> IMO having keep_connection option at the server object level may not
>> serve the purpose being discussed here.
>> For instance, let's say I create a foreign server in session 1 with
>> keep_connection on, and I want to use that
>> server object in session 2 with keep_connection off and session 3 with
>> keep_connection on and so on.
>
> In my opinion, in such cases, one needs to create two server object one with
> keep-connection ON and one with keep-connection off.  And need to decide
> to use appropriate for the particular session.
>

Yes, having two variants of foreign servers: one with keep-connections
on (this can be default behavior,
even if user doesn't mention this option, internally it can be treated
as keep-connections on) ,
and if users need no connection hashing, another foreign server with
all other options same but keep-connections
off.

This looks okay to me, if we want to avoid a core session level GUC.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: jsonpath versus NaN

2020-07-06 Thread Alexander Korotkov

On Thu, Jun 18, 2020 at 8:04 PM Alexander Korotkov
 wrote:
> On Thu, Jun 18, 2020 at 7:45 PM Tom Lane  wrote:
> > Alexander Korotkov  writes:
> > > Thank you for your answer. I'm trying to understand your point.
> > > Standard claims that .double() method should behave the same way as
> > > CAST to double.  However, standard references the standard behavior of
> > > CAST here, not behavior of your implementation of CAST.  So, if we
> > > extend the functionality of standard CAST in our implementation, that
> > > doesn't automatically mean we should extend the .double() jsonpath
> > > method in the same way.  Is it correct?
> >
> > Right.  We could, if we chose, extend jsonpath to allow Inf/NaN, but
> > I don't believe there's an argument that the spec requires us to.
> >
> > Also the larger point is that it doesn't make sense to extend jsonpath
> > that way when we haven't extended json(b) that way.  This code wart
> > wouldn't exist were it not for that inconsistency.  Also, I find it hard
> > to see why anyone would have a use for NaN in a jsonpath test when they
> > can't write NaN in the input json data, nor have it be correctly reflected
> > into output json data either.
>
> Ok, I got the point.  I have nothing against removing support of NaN
> in jsonpath as far as it doesn't violates the standard.  I'm going to
> write the patch for this.

The patchset is attached, sorry for the delay.

The first patch improves error messages, which appears to be unclear
for me.  If one applies .double() method to a numeric value, we
restrict that this numeric value should fit to double precision type.
If it doesn't fit, the current error message just says the following.

ERROR: jsonpath item method .double() can only be applied to a numeric value

But that's confusing, because .double() method is naturally applied to
a numeric value.  Patch makes this message explicitly report that
numeric value is out of range for double type.  This patch also adds
test exercising this error.  When string can't be converted to double
precision, I think it's better to explicitly say that we expected
valid string representation of double precision type.

Second patch forbids to convert NaN using .double() method.  As I get,
NaN can't be result of any jsonpath computations assuming there is no
NaN input.  So, I just put an assert to convertJsonbScalar() ensuring
there is no NaN in JsonbValue.

--
Regards,
Alexander Korotkov

0001-Improve-error-reporting-for-jsonpath-.double-method.patch
Description: Binary data

0002-Forbid-NaNs-in-jsonpath.patch
Description: Binary data

Re: Libpq support to connect to standby server as priority

2020-07-06 Thread Greg Nancarrow

> This patch no longer applies, can you please submit a rebased version?  I've
> marked the entry as Waiting on Author in the meantime.
>

Here's a rebased version of the patch.

Regards,
Greg


v16-0001-Enhance-libpq-target_session_attrs-and-add-target_se.patch
Description: Binary data

Re: WIP/PoC for parallel backup

2020-07-06 Thread Daniel Gustafsson

> On 12 Jun 2020, at 19:28, Robert Haas  wrote:

> I am sure that nobody is arguing that the patch has to be beneficial
> in all cases in order to justify applying it. However, there are
> several good arguments against proceding with this patch:

This thread has stalled with no resolution to the raised issues, and the latest
version of the patch (v15) posted no longer applies (I only tried 0001 which
failed, the green tick in the CFBot is due it mistakenlt thinking an attached
report is a patch).  I'm marking this patch Returned with Feedback.  Please
open a new CF entry when there is a new version of the patch.

cheers ./daniel

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-07-06 Thread Bharath Rupireddy

> >
> > If I understand it correctly, your suggestion is to add
> > keep_connection option and use that while defining the server object.
> > IMO having keep_connection option at the server object level may not
> > serve the purpose being discussed here.
> > For instance, let's say I create a foreign server in session 1 with
> > keep_connection on, and I want to use that
> > server object in session 2 with keep_connection off and session 3 with
> > keep_connection on and so on.
> > One way we can change the server's keep_connection option is to alter
> > the server object, but that's not a good choice,
> > as we have to alter it at the system level.
>
> Is there use-case in practice where different backends need to have
> different connection cache setting even if all of them connect the
> same server?

Currently, connection cache exists at each backend/session level and
gets destroyed
on backend/session exit. I think the same cached connection can be
used until it gets invalidated
due to user mapping or server definition changes.

One way is to have a shared memory based connection cache instead of
backend level cache,
but it has its own problems, like maintenance, invalidation, dealing
with concurrent usages etc.

> I thought since the problem that this feature is trying
> to resolve is not to  eat up the remote server connections capacity by
> disabling connection cache, we’d like to disable connection cache to
> the particular server, for example, which sets low max_connections.
>

Currently, the user mapping oid acts as the key for the cache's hash
table, so the cache entries
are not made directly using foreign server ids though each entry would
have some information related
to foreign server.

Just to reiterate, the main idea if this feature  is to give the user
a way to choose, whether to use connection caching or not,
if he decides that his session uses remote queries very rarely, then
he can disable, or if the remote queries are more frequent in
a particular session, he can choose to use connection caching.

In a way, this feature addresses the point that local sessions not
eating up remote connections/sessions by
letting users decide(as users know better when to cache or when not
to) to cache or not cache the remote connections
and thus releasing them immediately if there is not much usage from
local session.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Inlining of couple of functions in pl_exec.c improves performance

2020-07-06 Thread Amit Khandekar

On Sat, 4 Jul 2020 at 01:19, Tom Lane  wrote:
>
> I did some performance testing on 0001+0002 here, and found that
> for me, there's basically no change on x86_64 but a win of 2 to 3
> percent on aarch64, using Pavel's pi_est_1() as a representative
> case for simple plpgsql statements.  That squares with your original
> results I believe.  It's not clear to me whether any of the later
> tests in this thread measured these changes in isolation, or only
> with 0003 added.

Yeah I had the same observation. 0001+0002 seems to benefit mostly on
aarch64. And 0003 (exec_case_value) benefited both on amd64 and
aarch64.

>
> Anyway, that's good enough for me, so I pushed 0001+0002 after a
> little bit of additional cosmetic tweaking.
>
> I attach your original 0003 here (it still applies, with some line
> offsets).  That's just so the cfbot doesn't get confused about what
> it's supposed to test now.

Thanks for pushing all the three !

Thanks,
-Amit Khandekar
Huawei Technologies

Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

2020-07-06 Thread Georgios Kokolatos

As a general overview, the series of patches in the mail thread do match their 
description. The addition of the stricter, explicit use of instrumentation does 
improve the design as the distinction of the use cases requiring a pin or a 
lock is made more clear. The added commentary is descriptive and appears 
grammatically correct, at least to a non native speaker.

Unfortunately though, the two bug fixes do not seem to apply.

Also, there is a small issue regarding the process, not the content of the 
patches. In CF app there is a latest attachment  
(v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patch)  which does not appear 
in the mail thread.  Before changing the status, I will kindly ask for the 
complete latest series that applies in the mail thread.

The new status of this patch is: Waiting on Author

Re: WIP/PoC for parallel backup

2020-07-06 Thread Hamid Akhtar

On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson  wrote:

> > On 12 Jun 2020, at 19:28, Robert Haas  wrote:
>
> > I am sure that nobody is arguing that the patch has to be beneficial
> > in all cases in order to justify applying it. However, there are
> > several good arguments against proceding with this patch:
>
> This thread has stalled with no resolution to the raised issues, and the
> latest
> version of the patch (v15) posted no longer applies (I only tried 0001
> which
> failed, the green tick in the CFBot is due it mistakenlt thinking an
> attached
> report is a patch).  I'm marking this patch Returned with Feedback.  Please
> open a new CF entry when there is a new version of the patch.
>
> cheers ./daniel


I think this is fair. There are quite a few valid points raised by Robert.


-- 
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950  EMAIL: mailto:hamid.akh...@highgo.ca
SKYPE: engineeredvirus

Re: Yet another fast GiST build (typo)

2020-07-06 Thread Andrey M. Borodin



> 1 июля 2020 г., в 17:05, Daniel Gustafsson  написал(а):
> 
>> On 29 Feb 2020, at 18:24, Andrey M. Borodin  wrote:
> 
>> I've fixed this and added patch with GiST reloption to provide sort support 
>> function.
> 
> 0002 in this patchset fails to apply, please submit a rebased version.  I've
> marked the entry Waiting for Author in the meantime.

Thanks, Daniel!

PFA v7.

Best regards, Andrey Borodin.


v7-0001-Add-sort-support-for-point-gist_point_sortsupport.patch
Description: Binary data


v7-0002-Implement-GiST-build-using-sort-support.patch
Description: Binary data

Re: Binary support for pgoutput plugin

2020-07-06 Thread Dave Cramer

On Sun, 5 Jul 2020 at 17:28, Daniel Gustafsson  wrote:

> > On 5 Jul 2020, at 23:11, Alvaro Herrera 
> wrote:
> >
> > On 2020-Jul-05, Daniel Gustafsson wrote:
> >
> >>> On 2 Jul 2020, at 18:41, Dave Cramer  wrote:
> >>>
> >>> rebased
> >>
> >> Thanks!  The new version of 0001 patch has a compiler warning due to
> mixed
> >> declarations and code:
> >>
> >> worker.c: In function ‘slot_store_data’:
> >> worker.c:366:5: error: ISO C90 forbids mixed declarations and code
> [-Werror=declaration-after-statement]
> >
> > AFAICS this is fixed in 0005.
>
> Yes and no, 0005 fixes one such instance but the one failing the build is
> another one in worker.c (the below being from 0008 which in turn change
> the row
> in question from previous patches):
>
> +   int cursor = tupleData->values[remoteattnum]->cursor;
>
> >  I'm going to suggest to use "git rebase -i"
>
> +1
>

Strangely I don't see those errors when I build on my machine, but I will
fix them

as far as rebase -i do what is advised here for squashing them. Just one
patch now ?

Thanks,

Re: Binary support for pgoutput plugin

2020-07-06 Thread Daniel Gustafsson

> On 6 Jul 2020, at 14:58, Dave Cramer  wrote:

> as far as rebase -i do what is advised here for squashing them. Just one 
> patch now ?

One patch per logical change, if there are two disjoint changes in the patchset
where one builds on top of the other then multiple patches are of course fine.
My personal rule-of-thumb is to split it the way I envision it committed.

cheers ./daniel

Re: Is it useful to record whether plans are generic or custom?

2020-07-06 Thread Fujii Masao





On 2020/06/11 14:59, torikoshia wrote:

On 2020-06-10 18:00, Kyotaro Horiguchi wrote:



+    TupleDescInitEntry(tupdesc, (AttrNumber) 8, "last_plan",

This could be a problem if we showed the last plan in this view.  I
think "last_plan_type" would be better.

+    if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_CUSTOM)
+    values[7] = CStringGetTextDatum("custom");
+    else if (prep_stmt->plansource->last_plan_type == 
PLAN_CACHE_TYPE_GENERIC)
+    values[7] = CStringGetTextDatum("generic");
+    else
+    nulls[7] = true;

Using swith-case prevents future additional type (if any) from being
unhandled.  I think we are recommending that as a convension.


Thanks for your reviewing!

I've attached a patch that reflects your comments.


Thanks for the patch! Here are the comments.


+Number of times generic plan was choosen
+Number of times custom plan was choosen

Typo: "choosen" should be "chosen"?


+  
+   last_plan_type text
+  
+  
+Tells the last plan type was generic or custom. If the prepared
+statement has not executed yet, this field is null
+  

Could you tell me how this information is expected to be used?
I think that generic_plans and custom_plans are useful when investigating
the cause of performance drop by cached plan mode. But I failed to get
how much useful last_plan_type is.


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Binary support for pgoutput plugin

2020-07-06 Thread Dave Cramer

On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson  wrote:

> > On 6 Jul 2020, at 14:58, Dave Cramer  wrote:
>
> > as far as rebase -i do what is advised here for squashing them. Just one
> patch now ?
>
> One patch per logical change, if there are two disjoint changes in the
> patchset
> where one builds on top of the other then multiple patches are of course
> fine.
> My personal rule-of-thumb is to split it the way I envision it committed.
>

At this point it is the result of 3 rebases. I guess I'll have to break it
up differently..

Thanks,

Dave

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-07-06 Thread Ashutosh Bapat

On Thu, Jul 2, 2020 at 4:29 PM Bharath Rupireddy
 wrote:
> >
> > This is actually strange. AFAIR the code, without looking at the
> > current code, when a query picks a foreign connection it checks its
> > state. It's possible that the connection has not been marked bad by
> > the time you fire new query. If the problem exists probably we should
> > fix it anyway since the backend at the other end of the connection has
> > higher chances of being killed while the connection was sitting idle
> > in the cache.
> >
>
> Thanks Ashutosh for the suggestion. One way, we could solve the above
> problem is that, upon firing the new foreign query from local backend using 
> cached
> connection, (assuming the remote backend/session that was cached in the local 
> backed got
> killed by some means), instead of failing the query in the local 
> backend/session, upon
> detecting error from remote backend, we could just delete the cached old 
> entry and try getting another
> connection to remote backend/session, cache it and proceed to submit the 
> query. This has to happen only at
> the beginning of remote xact.

Yes, I believe that would be good.

>
> This way, instead of failing(as mentioned above " server closed the 
> connection unexpectedly"),
> the query succeeds if the local session is able to get a new remote backend 
> connection.
>

In GetConnection() there's a comment
/*
 * We don't check the health of cached connection here, because it would
 * require some overhead.  Broken connection will be detected when the
 * connection is actually used.
 */
Possibly this is where you want to check the health of connection when
it's being used the first time in a transaction.

> I worked on a POC patch to prove the above point. Attaching the patch.
> Please note that, the patch doesn't contain comments and has some issues like 
> having some new
> variable in PGconn structure and the things like.

I don't think changing the PGConn structure for this is going to help.
It's a libpq construct and used by many other applications/tools other
than postgres_fdw. Instead you could use ConnCacheEntry for the same.
See how we track invalidated connection and reconnect upon
invalidation.

>
> If the approach makes some sense, then I can rework properly on the patch and 
> probably
> can open another thread for the review and other stuff.
>
> The way I tested the patch:
>
> 1. select * from foreign_tbl;
> /*from local session - this results in a
> remote connection being cached in
> the connection cache and
> a remote backend/session is opened.
> */
> 2. kill the remote backend/session
> 3. select * from foreign_tbl;
> /*from local session - without patch
> this throws error "ERROR: server closed the connection unexpectedly"
> with path - try to use
> the cached connection at the beginning of remote xact, upon receiving
> error from remote postgres
> server, instead of aborting the query, delete the cached entry, try to
> get a new connection, if it
> gets, cache it and use that for executing the query, query succeeds.
> */

This will work. Be cognizant of the fact that the same connection may
be used by multiple plan nodes.


--
Best Wishes,
Ashutosh Bapat

Re: Ideas about a better API for postgres_fdw remote estimates

2020-07-06 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> 2. Wedging this into EXPLAIN would be quite ugly, because (at least
> >> as I envision it) the output would have just about nothing to do with
> >> any existing EXPLAIN output.
> 
> > This is a better argument for not making it part of EXPLAIN, though I
> > don't really feel like I've got a decent idea of what you are suggesting
> > the output *would* look like, so it's difficult for me to agree (or
> > disagree) about this particular point.
> 
> Per postgres_fdw's get_remote_estimate(), the only data we use right now
> is the startup_cost, total_cost, rows and width estimates from the
> top-level Plan node.  That's available immediately from the Plan tree,
> meaning that basically *nothing* of the substantial display effort
> expended by explain.c and ruleutils.c is of any value.  So the level-zero

The 'display effort' you're referring to, when using JSON format with
explain, is basically to format the results into JSON and return them-
which is what you're suggesting this mode would do anyway, no..?

If the remote side 'table' is actually a view that's complicated then
having a way to get just the top-level information (and excluding the
rest) sounds like it'd be useful and perhaps excluding that other info
doesn't really fit into EXPLAIN's mandate, but that's also much less
common.

> implementation of this would be to run the parser and planner, format
> those four numbers into a JSON object (which would require little more
> infrastructure than sprintf), and return that.  Sure, we could make that
> into some kind of early-exit path in explain.c, but I think it'd be a
> pretty substantial wart, especially since it'd mean that none of the
> other EXPLAIN options are sensible in combination with this one.

That EXPLAIN has options that only make sense in combination with
certain other options isn't anything new- BUFFERS makes no sense without
ANALYZE, etc.

> Further down the road, we might want to rethink the whole idea of
> completely constructing a concrete Plan.  We could get the data we need
> at the list-of-Paths stage.  Even more interesting, we could (with very
> little more work) return data about multiple Paths, so that the client
> could find out, for example, the costs of sorted and unsorted output
> without paying two network round trips to discover that.  That'd
> definitely require changes in the core planner, since it has no API to
> stop at that point.  And it's even less within the charter of EXPLAIN.

I have to admit that I'm not really sure how we could make it work, but
having a way to get multiple paths returned by EXPLAIN would certainly
be interesting to a lot of users.  Certainly it's easier to see how we
could get at that info in a postgres_fdw-specific function, and be able
to understand how to deal with it there and what could be done, but once
it's there I wonder if other tools might see that and possibly even
build on it because it'd be the only way to get that kind of info, which
certainly wouldn't be ideal.

> I grant your point that there might be other users for this besides
> postgres_fdw, but that doesn't mean it must be a core feature.

That postgres_fdw is an extension is almost as much of a wart as
anything being discussed here and suggesting that things added to
postgres_fdw aren't 'core features' seems akin to ignoring the forest
for the trees- consider that, today, there isn't even an option to
install only the core server from the PGDG repos (at least for Debian /
Ubuntu, not sure if the RPMs have caught up to that yet, but they
probably should).  The 'postgresql-12' .deb includes all the extensions
that are part of the core git repo, because they're released and
maintained just the same as the core server and, from a practical
perspective, to run a decent PG system you really should have them
installed, so why bother having a separate package?

> >> 3. We surely would not back-patch a core change like this.  OTOH, if
> >> the added infrastructure is in an extension, somebody might want to
> >> back-patch that (even if unofficially).
> 
> > Since postgres_fdw is part of core and core's release cycle, and the
> > packagers manage the extensions from core in a way that they have to
> > match up, this argument doesn't hold any weight with me.
> 
> Certainly only v14 (or whenever) and later postgres_fdw would be able
> to *use* this data.  The scenario I'm imagining is that somebody wants
> to be able to use that client against an older remote server, and is
> willing to install some simple extension on the remote server to do so.
> Perhaps this scenario is not worth troubling over, but I don't think
> it's entirely far-fetched.

I definitely don't think that such an extension should be maintained
outside of core, and I seriously doubt any of our packagers would be
anxious to build an indepedent package for this to be usable in older
servers.

Re: Cache lookup errors with functions manipulation object addresses

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-06, Michael Paquier wrote:

> While refreshing my mind with this code, I got surprised with the
> choice of "routine" in getProcedureTypeDescription() when we need a
> default object type name for an object not found, so I have switched
> that to "procedure" to be more consistent.

I think "routine" was more correct.  We introduced that terminology to
encompass both functions and procedures, back when real procedures were
added.  See the definitions in the glossary for example.

> I have also spent some time analyzing the coverage of the patch, and
> did not find any obvious holes or any remaining missing_ok paths not
> covered.  Some comments were also a bit weird after re-reading them,
> so I tweaked a couple of places.

Cool.

> Attached is for now a rebased patch.  If there are any comments,
> please feel free.  Daniel, Alvaro, does that look fine for you?  I am
> letting this stuff aside for a couple of days for the moment.

I'm not sure that I'll have time to review this in the next couple of
days, but it seemed sufficiently good when I last looked.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposal: Automatic partition creation

2020-07-06 Thread Justin Pryzby

On Mon, Jul 06, 2020 at 01:45:52PM +0300, Anastasia Lubennikova wrote:
> The previous discussion of automatic partition creation [1] has addressed
> static and dynamic creation of partitions and ended up with several syntax
> proposals.
...
> where partition_auto_create_clause is
> 
> CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec

> - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet
> I wonder, is it worth placing a stub for dynamic partitioning, or we can
> rather add these keywords later.

I understand by "deferred" you mean that the partition isn't created at the
time CREATE TABLE is run but rather deferred until needed by INSERT.

For deferred, range partitioned tables, I think maybe what you'd want to
specify (and store) is the INTERVAL.  If the table is partitioned by day, then
we'd date_trunc('day', time) and dynamically create that day.  But if it was
partitioned by month, we'd create the month.  I think you'd want to have an
ALTER command for that (we would use that to change tables between
daily/monthly based on their current size).  That should also support setting
the MODULUS of a HASH partitioned table, to allow changing the size of its
partitions (currently, the user would have to more or less recreate the table
and move all its data into different partitions, but that's not ideal).

I don't know if it's important for anyone, but it would be interesting to think
about supporting sub-partitioning: partitions which are themselvese partitioned.
Like something => something_ => something__MM => something__MM_DD.
You'd need to specify how to partition each layer of the heirarchy.  In the
most general case, it could be different partition strategy.

If you have a callback function for partition renaming, I think you'd want to
pass it not just the current name of the partition, but also the "VALUES" used
in partition creation.  Like (2020-04-05)TO(2020-05-06).  Maybe instead, we'd
allow setting a "format" to use to construct the partition name.  Like
"child.foo_bar_%Y_%m_%d".  Ideally, the formats would be fixed-length
(zero-padded, etc), so failures with length can happen at "parse" time of the
statement and not at "run" time of the creation.  You'd still have to handle
the case that the name already exists but isn't a partition (or is a partition
by doesn't handle the incoming tuple for some reason).

Also, maybe your "configuration" syntax would allow specifying other values.
Maybe including a retention period (as an INTERVAL for RANGE tables).  That's
useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE.

-- 
Justin

Re: Ideas about a better API for postgres_fdw remote estimates

2020-07-06 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Per postgres_fdw's get_remote_estimate(), the only data we use right now
>> is the startup_cost, total_cost, rows and width estimates from the
>> top-level Plan node.  That's available immediately from the Plan tree,
>> meaning that basically *nothing* of the substantial display effort
>> expended by explain.c and ruleutils.c is of any value.  So the level-zero

> The 'display effort' you're referring to, when using JSON format with
> explain, is basically to format the results into JSON and return them-
> which is what you're suggesting this mode would do anyway, no..?

Not hardly.  Spend some time studying ruleutils.c sometime ---
reverse-compiling a plan is *expensive*.  For instance, we have to
look up the names of all the operators used in the query quals,
decide what needs quoting, decide what needs parenthesization, etc.
There's also a fun little bit that assigns unique aliases to each
table appearing in the query, which from memory is at least O(N^2)
and maybe worse.  (Admittedly, shipped queries are usually not so
complicated that N would be large.)  And by the way, we're also
starting up the executor, even if you didn't say ANALYZE.

A little bit of fooling with "perf" suggests that when explaining
a pretty simple bitmapscan query --- I used
EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
which ought to be somewhat representative of what postgres_fdw needs
--- only about half of the runtime is spent within pg_plan_query, and
the other half is spent on explain.c + ruleutils.c formatting work.
So while getting rid of that overhead wouldn't be an earthshattering
improvement, I think it'd be worthwhile.

>> Further down the road, we might want to rethink the whole idea of
>> completely constructing a concrete Plan.  We could get the data we need
>> at the list-of-Paths stage.  Even more interesting, we could (with very
>> little more work) return data about multiple Paths, so that the client
>> could find out, for example, the costs of sorted and unsorted output
>> without paying two network round trips to discover that.

> I have to admit that I'm not really sure how we could make it work, but
> having a way to get multiple paths returned by EXPLAIN would certainly
> be interesting to a lot of users.  Certainly it's easier to see how we
> could get at that info in a postgres_fdw-specific function, and be able
> to understand how to deal with it there and what could be done, but once
> it's there I wonder if other tools might see that and possibly even
> build on it because it'd be the only way to get that kind of info, which
> certainly wouldn't be ideal.

Yeah, thinking about it as a function that inspects partial planner
results, it might be useful for other purposes besides postgres_fdw.
As I said before, I don't think this necessarily has to be bundled as
part of postgres_fdw.  That still doesn't make it part of EXPLAIN.

> That postgres_fdw is an extension is almost as much of a wart as
> anything being discussed here and suggesting that things added to
> postgres_fdw aren't 'core features' seems akin to ignoring the forest
> for the trees-

I think we just had this discussion in another thread.  The fact that
postgres_fdw is an extension is a feature, not a bug, because (a) it
means that somebody could implement their own version if they wanted
it to act differently; and (b) it keeps us honest about whether the
APIs needed by an FDW are accessible from outside core.  I think moving
postgres_fdw into core would be a large step backwards.

regards, tom lane

bad status of FETCH PERCENT in commitfest application

2020-07-06 Thread Pavel Stehule

Hi

I checking state of https://commitfest.postgresql.org/28/2176/

It should be committed, but I don't see a commit?

Regards

Pavel

Re: bad status of FETCH PERCENT in commitfest application

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-06, Pavel Stehule wrote:

> I checking state of https://commitfest.postgresql.org/28/2176/
> 
> It should be committed, but I don't see a commit?

I reverted it to needs-review.  It was marked as committed by me, but
the one I did commit by the same author in the same area is FETCH WITH
TIES.  The confusion is understandable.

Thanks for pointing it out

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ideas about a better API for postgres_fdw remote estimates

2020-07-06 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Per postgres_fdw's get_remote_estimate(), the only data we use right now
> >> is the startup_cost, total_cost, rows and width estimates from the
> >> top-level Plan node.  That's available immediately from the Plan tree,
> >> meaning that basically *nothing* of the substantial display effort
> >> expended by explain.c and ruleutils.c is of any value.  So the level-zero
> 
> > The 'display effort' you're referring to, when using JSON format with
> > explain, is basically to format the results into JSON and return them-
> > which is what you're suggesting this mode would do anyway, no..?
> 
> Not hardly.  Spend some time studying ruleutils.c sometime ---
> reverse-compiling a plan is *expensive*.  For instance, we have to
> look up the names of all the operators used in the query quals,
> decide what needs quoting, decide what needs parenthesization, etc.

Ah, alright, that makes more sense then.

> There's also a fun little bit that assigns unique aliases to each
> table appearing in the query, which from memory is at least O(N^2)
> and maybe worse.  (Admittedly, shipped queries are usually not so
> complicated that N would be large.)  And by the way, we're also
> starting up the executor, even if you didn't say ANALYZE.
> 
> A little bit of fooling with "perf" suggests that when explaining
> a pretty simple bitmapscan query --- I used
>   EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995
> which ought to be somewhat representative of what postgres_fdw needs
> --- only about half of the runtime is spent within pg_plan_query, and
> the other half is spent on explain.c + ruleutils.c formatting work.
> So while getting rid of that overhead wouldn't be an earthshattering
> improvement, I think it'd be worthwhile.

Sure.

> >> Further down the road, we might want to rethink the whole idea of
> >> completely constructing a concrete Plan.  We could get the data we need
> >> at the list-of-Paths stage.  Even more interesting, we could (with very
> >> little more work) return data about multiple Paths, so that the client
> >> could find out, for example, the costs of sorted and unsorted output
> >> without paying two network round trips to discover that.
> 
> > I have to admit that I'm not really sure how we could make it work, but
> > having a way to get multiple paths returned by EXPLAIN would certainly
> > be interesting to a lot of users.  Certainly it's easier to see how we
> > could get at that info in a postgres_fdw-specific function, and be able
> > to understand how to deal with it there and what could be done, but once
> > it's there I wonder if other tools might see that and possibly even
> > build on it because it'd be the only way to get that kind of info, which
> > certainly wouldn't be ideal.
> 
> Yeah, thinking about it as a function that inspects partial planner
> results, it might be useful for other purposes besides postgres_fdw.
> As I said before, I don't think this necessarily has to be bundled as
> part of postgres_fdw.  That still doesn't make it part of EXPLAIN.

Providing it as a function rather than through EXPLAIN does make a bit
more sense if we're going to skip things like the lookups you mention
above.  I'm still inclined to have it be a part of core rather than
having it as postgres_fdw though.  I'm not completely against it being
part of postgres_fdw... but I would think that would really be
appropriate if it's actually using something in postgres_fdw, but if
everything that it's doing is part of core and nothing related
specifically to the postgres FDW, then having it as part of core makes
more sense to me.  Also, having it as part of core would make it more
appropriate for other tools to look at and adding that kind of
inspection capability for partial planner results could be very
interesting for tools like pgAdmin and such.

> > That postgres_fdw is an extension is almost as much of a wart as
> > anything being discussed here and suggesting that things added to
> > postgres_fdw aren't 'core features' seems akin to ignoring the forest
> > for the trees-
> 
> I think we just had this discussion in another thread.  The fact that
> postgres_fdw is an extension is a feature, not a bug, because (a) it
> means that somebody could implement their own version if they wanted
> it to act differently; and (b) it keeps us honest about whether the
> APIs needed by an FDW are accessible from outside core.  I think moving
> postgres_fdw into core would be a large step backwards.

I'm not looking to change it today, as that ship has sailed, but while
having FDWs as a general capability that can be implemented by
extensions is certainly great and I'd love to see more of that (even
better would be more of those that are well maintained and cared for by
this community of folks), requiring users to install an extension into
every database where they want to query another PG s

Re: bad status of FETCH PERCENT in commitfest application

2020-07-06 Thread Pavel Stehule

po 6. 7. 2020 v 17:27 odesílatel Alvaro Herrera 
napsal:

> On 2020-Jul-06, Pavel Stehule wrote:
>
> > I checking state of https://commitfest.postgresql.org/28/2176/
> >
> > It should be committed, but I don't see a commit?
>
> I reverted it to needs-review.  It was marked as committed by me, but
> the one I did commit by the same author in the same area is FETCH WITH
> TIES.  The confusion is understandable.
>

ok, thank you for info

Pavel


> Thanks for pointing it out
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: min_safe_lsn column in pg_replication_slots view

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-04, Amit Kapila wrote:

> Fair enough.  It would be good if we can come up with something better
> than 'distance' for this column.  Some ideas safe_wal_limit,
> safe_wal_size?

Hmm, I like safe_wal_size.

I've been looking at this intermittently since late last week and I
intend to get it done in the next couple of days.

Thanks!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Global temporary tables

2020-07-06 Thread Pavel Stehule

Hi

čt 11. 6. 2020 v 4:13 odesílatel 曾文旌  napsal:

>
> 2020年6月9日 下午8:15，Prabhat Sahu  写道：
>
>
>
> On Wed, Apr 29, 2020 at 8:52 AM 曾文旌  wrote:
>
>> 2020年4月27日 下午9:48，Prabhat Sahu  写道：
>>
>> Thanks Wenjing, for the fix patch for previous issues.
>> I have verified the issues, now those fix look good to me.
>> But the below error message is confusing(for gtt2).
>>
>> postgres=# drop table gtt1;
>> ERROR:  cannot drop global temp table gtt1 when other backend attached it.
>>
>> postgres=# drop table gtt2;
>> ERROR:  cannot drop index idx2 on global temp table gtt2 when other
>> backend attached it.
>>
>> I feel the above error message shown for "DROP TABLE gtt2;" is a bit
>> confusing(looks similar to DROP INDEX gtt2;).
>> If possible, can we keep the error message simple as "ERROR:  cannot
>> drop global temp table gtt2 when other backend attached it."?
>> I mean, without giving extra information for the index attached to that
>> GTT.
>>
>> Fixed the error message to make the expression more accurate. In v33.
>>
>
> Thanks Wenjing. We verified your latest patch(gtt_v33) focusing on all
> reported issues and they work fine.
> Thanks.
> --
>
>
> I'm very glad to hear such good news.
> I am especially grateful for your professional work on GTT.
> Please feel free to let me know if there is anything you think could be
> improved.
>
>
> Thanks.
>
>
> Wenjing
>

this patch needs rebase

Regards

Pavel


> With Regards,
> Prabhat Kumar Sahu
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2020-07-06 Thread Robert Haas

On Wed, Jul 1, 2020 at 5:15 AM Bharath Rupireddy
 wrote:
> If I understand it correctly, your suggestion is to add
> keep_connection option and use that while defining the server object.
> IMO having keep_connection option at the server object level may not
> serve the purpose being discussed here.
> For instance, let's say I create a foreign server in session 1 with
> keep_connection on, and I want to use that
> server object in session 2 with keep_connection off and session 3 with
> keep_connection on and so on.
> One way we can change the server's keep_connection option is to alter
> the server object, but that's not a good choice,
> as we have to alter it at the system level.
>
> Overall, though we define the server object in a single session, it
> will be used in multiple sessions, having an
> option at the per-server level would not be a good idea.

You present this here as if it should be a Boolean (on or off) but I
don't see why that should be the case. You can imagine trying to close
connections if they have been idle for a certain length of time, or if
there are more than a certain number of them, rather than (or in
addition to) always/never. Which one is best, and why?

I tend to think this is better as an FDW property rather than a core
facility, but I'm not 100% sure of that and I think it likely depends
somewhat on the answers we choose to the questions in the preceding
paragraph.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow continuations in "pg_hba.conf" files

2020-07-06 Thread Fabien COELHO



In the attached v3, I've tried to clarify comments and doc about tokenization 
rules relating to comments, strings and continuations.


Attached v4 improves comments & doc as suggested by Justin.

--
Fabien.diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index 5cd88b462d..f1fc17935d 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -77,13 +77,16 @@
The general format of the pg_hba.conf file is
a set of records, one per line. Blank lines are ignored, as is any
text after the # comment character.
-   Records cannot be continued across lines.
+   Lines ending with a backslash are logically continued on the next
+   line, so a record can span several lines.
A record is made
up of a number of fields which are separated by spaces and/or tabs.
Fields can contain white space if the field value is double-quoted.
Quoting one of the keywords in a database, user, or address field (e.g.,
all or replication) makes the word lose its special
meaning, and just match a database, user, or host with that name.
+   If a continuation occurs within quoted text or a comment,
+   it is continued.
   
 
   
@@ -821,7 +824,7 @@ local   db1,db2,@demodbs  all   md5
 
 map-name system-username database-username
 
-   Comments and whitespace are handled in the same way as in
+   Comments, whitespace and continuations are handled in the same way as in
pg_hba.conf.  The
map-name is an arbitrary name that will be used to
refer to this mapping in pg_hba.conf. The other
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index da5189a4fa..6c3b780ace 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -166,11 +166,17 @@ pg_isblank(const char c)
 /*
  * Grab one token out of the string pointed to by *lineptr.
  *
- * Tokens are strings of non-blank
- * characters bounded by blank characters, commas, beginning of line, and
- * end of line. Blank means space or tab. Tokens can be delimited by
- * double quotes (this allows the inclusion of blanks, but not newlines).
- * Comments (started by an unquoted '#') are skipped.
+ * Tokens are strings of non-blank characters bounded by blank characters,
+ * commas, beginning of line, and end of line. Blank means space or tab.
+ *
+ * Tokens can be delimited by double quotes (this allows the inclusion of
+ * blanks or '#', but not newlines). If a continuation occurs within the
+ * quoted text, the quoted text is continued on the next line. There is no
+ * escape mechanism, thus character '"' cannot be part of a token.
+ *
+ * Comments (started by an unquoted '#') are skipped, i.e. the remainder
+ * of the line is ignored. If a line with a comment is backslash-continued,
+ * the continued text is part of the comment, thus ignored as well.
  *
  * The token, if any, is returned at *buf (a buffer of size bufsz), and
  * *lineptr is advanced past the token.
@@ -486,8 +492,43 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 		char	   *lineptr;
 		List	   *current_line = NIL;
 		char	   *err_msg = NULL;
+		char	   *cur = rawline;
+		int			len = sizeof(rawline);
+		int			continuations = 0;
 
-		if (!fgets(rawline, sizeof(rawline), file))
+		/* read input and handle simple backslash continuations */
+		while ((cur = fgets(cur, len, file)) != NULL)
+		{
+			int		curlen = strlen(cur);
+			char   *curend = cur + curlen - 1;
+
+			if (curlen == len - 1)
+			{
+/* Line too long! */
+ereport(elevel,
+		errcode(ERRCODE_CONFIG_FILE_ERROR),
+		errmsg("authentication file line too long"),
+		errcontext("line %d of configuration file \"%s\"",
+   line_number + continuations, filename));
+err_msg = "authentication file line too long";
+			}
+
+			/* Strip trailing linebreak from rawline */
+			while (curend >= cur && (*curend == '\n' || *curend == '\r'))
+*curend-- = '\0';
+
+			/* empty or not a continuation, we are done */
+			if (curend < cur || *curend != '\\')
+break;
+
+			/* else we have a continuation, just remove it and loop */
+			continuations++;
+			*curend = '\0';
+			len -= (curend - cur);
+			cur = curend;
+		}
+
+		if (cur == NULL)
 		{
 			int			save_errno = errno;
 
@@ -501,21 +542,6 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel)
 			   filename, strerror(save_errno));
 			rawline[0] = '\0';
 		}
-		if (strlen(rawline) == MAX_LINE - 1)
-		{
-			/* Line too long! */
-			ereport(elevel,
-	(errcode(ERRCODE_CONFIG_FILE_ERROR),
-	 errmsg("authentication file line too long"),
-	 errcontext("line %d of configuration file \"%s\"",
-line_number, filename)));
-			err_msg = "authentication file line too long";
-		}
-
-		/* Strip trailing linebreak from rawline */
-		lineptr = rawline + strlen(rawline) - 1;
-		while (lineptr >= rawline && (*lineptr == '\n' || *lineptr == '\r'))
-			*lineptr-- = '\0';
 
 		/* Parse fields */

Re: Proposal: Automatic partition creation

2020-07-06 Thread Robert Haas

On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova
 wrote:
> CREATE TABLE ... PARTITION BY partition_method (list_of_columns)
> partition_auto_create_clause
>
> where partition_auto_create_clause is
>
> CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec
>
> and partition_bound_spec is:
>
> MODULUS integer | VALUES IN (expr [,...]) [, ] |  INTERVAL
> range_step FROM range_start TO range_end

Might be good to compare this to what other databases support.

> - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet
> I wonder, is it worth placing a stub for dynamic partitioning, or we can
> rather add these keywords later.

I think we should not add any keywords we don't need immediately - and
should seek to minimize the number of new keywords that we need to
add, though compatibility with other implementations might be a good
reason for accepting some new ones.

> - HASH and LIST static partitioning works as expected.
> Testing and feedback are welcome.
>
> - RANGE partitioning is not really implemented in this patch.
> Now it only accepts interval data type as 'interval' and respectively
> date types as range_start and range_end expressions.
> Only one partition is created. I found it difficult to implement the
> generation of bounds using internal functions and data types.
> Both existing solutions (pg_pathman and pg_partman) rely on SQL level
> routines [2].
> I am going to implement this via SPI, which allow to simplify checks and
> calculations. Do you see any pitfalls in this approach?

I don't really see why we need SPI here. Why can't we just try to
evaluate the impression and see if we get a constant of the right
type, then use that?

I think the big problem here is identifying the operator to use. We
have no way of identifying the "plus" or "minus" operator associated
with a datatype; indeed, that constant doesn't exist. So either we (a)
limit this to a short list of data types and hard-code the operators
to be used (which is kind of sad given how extensible our type system
is) or we (b) invent some new mechanism for identifying the +/-
operators that should be used for a datatype, which was also proposed
in the context of some previous discussion of window framing options,
but which I don't think ever went anywhere (which is a lot of work) or
we (c) just look for operators called '+' and/or '-' by operator name
(which will probably make Tom throw up in his mouth a little).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Proposal: Automatic partition creation

2020-07-06 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova
>  wrote:
>> I am going to implement this via SPI, which allow to simplify checks and
>> calculations. Do you see any pitfalls in this approach?

> I don't really see why we need SPI here.

I would vote against any core facility that is implemented via SPI
queries.  It is just too darn hard to control the semantics completely in
the face of fun stuff like varying search_path.  Look at what a mess the
queries generated by the RI triggers are --- and they only have a very
small set of behaviors to worry about.  I'm still only about 95% confident
they don't have security issues, too.

If you're using SPI to try to look up appropriate operators, I think
the chances of being vulnerable to security problems are 100%.

> I think the big problem here is identifying the operator to use. We
> have no way of identifying the "plus" or "minus" operator associated
> with a datatype; indeed, that constant doesn't exist.

We did indeed solve this in connection with window functions, cf
0a459cec9.  I may be misunderstanding what the problem is here,
but I think trying to reuse that infrastructure might help.

regards, tom lane

Multi-byte character case-folding

2020-07-06 Thread Thom Brown

Hi,

At the moment, only single-byte characters in identifiers are
case-folded, and multi-byte characters are not.

For example, abĉDĚF is case-folded to "abĉdĚf".  This can be referred
to as "abĉdĚf" or "ABĉDĚF", but not "abĉděf" or "ABĈDĚF".

downcase_identifier() has the following comment:

/*
 * SQL99 specifies Unicode-aware case normalization, which we don't yet
 * have the infrastructure for.  Instead we use tolower() to provide a
 * locale-aware translation.  However, there are some locales where this
 * is not right either (eg, Turkish may do strange things with 'i' and
 * 'I').  Our current compromise is to use tolower() for characters with
 * the high bit set, as long as they aren't part of a multi-byte
 * character, and use an ASCII-only downcasing for 7-bit characters.
 */

So my question is, do we yet have the infrastructure to make
case-folding consistent across all character widths?

Thanks

Thom

Question: PostgreSQL on Amazon linux EC2

2020-07-06 Thread Ajay Patel

Hi Postgres team,

I would like to know if PostgreSQL can be installed and used without any
issues on Amazon Linux EC2 machines.

https://www.postgresql.org/docs/11/supported-platforms.html

I was going through the documentation and couldn't find very specific
details related to support.

Any input will be much helpful.

Warm regards,
Ajay

Re: Multi-byte character case-folding

2020-07-06 Thread Tom Lane

Thom Brown  writes:
> At the moment, only single-byte characters in identifiers are
> case-folded, and multi-byte characters are not.
> ...
> So my question is, do we yet have the infrastructure to make
> case-folding consistent across all character widths?

We still lack any built-in knowledge about this, and would have to rely
on libc, which means the results would likely be platform-dependent
and probably LC_CTYPE-dependent.

More generally, I'd be mighty hesitant to change this behavior after
it's stood for so many years.  I suspect more people would complain
that we broke their application than would be happy about it.

Having said that, we are already relying on towlower() in places,
and could do similarly here if we didn't care about the above issues.

regards, tom lane

Re: Question: PostgreSQL on Amazon linux EC2

2020-07-06 Thread Flavio Henrique Araque Gurgel

Em seg., 6 de jul. de 2020 às 21:55, Ajay Patel 
escreveu:

> Hi Postgres team,
>
> I would like to know if PostgreSQL can be installed and used without any
> issues on Amazon Linux EC2 machines.
>

Yes you can, but not with the repositories at yum.postgresql.org. There's a
dependency of a package that only exists on RHEL or CentOS and fail.


>
> https://www.postgresql.org/docs/11/supported-platforms.html
>
> I was going through the documentation and couldn't find very specific
> details related to support.
>
> Any input will be much helpful.
>

You'll be able to :
- compile PostgreSQL
- download and install rpm packages by hand
-  use a Amazon provided repo that installs more recent Postgres versions -
actually not up-to-date.

After struggling with the points above I decided just to use CentOS on EC2.
It works perfectly and from CentOS 7 and up it is supported by AWS on all
instance types and their exquisite hardware like network interfaces for EBS
performance.

Flavio Gurgel

Re: Question: PostgreSQL on Amazon linux EC2

2020-07-06 Thread Ajay Patel

Thank you Flavio, this is helpful.

Have you faced any other challenges or any other problems after installing
Postgres?

Warm regards,
Ajay

On Mon, Jul 6, 2020 at 4:45 PM Flavio Henrique Araque Gurgel <
fha...@gmail.com> wrote:

>
> Em seg., 6 de jul. de 2020 às 21:55, Ajay Patel 
> escreveu:
>
>> Hi Postgres team,
>>
>> I would like to know if PostgreSQL can be installed and used without any
>> issues on Amazon Linux EC2 machines.
>>
>
> Yes you can, but not with the repositories at yum.postgresql.org. There's
> a dependency of a package that only exists on RHEL or CentOS and fail.
>
>
>>
>> https://www.postgresql.org/docs/11/supported-platforms.html
>>
>> I was going through the documentation and couldn't find very specific
>> details related to support.
>>
>> Any input will be much helpful.
>>
>
> You'll be able to :
> - compile PostgreSQL
> - download and install rpm packages by hand
> -  use a Amazon provided repo that installs more recent Postgres versions
> - actually not up-to-date.
>
> After struggling with the points above I decided just to use CentOS on
> EC2. It works perfectly and from CentOS 7 and up it is supported by AWS on
> all instance types and their exquisite hardware like network interfaces for
> EBS performance.
>
> Flavio Gurgel
>
>
>

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-05, Anna Akenteva wrote:

> -- Swapping primary key's index for an equivalent index,
> -- but with INCLUDE-d attributes.
> CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info);
> ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX
> new_idx;
> ALTER TABLE referencing_tbl ALTER CONSTRAINT referencing_tbl_id_ref_fkey
> USING INDEX new_idx;

How is this state represented by pg_dump?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...

2020-07-06 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Jul-05, Anna Akenteva wrote:
>> -- Swapping primary key's index for an equivalent index,
>> -- but with INCLUDE-d attributes.
>> CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info);
>> ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX
>> new_idx;
>> ALTER TABLE referencing_tbl ALTER CONSTRAINT referencing_tbl_id_ref_fkey
>> USING INDEX new_idx;

> How is this state represented by pg_dump?

Even if it's possible to represent, I think we should flat out reject
this "feature".  Primary keys that aren't primary keys don't seem like
a good idea.  For one thing, it won't be possible to describe the
constraint accurately in the information_schema.

regards, tom lane

Re: Multi-byte character case-folding

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-06, Tom Lane wrote:

> More generally, I'd be mighty hesitant to change this behavior after
> it's stood for so many years.  I suspect more people would complain
> that we broke their application than would be happy about it.
> 
> Having said that, we are already relying on towlower() in places,
> and could do similarly here if we didn't care about the above issues.

I think the fact that identifiers fail to follow language-specific case
folding rules is more a known gotcha than a desired property, but on
principle I tend to agree that Turkish people would not be happy about
the prospect of us changing the downcasing rule in a major release -- it
would mean having to edit any affected application code as part of a
pg_upgrade process, which is not great.

Now you could say that this can be fixed by adding a GUC that preserves
the old behavior, but generally we don't like that too much.

The counter argument is that there are more future users than there are
current users.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: posgres 12 bug (partitioned table)

2020-07-06 Thread Soumyadeep Chakraborty

Hello,

ccing pgsql-hack...@postgresql.org

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))

I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())

The first patch [1] explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).

The second patch [2] instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values.  It
does make the assumption that the AM would supply a slot that will have
these system columns.

[1] v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2] v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch

Regards,
Soumyadeep (VMware)
From baa48a89e571405f4cd802f065d0f82131599f53 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 
Date: Sun, 5 Jul 2020 10:23:30 -0700
Subject: [PATCH v1 1/1] Explicitly supply system columns for INSERT..RETURNING
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

If there are system columns other than ctid and tableOid in the
RETURNING clause for an INSERT, we must ensure that the slot we pass to
ExecProcessReturning() has those columns (ctid and tableOid are present
directly in the TupleTableSlot struct and can be satisifed from there:
refer: slot_getsysattr)).

This is in response to bug #16446 reported by Георгий Драк. In the bug
an INSERT..RETURNING statement fails with:

"ERROR: virtual tuple table slot does not have system
attributes"

This is caused due to the fact that ExecProcessReturning() was fed with
a virtual tuple table slot. Thus the resultant expression evaluation
and by extension, call to ExecEvalSysVar(), blows up with the
aforementioned error.

So, we fix this by:

1. Determining if there are system columns (other than tableOid and
ctid) referenced in ExecBuildProjectionInfo() and we store this info
inside the ProjectionInfo struct.

2. If there are such system columns in RETURNING, we explicitly create a
heap tuple table slot, fill in the system column values as we would do
during heap_prepare_insert() and then pass that slot to
ExecProcessReturning(). We use a heap tuple table slot as it is
guaranteed to support these attributes.
---
 src/backend/executor/execExpr.c| 19 ++-
 src/backend/executor/nodeModifyTable.c | 23 +++
 src/include/nodes/execnodes.h  |  2 ++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..8cd966d227 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -360,10 +360,12 @@ ExecBuildProjectionInfo(List *targetList,
 	ExprState  *state;
 	ExprEvalStep scratch = {0};
 	ListCell   *lc;
+	List	   *tlist_vars;
 
 	projInfo->pi_exprContext = econtext;
 	/* We embed ExprState into ProjectionInfo instead of doing extra palloc */
 	projInfo->pi_state.tag = T_ExprState;
+	projInfo->has_non_slot_system_cols = false;
 	state = &projInfo->pi_state;
 	state->expr = (Expr *) targetList;
 	state->parent = parent;
@@ -455,7 +457,6 @@ ExecBuildProjectionInfo(List *targetList,
 			 */
 			ExecInitExprRec(tle->expr, state,
 			&state->resvalue, &state->resnull);
-
 			/*
 			 * Column might be referenced multiple times in upper nodes, so
 			 * force value to R/O - but only if it could be an expanded datum.
@@ -469,6 +470,22 @@ ExecBuildProjectionInfo(List *targetList,
 		}
 	}
 
+	/* Record system columns that are part of this projection */
+	tlist_vars = pull_var_clause((Node *) targetList,
+ PVC_RECURSE_AGGREGATES |
+	 PVC_RECURSE_WINDOWFUNCS |
+	 PVC_INCLUDE_PLACEHOLDERS);
+	foreach(lc, tlist_vars)
+	{
+		Var *var = (Var *) lfirst

Re: Binary support for pgoutput plugin

2020-07-06 Thread Dave Cramer

On Mon, 6 Jul 2020 at 09:35, Dave Cramer  wrote:

>
>
> On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson  wrote:
>
>> > On 6 Jul 2020, at 14:58, Dave Cramer  wrote:
>>
>> > as far as rebase -i do what is advised here for squashing them. Just
>> one patch now ?
>>
>> One patch per logical change, if there are two disjoint changes in the
>> patchset
>> where one builds on top of the other then multiple patches are of course
>> fine.
>> My personal rule-of-thumb is to split it the way I envision it committed.
>>
>
> At this point it is the result of 3 rebases. I guess I'll have to break it
> up differently..
>
>
OK, rebased it down to 2 patches, attached.



> Thanks,
>
> Dave
>


0001-Add-binary-protocol-for-publications-and-subscriptio.patch
Description: Binary data


0002-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch
Description: Binary data

Re: Multi-byte character case-folding

2020-07-06 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Jul-06, Tom Lane wrote:
>> More generally, I'd be mighty hesitant to change this behavior after
>> it's stood for so many years.  I suspect more people would complain
>> that we broke their application than would be happy about it.

> I think the fact that identifiers fail to follow language-specific case
> folding rules is more a known gotcha than a desired property, but on
> principle I tend to agree that Turkish people would not be happy about
> the prospect of us changing the downcasing rule in a major release -- it
> would mean having to edit any affected application code as part of a
> pg_upgrade process, which is not great.

It's not just the Turks.  As near as I can tell, we'd likely break *every*
app that's using such identifiers.  For example, supposing I do

test=# create table MYÉCLASS (f1 text);
CREATE TABLE
test=# \dt
  List of relations
 Schema |   Name   | Type  |  Owner   
+--+---+--
 public | myÉclass | table | postgres
(1 row)

pg_dump will render this as

CREATE TABLE public."myÉclass" (
f1 text
);

If we start to case-fold É, then the only way to access this table will
be by double-quoting its name, which the application probably is not
expecting (else it would have double-quoted in the original CREATE TABLE).

> Now you could say that this can be fixed by adding a GUC that preserves
> the old behavior, but generally we don't like that too much.

Yes, a GUC changing this would be a headache.  It would be just as much of
a compatibility and security hazard as standard_conforming_strings (which
indeed I've been thinking of proposing that we get rid of; it's hung
around long enough).

> The counter argument is that there are more future users than there are
> current users.

Especially if we drive away the current users :-(.  In practice, we've
heard very very few complaints about this, so my gut says to leave
it alone.

regards, tom lane

Re: min_safe_lsn column in pg_replication_slots view

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-06, Alvaro Herrera wrote:

> Hmm, I like safe_wal_size.
> 
> I've been looking at this intermittently since late last week and I
> intend to get it done in the next couple of days.

I propose the attached.  This is pretty much what was proposed by
Kyotaro, but I made a couple of changes.  Most notably, I moved the
calculation to the view code itself rather than creating a function in
xlog.c, mostly because it seemed to me that the new function was
creating an abstraction leakage without adding any value; also, if we
add per-slot size limits later, it would get worse.

The other change was to report negative values when the slot becomes
unreserved, rather than zero.  It shows how much beyond safety your
slots are getting, so it seems useful.  Clamping at zero seems to serve
no purpose.

I also made it report null immediately when slots are in state lost.
But beware of slots that appear lost but fall in the unreserved category
because they advanced before checkpointer signalled them.  (This case
requires a debugger to hit ...)

One thing that got my attention while going over this is that the error
message we throw when making a slot invalid is not very helpful; it
doesn't say what the current insertion LSN was at that point.  Maybe we
should add that?  (As a separate patch, of couse.)

Any more thoughts?  If not, I'll get this pushed tomorrow finally.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From bd65ed10b25429fea8776bf9bc38c82a3544a3fa Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 3 Jul 2020 17:25:00 -0400
Subject: [PATCH] Morph pg_replication_slots.min_safe_lsn to safe_wal_size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The previous definition of the column almost universally disliked, so
provide this updated definition which is more useful for monitoring
purposes: a large positive value is good, while zero or a negative value
means danger.  This is very easy to monitor.

FIXME -- catversion bump required

Author: Kyotaro Horiguchi 
Author: Álvaro Herrera 
Reported-by: Fujii Masao 
Reviewed-by: XXX
Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916...@oss.nttdata.com
---
 doc/src/sgml/catalogs.sgml|  7 ++--
 src/backend/access/transam/xlog.c |  3 +-
 src/backend/catalog/system_views.sql  |  2 +-
 src/backend/replication/slotfuncs.c   | 42 ---
 src/include/catalog/pg_proc.dat   |  4 +--
 src/test/recovery/t/019_replslot_limit.pl | 22 ++--
 src/test/regress/expected/rules.out   |  4 +--
 7 files changed, 51 insertions(+), 33 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 003d278370..361793b337 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -11275,10 +11275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx

-   min_safe_lsn pg_lsn
+   safe_wal_size int8

-   The minimum LSN currently available for walsenders.
+   The number of bytes that can be written to WAL such that this slot
+   is not in danger of getting in state "lost".  It is NULL for lost
+   slots, as well as if max_slot_wal_keep_size
+   is -1.

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fd93bcfaeb..2f6d67592a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9513,8 +9513,7 @@ GetWALAvailability(XLogRecPtr targetLSN)
 	XLogSegNo	targetSeg;		/* segid of targetLSN */
 	XLogSegNo	oldestSeg;		/* actual oldest segid */
 	XLogSegNo	oldestSegMaxWalSize;	/* oldest segid kept by max_wal_size */
-	XLogSegNo	oldestSlotSeg = InvalidXLogRecPtr;	/* oldest segid kept by
-	 * slot */
+	XLogSegNo	oldestSlotSeg;	/* oldest segid kept by slot */
 	uint64		keepSegs;

 	/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5314e9348f..73676d04cf 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS
 L.restart_lsn,
 L.confirmed_flush_lsn,
 L.wal_status,
-L.min_safe_lsn
+L.safe_wal_size
 FROM pg_get_replication_slots() AS L
 LEFT JOIN pg_database D ON (L.datoid = D.oid);

diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 88033a79b2..0ddf20e858 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -242,6 +242,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
 	Tuplestorestate *tupstore;
 	MemoryContext per_query_ctx;
 	MemoryContext oldcontext;
+	XLogRecPtr	currlsn;
 	int			slotno;

 	/* check to see if caller supports us returning a tuplestore */
@@ -274,6 +275,8

Re: Parallel worker hangs while handling errors.

2020-07-06 Thread vignesh C

On Fri, Jul 3, 2020 at 2:40 PM vignesh C  wrote:
>
> The Attached patch has the fix for the same.
>
I have added a commitfest entry for this bug:
https://commitfest.postgresql.org/29/2636/

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-06 Thread Masahiko Sawada

On Mon, 6 Jul 2020 at 20:45, Amit Kapila  wrote:
>
> On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada
>  wrote:
> >
> > I've attached PoC patch that implements a simple approach. I'd like to
> > discuss how we collect the replication slot statistics in the stats
> > collector before I bring the patch to completion.
> >
> > In this PoC patch, we have the array of PgStat_ReplSlotStats struct
> > which has max_replication_slots entries. The backend and wal sender
> > send the slot statistics to the stats collector when decoding a commit
> > WAL record.
> >
> > typedef struct PgStat_ReplSlotStats
> > {
> > charslotname[NAMEDATALEN];
> > PgStat_Counter  spill_txns;
> > PgStat_Counter  spill_count;
> > PgStat_Counter  spill_bytes;
> > } PgStat_ReplSlotStats;
> >
> > What I'd like to discuss are:
> >
> > Since the unique identifier of replication slots is the name, the
> > process sends slot statistics along with slot name to stats collector.
> > I'm concerned about the amount of data sent to the stats collector and
> > the cost of searching the statistics within the statistics array (it’s
> > O(N) where N is max_replication_slots). Since the maximum length of
> > slot name is NAMEDATALEN (64 bytes default) and max_replication_slots
> > is unlikely a large number, I might be too worrying but it seems like
> > it’s better to avoid that if we can do that easily. An idea I came up
> > with is to use the index of slots (i.g., the index of
> > ReplicationSlotCtl->replication_slots[]) as the index of statistics of
> > slot in the stats collector. But since the index of slots could change
> > after the restart we need to synchronize the index of slots on both
> > array somehow. So I thought that we can determine the index of the
> > statistics of slots at ReplicationSlotAcquire() or
> > ReplicationSlotCreate(), but it will in turn need to read stats file
> > while holding ReplicationSlotControlLock to prevent the same index of
> > the statistics being used by the concurrent process who creating a
> > slot. I might be missing something though.
> >
>
> I don't think we should be bothered about the large values of
> max_replication_slots.  The default value is 10 and I am not sure if
> users will be able to select values large enough that we should bother
> about searching them by name.  I think if it could turn out to be a
> problem then we can try to invent something to mitigate it.

Agreed.

>
> > Second, as long as the unique identifier is the slot name there is no
> > convenient way to distinguish between the same name old and new
> > replication slots, so the backend process or wal sender process sends
> > a message to the stats collector to drop the replication slot at
> > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > table, index, and function statistics. It might not be a problem but
> > I’m thinking a better way.
> >
>
> Can we rely on message ordering in the transmission mechanism (UDP)
> for stats?  The wiki suggests [1] we can't.  If so, then this might
> not work.

Yeah, I'm also concerned about this. Another idea would be to have
another unique identifier to distinguish old and new replication slots
with the same name. For example, creation timestamp. And then we
reclaim the stats of unused slots later like table and function
statistics.

On the other hand, if the ordering were to be reversed, we would miss
that stats but the next stat reporting would create the new entry. If
the problem is unlikely to happen in common case we can live with
that.

> > The new view name is also an open question. I prefer
> > pg_stat_replication_slots and to add stats of physical replication
> > slots to the same view in the future, rather than a separate view.
> >
>
> This sounds okay to me.
>
> > Aside from the above, this patch will change the most of the changes
> > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > concerned whether such changes are acceptable at the time of beta 2.
> >
>
> I think it depends on the final patch. My initial thought was that we
> should do this for PG14 but if you are suggesting removing the changes
> done by commit 9290ad198b1 then we need to think over it.  I could
> think of below options:
> a. Revert 9290ad198b1 and introduce stats for spilling in PG14.  We
> were anyway having spilling without any work in PG13 but didn’t have
> stats.
> b. Try to get your patch in PG13 if we can, otherwise, revert the
> feature 9290ad198b1.
> c. Get whatever we have in commit 9290ad198b1 for PG13 and
> additionally have what we are discussing here for PG14.  This means
> that spilled stats at slot level will be available in PG14 via
> pg_stat_replication_slots and for individual WAL senders it will be
> available via pg_stat_replication both in PG13 and PG14.  Even if we
> can get your patch in PG13, we can still keep those in
> pg_stat_replication.
> d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> for PG14.

Re: min_safe_lsn column in pg_replication_slots view

2020-07-06 Thread Kyotaro Horiguchi

Thanks!
At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera  
wrote in 
> On 2020-Jul-06, Alvaro Herrera wrote:
> 
> > Hmm, I like safe_wal_size.

I agree to the name, too.

> > I've been looking at this intermittently since late last week and I
> > intend to get it done in the next couple of days.
> 
> I propose the attached.  This is pretty much what was proposed by
> Kyotaro, but I made a couple of changes.  Most notably, I moved the
> calculation to the view code itself rather than creating a function in
> xlog.c, mostly because it seemed to me that the new function was
> creating an abstraction leakage without adding any value; also, if we
> add per-slot size limits later, it would get worse.

I'm not sure that detailed WAL segment calculation fits slotfuncs.c
but I don't object to the change.  However if we do that:

+   /* determine how many segments slots can be kept by 
slots ... */
+   keepSegs = max_slot_wal_keep_size_mb / 
(wal_segment_size / (1024 * 1024));

Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
use it intead of the bare expression?


> The other change was to report negative values when the slot becomes
> unreserved, rather than zero.  It shows how much beyond safety your
> slots are getting, so it seems useful.  Clamping at zero seems to serve
> no purpose.

The reason for the clamping is the signedness of the values, or
integral promotion.  However, I believe the calculation cannot go
beyond the range of signed long so the signedness conversion in the
patch looks fine.

> I also made it report null immediately when slots are in state lost.
> But beware of slots that appear lost but fall in the unreserved category
> because they advanced before checkpointer signalled them.  (This case
> requires a debugger to hit ...)

Oh! Okay, that change seems right to me.

> One thing that got my attention while going over this is that the error
> message we throw when making a slot invalid is not very helpful; it
> doesn't say what the current insertion LSN was at that point.  Maybe we
> should add that?  (As a separate patch, of couse.)

It sounds helpful to me. (I remember that I sometime want to see
checkpoint LSNs in server log..)

> Any more thoughts?  If not, I'll get this pushed tomorrow finally.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: A patch for get origin from commit_ts.

2020-07-06 Thread movead...@highgo.ca


>That was fast, thanks.  I have not tested the patch, but there are
>two things I missed a couple of hours back.  Why do you need
>pg_last_committed_xact_with_origin() to begin with?  Wouldn't it be
>more simple to just add a new column to pg_last_committed_xact() for
>the replication origin?  Contrary to pg_xact_commit_timestamp() that
>should not be broken for compatibility reasons because it returns only
>one value, we don't have this problem with pg_last_committed_xact() as
>it already returns one tuple with two values.
Yes make sense, changed in new patch.
 
>+{ oid => '4179', descr => 'get commit origin of a transaction',
>A second thing is that the OID of the new function should be in the
>range 8000.., as per the policy introduced in commit a6417078.
>src/include/catalog/unused_oids can be used to pick up a value.
Thanks, very helpful information and I have follow that.



Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca



get_origin_from_commit_ts_v5.patch
Description: Binary data

change a function name in a comment correctly

2020-07-06 Thread Masahiro Ikeda


Hi,

There is the comment which related function name is not same.
I attached the patch to fix it. Please review.

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATIONdiff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c
index 68b01ca68f..3acda32d17 100644
--- a/src/backend/utils/cache/relfilenodemap.c
+++ b/src/backend/utils/cache/relfilenodemap.c
@@ -82,7 +82,7 @@ RelfilenodeMapInvalidateCallback(Datum arg, Oid relid)
 }
 
 /*
- * RelfilenodeMapInvalidateCallback
+ * InitializeRelfilenodeMap
  *		Initialize cache, either on first use or after a reset.
  */
 static void

Re: change a function name in a comment correctly

2020-07-06 Thread Fujii Masao





On 2020/07/07 11:50, Masahiro Ikeda wrote:

Hi,

There is the comment which related function name is not same.
I attached the patch to fix it. Please review.


Thanks for the report and patch! LGTM.
I will commit this later.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: min_safe_lsn column in pg_replication_slots view

2020-07-06 Thread Alvaro Herrera

On 2020-Jul-07, Kyotaro Horiguchi wrote:

> At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera  
> wrote in 

> > I propose the attached.  This is pretty much what was proposed by
> > Kyotaro, but I made a couple of changes.  Most notably, I moved the
> > calculation to the view code itself rather than creating a function in
> > xlog.c, mostly because it seemed to me that the new function was
> > creating an abstraction leakage without adding any value; also, if we
> > add per-slot size limits later, it would get worse.
> 
> I'm not sure that detailed WAL segment calculation fits slotfuncs.c
> but I don't object to the change.  However if we do that:
> 
> + /* determine how many segments slots can be kept by 
> slots ... */
> + keepSegs = max_slot_wal_keep_size_mb / 
> (wal_segment_size / (1024 * 1024));
> 
> Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and
> use it intead of the bare expression?

I was of two minds about that, and the only reason I didn't do it is
that we'll need to give it a better name if we do it ...  I'm open to
suggestions.

> > The other change was to report negative values when the slot becomes
> > unreserved, rather than zero.  It shows how much beyond safety your
> > slots are getting, so it seems useful.  Clamping at zero seems to serve
> > no purpose.
> 
> The reason for the clamping is the signedness of the values, or
> integral promotion.  However, I believe the calculation cannot go
> beyond the range of signed long so the signedness conversion in the
> patch looks fine.

Yeah, I think the negative values are useful to see.  I think if you
ever get close to 2^62, you're in much more serious trouble anyway :-)
But I don't deny that the math there could be subject of overflow
issues.  If you want to verify, please be my guest ...

> > One thing that got my attention while going over this is that the error
> > message we throw when making a slot invalid is not very helpful; it
> > doesn't say what the current insertion LSN was at that point.  Maybe we
> > should add that?  (As a separate patch, of couse.)
> 
> It sounds helpful to me. (I remember that I sometime want to see
> checkpoint LSNs in server log..)

Hmm, ... let's do that for pg14!

Thanks for looking,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Resetting spilled txn statistics in pg_stat_replication

2020-07-06 Thread Amit Kapila

On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada
 wrote:
>
> On Mon, 6 Jul 2020 at 20:45, Amit Kapila  wrote:
> >
> > > Second, as long as the unique identifier is the slot name there is no
> > > convenient way to distinguish between the same name old and new
> > > replication slots, so the backend process or wal sender process sends
> > > a message to the stats collector to drop the replication slot at
> > > ReplicationSlotDropPtr(). This strategy differs from what we do for
> > > table, index, and function statistics. It might not be a problem but
> > > I’m thinking a better way.
> > >
> >
> > Can we rely on message ordering in the transmission mechanism (UDP)
> > for stats?  The wiki suggests [1] we can't.  If so, then this might
> > not work.
>
> Yeah, I'm also concerned about this. Another idea would be to have
> another unique identifier to distinguish old and new replication slots
> with the same name. For example, creation timestamp. And then we
> reclaim the stats of unused slots later like table and function
> statistics.
>

So, we need to have 'creation timestamp' as persistent data for slots
to achieve this?  I am not sure of adding creation_time as a parameter
to identify for this case because users can change timings on systems
so it might not be a bullet-proof method but I agree that it can work
in general.

> On the other hand, if the ordering were to be reversed, we would miss
> that stats but the next stat reporting would create the new entry. If
> the problem is unlikely to happen in common case we can live with
> that.
>

Yeah, that is a valid point and I think otherwise also some UDP
packets can be lost so maybe we don't need to worry too much about
this.  I guess we can add a comment in the code for such a case.

> >
> > > Aside from the above, this patch will change the most of the changes
> > > introduced by commit 9290ad198b1 and introduce new code much. I’m
> > > concerned whether such changes are acceptable at the time of beta 2.
> > >
> >
> > I think it depends on the final patch. My initial thought was that we
> > should do this for PG14 but if you are suggesting removing the changes
> > done by commit 9290ad198b1 then we need to think over it.  I could
> > think of below options:
> > a. Revert 9290ad198b1 and introduce stats for spilling in PG14.  We
> > were anyway having spilling without any work in PG13 but didn’t have
> > stats.
> > b. Try to get your patch in PG13 if we can, otherwise, revert the
> > feature 9290ad198b1.
> > c. Get whatever we have in commit 9290ad198b1 for PG13 and
> > additionally have what we are discussing here for PG14.  This means
> > that spilled stats at slot level will be available in PG14 via
> > pg_stat_replication_slots and for individual WAL senders it will be
> > available via pg_stat_replication both in PG13 and PG14.  Even if we
> > can get your patch in PG13, we can still keep those in
> > pg_stat_replication.
> > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it
> > for PG14.  I don't think this will be a popular approach.
>
> I was thinking option (a) or (b). I'm inclined to option (a) since the
> PoC patch added a certain amount of new codes. I agree with you that
> it depends on the final patch.
>

Magnus, Tomas, others, do you have any suggestions on the above
options or let us know if you have any other option in mind?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Implementing Incremental View Maintenance

2020-07-06 Thread Andy Fan

Thanks for the patch!


> Query checks for following restrictions are added:


Are all known supported cases listed below?


> - inheritance parent table
> ...
> - targetlist containing IVM column
> - simple subquery is only supported
>

How to understand 3 items above?

-
Best Regards
Andy Fan

Re: pg_resetwal --next-transaction-id may cause database failed to restart.

2020-07-06 Thread movead...@highgo.ca


 >ISTM that a reasonable compromise is that if you use -x (or -c, -m, -O)
>and the input value is outside the range supported by existing files,
>then it's a fatal error; unless you use --force, which turns it into
>just a warning.
I do not think '--force' is a good choice, so I add a '--test, -t' option to
force to write a unsafe value to pg_control.
Do you think it is an acceptable method?




Regards,
Highgo Software (Canada/China/Pakistan) 
URL : www.highgo.ca 
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca


pg_resetwal_transaction_limit_v2.patch
Description: Binary data

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-06 Thread Jeff Davis

On Mon, 2020-07-06 at 15:59 +0530, Amit Kapila wrote:
> I agree that it's good to wait for actual problems. But the
> > challenge
> > is that we can't backport an added GUC.
> > 
> 
> Is it because we won't be able to edit existing postgresql.conf file
> or for some other reasons?

Perhaps "can't" was too strong of a word, but I think it would be
unprecedented to introduce a GUC in a minor version. It could be a
source of confusion.

If others think that adding a GUC in a minor version would be
acceptable, please let me know.

Regards,
Jeff Davis

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-06 Thread Jeff Davis

On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote:
> Where does that leave the hash_mem idea (or some other similar
> proposal)?

hash_mem is acceptable to me if the consensus is moving toward that,
but I'm not excited about it.

It would be one thing if hash_mem was a nice clean solution. But it
doesn't seem like a clean solution to me; and it's likely that it will
get in the way of the next person who tries to improve the work_mem
situation.

> I think that we should offer something like hash_mem that can work as
> a multiple of work_mem, for the reason that Justin mentioned
> recently.
> This can be justified as something that more or less maintains some
> kind of continuity with the old design.

Didn't Justin argue against using a multiplier?
https://postgr.es/m/20200703024649.gj4...@telsasoft.com

> I think that it should affect hash join too, though I suppose that
> that part might be controversial -- that is certainly more than an
> escape hatch for this particular problem. Any thoughts on that?

If it's called hash_mem, then I guess it needs to affect HJ. If not, it
should have a different name.

> There are several reasons to get rid of work_mem entirely in the
> medium to long term. Some relatively obvious, others less so.

Agreed.

It seems like the only argument against the escape hatch GUCs is that
they are cruft and we will end up stuck with them. But if we are
dispensing with work_mem in a few releases, surely we'd need to
dispense with hash_mem or the proposed escape-hatch GUCs anyway.

> An example in the latter category is "hash teams" [1]: a design that
> teaches multiple hash operations (e.g. a hash join and a hash
> aggregate that hash on the same columns) to cooperate in processing
> their inputs.

Cool! It would certainly be nice to share the partitioning work between
a HashAgg and a HJ.

Regards,
Jeff Davis

Re: Default setting for enable_hashagg_disk (hash_mem)

2020-07-06 Thread Justin Pryzby

On Mon, Jul 06, 2020 at 09:57:08PM -0700, Jeff Davis wrote:
> > I think that we should offer something like hash_mem that can work as
> > a multiple of work_mem, for the reason that Justin mentioned
> > recently.
> > This can be justified as something that more or less maintains some
> > kind of continuity with the old design.
> 
> Didn't Justin argue against using a multiplier?
> https://postgr.es/m/20200703024649.gj4...@telsasoft.com

I recanted.
https://www.postgresql.org/message-id/20200703145620.gk4...@telsasoft.com

-- 
Justin

Re: A patch for get origin from commit_ts.

2020-07-06 Thread Michael Paquier

On Tue, Jul 07, 2020 at 10:02:29AM +0800, movead...@highgo.ca wrote:
> Thanks, very helpful information and I have followed that.

Cool, thanks.  I have gone through your patch in details, and updated
it as the attached.  Here are some comments.

'8000' as OID for the new function was not really random, so to be
fair with the other patches, I picked up the first random value
unused_oids has given me instead.

There were some indentation issues, and pgindent got that fixed.

I think that it is better to use "replication origin" in the docs
instead of just origin.  I have kept "origin" in the functions for
now as that sounded cleaner to me, but we may consider using something
like "reporigin" as well as attribute name.

The tests could just use tstzrange() to make sure that the timestamps
have valid values, so I have switched to that, and did not resist to
do the same in the existing tests.

+-- Test when it can not find the transaction
+SELECT * FROM pg_xact_commit_timestamp_origin((:'txid_set_origin'::text::int +
10)::text::xid) x;
This test could become unstable, particularly if it gets used in a
parallel environment, so I have removed it.  Perhaps I am just
over-pessimistic here though..

As a side note, I think that we could just remove the alternate output
of commit_ts/, as it does not really get used because of the
NO_INSTALLCHECK present in the module's Makefile.  That would be the
job of a different patch, so I have updated it accordingly.  Glad to
see that you did not forget to adapt it in your own patch.

(The change in catversion.h is a self-reminder...)
--
Michael
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 54518cd40e..ee58586569 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202007061
+#define CATALOG_VERSION_NO	202007121
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 38295aca48..585823576a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5946,12 +5946,20 @@
   prorettype => 'timestamptz', proargtypes => 'xid',
   prosrc => 'pg_xact_commit_timestamp' },
 
+{ oid => '8456',
+  descr => 'get commit timestamp and replication origin of a transaction',
+  proname => 'pg_xact_commit_timestamp_origin', provolatile => 'v',
+  prorettype => 'record', proargtypes => 'xid',
+  proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{i,o,o}',
+  proargnames => '{xid,timestamp,origin}',
+  prosrc => 'pg_xact_commit_timestamp_origin' },
+
 { oid => '3583',
-  descr => 'get transaction Id and commit timestamp of latest transaction commit',
+  descr => 'get transaction Id, commit timestamp and replication origin of latest transaction commit',
   proname => 'pg_last_committed_xact', provolatile => 'v',
   prorettype => 'record', proargtypes => '',
-  proallargtypes => '{xid,timestamptz}', proargmodes => '{o,o}',
-  proargnames => '{xid,timestamp}', prosrc => 'pg_last_committed_xact' },
+  proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{o,o,o}',
+  proargnames => '{xid,timestamp,origin}', prosrc => 'pg_last_committed_xact' },
 
 { oid => '3537', descr => 'get identification of SQL object',
   proname => 'pg_describe_object', provolatile => 's', prorettype => 'text',
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 9cdb136435..e48f88c884 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -417,28 +417,38 @@ pg_xact_commit_timestamp(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ * pg_last_committed_xact
+ *
+ * SQL-callable wrapper to obtain some information about the latest
+ * committed transaction: transaction ID, timestamp and replication
+ * origin.
+ */
 Datum
 pg_last_committed_xact(PG_FUNCTION_ARGS)
 {
 	TransactionId xid;
+	RepOriginId nodeid;
 	TimestampTz ts;
-	Datum		values[2];
-	bool		nulls[2];
+	Datum		values[3];
+	bool		nulls[3];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 
 	/* and construct a tuple with our data */
-	xid = GetLatestCommitTsData(&ts, NULL);
+	xid = GetLatestCommitTsData(&ts, &nodeid);
 
 	/*
 	 * Construct a tuple descriptor for the result row.  This must match this
 	 * function's pg_proc entry!
 	 */
-	tupdesc = CreateTemplateTupleDesc(2);
+	tupdesc = CreateTemplateTupleDesc(3);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid",
 	   XIDOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp",
 	   TIMESTAMPTZOID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "origin",
+	   OIDOID, -1, 0);
 	tupdesc = BlessTupleDesc(tupdesc);
 
 	if (!TransactionIdIsNormal(xid))
@@ -452,6 +462,9 @@ pg_last_committed_xact(PG_FUNCTION_ARGS)
 
 		values[1] = TimestampTzGetDatum(ts);
 		nulls[1] = false;
+
+		values[2] = ObjectIdGetDatum(nodeid);
+		nulls[2] = false;
 	}
 
 	htup = heap_form_tuple(tupdesc, values, nulls);
@@ -459,6 +47

Re: Cache lookup errors with functions manipulation object addresses

2020-07-06 Thread Michael Paquier

On Mon, Jul 06, 2020 at 10:56:53AM -0400, Alvaro Herrera wrote:
> I think "routine" was more correct.  We introduced that terminology to
> encompass both functions and procedures, back when real procedures were
> added.  See the definitions in the glossary for example.

Okay, fine by me, while we have getProcedureTypeDescription() working
on pg_proc entries ;)

> I'm not sure that I'll have time to review this in the next couple of
> days, but it seemed sufficiently good when I last looked.

Thanks.  I'll try to come back to it next week or the one after for
another round of self-review, and I'll see what to do at this point.
This has been around for three years, it can still wait a bit.
--
Michael


signature.asc
Description: PGP signature

Re: Parallel copy

2020-07-06 Thread vignesh C

On Wed, Jun 24, 2020 at 1:41 PM Bharath Rupireddy
 wrote:
>
> Along with the review comments addressed
> patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching
> all other latest series of patches(0001 to 0005) from [1], the order
> of applying patches is from 0001 to 0006.
>
> [1] 
> https://www.postgresql.org/message-id/CALDaNm0H3N9gK7CMheoaXkO99g%3DuAPA93nSZXu0xDarPyPY6sg%40mail.gmail.com
>

Some comments:

+   movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+   cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+   data_block = &pcshared_info->data_blocks[block_pos];
+
+   if (movebytes > 0)
+   memmove(&data_block->data[0],
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+   movebytes);
we can create a local variable and use in place of
cstate->pcdata->curr_data_block.

+   if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+   AdjustFieldInfo(cstate, 1);
+
+   memcpy(&fld_count,
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
sizeof(fld_count));
Should this be like below, as the remaining size can fit in current block:
   if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)

+   if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+   {
+   AdjustFieldInfo(cstate, 2);
+   *new_block_pos = pcshared_info->cur_block_pos;
+   }
Same like above.

+   movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+   cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+   data_block = &pcshared_info->data_blocks[block_pos];
+
+   if (movebytes > 0)
Instead of the above check, we can have an assert check for movebytes.

+   if (mode == 1)
+   {
+   cstate->pcdata->curr_data_block = data_block;
+   cstate->raw_buf_index = 0;
+   }
+   else if(mode == 2)
+   {
+   ParallelCopyDataBlock *prev_data_block = NULL;
+   prev_data_block = cstate->pcdata->curr_data_block;
+   prev_data_block->following_block = block_pos;
+   cstate->pcdata->curr_data_block = data_block;
+
+   if (prev_data_block->curr_blk_completed  == false)
+   prev_data_block->curr_blk_completed = true;
+
+   cstate->raw_buf_index = 0;
+   }

This code is common for both, keep in common flow and remove if (mode == 1)
cstate->pcdata->curr_data_block = data_block;
cstate->raw_buf_index = 0;

+#define CHECK_FIELD_COUNT \
+{\
+   if (fld_count == -1) \
+   { \
+   if (IsParallelCopy() && \
+   !IsLeader()) \
+   return true; \
+   else if (IsParallelCopy() && \
+   IsLeader()) \
+   { \
+   if
(cstate->pcdata->curr_data_block->data[cstate->raw_buf_index +
sizeof(fld_count)] != 0) \
+   ereport(ERROR, \
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+   errmsg("received copy
data after EOF marker"))); \
+   return true; \
+   } \
We only copy sizeof(fld_count), Shouldn't we check fld_count !=
cstate->max_fields? Am I missing something here?

+   if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+   {
+   AdjustFieldInfo(cstate, 2);
+   *new_block_pos = pcshared_info->cur_block_pos;
+   }
+
+   memcpy(&fld_size,
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
sizeof(fld_size));
+
+   cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+   fld_size = (int32) pg_ntoh32(fld_size);
+
+   if (fld_size == 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("unexpected EOF in COPY data")));
+
+   if (fld_size < -1)
+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("invalid field size")));
+
+   if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+   {
+   cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+   }
We can keep the check like cstate->raw_buf_index + fld_size < ..., for
better readability and consistency.

+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+   Oid typioparam, int32 typmod, uint32 *new_block_pos,
+   int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+   ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
flinfo, typioparam & typmod is not used, we can remove the parameter.

+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+   Oid typioparam, int32 typmod, uint32 *new_block_p

Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks

2020-07-06 Thread Bharath Rupireddy

Hi,

I found an issue while executing a backup use case(please see [1] for
queries) on postgres version 12.

Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as
on_shmem_exit call back which will
add this function to the before_shmem_exit_list array which is
supposed to be removed on pg_stop_backup
so that we can do the pending cleanup and issue a warning for each
pg_start_backup for which we did not call
the pg_stop backup. Now, I executed a query for which JIT is picked
up, then the the llvm compiler inserts it's
own exit callback i.e. llvm_shutdown at the end of
before_shmem_exit_list array. Now, I executed pg_stop_backup
and call to cancel_before_shmem_exit() is made with the expectation
that the nonexclusive_base_backup_cleanup
callback is removed from before_shmem_exit_list array.

Since the cancel_before_shmem_exit() only checks the last entry in the
before_shmem_exit_list array, which is
llvm compiler's exit callback, so we exit the
cancel_before_shmem_exit() without removing the intended
nonexclusive_base_backup_cleanup callback which remains still the
before_shmem_exit_list and gets executed
during the session exit throwing a warning "aborting backup due to
backend exiting before pg_stop_backup was called",
which is unintended.

Attached is the patch that fixes the above problem by making
cancel_before_shmem_exit() to look for the
given function(and for given args) in the entire
before_shmem_exit_list array, not just the last entry, starting
from the last entry.

Request the community take this patch for review for v12.

Having said that, abovementioned problem for backup use case does not
occur for v13 and latest versions of
postgres (please below description[2]), but these kinds of issues can
come, if the cancel_before_shmem_exit()
is left to just look at the last array entry while removing a
registered callback.

There's also a comment in cancel_before_shmem_exit() function
description "For simplicity, only the latest entry
can be removed. (We could work harder but there is no need for current uses.)

Since we start to find use cases now, there is a need to enhance
cancel_before_shmem_exit(), so I also propose
to have the same attached patch for v13 and latest versions.

Thoughts?

[1]
CREATE TABLE t1 (id SERIAL);
INSERT INTO t1 (id) SELECT * FROM generate_series(1, 2000);
SELECT * FROM pg_start_backup('label', false, false);
/*JIT is enabled in my session and it is being picked by below query*/
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT COUNT(*) FROM t1;
SELECT * FROM pg_stop_backup(false, true);

[2]
for v13 and latest versions, start_backup first registers do_pg_abort_backup,
and then pg_start_backup_callback, performs startup backup operations
and unregisters only pg_start_backup_callback from before_shmem_exit_list,
retaining do_pg_abort_backup still in the list, which is to be called
on session's exit
JIT compiler inserts it's own exit call back  at the end of
before_shmem_exit_list array.
stop_backup registers pg_stop_backup_callback, performs stop operations,
unregisters pg_stop_backup_callback from before_shmem_exit_list, and sets
the sessionBackupState = SESSION_BACKUP_NONE, note that the callback
do_pg_abort_backup registered by start_backup command still exists in the
before_shmem_exit_list and will not be removed by stop_backup. On session exit,
do_pg_abort_backup gets called but returns without performing any operations(not
even throwing a warning), by checking sessionBackupState which was set to
SESSION_BACKUP_NONE by stop_backup.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Modify-cancel_before_shmem_exit-to-search-all-reg.patch
Description: Binary data

Re: Global snapshots

2020-07-06 Thread Amit Kapila

On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada
 wrote:
>
> On Sat, 20 Jun 2020 at 21:21, Amit Kapila  wrote:
> >
> > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov
> >  wrote:
> >
> > >Also, can you let us know if this
> > > > supports 2PC in some way and if so how is it different from what the
> > > > other thread on the same topic [1] is trying to achieve?
> > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains
> > > 2PC machinery. Now I'd not judge which approach is better.
> > >
> >
>
> Sorry for being late.
>

No problem, your summarization, and comparisons of both approaches are
quite helpful.

>
> I studied this patch and did a simple comparison between this patch
> (0002 patch) and my 2PC patch.
>
> In terms of atomic commit, the features that are not implemented in
> this patch but in the 2PC patch are:
>
> * Crash safe.
> * PREPARE TRANSACTION command support.
> * Query cancel during waiting for the commit.
> * Automatically in-doubt transaction resolution.
>
> On the other hand, the feature that is implemented in this patch but
> not in the 2PC patch is:
>
> * Executing PREPARE TRANSACTION (and other commands) in parallel
>
> When the 2PC patch was proposed, IIRC it was like this patch (0002
> patch). I mean, it changed only postgres_fdw to support 2PC. But after
> discussion, we changed the approach to have the core manage foreign
> transaction for crash-safe. From my perspective, this patch has a
> minimum implementation of 2PC to work the global snapshot feature and
> has some missing features important for supporting crash-safe atomic
> commit. So I personally think we should consider how to integrate this
> global snapshot feature with the 2PC patch, rather than improving this
> patch if we want crash-safe atomic commit.
>

Okay, but isn't there some advantage with this approach (manage 2PC at
postgres_fdw level) as well which is that any node will be capable of
handling global transactions rather than doing them via central
coordinator?  I mean any node can do writes or reads rather than
probably routing them (at least writes) via coordinator node.  Now, I
agree that even if this advantage is there in the current approach, we
can't lose the crash-safety aspect of other approach.  Will you be
able to summarize what was the problem w.r.t crash-safety and how your
patch has dealt it?

> Looking at the commit procedure with this patch:
>
> When starting a new transaction on a foreign server, postgres_fdw
> executes pg_global_snapshot_import() to import the global snapshot.
> After some work, in pre-commit phase we do:
>
> 1. generate global transaction id, say 'gid'
> 2. execute PREPARE TRANSACTION 'gid' on all participants.
> 3. prepare global snapshot locally, if the local node also involves
> the transaction
> 4. execute pg_global_snapshot_prepare('gid') for all participants
>
> During step 2 to 4, we calculate the maximum CSN from the CSNs
> returned from each pg_global_snapshot_prepare() executions.
>
> 5. assign global snapshot locally, if the local node also involves the
> transaction
> 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants.
>
> Then, we commit locally (i.g. mark the current transaction as
> committed in clog).
>
> After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all
> participants.
>

As per my current understanding, the overall idea is as follows.  For
global transactions, pg_global_snapshot_prepare('gid') will set the
transaction status as InDoubt and generate CSN (let's call it NodeCSN)
at the node where that function is executed, it also returns the
NodeCSN to the coordinator.  Then the coordinator (the current
postgres_fdw node on which write transaction is being executed)
computes MaxCSN based on the return value (NodeCSN) of prepare
(pg_global_snapshot_prepare) from all nodes.  It then assigns MaxCSN
to each node.  Finally, when Commit Prepared is issued for each node
that MaxCSN will be written to each node including the current node.
So, with this idea, each node will have the same view of CSN value
corresponding to any particular transaction.

For Snapshot management, the node which receives the query generates a
CSN (CurrentCSN) and follows the simple rule that the tuple having a
xid with CSN lesser than CurrentCSN will be visible.  Now, it is
possible that when we are examining a tuple, the CSN corresponding to
xid that has written the tuple has a value as INDOUBT which will
indicate that the transaction is yet not committed on all nodes.  And
we wait till we get the valid CSN value corresponding to xid and then
use it to check if the tuple is visible.

Now, one thing to note here is that for global transactions we
primarily rely on CSN value corresponding to a transaction for its
visibility even though we still maintain CLOG for local transaction
status.

Leaving aside the incomplete parts and or flaws of the current patch,
does the above match the top-level idea of this patch?  I am not s

Re: Cache lookup errors with functions manipulation object addresses

2020-07-06 Thread Daniel Gustafsson

> On 6 Jul 2020, at 09:45, Michael Paquier  wrote:

> Attached is for now a rebased patch.  If there are any comments,
> please feel free.  Daniel, Alvaro, does that look fine for you?  I am
> letting this stuff aside for a couple of days for the moment.

Reading through I have no comments or objections, LGTM.  I didn't try to apply
and run tests, but the CFBot is happy about it so I doubt I would find anything
different there.

cheers ./daniel

71 matches

Mail list logo