Re: Resetting spilled txn statistics in pg_stat_replication
On Sat, 4 Jul 2020 at 22:13, Ajin Cherian wrote: > > > > On Thu, Jul 2, 2020 at 1:31 PM Masahiko Sawada > wrote: >> >> >> >> Thanks! Yes, I'm working on this patch while considering to send the >> stats to stats collector. >> >> I've attached PoC patch that implements a simple approach. I'd like to >> discuss how we collect the replication slot statistics in the stats >> collector before I bring the patch to completion. >> > > I understand the patch is only in the initial stage but I just tried testing > it. Thank you for testing the patch! > Using the patch, I enabled logical replication and created two pub/subs > (sub1,sub2) for two seperate tables (t1,t2). I inserted data into the second > table (t2) such that it spills into disk. > Then when I checked the stats using the new function > pg_stat_get_replication_slots() , I see that the same stats are updated for > both the slots, when ideally it should have reflected in the second slot > alone. > > postgres=# SELECT s.name, s.spill_txns,s.spill_count,s.spill_bytes > FROM pg_stat_get_replication_slots() s(name, spill_txns, spill_count, > spill_bytes); > name | spill_txns | spill_count | spill_bytes > --++-+- > sub1 | 1 | 20 | 132000 > sub2 | 1 | 20 | 132000 > (2 rows) > I think this is because logical decodings behind those two logical replications decode all WAL records *before* filtering the specified tables. In logical replication, we decode the whole WAL stream and then pass it to a logical decoding output plugin such as pgoutput. And then we filter tables according to the publication. Therefore, even if subscription sub1 is not interested in changes related to table t2, the replication slot sub1 needs to decode the whole WAL stream, resulting in spilling into disk. Regards, -- Masahiko Sawadahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Cache lookup errors with functions manipulation object addresses
On Fri, Jul 03, 2020 at 11:04:17AM -0400, Alvaro Herrera wrote: > 0001 and 0002 look good to me. Thanks for the review. > I think 0003 could be a little more careful about indentation; some long > lines are going to result after pgindent that might be better to handle > in a different way before commit, e.g., here > >> +{ >> +char *proname = format_procedure_extended(object->objectId, >> +FORMAT_PROC_FORCE_QUALIFY | FORMAT_PROC_INVALID_AS_NULL); Yes, I was looking at that for a couple of hours and pgindent made that a bit weird. So I have changed the code to just use a separate variable. That makes things a bit cleaner. While refreshing my mind with this code, I got surprised with the choice of "routine" in getProcedureTypeDescription() when we need a default object type name for an object not found, so I have switched that to "procedure" to be more consistent. I have also spent some time analyzing the coverage of the patch, and did not find any obvious holes or any remaining missing_ok paths not covered. Some comments were also a bit weird after re-reading them, so I tweaked a couple of places. Attached is for now a rebased patch. If there are any comments, please feel free. Daniel, Alvaro, does that look fine for you? I am letting this stuff aside for a couple of days for the moment. -- Michael From 3ca2841d38880eb0f40427e9aa447b5ae3a2e66c Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 6 Jul 2020 16:44:31 +0900 Subject: [PATCH v20] Eliminate user-visible cache lookup errors for objaddr SQL functions When using the following functions, users could see various types of errors like "cache lookup failed for OID XXX": * pg_describe_object * pg_identify_object_as_address * pg_identify_object All the lower set of APIs managing object addresses for all types of the system are made smarter by gaining a missing_ok argument that allows any caller to control if he is willing to have an actual error or if he wants to control error at its level by having empty objects if they are undefined. Regression tests added in this commit failed with such errors before being patched. --- src/include/catalog/objectaddress.h | 12 +- src/include/utils/regproc.h |5 +- src/backend/catalog/dependency.c | 30 +- src/backend/catalog/objectaddress.c | 1003 +- src/backend/catalog/pg_depend.c |4 +- src/backend/catalog/pg_shdepend.c|8 +- src/backend/commands/event_trigger.c |9 +- src/backend/commands/extension.c |6 +- src/backend/commands/tablecmds.c | 12 +- src/backend/utils/adt/regproc.c | 20 +- src/test/regress/expected/object_address.out | 98 ++ src/test/regress/sql/object_address.sql | 56 + doc/src/sgml/func.sgml |7 +- contrib/sepgsql/database.c |6 +- contrib/sepgsql/dml.c|4 +- contrib/sepgsql/label.c |4 +- contrib/sepgsql/proc.c | 14 +- contrib/sepgsql/relation.c | 20 +- contrib/sepgsql/schema.c |6 +- 19 files changed, 996 insertions(+), 328 deletions(-) diff --git a/src/include/catalog/objectaddress.h b/src/include/catalog/objectaddress.h index 144715d4f4..b876617c9d 100644 --- a/src/include/catalog/objectaddress.h +++ b/src/include/catalog/objectaddress.h @@ -70,14 +70,18 @@ extern bool get_object_namensp_unique(Oid class_id); extern HeapTuple get_catalog_object_by_oid(Relation catalog, AttrNumber oidcol, Oid objectId); -extern char *getObjectDescription(const ObjectAddress *object); +extern char *getObjectDescription(const ObjectAddress *object, + bool missing_ok); extern char *getObjectDescriptionOids(Oid classid, Oid objid); extern int read_objtype_from_string(const char *objtype); -extern char *getObjectTypeDescription(const ObjectAddress *object); -extern char *getObjectIdentity(const ObjectAddress *address); +extern char *getObjectTypeDescription(const ObjectAddress *object, + bool missing_ok); +extern char *getObjectIdentity(const ObjectAddress *address, + bool missing_ok); extern char *getObjectIdentityParts(const ObjectAddress *address, - List **objname, List **objargs); + List **objname, List **objargs, + bool missing_ok); extern struct ArrayType *strlist_to_textarray(List *list); extern ObjectType get_relkind_objtype(char relkind); diff --git a/src/include/utils/regproc.h b/src/include/utils/regproc.h index 145452a5ad..330417e888 100644 --- a/src/include/utils/regproc.h +++ b/src/include/utils/regproc.h @@ -29,10 +29,11 @@ extern List *stringToQualifiedNameList(const char *string); extern char *format_procedure(Oid procedure_oid); extern char *format_procedure_qualified(Oid procedure_oid); extern void format_procedure_parts(Oid operator_oi
Re: Include access method in listTables output
‐‐‐ Original Message ‐‐‐ On Monday, July 6, 2020 3:12 AM, Michael Paquier wrote: > On Sun, Jul 05, 2020 at 07:13:10AM +0530, vignesh C wrote: > > > I'm not sure if we should include showViews, I had seen that the > > access method was not getting selected for view. > > +1. These have no physical storage, so you are looking here for > relkinds that satisfy RELKIND_HAS_STORAGE(). Thank you for the review. Find attached v4 of the patch. > > --- > > Michael 0001-Include-AM-in-listTables-output-v4.patch Description: Binary data
Re: A patch for get origin from commit_ts.
On Mon, Jul 06, 2020 at 11:12:30AM +0800, movead...@highgo.ca wrote: > Thanks for the points and follow them, new patch attached. That was fast, thanks. I have not tested the patch, but there are two things I missed a couple of hours back. Why do you need pg_last_committed_xact_with_origin() to begin with? Wouldn't it be more simple to just add a new column to pg_last_committed_xact() for the replication origin? Contrary to pg_xact_commit_timestamp() that should not be broken for compatibility reasons because it returns only one value, we don't have this problem with pg_last_committed_xact() as it already returns one tuple with two values. +{ oid => '4179', descr => 'get commit origin of a transaction', A second thing is that the OID of the new function should be in the range 8000.., as per the policy introduced in commit a6417078. src/include/catalog/unused_oids can be used to pick up a value. -- Michael signature.asc Description: PGP signature
Re: proposal: schema variables
ne 5. 7. 2020 v 15:33 odesílatel Pavel Stehule napsal: > > > čt 21. 5. 2020 v 14:49 odesílatel Pavel Stehule > napsal: > >> >> >> čt 21. 5. 2020 v 13:34 odesílatel Amit Kapila >> napsal: >> >>> On Thu, May 21, 2020 at 3:41 PM Pavel Stehule >>> wrote: >>> > >>> > Hi >>> > >>> > just rebase without any other changes >>> > >>> >>> You seem to forget attaching the rebased patch. >>> >> >> I am sorry >> >> attached. >> > > fresh rebase > fix test build Regards Pavel > Regards > > Pavel > > >> Pavel >> >> >>> -- >>> With Regards, >>> Amit Kapila. >>> EnterpriseDB: http://www.enterprisedb.com >>> >> schema-variables-20200706.patch.gz Description: application/gzip
Performing partition pruning using row value
Hello I would like to ask about the conditions under which partition pruning is performed. In PostgreSQL 12, when I executed following SQL, partition pruning is not performed. postgres=# explain select * from a where (c1, c2) < (99, 99); QUERY PLAN Append (cost=0.00..60.00 rows=800 width=40) -> Seq Scan on a1 a_1 (cost=0.00..28.00 rows=400 width=40) Filter: (ROW(c1, c2) < ROW(99, 99)) -> Seq Scan on a2 a_2 (cost=0.00..28.00 rows=400 width=40) Filter: (ROW(c1, c2) < ROW(99, 99)) (5 rows) However, pruning is performed when I changed the SQL as follows. postgres=# explain select * from a where c1 < 99 and c2 < 99; QUERY PLAN Seq Scan on a1 a (cost=0.00..28.00 rows=133 width=40) Filter: ((c1 < 99) AND (c2 < 99)) (2 rows) These tables are defined as follows. create table a( c1 int, c2 int, c3 varchar) partition by range(c1, c2); create table a1 partition of a for values from(0, 0) to (100, 100); create table a2 partition of a for values from(100, 100) to (200, 200); Looking at the code, "(c1, c2) < (99, 99)" is recognized as RowCompExpr and "c1 < 99 and c2 < 99" is recognized combination of OpExpr. Currently, pruning is not performed for RowCompExpr, is this correct? Also, at the end of match_clause_to_partition_key(), the following Comments like. "Since the qual didn't match up to any of the other qual types supported here, then trying to match it against any other partition key is a waste of time, so just return PARTCLAUSE_UNSUPPORTED." Because it would take a long time to parse all Expr nodes, does match_cluause_to_partition_key() return PART_CLAUSE_UNSUPPORTED when such Expr node is passed? If the number of args in RowCompExpr is small, I would think that expanding it would improve performance. regards, sho kato
Re: Asymmetric partition-wise JOIN
On 12/27/19 12:34 PM, Kohei KaiGai wrote: The attached v2 fixed the problem, and regression test finished correctly. Using your patch I saw incorrect value of predicted rows at the top node of the plan: "Append (cost=270.02..35165.37 rows=40004 width=16)" Full explain of the query plan see in attachment - explain_with_asymmetric.sql if I disable enable_partitionwise_join then: "Hash Join (cost=270.02..38855.25 rows=10001 width=16)" Full explain - explain_no_asymmetric.sql I thought that is the case of incorrect usage of cached values of norm_selec, but it is a corner-case problem of the eqjoinsel() routine : selectivity = 1/size_of_larger_relation; (selfuncs.c:2567) tuples = selectivity * outer_tuples * inner_tuples; (costsize.c:4607) i.e. number of tuples depends only on size of smaller relation. It is not a bug of your patch but I think you need to know because it may affect on planner decision. === P.S. Test case: CREATE TABLE t0 (a serial, b int); INSERT INTO t0 (b) (SELECT * FROM generate_series(1e4, 2e4) as g); CREATE TABLE parts (a serial, b int) PARTITION BY HASH(a) INSERT INTO parts (b) (SELECT * FROM generate_series(1, 1e6) as g); -- regards, Andrey Lepikhov Postgres Professional explain_with_asymmetric.sql Description: application/sql explain_no_asymmetric.sql Description: application/sql
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
On Mon, Jul 6, 2020 at 11:44 AM Dilip Kumar wrote: > > On Mon, Jul 6, 2020 at 11:31 AM Amit Kapila wrote: > > > > > > > 10. I have got the below failure once. I have not investigated this > > > > in detail as the patch is still under progress. See, if you have any > > > > idea? > > > > # Failed test 'check extra columns contain local defaults' > > > > # at t/013_stream_subxact_ddl_abort.pl line 81. > > > > # got: '2|0' > > > > # expected: '1000|500' > > > > # Looks like you failed 1 test of 2. > > > > make[2]: *** [check] Error 1 > > > > make[1]: *** [check-subscription-recurse] Error 2 > > > > make[1]: *** Waiting for unfinished jobs > > > > make: *** [check-world-src/test-recurse] Error 2 > > > > > > Even I got the failure once and after that, it did not reproduce. I > > > have executed it multiple time but it did not reproduce again. Are > > > you able to reproduce it consistently? > > > > > > > No, I am also not able to reproduce it consistently but I think this > > can fail if a subscriber sends the replay_location before actually > > replaying the changes. First, I thought that extra send_feedback we > > have in apply_handle_stream_commit might have caused this but I guess > > that can't happen because we need the commit time location for that > > and we are storing the same at the end of apply_handle_stream_commit > > after applying all messages. I am not sure what is going on here. I > > think we somehow need to reproduce this or some variant of this test > > consistently to find the root cause. > > And I think it appeared first time for me, so maybe either induced > from past few versions so some changes in the last few versions might > have exposed it. I have noticed that almost 50% of the time I am able > to reproduce after the clean build so I can trace back from which > version it started appearing that way it will be easy to narrow down. > One more comment ReorderBufferLargestTopTXN { .. dlist_foreach(iter, &rb->toplevel_by_lsn) { ReorderBufferTXN *txn; + Size size = 0; + Size largest_size = 0; txn = dlist_container(ReorderBufferTXN, node, iter.cur); - /* if the current transaction is larger, remember it */ - if ((!largest) || (txn->size > largest->size)) + /* + * If this transaction have some incomplete changes then only consider + * the size upto last complete lsn. + */ + if (rbtxn_has_incomplete_tuple(txn)) + size = txn->complete_size; + else + size = txn->total_size; + + /* If the current transaction is larger then remember it. */ + if ((largest != NULL || size > largest_size) && size > 0) Here largest_size is a local variable inside the loop which is initialized to 0 in each iteration and that will lead to picking each next txn as largest. This seems wrong to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Default setting for enable_hashagg_disk (hash_mem)
On Sun, Jul 5, 2020 at 2:24 AM Jeff Davis wrote: > > On Sat, 2020-07-04 at 14:49 +0530, Amit Kapila wrote: > > > We don't even have a user report yet of a > > > regression compared to PG 12, or one that can't be fixed by > > > increasing > > > work_mem. > > > > > > > Yeah, this is exactly the same point I have raised above. I feel we > > should wait before designing any solution to match pre-13 behavior > > for > > hashaggs to see what percentage of users face problems related to > > this > > and how much is a problem for them to increase work_mem to avoid > > regression. > > I agree that it's good to wait for actual problems. But the challenge > is that we can't backport an added GUC. > Is it because we won't be able to edit existing postgresql.conf file or for some other reasons? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Autovacuum on partitioned table (autoanalyze)
> On Wed, Jul 1, 2020 at 6:26 PM Daniel Gustafsson wrote: > > > On 21 Apr 2020, at 18:21, yuzuko wrote: > > > I'll update the patch soon. > > Do you have an updated version to submit? The previous patch no longer > applies > to HEAD, so I'm marking this entry Waiting on Author in the meantime. > Thank you for letting me know. I attach the latest patch applies to HEAD. I think there are other approaches like Tom's idea that Justin previously referenced, but this patch works the same way as previous patches. (tracks updated/inserted/deleted tuples and checks whether the partitioned tables needs auto-analyze, same as nonpartitioned tables) Because I wanted to be able to analyze partitioned tables by autovacuum as a first step, and I think this approach is the simplest way to do it. -- Best regards, Yuzuko Hosoya NTT Open Source Software Center v8_autovacuum_on_partitioned_table.patch Description: Binary data
Proposal: Automatic partition creation
The previous discussion of automatic partition creation [1] has addressed static and dynamic creation of partitions and ended up with several syntax proposals. In this thread, I want to continue this work. Attached is PoC for static partition creation. The patch core is quite straightforward. It adds one more transform clause to convert given partitioning specification into several CREATE TABLE statements. The patch implements following syntax: CREATE TABLE ... PARTITION BY partition_method (list_of_columns) partition_auto_create_clause where partition_auto_create_clause is CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec and partition_bound_spec is: MODULUS integer | VALUES IN (expr [,...]) [, ] | INTERVAL range_step FROM range_start TO range_end For more examples check auto_partitions.sql in the patch. TODO: - CONFIGURATION is just an existing keyword, that I picked as a stub. Ideas on better wording are welcome. - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet I wonder, is it worth placing a stub for dynamic partitioning, or we can rather add these keywords later. - HASH and LIST static partitioning works as expected. Testing and feedback are welcome. - RANGE partitioning is not really implemented in this patch. Now it only accepts interval data type as 'interval' and respectively date types as range_start and range_end expressions. Only one partition is created. I found it difficult to implement the generation of bounds using internal functions and data types. Both existing solutions (pg_pathman and pg_partman) rely on SQL level routines [2]. I am going to implement this via SPI, which allow to simplify checks and calculations. Do you see any pitfalls in this approach? - Partition naming. Now partition names for all methods look like $tablename_$partnum Do we want more intelligence here? Now we have RunObjectPostCreateHook(), which allows to rename the table. To make it more user-friendly, we can later implement pl/pgsql function that sets the callback, as it is done in pg_pathman set_init_callback() [3]. - Current design doesn't allow to create default partition automatically. Do we need this functionality? - Do you see any restrictions for future extensibility (dynamic partitioning, init_callback, etc.) in the proposed design ? I expect this to be a long discussion, so here is the wiki page [4] to fix important questions and final agreements. [1] https://www.postgresql.org/message-id/flat/alpine.DEB.2.21.1907150711080.22273%40lancre [2] https://github.com/postgrespro/pg_pathman/blob/dbcbd02e411e6acea6d97f572234746007979538/range.sql#L99 [3] https://github.com/postgrespro/pg_pathman#additional-parameters [4] https://wiki.postgresql.org/wiki/Declarative_partitioning_improvements -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 5808c5e843cac1e1383366a5cbff116eaa433f90 Mon Sep 17 00:00:00 2001 From: anastasia Date: Fri, 3 Jul 2020 03:34:24 +0300 Subject: [PATCH] WIP create partitions automatically Implement new syntax to generate bounds for HASH, LIST and RANGE partitions. Implement automatic partition creation for HASH and LIST. Check new regression test 'auto_partitions.sql' for syntax examples --- src/backend/commands/tablecmds.c | 7 + src/backend/nodes/copyfuncs.c | 33 src/backend/nodes/equalfuncs.c| 29 +++ src/backend/nodes/outfuncs.c | 28 +++ src/backend/nodes/readfuncs.c | 33 src/backend/parser/gram.y | 160 ++--- src/backend/parser/parse_utilcmd.c| 166 ++ src/include/nodes/nodes.h | 2 + src/include/nodes/parsenodes.h| 43 + src/include/parser/kwlist.h | 1 + src/include/partitioning/partdefs.h | 4 + src/test/regress/expected/auto_partitions.out | 0 src/test/regress/parallel_schedule| 2 + src/test/regress/serial_schedule | 1 + src/test/regress/sql/auto_partitions.sql | 43 + 15 files changed, 523 insertions(+), 29 deletions(-) create mode 100644 src/test/regress/expected/auto_partitions.out create mode 100644 src/test/regress/sql/auto_partitions.sql diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index f79044f39f..7b2c651952 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -628,6 +628,13 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, else partitioned = false; + if (!partitioned && stmt->partautocreate) + { + ereport(ERROR, +(errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("PARTITION bounds can only be used on partitioned tables"))); + } + /* * Look up the namespace in which we are supposed to create the relation, * check we have permission to create there
Re: Resetting spilled txn statistics in pg_stat_replication
On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada wrote: > > I've attached PoC patch that implements a simple approach. I'd like to > discuss how we collect the replication slot statistics in the stats > collector before I bring the patch to completion. > > In this PoC patch, we have the array of PgStat_ReplSlotStats struct > which has max_replication_slots entries. The backend and wal sender > send the slot statistics to the stats collector when decoding a commit > WAL record. > > typedef struct PgStat_ReplSlotStats > { > charslotname[NAMEDATALEN]; > PgStat_Counter spill_txns; > PgStat_Counter spill_count; > PgStat_Counter spill_bytes; > } PgStat_ReplSlotStats; > > What I'd like to discuss are: > > Since the unique identifier of replication slots is the name, the > process sends slot statistics along with slot name to stats collector. > I'm concerned about the amount of data sent to the stats collector and > the cost of searching the statistics within the statistics array (it’s > O(N) where N is max_replication_slots). Since the maximum length of > slot name is NAMEDATALEN (64 bytes default) and max_replication_slots > is unlikely a large number, I might be too worrying but it seems like > it’s better to avoid that if we can do that easily. An idea I came up > with is to use the index of slots (i.g., the index of > ReplicationSlotCtl->replication_slots[]) as the index of statistics of > slot in the stats collector. But since the index of slots could change > after the restart we need to synchronize the index of slots on both > array somehow. So I thought that we can determine the index of the > statistics of slots at ReplicationSlotAcquire() or > ReplicationSlotCreate(), but it will in turn need to read stats file > while holding ReplicationSlotControlLock to prevent the same index of > the statistics being used by the concurrent process who creating a > slot. I might be missing something though. > I don't think we should be bothered about the large values of max_replication_slots. The default value is 10 and I am not sure if users will be able to select values large enough that we should bother about searching them by name. I think if it could turn out to be a problem then we can try to invent something to mitigate it. > Second, as long as the unique identifier is the slot name there is no > convenient way to distinguish between the same name old and new > replication slots, so the backend process or wal sender process sends > a message to the stats collector to drop the replication slot at > ReplicationSlotDropPtr(). This strategy differs from what we do for > table, index, and function statistics. It might not be a problem but > I’m thinking a better way. > Can we rely on message ordering in the transmission mechanism (UDP) for stats? The wiki suggests [1] we can't. If so, then this might not work. > The new view name is also an open question. I prefer > pg_stat_replication_slots and to add stats of physical replication > slots to the same view in the future, rather than a separate view. > This sounds okay to me. > Aside from the above, this patch will change the most of the changes > introduced by commit 9290ad198b1 and introduce new code much. I’m > concerned whether such changes are acceptable at the time of beta 2. > I think it depends on the final patch. My initial thought was that we should do this for PG14 but if you are suggesting removing the changes done by commit 9290ad198b1 then we need to think over it. I could think of below options: a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We were anyway having spilling without any work in PG13 but didn’t have stats. b. Try to get your patch in PG13 if we can, otherwise, revert the feature 9290ad198b1. c. Get whatever we have in commit 9290ad198b1 for PG13 and additionally have what we are discussing here for PG14. This means that spilled stats at slot level will be available in PG14 via pg_stat_replication_slots and for individual WAL senders it will be available via pg_stat_replication both in PG13 and PG14. Even if we can get your patch in PG13, we can still keep those in pg_stat_replication. d. Get whatever we have in commit 9290ad198b1 for PG13 and change it for PG14. I don't think this will be a popular approach. [1] - https://en.wikipedia.org/wiki/User_Datagram_Protocol -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
>> >> If I understand it correctly, your suggestion is to add >> keep_connection option and use that while defining the server object. >> IMO having keep_connection option at the server object level may not >> serve the purpose being discussed here. >> For instance, let's say I create a foreign server in session 1 with >> keep_connection on, and I want to use that >> server object in session 2 with keep_connection off and session 3 with >> keep_connection on and so on. > > In my opinion, in such cases, one needs to create two server object one with > keep-connection ON and one with keep-connection off. And need to decide > to use appropriate for the particular session. > Yes, having two variants of foreign servers: one with keep-connections on (this can be default behavior, even if user doesn't mention this option, internally it can be treated as keep-connections on) , and if users need no connection hashing, another foreign server with all other options same but keep-connections off. This looks okay to me, if we want to avoid a core session level GUC. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: jsonpath versus NaN
On Thu, Jun 18, 2020 at 8:04 PM Alexander Korotkov wrote: > On Thu, Jun 18, 2020 at 7:45 PM Tom Lane wrote: > > Alexander Korotkov writes: > > > Thank you for your answer. I'm trying to understand your point. > > > Standard claims that .double() method should behave the same way as > > > CAST to double. However, standard references the standard behavior of > > > CAST here, not behavior of your implementation of CAST. So, if we > > > extend the functionality of standard CAST in our implementation, that > > > doesn't automatically mean we should extend the .double() jsonpath > > > method in the same way. Is it correct? > > > > Right. We could, if we chose, extend jsonpath to allow Inf/NaN, but > > I don't believe there's an argument that the spec requires us to. > > > > Also the larger point is that it doesn't make sense to extend jsonpath > > that way when we haven't extended json(b) that way. This code wart > > wouldn't exist were it not for that inconsistency. Also, I find it hard > > to see why anyone would have a use for NaN in a jsonpath test when they > > can't write NaN in the input json data, nor have it be correctly reflected > > into output json data either. > > Ok, I got the point. I have nothing against removing support of NaN > in jsonpath as far as it doesn't violates the standard. I'm going to > write the patch for this. The patchset is attached, sorry for the delay. The first patch improves error messages, which appears to be unclear for me. If one applies .double() method to a numeric value, we restrict that this numeric value should fit to double precision type. If it doesn't fit, the current error message just says the following. ERROR: jsonpath item method .double() can only be applied to a numeric value But that's confusing, because .double() method is naturally applied to a numeric value. Patch makes this message explicitly report that numeric value is out of range for double type. This patch also adds test exercising this error. When string can't be converted to double precision, I think it's better to explicitly say that we expected valid string representation of double precision type. Second patch forbids to convert NaN using .double() method. As I get, NaN can't be result of any jsonpath computations assuming there is no NaN input. So, I just put an assert to convertJsonbScalar() ensuring there is no NaN in JsonbValue. -- Regards, Alexander Korotkov 0001-Improve-error-reporting-for-jsonpath-.double-method.patch Description: Binary data 0002-Forbid-NaNs-in-jsonpath.patch Description: Binary data
Re: Libpq support to connect to standby server as priority
> This patch no longer applies, can you please submit a rebased version? I've > marked the entry as Waiting on Author in the meantime. > Here's a rebased version of the patch. Regards, Greg v16-0001-Enhance-libpq-target_session_attrs-and-add-target_se.patch Description: Binary data
Re: WIP/PoC for parallel backup
> On 12 Jun 2020, at 19:28, Robert Haas wrote: > I am sure that nobody is arguing that the patch has to be beneficial > in all cases in order to justify applying it. However, there are > several good arguments against proceding with this patch: This thread has stalled with no resolution to the raised issues, and the latest version of the patch (v15) posted no longer applies (I only tried 0001 which failed, the green tick in the CFBot is due it mistakenlt thinking an attached report is a patch). I'm marking this patch Returned with Feedback. Please open a new CF entry when there is a new version of the patch. cheers ./daniel
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> > > > If I understand it correctly, your suggestion is to add > > keep_connection option and use that while defining the server object. > > IMO having keep_connection option at the server object level may not > > serve the purpose being discussed here. > > For instance, let's say I create a foreign server in session 1 with > > keep_connection on, and I want to use that > > server object in session 2 with keep_connection off and session 3 with > > keep_connection on and so on. > > One way we can change the server's keep_connection option is to alter > > the server object, but that's not a good choice, > > as we have to alter it at the system level. > > Is there use-case in practice where different backends need to have > different connection cache setting even if all of them connect the > same server? Currently, connection cache exists at each backend/session level and gets destroyed on backend/session exit. I think the same cached connection can be used until it gets invalidated due to user mapping or server definition changes. One way is to have a shared memory based connection cache instead of backend level cache, but it has its own problems, like maintenance, invalidation, dealing with concurrent usages etc. > I thought since the problem that this feature is trying > to resolve is not to eat up the remote server connections capacity by > disabling connection cache, we’d like to disable connection cache to > the particular server, for example, which sets low max_connections. > Currently, the user mapping oid acts as the key for the cache's hash table, so the cache entries are not made directly using foreign server ids though each entry would have some information related to foreign server. Just to reiterate, the main idea if this feature is to give the user a way to choose, whether to use connection caching or not, if he decides that his session uses remote queries very rarely, then he can disable, or if the remote queries are more frequent in a particular session, he can choose to use connection caching. In a way, this feature addresses the point that local sessions not eating up remote connections/sessions by letting users decide(as users know better when to cache or when not to) to cache or not cache the remote connections and thus releasing them immediately if there is not much usage from local session. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Inlining of couple of functions in pl_exec.c improves performance
On Sat, 4 Jul 2020 at 01:19, Tom Lane wrote: > > I did some performance testing on 0001+0002 here, and found that > for me, there's basically no change on x86_64 but a win of 2 to 3 > percent on aarch64, using Pavel's pi_est_1() as a representative > case for simple plpgsql statements. That squares with your original > results I believe. It's not clear to me whether any of the later > tests in this thread measured these changes in isolation, or only > with 0003 added. Yeah I had the same observation. 0001+0002 seems to benefit mostly on aarch64. And 0003 (exec_case_value) benefited both on amd64 and aarch64. > > Anyway, that's good enough for me, so I pushed 0001+0002 after a > little bit of additional cosmetic tweaking. > > I attach your original 0003 here (it still applies, with some line > offsets). That's just so the cfbot doesn't get confused about what > it's supposed to test now. Thanks for pushing all the three ! Thanks, -Amit Khandekar Huawei Technologies
Re: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)
As a general overview, the series of patches in the mail thread do match their description. The addition of the stricter, explicit use of instrumentation does improve the design as the distinction of the use cases requiring a pin or a lock is made more clear. The added commentary is descriptive and appears grammatically correct, at least to a non native speaker. Unfortunately though, the two bug fixes do not seem to apply. Also, there is a small issue regarding the process, not the content of the patches. In CF app there is a latest attachment (v3-0002-Add-nbtree-Valgrind-buffer-lock-checks.patch) which does not appear in the mail thread. Before changing the status, I will kindly ask for the complete latest series that applies in the mail thread. The new status of this patch is: Waiting on Author
Re: WIP/PoC for parallel backup
On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson wrote: > > On 12 Jun 2020, at 19:28, Robert Haas wrote: > > > I am sure that nobody is arguing that the patch has to be beneficial > > in all cases in order to justify applying it. However, there are > > several good arguments against proceding with this patch: > > This thread has stalled with no resolution to the raised issues, and the > latest > version of the patch (v15) posted no longer applies (I only tried 0001 > which > failed, the green tick in the CFBot is due it mistakenlt thinking an > attached > report is a patch). I'm marking this patch Returned with Feedback. Please > open a new CF entry when there is a new version of the patch. > > cheers ./daniel I think this is fair. There are quite a few valid points raised by Robert. -- Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca ADDR: 10318 WHALLEY BLVD, Surrey, BC CELL:+923335449950 EMAIL: mailto:hamid.akh...@highgo.ca SKYPE: engineeredvirus
Re: Yet another fast GiST build (typo)
> 1 июля 2020 г., в 17:05, Daniel Gustafsson написал(а): > >> On 29 Feb 2020, at 18:24, Andrey M. Borodin wrote: > >> I've fixed this and added patch with GiST reloption to provide sort support >> function. > > 0002 in this patchset fails to apply, please submit a rebased version. I've > marked the entry Waiting for Author in the meantime. Thanks, Daniel! PFA v7. Best regards, Andrey Borodin. v7-0001-Add-sort-support-for-point-gist_point_sortsupport.patch Description: Binary data v7-0002-Implement-GiST-build-using-sort-support.patch Description: Binary data
Re: Binary support for pgoutput plugin
On Sun, 5 Jul 2020 at 17:28, Daniel Gustafsson wrote: > > On 5 Jul 2020, at 23:11, Alvaro Herrera > wrote: > > > > On 2020-Jul-05, Daniel Gustafsson wrote: > > > >>> On 2 Jul 2020, at 18:41, Dave Cramer wrote: > >>> > >>> rebased > >> > >> Thanks! The new version of 0001 patch has a compiler warning due to > mixed > >> declarations and code: > >> > >> worker.c: In function ‘slot_store_data’: > >> worker.c:366:5: error: ISO C90 forbids mixed declarations and code > [-Werror=declaration-after-statement] > > > > AFAICS this is fixed in 0005. > > Yes and no, 0005 fixes one such instance but the one failing the build is > another one in worker.c (the below being from 0008 which in turn change > the row > in question from previous patches): > > + int cursor = tupleData->values[remoteattnum]->cursor; > > > I'm going to suggest to use "git rebase -i" > > +1 > Strangely I don't see those errors when I build on my machine, but I will fix them as far as rebase -i do what is advised here for squashing them. Just one patch now ? Thanks,
Re: Binary support for pgoutput plugin
> On 6 Jul 2020, at 14:58, Dave Cramer wrote: > as far as rebase -i do what is advised here for squashing them. Just one > patch now ? One patch per logical change, if there are two disjoint changes in the patchset where one builds on top of the other then multiple patches are of course fine. My personal rule-of-thumb is to split it the way I envision it committed. cheers ./daniel
Re: Is it useful to record whether plans are generic or custom?
On 2020/06/11 14:59, torikoshia wrote: On 2020-06-10 18:00, Kyotaro Horiguchi wrote: + TupleDescInitEntry(tupdesc, (AttrNumber) 8, "last_plan", This could be a problem if we showed the last plan in this view. I think "last_plan_type" would be better. + if (prep_stmt->plansource->last_plan_type == PLAN_CACHE_TYPE_CUSTOM) + values[7] = CStringGetTextDatum("custom"); + else if (prep_stmt->plansource->last_plan_type == PLAN_CACHE_TYPE_GENERIC) + values[7] = CStringGetTextDatum("generic"); + else + nulls[7] = true; Using swith-case prevents future additional type (if any) from being unhandled. I think we are recommending that as a convension. Thanks for your reviewing! I've attached a patch that reflects your comments. Thanks for the patch! Here are the comments. +Number of times generic plan was choosen +Number of times custom plan was choosen Typo: "choosen" should be "chosen"? + + last_plan_type text + + +Tells the last plan type was generic or custom. If the prepared +statement has not executed yet, this field is null + Could you tell me how this information is expected to be used? I think that generic_plans and custom_plans are useful when investigating the cause of performance drop by cached plan mode. But I failed to get how much useful last_plan_type is. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Binary support for pgoutput plugin
On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson wrote: > > On 6 Jul 2020, at 14:58, Dave Cramer wrote: > > > as far as rebase -i do what is advised here for squashing them. Just one > patch now ? > > One patch per logical change, if there are two disjoint changes in the > patchset > where one builds on top of the other then multiple patches are of course > fine. > My personal rule-of-thumb is to split it the way I envision it committed. > At this point it is the result of 3 rebases. I guess I'll have to break it up differently.. Thanks, Dave
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Thu, Jul 2, 2020 at 4:29 PM Bharath Rupireddy wrote: > > > > This is actually strange. AFAIR the code, without looking at the > > current code, when a query picks a foreign connection it checks its > > state. It's possible that the connection has not been marked bad by > > the time you fire new query. If the problem exists probably we should > > fix it anyway since the backend at the other end of the connection has > > higher chances of being killed while the connection was sitting idle > > in the cache. > > > > Thanks Ashutosh for the suggestion. One way, we could solve the above > problem is that, upon firing the new foreign query from local backend using > cached > connection, (assuming the remote backend/session that was cached in the local > backed got > killed by some means), instead of failing the query in the local > backend/session, upon > detecting error from remote backend, we could just delete the cached old > entry and try getting another > connection to remote backend/session, cache it and proceed to submit the > query. This has to happen only at > the beginning of remote xact. Yes, I believe that would be good. > > This way, instead of failing(as mentioned above " server closed the > connection unexpectedly"), > the query succeeds if the local session is able to get a new remote backend > connection. > In GetConnection() there's a comment /* * We don't check the health of cached connection here, because it would * require some overhead. Broken connection will be detected when the * connection is actually used. */ Possibly this is where you want to check the health of connection when it's being used the first time in a transaction. > I worked on a POC patch to prove the above point. Attaching the patch. > Please note that, the patch doesn't contain comments and has some issues like > having some new > variable in PGconn structure and the things like. I don't think changing the PGConn structure for this is going to help. It's a libpq construct and used by many other applications/tools other than postgres_fdw. Instead you could use ConnCacheEntry for the same. See how we track invalidated connection and reconnect upon invalidation. > > If the approach makes some sense, then I can rework properly on the patch and > probably > can open another thread for the review and other stuff. > > The way I tested the patch: > > 1. select * from foreign_tbl; > /*from local session - this results in a > remote connection being cached in > the connection cache and > a remote backend/session is opened. > */ > 2. kill the remote backend/session > 3. select * from foreign_tbl; > /*from local session - without patch > this throws error "ERROR: server closed the connection unexpectedly" > with path - try to use > the cached connection at the beginning of remote xact, upon receiving > error from remote postgres > server, instead of aborting the query, delete the cached entry, try to > get a new connection, if it > gets, cache it and use that for executing the query, query succeeds. > */ This will work. Be cognizant of the fact that the same connection may be used by multiple plan nodes. -- Best Wishes, Ashutosh Bapat
Re: Ideas about a better API for postgres_fdw remote estimates
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> 2. Wedging this into EXPLAIN would be quite ugly, because (at least > >> as I envision it) the output would have just about nothing to do with > >> any existing EXPLAIN output. > > > This is a better argument for not making it part of EXPLAIN, though I > > don't really feel like I've got a decent idea of what you are suggesting > > the output *would* look like, so it's difficult for me to agree (or > > disagree) about this particular point. > > Per postgres_fdw's get_remote_estimate(), the only data we use right now > is the startup_cost, total_cost, rows and width estimates from the > top-level Plan node. That's available immediately from the Plan tree, > meaning that basically *nothing* of the substantial display effort > expended by explain.c and ruleutils.c is of any value. So the level-zero The 'display effort' you're referring to, when using JSON format with explain, is basically to format the results into JSON and return them- which is what you're suggesting this mode would do anyway, no..? If the remote side 'table' is actually a view that's complicated then having a way to get just the top-level information (and excluding the rest) sounds like it'd be useful and perhaps excluding that other info doesn't really fit into EXPLAIN's mandate, but that's also much less common. > implementation of this would be to run the parser and planner, format > those four numbers into a JSON object (which would require little more > infrastructure than sprintf), and return that. Sure, we could make that > into some kind of early-exit path in explain.c, but I think it'd be a > pretty substantial wart, especially since it'd mean that none of the > other EXPLAIN options are sensible in combination with this one. That EXPLAIN has options that only make sense in combination with certain other options isn't anything new- BUFFERS makes no sense without ANALYZE, etc. > Further down the road, we might want to rethink the whole idea of > completely constructing a concrete Plan. We could get the data we need > at the list-of-Paths stage. Even more interesting, we could (with very > little more work) return data about multiple Paths, so that the client > could find out, for example, the costs of sorted and unsorted output > without paying two network round trips to discover that. That'd > definitely require changes in the core planner, since it has no API to > stop at that point. And it's even less within the charter of EXPLAIN. I have to admit that I'm not really sure how we could make it work, but having a way to get multiple paths returned by EXPLAIN would certainly be interesting to a lot of users. Certainly it's easier to see how we could get at that info in a postgres_fdw-specific function, and be able to understand how to deal with it there and what could be done, but once it's there I wonder if other tools might see that and possibly even build on it because it'd be the only way to get that kind of info, which certainly wouldn't be ideal. > I grant your point that there might be other users for this besides > postgres_fdw, but that doesn't mean it must be a core feature. That postgres_fdw is an extension is almost as much of a wart as anything being discussed here and suggesting that things added to postgres_fdw aren't 'core features' seems akin to ignoring the forest for the trees- consider that, today, there isn't even an option to install only the core server from the PGDG repos (at least for Debian / Ubuntu, not sure if the RPMs have caught up to that yet, but they probably should). The 'postgresql-12' .deb includes all the extensions that are part of the core git repo, because they're released and maintained just the same as the core server and, from a practical perspective, to run a decent PG system you really should have them installed, so why bother having a separate package? > >> 3. We surely would not back-patch a core change like this. OTOH, if > >> the added infrastructure is in an extension, somebody might want to > >> back-patch that (even if unofficially). > > > Since postgres_fdw is part of core and core's release cycle, and the > > packagers manage the extensions from core in a way that they have to > > match up, this argument doesn't hold any weight with me. > > Certainly only v14 (or whenever) and later postgres_fdw would be able > to *use* this data. The scenario I'm imagining is that somebody wants > to be able to use that client against an older remote server, and is > willing to install some simple extension on the remote server to do so. > Perhaps this scenario is not worth troubling over, but I don't think > it's entirely far-fetched. I definitely don't think that such an extension should be maintained outside of core, and I seriously doubt any of our packagers would be anxious to build an indepedent package for this to be usable in older servers.
Re: Cache lookup errors with functions manipulation object addresses
On 2020-Jul-06, Michael Paquier wrote: > While refreshing my mind with this code, I got surprised with the > choice of "routine" in getProcedureTypeDescription() when we need a > default object type name for an object not found, so I have switched > that to "procedure" to be more consistent. I think "routine" was more correct. We introduced that terminology to encompass both functions and procedures, back when real procedures were added. See the definitions in the glossary for example. > I have also spent some time analyzing the coverage of the patch, and > did not find any obvious holes or any remaining missing_ok paths not > covered. Some comments were also a bit weird after re-reading them, > so I tweaked a couple of places. Cool. > Attached is for now a rebased patch. If there are any comments, > please feel free. Daniel, Alvaro, does that look fine for you? I am > letting this stuff aside for a couple of days for the moment. I'm not sure that I'll have time to review this in the next couple of days, but it seemed sufficiently good when I last looked. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Proposal: Automatic partition creation
On Mon, Jul 06, 2020 at 01:45:52PM +0300, Anastasia Lubennikova wrote: > The previous discussion of automatic partition creation [1] has addressed > static and dynamic creation of partitions and ended up with several syntax > proposals. ... > where partition_auto_create_clause is > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet > I wonder, is it worth placing a stub for dynamic partitioning, or we can > rather add these keywords later. I understand by "deferred" you mean that the partition isn't created at the time CREATE TABLE is run but rather deferred until needed by INSERT. For deferred, range partitioned tables, I think maybe what you'd want to specify (and store) is the INTERVAL. If the table is partitioned by day, then we'd date_trunc('day', time) and dynamically create that day. But if it was partitioned by month, we'd create the month. I think you'd want to have an ALTER command for that (we would use that to change tables between daily/monthly based on their current size). That should also support setting the MODULUS of a HASH partitioned table, to allow changing the size of its partitions (currently, the user would have to more or less recreate the table and move all its data into different partitions, but that's not ideal). I don't know if it's important for anyone, but it would be interesting to think about supporting sub-partitioning: partitions which are themselvese partitioned. Like something => something_ => something__MM => something__MM_DD. You'd need to specify how to partition each layer of the heirarchy. In the most general case, it could be different partition strategy. If you have a callback function for partition renaming, I think you'd want to pass it not just the current name of the partition, but also the "VALUES" used in partition creation. Like (2020-04-05)TO(2020-05-06). Maybe instead, we'd allow setting a "format" to use to construct the partition name. Like "child.foo_bar_%Y_%m_%d". Ideally, the formats would be fixed-length (zero-padded, etc), so failures with length can happen at "parse" time of the statement and not at "run" time of the creation. You'd still have to handle the case that the name already exists but isn't a partition (or is a partition by doesn't handle the incoming tuple for some reason). Also, maybe your "configuration" syntax would allow specifying other values. Maybe including a retention period (as an INTERVAL for RANGE tables). That's useful if you had a command to PRUNE the oldest partitions, like ALTER..PRUNE. -- Justin
Re: Ideas about a better API for postgres_fdw remote estimates
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Per postgres_fdw's get_remote_estimate(), the only data we use right now >> is the startup_cost, total_cost, rows and width estimates from the >> top-level Plan node. That's available immediately from the Plan tree, >> meaning that basically *nothing* of the substantial display effort >> expended by explain.c and ruleutils.c is of any value. So the level-zero > The 'display effort' you're referring to, when using JSON format with > explain, is basically to format the results into JSON and return them- > which is what you're suggesting this mode would do anyway, no..? Not hardly. Spend some time studying ruleutils.c sometime --- reverse-compiling a plan is *expensive*. For instance, we have to look up the names of all the operators used in the query quals, decide what needs quoting, decide what needs parenthesization, etc. There's also a fun little bit that assigns unique aliases to each table appearing in the query, which from memory is at least O(N^2) and maybe worse. (Admittedly, shipped queries are usually not so complicated that N would be large.) And by the way, we're also starting up the executor, even if you didn't say ANALYZE. A little bit of fooling with "perf" suggests that when explaining a pretty simple bitmapscan query --- I used EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995 which ought to be somewhat representative of what postgres_fdw needs --- only about half of the runtime is spent within pg_plan_query, and the other half is spent on explain.c + ruleutils.c formatting work. So while getting rid of that overhead wouldn't be an earthshattering improvement, I think it'd be worthwhile. >> Further down the road, we might want to rethink the whole idea of >> completely constructing a concrete Plan. We could get the data we need >> at the list-of-Paths stage. Even more interesting, we could (with very >> little more work) return data about multiple Paths, so that the client >> could find out, for example, the costs of sorted and unsorted output >> without paying two network round trips to discover that. > I have to admit that I'm not really sure how we could make it work, but > having a way to get multiple paths returned by EXPLAIN would certainly > be interesting to a lot of users. Certainly it's easier to see how we > could get at that info in a postgres_fdw-specific function, and be able > to understand how to deal with it there and what could be done, but once > it's there I wonder if other tools might see that and possibly even > build on it because it'd be the only way to get that kind of info, which > certainly wouldn't be ideal. Yeah, thinking about it as a function that inspects partial planner results, it might be useful for other purposes besides postgres_fdw. As I said before, I don't think this necessarily has to be bundled as part of postgres_fdw. That still doesn't make it part of EXPLAIN. > That postgres_fdw is an extension is almost as much of a wart as > anything being discussed here and suggesting that things added to > postgres_fdw aren't 'core features' seems akin to ignoring the forest > for the trees- I think we just had this discussion in another thread. The fact that postgres_fdw is an extension is a feature, not a bug, because (a) it means that somebody could implement their own version if they wanted it to act differently; and (b) it keeps us honest about whether the APIs needed by an FDW are accessible from outside core. I think moving postgres_fdw into core would be a large step backwards. regards, tom lane
bad status of FETCH PERCENT in commitfest application
Hi I checking state of https://commitfest.postgresql.org/28/2176/ It should be committed, but I don't see a commit? Regards Pavel
Re: bad status of FETCH PERCENT in commitfest application
On 2020-Jul-06, Pavel Stehule wrote: > I checking state of https://commitfest.postgresql.org/28/2176/ > > It should be committed, but I don't see a commit? I reverted it to needs-review. It was marked as committed by me, but the one I did commit by the same author in the same area is FETCH WITH TIES. The confusion is understandable. Thanks for pointing it out -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Ideas about a better API for postgres_fdw remote estimates
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > >> Per postgres_fdw's get_remote_estimate(), the only data we use right now > >> is the startup_cost, total_cost, rows and width estimates from the > >> top-level Plan node. That's available immediately from the Plan tree, > >> meaning that basically *nothing* of the substantial display effort > >> expended by explain.c and ruleutils.c is of any value. So the level-zero > > > The 'display effort' you're referring to, when using JSON format with > > explain, is basically to format the results into JSON and return them- > > which is what you're suggesting this mode would do anyway, no..? > > Not hardly. Spend some time studying ruleutils.c sometime --- > reverse-compiling a plan is *expensive*. For instance, we have to > look up the names of all the operators used in the query quals, > decide what needs quoting, decide what needs parenthesization, etc. Ah, alright, that makes more sense then. > There's also a fun little bit that assigns unique aliases to each > table appearing in the query, which from memory is at least O(N^2) > and maybe worse. (Admittedly, shipped queries are usually not so > complicated that N would be large.) And by the way, we're also > starting up the executor, even if you didn't say ANALYZE. > > A little bit of fooling with "perf" suggests that when explaining > a pretty simple bitmapscan query --- I used > EXPLAIN SELECT * FROM tenk1 WHERE unique1 > 9995 > which ought to be somewhat representative of what postgres_fdw needs > --- only about half of the runtime is spent within pg_plan_query, and > the other half is spent on explain.c + ruleutils.c formatting work. > So while getting rid of that overhead wouldn't be an earthshattering > improvement, I think it'd be worthwhile. Sure. > >> Further down the road, we might want to rethink the whole idea of > >> completely constructing a concrete Plan. We could get the data we need > >> at the list-of-Paths stage. Even more interesting, we could (with very > >> little more work) return data about multiple Paths, so that the client > >> could find out, for example, the costs of sorted and unsorted output > >> without paying two network round trips to discover that. > > > I have to admit that I'm not really sure how we could make it work, but > > having a way to get multiple paths returned by EXPLAIN would certainly > > be interesting to a lot of users. Certainly it's easier to see how we > > could get at that info in a postgres_fdw-specific function, and be able > > to understand how to deal with it there and what could be done, but once > > it's there I wonder if other tools might see that and possibly even > > build on it because it'd be the only way to get that kind of info, which > > certainly wouldn't be ideal. > > Yeah, thinking about it as a function that inspects partial planner > results, it might be useful for other purposes besides postgres_fdw. > As I said before, I don't think this necessarily has to be bundled as > part of postgres_fdw. That still doesn't make it part of EXPLAIN. Providing it as a function rather than through EXPLAIN does make a bit more sense if we're going to skip things like the lookups you mention above. I'm still inclined to have it be a part of core rather than having it as postgres_fdw though. I'm not completely against it being part of postgres_fdw... but I would think that would really be appropriate if it's actually using something in postgres_fdw, but if everything that it's doing is part of core and nothing related specifically to the postgres FDW, then having it as part of core makes more sense to me. Also, having it as part of core would make it more appropriate for other tools to look at and adding that kind of inspection capability for partial planner results could be very interesting for tools like pgAdmin and such. > > That postgres_fdw is an extension is almost as much of a wart as > > anything being discussed here and suggesting that things added to > > postgres_fdw aren't 'core features' seems akin to ignoring the forest > > for the trees- > > I think we just had this discussion in another thread. The fact that > postgres_fdw is an extension is a feature, not a bug, because (a) it > means that somebody could implement their own version if they wanted > it to act differently; and (b) it keeps us honest about whether the > APIs needed by an FDW are accessible from outside core. I think moving > postgres_fdw into core would be a large step backwards. I'm not looking to change it today, as that ship has sailed, but while having FDWs as a general capability that can be implemented by extensions is certainly great and I'd love to see more of that (even better would be more of those that are well maintained and cared for by this community of folks), requiring users to install an extension into every database where they want to query another PG s
Re: bad status of FETCH PERCENT in commitfest application
po 6. 7. 2020 v 17:27 odesílatel Alvaro Herrera napsal: > On 2020-Jul-06, Pavel Stehule wrote: > > > I checking state of https://commitfest.postgresql.org/28/2176/ > > > > It should be committed, but I don't see a commit? > > I reverted it to needs-review. It was marked as committed by me, but > the one I did commit by the same author in the same area is FETCH WITH > TIES. The confusion is understandable. > ok, thank you for info Pavel > Thanks for pointing it out > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: min_safe_lsn column in pg_replication_slots view
On 2020-Jul-04, Amit Kapila wrote: > Fair enough. It would be good if we can come up with something better > than 'distance' for this column. Some ideas safe_wal_limit, > safe_wal_size? Hmm, I like safe_wal_size. I've been looking at this intermittently since late last week and I intend to get it done in the next couple of days. Thanks! -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Global temporary tables
Hi čt 11. 6. 2020 v 4:13 odesílatel 曾文旌 napsal: > > 2020年6月9日 下午8:15,Prabhat Sahu 写道: > > > > On Wed, Apr 29, 2020 at 8:52 AM 曾文旌 wrote: > >> 2020年4月27日 下午9:48,Prabhat Sahu 写道: >> >> Thanks Wenjing, for the fix patch for previous issues. >> I have verified the issues, now those fix look good to me. >> But the below error message is confusing(for gtt2). >> >> postgres=# drop table gtt1; >> ERROR: cannot drop global temp table gtt1 when other backend attached it. >> >> postgres=# drop table gtt2; >> ERROR: cannot drop index idx2 on global temp table gtt2 when other >> backend attached it. >> >> I feel the above error message shown for "DROP TABLE gtt2;" is a bit >> confusing(looks similar to DROP INDEX gtt2;). >> If possible, can we keep the error message simple as "ERROR: cannot >> drop global temp table gtt2 when other backend attached it."? >> I mean, without giving extra information for the index attached to that >> GTT. >> >> Fixed the error message to make the expression more accurate. In v33. >> > > Thanks Wenjing. We verified your latest patch(gtt_v33) focusing on all > reported issues and they work fine. > Thanks. > -- > > > I'm very glad to hear such good news. > I am especially grateful for your professional work on GTT. > Please feel free to let me know if there is anything you think could be > improved. > > > Thanks. > > > Wenjing > this patch needs rebase Regards Pavel > With Regards, > Prabhat Kumar Sahu > EnterpriseDB: http://www.enterprisedb.com > > >
Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
On Wed, Jul 1, 2020 at 5:15 AM Bharath Rupireddy wrote: > If I understand it correctly, your suggestion is to add > keep_connection option and use that while defining the server object. > IMO having keep_connection option at the server object level may not > serve the purpose being discussed here. > For instance, let's say I create a foreign server in session 1 with > keep_connection on, and I want to use that > server object in session 2 with keep_connection off and session 3 with > keep_connection on and so on. > One way we can change the server's keep_connection option is to alter > the server object, but that's not a good choice, > as we have to alter it at the system level. > > Overall, though we define the server object in a single session, it > will be used in multiple sessions, having an > option at the per-server level would not be a good idea. You present this here as if it should be a Boolean (on or off) but I don't see why that should be the case. You can imagine trying to close connections if they have been idle for a certain length of time, or if there are more than a certain number of them, rather than (or in addition to) always/never. Which one is best, and why? I tend to think this is better as an FDW property rather than a core facility, but I'm not 100% sure of that and I think it likely depends somewhat on the answers we choose to the questions in the preceding paragraph. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Allow continuations in "pg_hba.conf" files
In the attached v3, I've tried to clarify comments and doc about tokenization rules relating to comments, strings and continuations. Attached v4 improves comments & doc as suggested by Justin. -- Fabien.diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 5cd88b462d..f1fc17935d 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -77,13 +77,16 @@ The general format of the pg_hba.conf file is a set of records, one per line. Blank lines are ignored, as is any text after the # comment character. - Records cannot be continued across lines. + Lines ending with a backslash are logically continued on the next + line, so a record can span several lines. A record is made up of a number of fields which are separated by spaces and/or tabs. Fields can contain white space if the field value is double-quoted. Quoting one of the keywords in a database, user, or address field (e.g., all or replication) makes the word lose its special meaning, and just match a database, user, or host with that name. + If a continuation occurs within quoted text or a comment, + it is continued. @@ -821,7 +824,7 @@ local db1,db2,@demodbs all md5 map-name system-username database-username - Comments and whitespace are handled in the same way as in + Comments, whitespace and continuations are handled in the same way as in pg_hba.conf. The map-name is an arbitrary name that will be used to refer to this mapping in pg_hba.conf. The other diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c index da5189a4fa..6c3b780ace 100644 --- a/src/backend/libpq/hba.c +++ b/src/backend/libpq/hba.c @@ -166,11 +166,17 @@ pg_isblank(const char c) /* * Grab one token out of the string pointed to by *lineptr. * - * Tokens are strings of non-blank - * characters bounded by blank characters, commas, beginning of line, and - * end of line. Blank means space or tab. Tokens can be delimited by - * double quotes (this allows the inclusion of blanks, but not newlines). - * Comments (started by an unquoted '#') are skipped. + * Tokens are strings of non-blank characters bounded by blank characters, + * commas, beginning of line, and end of line. Blank means space or tab. + * + * Tokens can be delimited by double quotes (this allows the inclusion of + * blanks or '#', but not newlines). If a continuation occurs within the + * quoted text, the quoted text is continued on the next line. There is no + * escape mechanism, thus character '"' cannot be part of a token. + * + * Comments (started by an unquoted '#') are skipped, i.e. the remainder + * of the line is ignored. If a line with a comment is backslash-continued, + * the continued text is part of the comment, thus ignored as well. * * The token, if any, is returned at *buf (a buffer of size bufsz), and * *lineptr is advanced past the token. @@ -486,8 +492,43 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel) char *lineptr; List *current_line = NIL; char *err_msg = NULL; + char *cur = rawline; + int len = sizeof(rawline); + int continuations = 0; - if (!fgets(rawline, sizeof(rawline), file)) + /* read input and handle simple backslash continuations */ + while ((cur = fgets(cur, len, file)) != NULL) + { + int curlen = strlen(cur); + char *curend = cur + curlen - 1; + + if (curlen == len - 1) + { +/* Line too long! */ +ereport(elevel, + errcode(ERRCODE_CONFIG_FILE_ERROR), + errmsg("authentication file line too long"), + errcontext("line %d of configuration file \"%s\"", + line_number + continuations, filename)); +err_msg = "authentication file line too long"; + } + + /* Strip trailing linebreak from rawline */ + while (curend >= cur && (*curend == '\n' || *curend == '\r')) +*curend-- = '\0'; + + /* empty or not a continuation, we are done */ + if (curend < cur || *curend != '\\') +break; + + /* else we have a continuation, just remove it and loop */ + continuations++; + *curend = '\0'; + len -= (curend - cur); + cur = curend; + } + + if (cur == NULL) { int save_errno = errno; @@ -501,21 +542,6 @@ tokenize_file(const char *filename, FILE *file, List **tok_lines, int elevel) filename, strerror(save_errno)); rawline[0] = '\0'; } - if (strlen(rawline) == MAX_LINE - 1) - { - /* Line too long! */ - ereport(elevel, - (errcode(ERRCODE_CONFIG_FILE_ERROR), - errmsg("authentication file line too long"), - errcontext("line %d of configuration file \"%s\"", -line_number, filename))); - err_msg = "authentication file line too long"; - } - - /* Strip trailing linebreak from rawline */ - lineptr = rawline + strlen(rawline) - 1; - while (lineptr >= rawline && (*lineptr == '\n' || *lineptr == '\r')) - *lineptr-- = '\0'; /* Parse fields */
Re: Proposal: Automatic partition creation
On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova wrote: > CREATE TABLE ... PARTITION BY partition_method (list_of_columns) > partition_auto_create_clause > > where partition_auto_create_clause is > > CONFIGURATION [IMMEDIATE| DEFERRED] USING partition_bound_spec > > and partition_bound_spec is: > > MODULUS integer | VALUES IN (expr [,...]) [, ] | INTERVAL > range_step FROM range_start TO range_end Might be good to compare this to what other databases support. > - IMMEDIATE| DEFERRED is optional, DEFERRED is not implemented yet > I wonder, is it worth placing a stub for dynamic partitioning, or we can > rather add these keywords later. I think we should not add any keywords we don't need immediately - and should seek to minimize the number of new keywords that we need to add, though compatibility with other implementations might be a good reason for accepting some new ones. > - HASH and LIST static partitioning works as expected. > Testing and feedback are welcome. > > - RANGE partitioning is not really implemented in this patch. > Now it only accepts interval data type as 'interval' and respectively > date types as range_start and range_end expressions. > Only one partition is created. I found it difficult to implement the > generation of bounds using internal functions and data types. > Both existing solutions (pg_pathman and pg_partman) rely on SQL level > routines [2]. > I am going to implement this via SPI, which allow to simplify checks and > calculations. Do you see any pitfalls in this approach? I don't really see why we need SPI here. Why can't we just try to evaluate the impression and see if we get a constant of the right type, then use that? I think the big problem here is identifying the operator to use. We have no way of identifying the "plus" or "minus" operator associated with a datatype; indeed, that constant doesn't exist. So either we (a) limit this to a short list of data types and hard-code the operators to be used (which is kind of sad given how extensible our type system is) or we (b) invent some new mechanism for identifying the +/- operators that should be used for a datatype, which was also proposed in the context of some previous discussion of window framing options, but which I don't think ever went anywhere (which is a lot of work) or we (c) just look for operators called '+' and/or '-' by operator name (which will probably make Tom throw up in his mouth a little). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Proposal: Automatic partition creation
Robert Haas writes: > On Mon, Jul 6, 2020 at 6:46 AM Anastasia Lubennikova > wrote: >> I am going to implement this via SPI, which allow to simplify checks and >> calculations. Do you see any pitfalls in this approach? > I don't really see why we need SPI here. I would vote against any core facility that is implemented via SPI queries. It is just too darn hard to control the semantics completely in the face of fun stuff like varying search_path. Look at what a mess the queries generated by the RI triggers are --- and they only have a very small set of behaviors to worry about. I'm still only about 95% confident they don't have security issues, too. If you're using SPI to try to look up appropriate operators, I think the chances of being vulnerable to security problems are 100%. > I think the big problem here is identifying the operator to use. We > have no way of identifying the "plus" or "minus" operator associated > with a datatype; indeed, that constant doesn't exist. We did indeed solve this in connection with window functions, cf 0a459cec9. I may be misunderstanding what the problem is here, but I think trying to reuse that infrastructure might help. regards, tom lane
Multi-byte character case-folding
Hi, At the moment, only single-byte characters in identifiers are case-folded, and multi-byte characters are not. For example, abĉDĚF is case-folded to "abĉdĚf". This can be referred to as "abĉdĚf" or "ABĉDĚF", but not "abĉděf" or "ABĈDĚF". downcase_identifier() has the following comment: /* * SQL99 specifies Unicode-aware case normalization, which we don't yet * have the infrastructure for. Instead we use tolower() to provide a * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and * 'I'). Our current compromise is to use tolower() for characters with * the high bit set, as long as they aren't part of a multi-byte * character, and use an ASCII-only downcasing for 7-bit characters. */ So my question is, do we yet have the infrastructure to make case-folding consistent across all character widths? Thanks Thom
Question: PostgreSQL on Amazon linux EC2
Hi Postgres team, I would like to know if PostgreSQL can be installed and used without any issues on Amazon Linux EC2 machines. https://www.postgresql.org/docs/11/supported-platforms.html I was going through the documentation and couldn't find very specific details related to support. Any input will be much helpful. Warm regards, Ajay
Re: Multi-byte character case-folding
Thom Brown writes: > At the moment, only single-byte characters in identifiers are > case-folded, and multi-byte characters are not. > ... > So my question is, do we yet have the infrastructure to make > case-folding consistent across all character widths? We still lack any built-in knowledge about this, and would have to rely on libc, which means the results would likely be platform-dependent and probably LC_CTYPE-dependent. More generally, I'd be mighty hesitant to change this behavior after it's stood for so many years. I suspect more people would complain that we broke their application than would be happy about it. Having said that, we are already relying on towlower() in places, and could do similarly here if we didn't care about the above issues. regards, tom lane
Re: Question: PostgreSQL on Amazon linux EC2
Em seg., 6 de jul. de 2020 às 21:55, Ajay Patel escreveu: > Hi Postgres team, > > I would like to know if PostgreSQL can be installed and used without any > issues on Amazon Linux EC2 machines. > Yes you can, but not with the repositories at yum.postgresql.org. There's a dependency of a package that only exists on RHEL or CentOS and fail. > > https://www.postgresql.org/docs/11/supported-platforms.html > > I was going through the documentation and couldn't find very specific > details related to support. > > Any input will be much helpful. > You'll be able to : - compile PostgreSQL - download and install rpm packages by hand - use a Amazon provided repo that installs more recent Postgres versions - actually not up-to-date. After struggling with the points above I decided just to use CentOS on EC2. It works perfectly and from CentOS 7 and up it is supported by AWS on all instance types and their exquisite hardware like network interfaces for EBS performance. Flavio Gurgel
Re: Question: PostgreSQL on Amazon linux EC2
Thank you Flavio, this is helpful. Have you faced any other challenges or any other problems after installing Postgres? Warm regards, Ajay On Mon, Jul 6, 2020 at 4:45 PM Flavio Henrique Araque Gurgel < fha...@gmail.com> wrote: > > Em seg., 6 de jul. de 2020 às 21:55, Ajay Patel > escreveu: > >> Hi Postgres team, >> >> I would like to know if PostgreSQL can be installed and used without any >> issues on Amazon Linux EC2 machines. >> > > Yes you can, but not with the repositories at yum.postgresql.org. There's > a dependency of a package that only exists on RHEL or CentOS and fail. > > >> >> https://www.postgresql.org/docs/11/supported-platforms.html >> >> I was going through the documentation and couldn't find very specific >> details related to support. >> >> Any input will be much helpful. >> > > You'll be able to : > - compile PostgreSQL > - download and install rpm packages by hand > - use a Amazon provided repo that installs more recent Postgres versions > - actually not up-to-date. > > After struggling with the points above I decided just to use CentOS on > EC2. It works perfectly and from CentOS 7 and up it is supported by AWS on > all instance types and their exquisite hardware like network interfaces for > EBS performance. > > Flavio Gurgel > > >
Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...
On 2020-Jul-05, Anna Akenteva wrote: > -- Swapping primary key's index for an equivalent index, > -- but with INCLUDE-d attributes. > CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info); > ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX > new_idx; > ALTER TABLE referencing_tbl ALTER CONSTRAINT referencing_tbl_id_ref_fkey > USING INDEX new_idx; How is this state represented by pg_dump? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Change a constraint's index - ALTER TABLE ... ALTER CONSTRAINT ... USING INDEX ...
Alvaro Herrera writes: > On 2020-Jul-05, Anna Akenteva wrote: >> -- Swapping primary key's index for an equivalent index, >> -- but with INCLUDE-d attributes. >> CREATE UNIQUE INDEX new_idx ON target_tbl (id) INCLUDE (info); >> ALTER TABLE target_tbl ALTER CONSTRAINT target_tbl_pkey USING INDEX >> new_idx; >> ALTER TABLE referencing_tbl ALTER CONSTRAINT referencing_tbl_id_ref_fkey >> USING INDEX new_idx; > How is this state represented by pg_dump? Even if it's possible to represent, I think we should flat out reject this "feature". Primary keys that aren't primary keys don't seem like a good idea. For one thing, it won't be possible to describe the constraint accurately in the information_schema. regards, tom lane
Re: Multi-byte character case-folding
On 2020-Jul-06, Tom Lane wrote: > More generally, I'd be mighty hesitant to change this behavior after > it's stood for so many years. I suspect more people would complain > that we broke their application than would be happy about it. > > Having said that, we are already relying on towlower() in places, > and could do similarly here if we didn't care about the above issues. I think the fact that identifiers fail to follow language-specific case folding rules is more a known gotcha than a desired property, but on principle I tend to agree that Turkish people would not be happy about the prospect of us changing the downcasing rule in a major release -- it would mean having to edit any affected application code as part of a pg_upgrade process, which is not great. Now you could say that this can be fixed by adding a GUC that preserves the old behavior, but generally we don't like that too much. The counter argument is that there are more future users than there are current users. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: posgres 12 bug (partitioned table)
Hello, ccing pgsql-hack...@postgresql.org Upon investigation, it seems that the problem is caused by the following: The slot passed to the call to ExecProcessReturning() inside ExecInsert() is often a virtual tuple table slot. If there are system columns other than ctid and tableOid referenced in the RETURNING clause (not only xmin as in the bug report), it will lead to the ERROR as mentioned in this thread as virtual tuple table slots don't really store such columns. (ctid and tableOid are present directly in the TupleTableSlot struct and can be satisfied from there: refer: slot_getsysattr())) I have attached two alternate patches to solve the problem. Both patches use and share a mechanism to detect if there are any such system columns. This is done inside ExecBuildProjectionInfo() and we store this info inside the ProjectionInfo struct. Then based on this info, system columns are populated in a suitable slot, which is then passed on to ExecProcessReturning(). (If it is deemed that this operation only be done for RETURNING, we can just as easily do it in the callsite for ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead of doing it inside ExecBuildProjectionInfo()) The first patch [1] explicitly creates a heap tuple table slot, fills in the system column values as we would do during heap_prepare_insert() and then passes that slot to ExecProcessReturning(). (We use a heap tuple table slot as it is guaranteed to support these attributes). The second patch [2] instead of relying on a heap tuple table slot, relies on ExecGetReturningSlot() for the right slot and table_tuple_fetch_row_version() to supply the system column values. It does make the assumption that the AM would supply a slot that will have these system columns. [1] v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch [2] v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch Regards, Soumyadeep (VMware) From baa48a89e571405f4cd802f065d0f82131599f53 Mon Sep 17 00:00:00 2001 From: soumyadeep2007 Date: Sun, 5 Jul 2020 10:23:30 -0700 Subject: [PATCH v1 1/1] Explicitly supply system columns for INSERT..RETURNING MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If there are system columns other than ctid and tableOid in the RETURNING clause for an INSERT, we must ensure that the slot we pass to ExecProcessReturning() has those columns (ctid and tableOid are present directly in the TupleTableSlot struct and can be satisifed from there: refer: slot_getsysattr)). This is in response to bug #16446 reported by Георгий Драк. In the bug an INSERT..RETURNING statement fails with: "ERROR: virtual tuple table slot does not have system attributes" This is caused due to the fact that ExecProcessReturning() was fed with a virtual tuple table slot. Thus the resultant expression evaluation and by extension, call to ExecEvalSysVar(), blows up with the aforementioned error. So, we fix this by: 1. Determining if there are system columns (other than tableOid and ctid) referenced in ExecBuildProjectionInfo() and we store this info inside the ProjectionInfo struct. 2. If there are such system columns in RETURNING, we explicitly create a heap tuple table slot, fill in the system column values as we would do during heap_prepare_insert() and then pass that slot to ExecProcessReturning(). We use a heap tuple table slot as it is guaranteed to support these attributes. --- src/backend/executor/execExpr.c| 19 ++- src/backend/executor/nodeModifyTable.c | 23 +++ src/include/nodes/execnodes.h | 2 ++ 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 236413f62a..8cd966d227 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -360,10 +360,12 @@ ExecBuildProjectionInfo(List *targetList, ExprState *state; ExprEvalStep scratch = {0}; ListCell *lc; + List *tlist_vars; projInfo->pi_exprContext = econtext; /* We embed ExprState into ProjectionInfo instead of doing extra palloc */ projInfo->pi_state.tag = T_ExprState; + projInfo->has_non_slot_system_cols = false; state = &projInfo->pi_state; state->expr = (Expr *) targetList; state->parent = parent; @@ -455,7 +457,6 @@ ExecBuildProjectionInfo(List *targetList, */ ExecInitExprRec(tle->expr, state, &state->resvalue, &state->resnull); - /* * Column might be referenced multiple times in upper nodes, so * force value to R/O - but only if it could be an expanded datum. @@ -469,6 +470,22 @@ ExecBuildProjectionInfo(List *targetList, } } + /* Record system columns that are part of this projection */ + tlist_vars = pull_var_clause((Node *) targetList, + PVC_RECURSE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_INCLUDE_PLACEHOLDERS); + foreach(lc, tlist_vars) + { + Var *var = (Var *) lfirst
Re: Binary support for pgoutput plugin
On Mon, 6 Jul 2020 at 09:35, Dave Cramer wrote: > > > On Mon, 6 Jul 2020 at 09:03, Daniel Gustafsson wrote: > >> > On 6 Jul 2020, at 14:58, Dave Cramer wrote: >> >> > as far as rebase -i do what is advised here for squashing them. Just >> one patch now ? >> >> One patch per logical change, if there are two disjoint changes in the >> patchset >> where one builds on top of the other then multiple patches are of course >> fine. >> My personal rule-of-thumb is to split it the way I envision it committed. >> > > At this point it is the result of 3 rebases. I guess I'll have to break it > up differently.. > > OK, rebased it down to 2 patches, attached. > Thanks, > > Dave > 0001-Add-binary-protocol-for-publications-and-subscriptio.patch Description: Binary data 0002-document-new-binary-option-for-CREATE-SUBSCRIPTION.patch Description: Binary data
Re: Multi-byte character case-folding
Alvaro Herrera writes: > On 2020-Jul-06, Tom Lane wrote: >> More generally, I'd be mighty hesitant to change this behavior after >> it's stood for so many years. I suspect more people would complain >> that we broke their application than would be happy about it. > I think the fact that identifiers fail to follow language-specific case > folding rules is more a known gotcha than a desired property, but on > principle I tend to agree that Turkish people would not be happy about > the prospect of us changing the downcasing rule in a major release -- it > would mean having to edit any affected application code as part of a > pg_upgrade process, which is not great. It's not just the Turks. As near as I can tell, we'd likely break *every* app that's using such identifiers. For example, supposing I do test=# create table MYÉCLASS (f1 text); CREATE TABLE test=# \dt List of relations Schema | Name | Type | Owner +--+---+-- public | myÉclass | table | postgres (1 row) pg_dump will render this as CREATE TABLE public."myÉclass" ( f1 text ); If we start to case-fold É, then the only way to access this table will be by double-quoting its name, which the application probably is not expecting (else it would have double-quoted in the original CREATE TABLE). > Now you could say that this can be fixed by adding a GUC that preserves > the old behavior, but generally we don't like that too much. Yes, a GUC changing this would be a headache. It would be just as much of a compatibility and security hazard as standard_conforming_strings (which indeed I've been thinking of proposing that we get rid of; it's hung around long enough). > The counter argument is that there are more future users than there are > current users. Especially if we drive away the current users :-(. In practice, we've heard very very few complaints about this, so my gut says to leave it alone. regards, tom lane
Re: min_safe_lsn column in pg_replication_slots view
On 2020-Jul-06, Alvaro Herrera wrote: > Hmm, I like safe_wal_size. > > I've been looking at this intermittently since late last week and I > intend to get it done in the next couple of days. I propose the attached. This is pretty much what was proposed by Kyotaro, but I made a couple of changes. Most notably, I moved the calculation to the view code itself rather than creating a function in xlog.c, mostly because it seemed to me that the new function was creating an abstraction leakage without adding any value; also, if we add per-slot size limits later, it would get worse. The other change was to report negative values when the slot becomes unreserved, rather than zero. It shows how much beyond safety your slots are getting, so it seems useful. Clamping at zero seems to serve no purpose. I also made it report null immediately when slots are in state lost. But beware of slots that appear lost but fall in the unreserved category because they advanced before checkpointer signalled them. (This case requires a debugger to hit ...) One thing that got my attention while going over this is that the error message we throw when making a slot invalid is not very helpful; it doesn't say what the current insertion LSN was at that point. Maybe we should add that? (As a separate patch, of couse.) Any more thoughts? If not, I'll get this pushed tomorrow finally. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From bd65ed10b25429fea8776bf9bc38c82a3544a3fa Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 3 Jul 2020 17:25:00 -0400 Subject: [PATCH] Morph pg_replication_slots.min_safe_lsn to safe_wal_size MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous definition of the column almost universally disliked, so provide this updated definition which is more useful for monitoring purposes: a large positive value is good, while zero or a negative value means danger. This is very easy to monitor. FIXME -- catversion bump required Author: Kyotaro Horiguchi Author: Álvaro Herrera Reported-by: Fujii Masao Reviewed-by: XXX Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916...@oss.nttdata.com --- doc/src/sgml/catalogs.sgml| 7 ++-- src/backend/access/transam/xlog.c | 3 +- src/backend/catalog/system_views.sql | 2 +- src/backend/replication/slotfuncs.c | 42 --- src/include/catalog/pg_proc.dat | 4 +-- src/test/recovery/t/019_replslot_limit.pl | 22 ++-- src/test/regress/expected/rules.out | 4 +-- 7 files changed, 51 insertions(+), 33 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 003d278370..361793b337 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -11275,10 +11275,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx - min_safe_lsn pg_lsn + safe_wal_size int8 - The minimum LSN currently available for walsenders. + The number of bytes that can be written to WAL such that this slot + is not in danger of getting in state "lost". It is NULL for lost + slots, as well as if max_slot_wal_keep_size + is -1. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index fd93bcfaeb..2f6d67592a 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9513,8 +9513,7 @@ GetWALAvailability(XLogRecPtr targetLSN) XLogSegNo targetSeg; /* segid of targetLSN */ XLogSegNo oldestSeg; /* actual oldest segid */ XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */ - XLogSegNo oldestSlotSeg = InvalidXLogRecPtr; /* oldest segid kept by - * slot */ + XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */ uint64 keepSegs; /* diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 5314e9348f..73676d04cf 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -879,7 +879,7 @@ CREATE VIEW pg_replication_slots AS L.restart_lsn, L.confirmed_flush_lsn, L.wal_status, -L.min_safe_lsn +L.safe_wal_size FROM pg_get_replication_slots() AS L LEFT JOIN pg_database D ON (L.datoid = D.oid); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 88033a79b2..0ddf20e858 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -242,6 +242,7 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) Tuplestorestate *tupstore; MemoryContext per_query_ctx; MemoryContext oldcontext; + XLogRecPtr currlsn; int slotno; /* check to see if caller supports us returning a tuplestore */ @@ -274,6 +275,8
Re: Parallel worker hangs while handling errors.
On Fri, Jul 3, 2020 at 2:40 PM vignesh C wrote: > > The Attached patch has the fix for the same. > I have added a commitfest entry for this bug: https://commitfest.postgresql.org/29/2636/ Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Resetting spilled txn statistics in pg_stat_replication
On Mon, 6 Jul 2020 at 20:45, Amit Kapila wrote: > > On Thu, Jul 2, 2020 at 9:01 AM Masahiko Sawada > wrote: > > > > I've attached PoC patch that implements a simple approach. I'd like to > > discuss how we collect the replication slot statistics in the stats > > collector before I bring the patch to completion. > > > > In this PoC patch, we have the array of PgStat_ReplSlotStats struct > > which has max_replication_slots entries. The backend and wal sender > > send the slot statistics to the stats collector when decoding a commit > > WAL record. > > > > typedef struct PgStat_ReplSlotStats > > { > > charslotname[NAMEDATALEN]; > > PgStat_Counter spill_txns; > > PgStat_Counter spill_count; > > PgStat_Counter spill_bytes; > > } PgStat_ReplSlotStats; > > > > What I'd like to discuss are: > > > > Since the unique identifier of replication slots is the name, the > > process sends slot statistics along with slot name to stats collector. > > I'm concerned about the amount of data sent to the stats collector and > > the cost of searching the statistics within the statistics array (it’s > > O(N) where N is max_replication_slots). Since the maximum length of > > slot name is NAMEDATALEN (64 bytes default) and max_replication_slots > > is unlikely a large number, I might be too worrying but it seems like > > it’s better to avoid that if we can do that easily. An idea I came up > > with is to use the index of slots (i.g., the index of > > ReplicationSlotCtl->replication_slots[]) as the index of statistics of > > slot in the stats collector. But since the index of slots could change > > after the restart we need to synchronize the index of slots on both > > array somehow. So I thought that we can determine the index of the > > statistics of slots at ReplicationSlotAcquire() or > > ReplicationSlotCreate(), but it will in turn need to read stats file > > while holding ReplicationSlotControlLock to prevent the same index of > > the statistics being used by the concurrent process who creating a > > slot. I might be missing something though. > > > > I don't think we should be bothered about the large values of > max_replication_slots. The default value is 10 and I am not sure if > users will be able to select values large enough that we should bother > about searching them by name. I think if it could turn out to be a > problem then we can try to invent something to mitigate it. Agreed. > > > Second, as long as the unique identifier is the slot name there is no > > convenient way to distinguish between the same name old and new > > replication slots, so the backend process or wal sender process sends > > a message to the stats collector to drop the replication slot at > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > table, index, and function statistics. It might not be a problem but > > I’m thinking a better way. > > > > Can we rely on message ordering in the transmission mechanism (UDP) > for stats? The wiki suggests [1] we can't. If so, then this might > not work. Yeah, I'm also concerned about this. Another idea would be to have another unique identifier to distinguish old and new replication slots with the same name. For example, creation timestamp. And then we reclaim the stats of unused slots later like table and function statistics. On the other hand, if the ordering were to be reversed, we would miss that stats but the next stat reporting would create the new entry. If the problem is unlikely to happen in common case we can live with that. > > The new view name is also an open question. I prefer > > pg_stat_replication_slots and to add stats of physical replication > > slots to the same view in the future, rather than a separate view. > > > > This sounds okay to me. > > > Aside from the above, this patch will change the most of the changes > > introduced by commit 9290ad198b1 and introduce new code much. I’m > > concerned whether such changes are acceptable at the time of beta 2. > > > > I think it depends on the final patch. My initial thought was that we > should do this for PG14 but if you are suggesting removing the changes > done by commit 9290ad198b1 then we need to think over it. I could > think of below options: > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > were anyway having spilling without any work in PG13 but didn’t have > stats. > b. Try to get your patch in PG13 if we can, otherwise, revert the > feature 9290ad198b1. > c. Get whatever we have in commit 9290ad198b1 for PG13 and > additionally have what we are discussing here for PG14. This means > that spilled stats at slot level will be available in PG14 via > pg_stat_replication_slots and for individual WAL senders it will be > available via pg_stat_replication both in PG13 and PG14. Even if we > can get your patch in PG13, we can still keep those in > pg_stat_replication. > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > for PG14.
Re: min_safe_lsn column in pg_replication_slots view
Thanks! At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera wrote in > On 2020-Jul-06, Alvaro Herrera wrote: > > > Hmm, I like safe_wal_size. I agree to the name, too. > > I've been looking at this intermittently since late last week and I > > intend to get it done in the next couple of days. > > I propose the attached. This is pretty much what was proposed by > Kyotaro, but I made a couple of changes. Most notably, I moved the > calculation to the view code itself rather than creating a function in > xlog.c, mostly because it seemed to me that the new function was > creating an abstraction leakage without adding any value; also, if we > add per-slot size limits later, it would get worse. I'm not sure that detailed WAL segment calculation fits slotfuncs.c but I don't object to the change. However if we do that: + /* determine how many segments slots can be kept by slots ... */ + keepSegs = max_slot_wal_keep_size_mb / (wal_segment_size / (1024 * 1024)); Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and use it intead of the bare expression? > The other change was to report negative values when the slot becomes > unreserved, rather than zero. It shows how much beyond safety your > slots are getting, so it seems useful. Clamping at zero seems to serve > no purpose. The reason for the clamping is the signedness of the values, or integral promotion. However, I believe the calculation cannot go beyond the range of signed long so the signedness conversion in the patch looks fine. > I also made it report null immediately when slots are in state lost. > But beware of slots that appear lost but fall in the unreserved category > because they advanced before checkpointer signalled them. (This case > requires a debugger to hit ...) Oh! Okay, that change seems right to me. > One thing that got my attention while going over this is that the error > message we throw when making a slot invalid is not very helpful; it > doesn't say what the current insertion LSN was at that point. Maybe we > should add that? (As a separate patch, of couse.) It sounds helpful to me. (I remember that I sometime want to see checkpoint LSNs in server log..) > Any more thoughts? If not, I'll get this pushed tomorrow finally. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: A patch for get origin from commit_ts.
>That was fast, thanks. I have not tested the patch, but there are >two things I missed a couple of hours back. Why do you need >pg_last_committed_xact_with_origin() to begin with? Wouldn't it be >more simple to just add a new column to pg_last_committed_xact() for >the replication origin? Contrary to pg_xact_commit_timestamp() that >should not be broken for compatibility reasons because it returns only >one value, we don't have this problem with pg_last_committed_xact() as >it already returns one tuple with two values. Yes make sense, changed in new patch. >+{ oid => '4179', descr => 'get commit origin of a transaction', >A second thing is that the OID of the new function should be in the >range 8000.., as per the policy introduced in commit a6417078. >src/include/catalog/unused_oids can be used to pick up a value. Thanks, very helpful information and I have follow that. Regards, Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca EMAIL: mailto:movead(dot)li(at)highgo(dot)ca get_origin_from_commit_ts_v5.patch Description: Binary data
change a function name in a comment correctly
Hi, There is the comment which related function name is not same. I attached the patch to fix it. Please review. Regards, -- Masahiro Ikeda NTT DATA CORPORATIONdiff --git a/src/backend/utils/cache/relfilenodemap.c b/src/backend/utils/cache/relfilenodemap.c index 68b01ca68f..3acda32d17 100644 --- a/src/backend/utils/cache/relfilenodemap.c +++ b/src/backend/utils/cache/relfilenodemap.c @@ -82,7 +82,7 @@ RelfilenodeMapInvalidateCallback(Datum arg, Oid relid) } /* - * RelfilenodeMapInvalidateCallback + * InitializeRelfilenodeMap * Initialize cache, either on first use or after a reset. */ static void
Re: change a function name in a comment correctly
On 2020/07/07 11:50, Masahiro Ikeda wrote: Hi, There is the comment which related function name is not same. I attached the patch to fix it. Please review. Thanks for the report and patch! LGTM. I will commit this later. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: min_safe_lsn column in pg_replication_slots view
On 2020-Jul-07, Kyotaro Horiguchi wrote: > At Mon, 6 Jul 2020 20:54:36 -0400, Alvaro Herrera > wrote in > > I propose the attached. This is pretty much what was proposed by > > Kyotaro, but I made a couple of changes. Most notably, I moved the > > calculation to the view code itself rather than creating a function in > > xlog.c, mostly because it seemed to me that the new function was > > creating an abstraction leakage without adding any value; also, if we > > add per-slot size limits later, it would get worse. > > I'm not sure that detailed WAL segment calculation fits slotfuncs.c > but I don't object to the change. However if we do that: > > + /* determine how many segments slots can be kept by > slots ... */ > + keepSegs = max_slot_wal_keep_size_mb / > (wal_segment_size / (1024 * 1024)); > > Couldn't we move ConvertToXSegs from xlog.c to xlog_ingernals.h and > use it intead of the bare expression? I was of two minds about that, and the only reason I didn't do it is that we'll need to give it a better name if we do it ... I'm open to suggestions. > > The other change was to report negative values when the slot becomes > > unreserved, rather than zero. It shows how much beyond safety your > > slots are getting, so it seems useful. Clamping at zero seems to serve > > no purpose. > > The reason for the clamping is the signedness of the values, or > integral promotion. However, I believe the calculation cannot go > beyond the range of signed long so the signedness conversion in the > patch looks fine. Yeah, I think the negative values are useful to see. I think if you ever get close to 2^62, you're in much more serious trouble anyway :-) But I don't deny that the math there could be subject of overflow issues. If you want to verify, please be my guest ... > > One thing that got my attention while going over this is that the error > > message we throw when making a slot invalid is not very helpful; it > > doesn't say what the current insertion LSN was at that point. Maybe we > > should add that? (As a separate patch, of couse.) > > It sounds helpful to me. (I remember that I sometime want to see > checkpoint LSNs in server log..) Hmm, ... let's do that for pg14! Thanks for looking, -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Resetting spilled txn statistics in pg_stat_replication
On Tue, Jul 7, 2020 at 7:07 AM Masahiko Sawada wrote: > > On Mon, 6 Jul 2020 at 20:45, Amit Kapila wrote: > > > > > Second, as long as the unique identifier is the slot name there is no > > > convenient way to distinguish between the same name old and new > > > replication slots, so the backend process or wal sender process sends > > > a message to the stats collector to drop the replication slot at > > > ReplicationSlotDropPtr(). This strategy differs from what we do for > > > table, index, and function statistics. It might not be a problem but > > > I’m thinking a better way. > > > > > > > Can we rely on message ordering in the transmission mechanism (UDP) > > for stats? The wiki suggests [1] we can't. If so, then this might > > not work. > > Yeah, I'm also concerned about this. Another idea would be to have > another unique identifier to distinguish old and new replication slots > with the same name. For example, creation timestamp. And then we > reclaim the stats of unused slots later like table and function > statistics. > So, we need to have 'creation timestamp' as persistent data for slots to achieve this? I am not sure of adding creation_time as a parameter to identify for this case because users can change timings on systems so it might not be a bullet-proof method but I agree that it can work in general. > On the other hand, if the ordering were to be reversed, we would miss > that stats but the next stat reporting would create the new entry. If > the problem is unlikely to happen in common case we can live with > that. > Yeah, that is a valid point and I think otherwise also some UDP packets can be lost so maybe we don't need to worry too much about this. I guess we can add a comment in the code for such a case. > > > > > Aside from the above, this patch will change the most of the changes > > > introduced by commit 9290ad198b1 and introduce new code much. I’m > > > concerned whether such changes are acceptable at the time of beta 2. > > > > > > > I think it depends on the final patch. My initial thought was that we > > should do this for PG14 but if you are suggesting removing the changes > > done by commit 9290ad198b1 then we need to think over it. I could > > think of below options: > > a. Revert 9290ad198b1 and introduce stats for spilling in PG14. We > > were anyway having spilling without any work in PG13 but didn’t have > > stats. > > b. Try to get your patch in PG13 if we can, otherwise, revert the > > feature 9290ad198b1. > > c. Get whatever we have in commit 9290ad198b1 for PG13 and > > additionally have what we are discussing here for PG14. This means > > that spilled stats at slot level will be available in PG14 via > > pg_stat_replication_slots and for individual WAL senders it will be > > available via pg_stat_replication both in PG13 and PG14. Even if we > > can get your patch in PG13, we can still keep those in > > pg_stat_replication. > > d. Get whatever we have in commit 9290ad198b1 for PG13 and change it > > for PG14. I don't think this will be a popular approach. > > I was thinking option (a) or (b). I'm inclined to option (a) since the > PoC patch added a certain amount of new codes. I agree with you that > it depends on the final patch. > Magnus, Tomas, others, do you have any suggestions on the above options or let us know if you have any other option in mind? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Implementing Incremental View Maintenance
Thanks for the patch! > Query checks for following restrictions are added: Are all known supported cases listed below? > - inheritance parent table > ... > - targetlist containing IVM column > - simple subquery is only supported > How to understand 3 items above? - Best Regards Andy Fan
Re: pg_resetwal --next-transaction-id may cause database failed to restart.
>ISTM that a reasonable compromise is that if you use -x (or -c, -m, -O) >and the input value is outside the range supported by existing files, >then it's a fatal error; unless you use --force, which turns it into >just a warning. I do not think '--force' is a good choice, so I add a '--test, -t' option to force to write a unsafe value to pg_control. Do you think it is an acceptable method? Regards, Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca EMAIL: mailto:movead(dot)li(at)highgo(dot)ca pg_resetwal_transaction_limit_v2.patch Description: Binary data
Re: Default setting for enable_hashagg_disk (hash_mem)
On Mon, 2020-07-06 at 15:59 +0530, Amit Kapila wrote: > I agree that it's good to wait for actual problems. But the > > challenge > > is that we can't backport an added GUC. > > > > Is it because we won't be able to edit existing postgresql.conf file > or for some other reasons? Perhaps "can't" was too strong of a word, but I think it would be unprecedented to introduce a GUC in a minor version. It could be a source of confusion. If others think that adding a GUC in a minor version would be acceptable, please let me know. Regards, Jeff Davis
Re: Default setting for enable_hashagg_disk (hash_mem)
On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote: > Where does that leave the hash_mem idea (or some other similar > proposal)? hash_mem is acceptable to me if the consensus is moving toward that, but I'm not excited about it. It would be one thing if hash_mem was a nice clean solution. But it doesn't seem like a clean solution to me; and it's likely that it will get in the way of the next person who tries to improve the work_mem situation. > I think that we should offer something like hash_mem that can work as > a multiple of work_mem, for the reason that Justin mentioned > recently. > This can be justified as something that more or less maintains some > kind of continuity with the old design. Didn't Justin argue against using a multiplier? https://postgr.es/m/20200703024649.gj4...@telsasoft.com > I think that it should affect hash join too, though I suppose that > that part might be controversial -- that is certainly more than an > escape hatch for this particular problem. Any thoughts on that? If it's called hash_mem, then I guess it needs to affect HJ. If not, it should have a different name. > There are several reasons to get rid of work_mem entirely in the > medium to long term. Some relatively obvious, others less so. Agreed. It seems like the only argument against the escape hatch GUCs is that they are cruft and we will end up stuck with them. But if we are dispensing with work_mem in a few releases, surely we'd need to dispense with hash_mem or the proposed escape-hatch GUCs anyway. > An example in the latter category is "hash teams" [1]: a design that > teaches multiple hash operations (e.g. a hash join and a hash > aggregate that hash on the same columns) to cooperate in processing > their inputs. Cool! It would certainly be nice to share the partitioning work between a HashAgg and a HJ. Regards, Jeff Davis
Re: Default setting for enable_hashagg_disk (hash_mem)
On Mon, Jul 06, 2020 at 09:57:08PM -0700, Jeff Davis wrote: > > I think that we should offer something like hash_mem that can work as > > a multiple of work_mem, for the reason that Justin mentioned > > recently. > > This can be justified as something that more or less maintains some > > kind of continuity with the old design. > > Didn't Justin argue against using a multiplier? > https://postgr.es/m/20200703024649.gj4...@telsasoft.com I recanted. https://www.postgresql.org/message-id/20200703145620.gk4...@telsasoft.com -- Justin
Re: A patch for get origin from commit_ts.
On Tue, Jul 07, 2020 at 10:02:29AM +0800, movead...@highgo.ca wrote: > Thanks, very helpful information and I have followed that. Cool, thanks. I have gone through your patch in details, and updated it as the attached. Here are some comments. '8000' as OID for the new function was not really random, so to be fair with the other patches, I picked up the first random value unused_oids has given me instead. There were some indentation issues, and pgindent got that fixed. I think that it is better to use "replication origin" in the docs instead of just origin. I have kept "origin" in the functions for now as that sounded cleaner to me, but we may consider using something like "reporigin" as well as attribute name. The tests could just use tstzrange() to make sure that the timestamps have valid values, so I have switched to that, and did not resist to do the same in the existing tests. +-- Test when it can not find the transaction +SELECT * FROM pg_xact_commit_timestamp_origin((:'txid_set_origin'::text::int + 10)::text::xid) x; This test could become unstable, particularly if it gets used in a parallel environment, so I have removed it. Perhaps I am just over-pessimistic here though.. As a side note, I think that we could just remove the alternate output of commit_ts/, as it does not really get used because of the NO_INSTALLCHECK present in the module's Makefile. That would be the job of a different patch, so I have updated it accordingly. Glad to see that you did not forget to adapt it in your own patch. (The change in catversion.h is a self-reminder...) -- Michael diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h index 54518cd40e..ee58586569 100644 --- a/src/include/catalog/catversion.h +++ b/src/include/catalog/catversion.h @@ -53,6 +53,6 @@ */ /* mmddN */ -#define CATALOG_VERSION_NO 202007061 +#define CATALOG_VERSION_NO 202007121 #endif diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 38295aca48..585823576a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5946,12 +5946,20 @@ prorettype => 'timestamptz', proargtypes => 'xid', prosrc => 'pg_xact_commit_timestamp' }, +{ oid => '8456', + descr => 'get commit timestamp and replication origin of a transaction', + proname => 'pg_xact_commit_timestamp_origin', provolatile => 'v', + prorettype => 'record', proargtypes => 'xid', + proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{i,o,o}', + proargnames => '{xid,timestamp,origin}', + prosrc => 'pg_xact_commit_timestamp_origin' }, + { oid => '3583', - descr => 'get transaction Id and commit timestamp of latest transaction commit', + descr => 'get transaction Id, commit timestamp and replication origin of latest transaction commit', proname => 'pg_last_committed_xact', provolatile => 'v', prorettype => 'record', proargtypes => '', - proallargtypes => '{xid,timestamptz}', proargmodes => '{o,o}', - proargnames => '{xid,timestamp}', prosrc => 'pg_last_committed_xact' }, + proallargtypes => '{xid,timestamptz,oid}', proargmodes => '{o,o,o}', + proargnames => '{xid,timestamp,origin}', prosrc => 'pg_last_committed_xact' }, { oid => '3537', descr => 'get identification of SQL object', proname => 'pg_describe_object', provolatile => 's', prorettype => 'text', diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index 9cdb136435..e48f88c884 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -417,28 +417,38 @@ pg_xact_commit_timestamp(PG_FUNCTION_ARGS) } +/* + * pg_last_committed_xact + * + * SQL-callable wrapper to obtain some information about the latest + * committed transaction: transaction ID, timestamp and replication + * origin. + */ Datum pg_last_committed_xact(PG_FUNCTION_ARGS) { TransactionId xid; + RepOriginId nodeid; TimestampTz ts; - Datum values[2]; - bool nulls[2]; + Datum values[3]; + bool nulls[3]; TupleDesc tupdesc; HeapTuple htup; /* and construct a tuple with our data */ - xid = GetLatestCommitTsData(&ts, NULL); + xid = GetLatestCommitTsData(&ts, &nodeid); /* * Construct a tuple descriptor for the result row. This must match this * function's pg_proc entry! */ - tupdesc = CreateTemplateTupleDesc(2); + tupdesc = CreateTemplateTupleDesc(3); TupleDescInitEntry(tupdesc, (AttrNumber) 1, "xid", XIDOID, -1, 0); TupleDescInitEntry(tupdesc, (AttrNumber) 2, "timestamp", TIMESTAMPTZOID, -1, 0); + TupleDescInitEntry(tupdesc, (AttrNumber) 3, "origin", + OIDOID, -1, 0); tupdesc = BlessTupleDesc(tupdesc); if (!TransactionIdIsNormal(xid)) @@ -452,6 +462,9 @@ pg_last_committed_xact(PG_FUNCTION_ARGS) values[1] = TimestampTzGetDatum(ts); nulls[1] = false; + + values[2] = ObjectIdGetDatum(nodeid); + nulls[2] = false; } htup = heap_form_tuple(tupdesc, values, nulls); @@ -459,6 +47
Re: Cache lookup errors with functions manipulation object addresses
On Mon, Jul 06, 2020 at 10:56:53AM -0400, Alvaro Herrera wrote: > I think "routine" was more correct. We introduced that terminology to > encompass both functions and procedures, back when real procedures were > added. See the definitions in the glossary for example. Okay, fine by me, while we have getProcedureTypeDescription() working on pg_proc entries ;) > I'm not sure that I'll have time to review this in the next couple of > days, but it seemed sufficiently good when I last looked. Thanks. I'll try to come back to it next week or the one after for another round of self-review, and I'll see what to do at this point. This has been around for three years, it can still wait a bit. -- Michael signature.asc Description: PGP signature
Re: Parallel copy
On Wed, Jun 24, 2020 at 1:41 PM Bharath Rupireddy wrote: > > Along with the review comments addressed > patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching > all other latest series of patches(0001 to 0005) from [1], the order > of applying patches is from 0001 to 0006. > > [1] > https://www.postgresql.org/message-id/CALDaNm0H3N9gK7CMheoaXkO99g%3DuAPA93nSZXu0xDarPyPY6sg%40mail.gmail.com > Some comments: + movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index; + + cstate->pcdata->curr_data_block->skip_bytes = movebytes; + + data_block = &pcshared_info->data_blocks[block_pos]; + + if (movebytes > 0) + memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], + movebytes); we can create a local variable and use in place of cstate->pcdata->curr_data_block. + if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1)) + AdjustFieldInfo(cstate, 1); + + memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count)); Should this be like below, as the remaining size can fit in current block: if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE) + if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1)) + { + AdjustFieldInfo(cstate, 2); + *new_block_pos = pcshared_info->cur_block_pos; + } Same like above. + movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index; + + cstate->pcdata->curr_data_block->skip_bytes = movebytes; + + data_block = &pcshared_info->data_blocks[block_pos]; + + if (movebytes > 0) Instead of the above check, we can have an assert check for movebytes. + if (mode == 1) + { + cstate->pcdata->curr_data_block = data_block; + cstate->raw_buf_index = 0; + } + else if(mode == 2) + { + ParallelCopyDataBlock *prev_data_block = NULL; + prev_data_block = cstate->pcdata->curr_data_block; + prev_data_block->following_block = block_pos; + cstate->pcdata->curr_data_block = data_block; + + if (prev_data_block->curr_blk_completed == false) + prev_data_block->curr_blk_completed = true; + + cstate->raw_buf_index = 0; + } This code is common for both, keep in common flow and remove if (mode == 1) cstate->pcdata->curr_data_block = data_block; cstate->raw_buf_index = 0; +#define CHECK_FIELD_COUNT \ +{\ + if (fld_count == -1) \ + { \ + if (IsParallelCopy() && \ + !IsLeader()) \ + return true; \ + else if (IsParallelCopy() && \ + IsLeader()) \ + { \ + if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \ + ereport(ERROR, \ + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \ + errmsg("received copy data after EOF marker"))); \ + return true; \ + } \ We only copy sizeof(fld_count), Shouldn't we check fld_count != cstate->max_fields? Am I missing something here? + if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1)) + { + AdjustFieldInfo(cstate, 2); + *new_block_pos = pcshared_info->cur_block_pos; + } + + memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size)); + + cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size); + + fld_size = (int32) pg_ntoh32(fld_size); + + if (fld_size == 0) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), +errmsg("unexpected EOF in COPY data"))); + + if (fld_size < -1) + ereport(ERROR, + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), +errmsg("invalid field size"))); + + if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size) + { + cstate->raw_buf_index = cstate->raw_buf_index + fld_size; + } We can keep the check like cstate->raw_buf_index + fld_size < ..., for better readability and consistency. +static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_pos, + int m, ParallelCopyTupleInfo *tuple_start_info_ptr, + ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size) flinfo, typioparam & typmod is not used, we can remove the parameter. +static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_p
Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Hi, I found an issue while executing a backup use case(please see [1] for queries) on postgres version 12. Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as on_shmem_exit call back which will add this function to the before_shmem_exit_list array which is supposed to be removed on pg_stop_backup so that we can do the pending cleanup and issue a warning for each pg_start_backup for which we did not call the pg_stop backup. Now, I executed a query for which JIT is picked up, then the the llvm compiler inserts it's own exit callback i.e. llvm_shutdown at the end of before_shmem_exit_list array. Now, I executed pg_stop_backup and call to cancel_before_shmem_exit() is made with the expectation that the nonexclusive_base_backup_cleanup callback is removed from before_shmem_exit_list array. Since the cancel_before_shmem_exit() only checks the last entry in the before_shmem_exit_list array, which is llvm compiler's exit callback, so we exit the cancel_before_shmem_exit() without removing the intended nonexclusive_base_backup_cleanup callback which remains still the before_shmem_exit_list and gets executed during the session exit throwing a warning "aborting backup due to backend exiting before pg_stop_backup was called", which is unintended. Attached is the patch that fixes the above problem by making cancel_before_shmem_exit() to look for the given function(and for given args) in the entire before_shmem_exit_list array, not just the last entry, starting from the last entry. Request the community take this patch for review for v12. Having said that, abovementioned problem for backup use case does not occur for v13 and latest versions of postgres (please below description[2]), but these kinds of issues can come, if the cancel_before_shmem_exit() is left to just look at the last array entry while removing a registered callback. There's also a comment in cancel_before_shmem_exit() function description "For simplicity, only the latest entry can be removed. (We could work harder but there is no need for current uses.) Since we start to find use cases now, there is a need to enhance cancel_before_shmem_exit(), so I also propose to have the same attached patch for v13 and latest versions. Thoughts? [1] CREATE TABLE t1 (id SERIAL); INSERT INTO t1 (id) SELECT * FROM generate_series(1, 2000); SELECT * FROM pg_start_backup('label', false, false); /*JIT is enabled in my session and it is being picked by below query*/ EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT COUNT(*) FROM t1; SELECT * FROM pg_stop_backup(false, true); [2] for v13 and latest versions, start_backup first registers do_pg_abort_backup, and then pg_start_backup_callback, performs startup backup operations and unregisters only pg_start_backup_callback from before_shmem_exit_list, retaining do_pg_abort_backup still in the list, which is to be called on session's exit JIT compiler inserts it's own exit call back at the end of before_shmem_exit_list array. stop_backup registers pg_stop_backup_callback, performs stop operations, unregisters pg_stop_backup_callback from before_shmem_exit_list, and sets the sessionBackupState = SESSION_BACKUP_NONE, note that the callback do_pg_abort_backup registered by start_backup command still exists in the before_shmem_exit_list and will not be removed by stop_backup. On session exit, do_pg_abort_backup gets called but returns without performing any operations(not even throwing a warning), by checking sessionBackupState which was set to SESSION_BACKUP_NONE by stop_backup. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v1-0001-Modify-cancel_before_shmem_exit-to-search-all-reg.patch Description: Binary data
Re: Global snapshots
On Fri, Jul 3, 2020 at 12:18 PM Masahiko Sawada wrote: > > On Sat, 20 Jun 2020 at 21:21, Amit Kapila wrote: > > > > On Fri, Jun 19, 2020 at 1:42 PM Andrey V. Lepikhov > > wrote: > > > > >Also, can you let us know if this > > > > supports 2PC in some way and if so how is it different from what the > > > > other thread on the same topic [1] is trying to achieve? > > > Yes, the patch '0003-postgres_fdw-support-for-global-snapshots' contains > > > 2PC machinery. Now I'd not judge which approach is better. > > > > > > > Sorry for being late. > No problem, your summarization, and comparisons of both approaches are quite helpful. > > I studied this patch and did a simple comparison between this patch > (0002 patch) and my 2PC patch. > > In terms of atomic commit, the features that are not implemented in > this patch but in the 2PC patch are: > > * Crash safe. > * PREPARE TRANSACTION command support. > * Query cancel during waiting for the commit. > * Automatically in-doubt transaction resolution. > > On the other hand, the feature that is implemented in this patch but > not in the 2PC patch is: > > * Executing PREPARE TRANSACTION (and other commands) in parallel > > When the 2PC patch was proposed, IIRC it was like this patch (0002 > patch). I mean, it changed only postgres_fdw to support 2PC. But after > discussion, we changed the approach to have the core manage foreign > transaction for crash-safe. From my perspective, this patch has a > minimum implementation of 2PC to work the global snapshot feature and > has some missing features important for supporting crash-safe atomic > commit. So I personally think we should consider how to integrate this > global snapshot feature with the 2PC patch, rather than improving this > patch if we want crash-safe atomic commit. > Okay, but isn't there some advantage with this approach (manage 2PC at postgres_fdw level) as well which is that any node will be capable of handling global transactions rather than doing them via central coordinator? I mean any node can do writes or reads rather than probably routing them (at least writes) via coordinator node. Now, I agree that even if this advantage is there in the current approach, we can't lose the crash-safety aspect of other approach. Will you be able to summarize what was the problem w.r.t crash-safety and how your patch has dealt it? > Looking at the commit procedure with this patch: > > When starting a new transaction on a foreign server, postgres_fdw > executes pg_global_snapshot_import() to import the global snapshot. > After some work, in pre-commit phase we do: > > 1. generate global transaction id, say 'gid' > 2. execute PREPARE TRANSACTION 'gid' on all participants. > 3. prepare global snapshot locally, if the local node also involves > the transaction > 4. execute pg_global_snapshot_prepare('gid') for all participants > > During step 2 to 4, we calculate the maximum CSN from the CSNs > returned from each pg_global_snapshot_prepare() executions. > > 5. assign global snapshot locally, if the local node also involves the > transaction > 6. execute pg_global_snapshot_assign('gid', max-csn) on all participants. > > Then, we commit locally (i.g. mark the current transaction as > committed in clog). > > After that, in post-commit phase, execute COMMIT PREPARED 'gid' on all > participants. > As per my current understanding, the overall idea is as follows. For global transactions, pg_global_snapshot_prepare('gid') will set the transaction status as InDoubt and generate CSN (let's call it NodeCSN) at the node where that function is executed, it also returns the NodeCSN to the coordinator. Then the coordinator (the current postgres_fdw node on which write transaction is being executed) computes MaxCSN based on the return value (NodeCSN) of prepare (pg_global_snapshot_prepare) from all nodes. It then assigns MaxCSN to each node. Finally, when Commit Prepared is issued for each node that MaxCSN will be written to each node including the current node. So, with this idea, each node will have the same view of CSN value corresponding to any particular transaction. For Snapshot management, the node which receives the query generates a CSN (CurrentCSN) and follows the simple rule that the tuple having a xid with CSN lesser than CurrentCSN will be visible. Now, it is possible that when we are examining a tuple, the CSN corresponding to xid that has written the tuple has a value as INDOUBT which will indicate that the transaction is yet not committed on all nodes. And we wait till we get the valid CSN value corresponding to xid and then use it to check if the tuple is visible. Now, one thing to note here is that for global transactions we primarily rely on CSN value corresponding to a transaction for its visibility even though we still maintain CLOG for local transaction status. Leaving aside the incomplete parts and or flaws of the current patch, does the above match the top-level idea of this patch? I am not s
Re: Cache lookup errors with functions manipulation object addresses
> On 6 Jul 2020, at 09:45, Michael Paquier wrote: > Attached is for now a rebased patch. If there are any comments, > please feel free. Daniel, Alvaro, does that look fine for you? I am > letting this stuff aside for a couple of days for the moment. Reading through I have no comments or objections, LGTM. I didn't try to apply and run tests, but the CFBot is happy about it so I doubt I would find anything different there. cheers ./daniel