Re: RTLD_GLOBAL (& JIT inlining)
At Thu, 24 Jan 2019 01:02:14 -0500, Tom Lane wrote in <24744.1548309...@sss.pgh.pa.us> > Kyotaro HORIGUCHI writes: > > With the attached patch, external modules are loaded RTLD_LOCAL > > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let > > it loaded RTLD_GLOBAL. I suppose this is the change with the > > least impact on existing modules because I believe most of them > > don't need to export symbols. > > This seems really hacky :-( > > In the PL cases, at least, it's not so much the PL itself that > wants to export symbols as the underlying language library > (libpython, libperl, etc). You're assuming that an action on > the PL .so will propagate to the underlying language .so, which (If I understand you correctly) I'm not assuming that the propagation happens or even the contray. As can be seen from it actually works, symbols of underlying libraries referred from pl*.so are resoved using dependency list in pl*.so itself independently of the RTLD flags given for it. The problem here is while underlying language libraries cannot ferer symbols back in the pl*.so when it is loaded with RTLD_LOCAL, RTLD_GLOBAL makes some extensions unhappy. I think that the assumption that RTLD_GLOBAL resolves all was proved rather shaky. RTLD_DEEPBIND is not available on some platforms so we need to choose between LOCAL and GLOBAL per loading module. > seems like a pretty shaky assumption. I wonder also whether > closing and reopening the DL handle will really do anything > at all for already-loaded libraries ... internal_load_library() avoids duplicate loading so I believe it cannot happen. Regarding duplicate dlopen, RTLD_GLOBAL just promotes RTLD_LOCAL. In other words, the former wins. So it's not a problem if the dlclose() there did nothing. We could RTLD_NOLOAD for almost all platforms but win32 implement needs additinal change to make work it as expected. If you are saying hacky about that it uses only the symbol of unused function, we can actually call it to get the flag. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
Thank you for the comments. At Wed, 23 Jan 2019 18:21:45 -0500, Bruce Momjian wrote in <20190123232145.ga8...@momjian.us> > On Wed, Jan 23, 2019 at 05:35:02PM +0900, Kyotaro HORIGUCHI wrote: > > At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > > wrote in > > <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp> > > > An option is an additional PGPROC member and interface functions. > > > > > > struct PGPROC > > > { > > > ... > > > int syscahe_usage_track_interval; /* track interval, 0 to disable */ > > > > > > =# select syscahce_usage_track_add(, [, ]); > > > =# select syscahce_usage_track_remove(2134); > > > > > > > > > Or, just provide an one-shot triggering function. > > > > > > =# select syscahce_take_usage_track(); > > > > > > This can use both a similar PGPROC variable or SendProcSignal() > > > but the former doesn't fire while idle time unless using timer. > > > > The attached is revised version of this patchset, where the third > > patch is the remote setting feature. It uses static shared memory. > > > > =# select pg_backend_catcache_stats(, ); > > > > Activates or changes catcache stats feature on the backend with > > PID. (The name should be changed to .._syscache_stats, though.) > > It is far smaller than the remote-GUC feature. (It contains a > > part that should be in the previous patch. I will fix it later.) > > I have a few questions to make sure we have not made the API too > complex. First, for syscache_prune_min_age, that is the minimum age > that we prune, and entries could last twice that long. Is there any > value to doing the scan at 50% of the age so that the > syscache_prune_min_age is the max age? For example, if our age cutoff > is 10 minutes, we could scan every 5 minutes so 10 minutes would be the > maximum age kept. (Looking into the patch..) Actually thrice, not twice. It is because I put significance on the access frequency. I think it is reasonable that the entries with more frequent access gets longer life (within a certain limit). The original problem here was negative caches that are created but never accessed. However, there's no firm reason for the number of the steps (3). There might be no difference if the extra life time were up to once of s_p_m_age or even with no extra time. > Second, when would you use syscache_memory_target != 0? It is a suggestion upthread, we sometimes want to keep some known amount of caches despite that expration should be activated. > If you had > syscache_prune_min_age really fast, e.g. 10 seconds? What is the > use-case for this? You have a query that touches 10k objects, and then > the connection stays active but doesn't touch many of those 10k objects, > and you want it cleaned up in seconds instead of minutes? (I can't see > why you would not clean up all unreferenced objects after _minutes_ of > disuse, but removing them after seconds of disuse seems undesirable.) > What are the odds you would retain the entires you want with a fast > target? Do you asking the reason for the unit? It's just because it won't be so large even in seconds, to the utmost 3600 seconds. Even though I don't think such a short dutaion setting is meaningful in the real world, either I don't think we need to inhibit that. (Actually it is useful for testing:p) Another reason is that GUC_UNIT_MIN doesn't seem so common that it is used only by two variables, log_rotation_age and old_snapshot_threshold. > What is the value of being able to change a specific backend's stat > interval? I don't remember any other setting having this ability. As mentioned upthread, it takes significant time to take statistics so I believe no one is willing to turn it on at all times. As the result it should be useless because it cannot be turned on on an active backend when it actually gets bloat. So I wanted to provide a remote switching feture. I also thought that there's some other features that is useful if it could be turned on remotely so the remote GUC feature but it was too complex... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: FETCH FIRST clause PERCENT option
On Wed, Jan 9, 2019 at 8:18 PM Tomas Vondra wrote: > > See the attached patch, which recomputes the count regularly. I don't > claim the patch is committable or that it has no other bugs, but > hopefully it shows what I meant. > > I got only one issue it is not work well with cursor postgres=# START TRANSACTION; START TRANSACTION postgres=# create table t as select i from generate_series(1,1000) s(i); SELECT 1000 postgres=# declare c cursor for select * from t fetch first 5 percent rows only; DECLARE CURSOR postgres=# fetch all in c; ERROR: trying to store a minimal tuple into wrong type of slot I am looking at it . meanwhile i fix row estimation and cost and make create_ordered_paths creation with no LIMIT consideration in PERCENTAGE case regards Surafel diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 4db8142afa..8491b7831a 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1397,7 +1397,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY In this syntax, the start or count value is required by @@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW +with PERCENT count specifies the maximum number of rows to return +in percentage.ROW and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses. diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index baa669abe8..8fc70b1b5f 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -21,6 +21,8 @@ #include "postgres.h" +#include + #include "executor/executor.h" #include "executor/nodeLimit.h" #include "miscadmin.h" @@ -44,6 +46,7 @@ ExecLimit(PlanState *pstate) ScanDirection direction; TupleTableSlot *slot; PlanState *outerPlan; + slot = node->subSlot; CHECK_FOR_INTERRUPTS(); @@ -81,7 +84,15 @@ ExecLimit(PlanState *pstate) /* * Check for empty window; if so, treat like empty subplan. */ - if (node->count <= 0 && !node->noCount) + if (node->limitOption == PERCENTAGE) + { +if (node->percent == 0.0) +{ + node->lstate = LIMIT_EMPTY; + return NULL; +} + } + else if (node->count <= 0 && !node->noCount) { node->lstate = LIMIT_EMPTY; return NULL; @@ -122,6 +133,7 @@ ExecLimit(PlanState *pstate) return NULL; case LIMIT_INWINDOW: + if (ScanDirectionIsForward(direction)) { /* @@ -130,6 +142,32 @@ ExecLimit(PlanState *pstate) * advancing the subplan or the position variable; but change * the state machine state to record having done so. */ + +/* + * When in percentage mode, we need to see if we can get any + * additional rows from the subplan (enough to increase the + * node->count value). + */ +if (node->limitOption == PERCENTAGE) +{ + /* loop until the node->count increments */ + while (node->position - node->offset >= node->count) + { + int64 cnt; + + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + break; + + tuplestore_puttupleslot(node->totalTuple, slot); + + /* plus 1, because the first tuple is returned from LIMIT_RESCAN */ + cnt = 1 + tuplestore_tuple_count(node->totalTuple); + + node->count = ceil(node->percent * cnt / 100.0); + } +} + if (!node->noCount && node->position - node->offset >= node->count) { @@ -145,6 +183,20 @@ ExecLimit(PlanState *pstate) return NULL; } +if (node->limitOption == PERCENTAGE) +{ + while (node->position - node->offset < node->count) + { + if (tuplestore_gettupleslot(node->totalTuple, true, true, slot)) + { + node->subSlot = slot; + node->position++; + } + } +} +else if (node->limitOption == EXACT_NUMBER) +{ + /* * Get next tuple from subplan, if any. */ @@ -156,6 +208,7 @@ ExecLimit(PlanState *pstate) } node->subSlot = slot; node->position++; +} } else { @@ -278,17 +331,29 @@ recompute_limits(LimitState *node) /* Interpret NULL count as no count (LIMIT ALL) */ if (isNull) { - node->count = 0; + node->count = 1; node->noCount
Re: using expression syntax for partition bounds
Hello. At Fri, 18 Jan 2019 17:50:56 +0900, Amit Langote wrote in > Thanks for the comments. > > On 2019/01/18 16:48, Peter Eisentraut wrote: > >> How about the following note in the documentation: > >> > >> + Although volatile expressions such as > >> + CURRENT_TIMESTAMP can be > >> used > >> + for this, be careful when using them, because > >> + PostgreSQL will evaluate them only once > >> + when adding the partition. > > > > I don't think we have to phrase it in this warning way. Just say that > > volatile expressions are evaluated at the time of the DDL statement. > > OK, then just this: > > + Even volatile expressions such as > + CURRENT_TIMESTAMP can be used. I agree to not to phrase in a warning way, but it seems too-simplified. I think the reason is still required, like this?: === The expression is evaluated once at the table creation time so it can involve even volatile expressions such as CURRENT_TIMESTAMP. === > >> Sorry but I'm not sure how or what I would test about this. Maybe, just > >> add a test in create_table.sql/alter_table.sql that shows that using > >> volatile expression doesn't cause an error? > > > > Possibilities: Create a range partition with current_timestamp as the > > upper bound and then in a separate transaction insert current_timestamp > > and have it appear in the default partition. Or create list partition > > with session_user as one partition's value and then insert session_user > > and have it appear in that table. Or something like those. > > OK, added a test that uses current_timestamp. > > >> So, should the "name" type's collation should simply be discarded in favor > >> of "POSIX" that's being used for partitioning? > > > > In that specific case, yes, I think so. > > > >>> What we don't want is someone writing an explicit COLLATE clause. I > >>> think we just need to check that there is no top-level COLLATE clause. > >>> This would then sort of match the logic parse_collate.c for combining > >>> collations. > >> > >> Maybe I'm missing something, but isn't it OK to allow the COLLATE clause > >> as long as it specifies the matching collation as the parent? > > > > Yes, that should be OK. > > Alright, I've included a test that uses cast expression in partition bound. > > Updated patch attached. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: WIP: Avoid creation of the free space map for small tables
On Thu, Jan 24, 2019 at 9:46 AM Amit Kapila wrote: > > On Thu, Jan 24, 2019 at 3:39 AM John Naylor > wrote: > > Few comments related to pg_upgrade patch: 1. + if ((maps[mapnum].relkind != RELKIND_RELATION && + maps[mapnum].relkind != RELKIND_TOASTVALUE) || + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ || + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100) + (void) transfer_relfile(&maps[mapnum], "_fsm", vm_must_add_frozenbit); I think this check will needlessly be performed for future versions as well, say when wants to upgrade from PG12 to PG13. That might not create any problem, but let's try to be more precise. Can you try to rewrite this check? You might want to encapsulate it inside a function. I have thought of doing something similar to what we do for vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I guess for this patch it is not important to check catalog version as even if someone tries to upgrade to the same version. 2. transfer_relfile() { .. - /* Is it an extent, fsm, or vm file? */ - if (type_suffix[0] != '\0' || segno != 0) + /* Did file open fail? */ + if (stat(old_file, &statbuf) != 0) .. } So from now onwards, we will call stat for even 0th segment which means there is one additional system call for each relation, not sure if that matters, but I think there is no harm in once testing with a large number of relations say 10K to 50K relations which have FSM. The other alternative is we can fetch pg_class.relpages and rely on that to take this decision, but again if that is not updated, we might take the wrong decision. Anyone else has any thoughts on this point? 3. -static void +static Size transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_frozenbit) If we decide to go with the approach proposed by you, we should add some comments atop this function for return value change? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: using expression syntax for partition bounds
Horiguchi-san, Thanks for checking. On 2019/01/24 19:03, Kyotaro HORIGUCHI wrote: > At Fri, 18 Jan 2019 17:50:56 +0900, Amit Langote wrote: >> On 2019/01/18 16:48, Peter Eisentraut wrote: How about the following note in the documentation: + Although volatile expressions such as + CURRENT_TIMESTAMP can be used + for this, be careful when using them, because + PostgreSQL will evaluate them only once + when adding the partition. >>> >>> I don't think we have to phrase it in this warning way. Just say that >>> volatile expressions are evaluated at the time of the DDL statement. >> >> OK, then just this: >> >> + Even volatile expressions such as >> + CURRENT_TIMESTAMP can be used. > > I agree to not to phrase in a warning way, but it seems > too-simplified. I think the reason is still required, like this?: > > === > The expression is evaluated once at the table creation time so it > can involve even volatile expressions such as > CURRENT_TIMESTAMP. Ah, that's perhaps a better way of describing this feature. Attached rebased patch containing above change. Thanks, Amit From 6ed90ad9640c4d7bfe53c280535631c45d2cafae Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Jul 2018 14:05:22 +0900 Subject: [PATCH v10] Allow generalized expression syntax for partition bounds Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote --- doc/src/sgml/ref/alter_table.sgml | 6 +- doc/src/sgml/ref/create_table.sgml | 19 +-- src/backend/commands/tablecmds.c | 9 ++ src/backend/optimizer/util/clauses.c | 4 +- src/backend/parser/gram.y | 60 +--- src/backend/parser/parse_agg.c | 10 ++ src/backend/parser/parse_expr.c| 5 + src/backend/parser/parse_func.c| 3 + src/backend/parser/parse_utilcmd.c | 214 +++-- src/include/optimizer/clauses.h| 2 + src/include/parser/parse_node.h| 3 +- src/include/utils/partcache.h | 6 + src/test/regress/expected/create_table.out | 89 +--- src/test/regress/sql/create_table.sql | 51 ++- 14 files changed, 312 insertions(+), 169 deletions(-) diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 0d1feaf828..0aa0f093f2 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -86,9 +86,9 @@ ALTER TABLE [ IF EXISTS ] name and partition_bound_spec is: -IN ( { numeric_literal | string_literal | TRUE | FALSE | NULL } [, ...] ) | -FROM ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) - TO ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) | +IN ( partition_bound_expr [, ...] ) | +FROM ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) + TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) and column_constraint is: diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 857515ec8f..22dbc07b23 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -87,9 +87,9 @@ class="parameter">referential_action ] [ ON UPDATE and partition_bound_spec is: -IN ( { numeric_literal | string_literal | TRUE | FALSE | NULL } [, ...] ) | -FROM ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) - TO ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) | +IN ( partition_bound_expr [, ...] ) | +FROM ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) + TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: @@ -413,12 +413,13 @@ WITH ( MODULUS numeric_literal, REM - Each of the values specified in - the partition_bound_spec is - a literal, NULL, MINVALUE, or - MAXVALUE. Each literal value must be either a - numeric constant that is coercible to the corresponding partition key - column's type, or a string literal that is valid input for that type. + partition_bound_expr is + any variable-free expression (subqueries, window functions, aggregate + functions, and set-returning functions are not allowed). Its data type + must match the data type of the corresponding partition key column. + The expression is evaluated once at table creation time, so it can + even contain volatile expressions such as + CURRENT_TIMESTAMP. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 738c178107..e338e760ab 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -83
Re: Use an enum for RELKIND_*?
At Wed, 19 Dec 2018 09:55:22 -0500, Tom Lane wrote in <10268.1545231...@sss.pgh.pa.us> > Andres Freund writes: > > On 2018-12-18 23:17:54 -0500, Tom Lane wrote: > >> Yeah, that would sure make things better. I was considering > >> something like > >> > >>#ifndef USE_ASSERT_CHECKING > >>default: > >>elog(ERROR, ...); > >>#endif > > > Yea, that's the best I can come up with too. I think we'd probably want > > to have the warning available during development mainly, so I'm not sure > > the "lonely buildfarm animal" approach buys us enough. > > Well, if it's controlled by some dedicated symbol ("CHECK_ENUM_SWITCHES" > or such) then you could certainly run a test compile with that turned > on, if you felt the need to. But I don't really buy the complaint > that leaving it to the buildfarm to find such problems wouldn't work. > They're almost always simple, easily-fixed oversights. > > Also, if we wrap all this up in a macro then it'd be possible to do > other things by adjusting the macro. I'm envisioning > > switch ((RelationRelkind) rel->rd_rel->relkind) > { > case ... > ... > > CHECK_ENUM_SWITCH(RelationRelkind, rel->rd_rel->relkind); > } I might misunderstand something, but my compiler (gcc 7.3.1) won't be quiet about omitted value even with default:. > switch ((x) v) > { > case A: ... > case B: ... > default: ... > } > > t.c:12:3: warning: enumeration value 'C' not handled in switch [-Wswitch-enum] Isn't it enough that at least one platform correctly warns that? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Use an enum for RELKIND_*?
On Wed 16 Jan 2019, 18:10 Alvaro Herrera > Or we could do nothing, since there are no complaints about this > problem. > Unless there are objections my patch is to some add it to the xid case. The tag is pretty useless for users and it's incredibly hard to even see these rows anyways I'll take care of it if there's no objections >
Re: Connection slots reserved for replication
Hello. Documentation looks fine for me. By the way, a comment for the caller-site of CheckRequreParameterValues() in xlog.c looks somewhat stale. > /* Check to see if any changes to max_connections give problems */ > CheckRequiredParameterValues(); may be better something like this?: > /* Check to see if any parameter change gives a problem on replication */ In postinit.c: >/* > * The last few connection slots are reserved for superusers. > */ >if ((!am_superuser && !am_walsender) && >ReservedBackends > 0 && This is forgetting about explaing about walsenders. > The last few connection slots are reserved for > superusers. Replication connections don't share the same slot > pool. Or something? And the parentheses enclosing "!am_superuser..walsender" seems no longer needed. In guc.c: - /* see max_connections and max_wal_senders */ + /* see max_connections */ I don't understand for what reason we should see max_connections just above. (Or even max_wal_senders) Do we need this comment? (There're some other instances, but they wont'be for this patch.) In pg_controldata.c: + printf(_("max_wal_senders setting: %d\n"), + ControlFile->max_wal_senders); printf(_("max_worker_processes setting: %d\n"), ControlFile->max_worker_processes); printf(_("max_prepared_xacts setting: %d\n"), The added item seems to want some additional spaces. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Simplify set of flags used by MyXactFlags
On 2019-Jan-22, Michael Paquier wrote: > On Mon, Jan 21, 2019 at 02:58:39PM -0300, Alvaro Herrera wrote: > > "... operated on temp namespace" doesn't look good; seems to me to be > > missing an article, for one thing, but really I'm not sure that > > 'namespace' is the term to be using here. I'd say "... operated on > > temporary objects" instead (the plural avoids the need for the article; > > and the user doesn't really care about the namespace itself but rather > > about the objects it contains.) > > Thanks for the input. Would you prefer something like the attached > then? I have switched the error message to "temporary objects", and > renamed the flag to XACT_FLAGS_ACCESSEDTEMPOBJECTS. Uh, I didn't think it was necessary nor desirable to rename the flag, only the user-visible message. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: using expression syntax for partition bounds
Why did you lose the parser_errposition in parse_utilcmd.c line 3854? > - /* Fail if we don't have a constant (i.e., non-immutable coercion) */ > - if (!IsA(value, Const)) > + /* Make sure the expression does not refer to any vars. */ > + if (contain_var_clause(value)) > ereport(ERROR, > - (errcode(ERRCODE_DATATYPE_MISMATCH), > - errmsg("specified value cannot be cast to type > %s for column \"%s\"", > - format_type_be(colType), > colName), > - errdetail("The cast requires a non-immutable > conversion."), > - errhint("Try putting the literal value in > single quotes."), > - parser_errposition(pstate, con->location))); > + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), > + errmsg("cannot use column references in > partition bound expression"))); -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: ArchiveEntry optional arguments refactoring
Here is another version, where I accumulated all the suggestions: * Use NULL as a default value where it was an empty string before (this required few minor changes for some part of the code outside ArchiveEntry) * Rename defn, descr to createStmt, description * Use a macro to avoid pgindent errors About the last one. I'm also inclined to use the simpler version of ARCHIVE_OPTS macro, mostly because the difference between "optional" and "positional" arguments in the alternative proposal is not that visible. So > mixing struct arguments and normal function arguments seems > quite confusing could probably affect not only readability, but also would be bit more problematic for updating this code (which was the goal in the first place). 0001-ArchiveOpts-structure_v2.patch Description: Binary data
Re: problems with foreign keys on partitioned tables
Hi, On 2019/01/24 6:13, Alvaro Herrera wrote: > On 2019-Jan-22, Amit Langote wrote: >> Done. See the attached patches for HEAD and PG 11. > > I'm not quite sure I understand why the one in DefineIndex needs the > deps but nothing else does. I fear that you added that one just to > appease the existing test that breaks otherwise, and I worry that with > that addition we're papering over some other, more fundamental bug. Thinking more on this, my proposal to rip dependencies between parent and child constraints altogether to resolve the bug I initially reported is starting to sound a bit overambitious especially considering that we'd need to back-patch it (the patch didn't even consider index constraints properly, creating a divergence between the behaviors of inherited foreign key constraints and inherited index constraints). We can pursue it if only to avoid bloating the catalog for what can be achieved with little bit of additional code in tablecmds.c, but maybe we should refrain from doing it in reaction to this particular bug. I've updated the patch that implements a much simpler fix for this particular bug. Just to reiterate, the following illustrates the bug: create table pk (a int primary key); create table p (a int references pk) partition by list (a); create table p1 partition of p for values in (1) partition by list (a); create table p11 partition of p1 for values in (1); alter table p detach partition p1; alter table p1 drop constraint p_a_fkey; ERROR: constraint "p_a_fkey" of relation "p11" does not exist The error occurs because ATExecDropConstraint when recursively called on p11 cannot find the constraint as the dependency mechanism already dropped it. The new fix is to return from ATExecDropConstraint without recursing if the constraint being dropped is index or foreign constraint. A few hunks of the originally proposed patch are attached here as 0001, especially the part which fixes ATAddForeignKeyConstraint to pass the correct value of connoninherit to CreateConstraintEntry (which should be false for partitioned tables). With that change, many tests start failing because of the above bug. That patch also adds a test case like the one above, but it fails along with others due to the bug. Patch 0002 is the aforementioned simpler fix to make the errors (existing and the newly added) go away. These patches apply unchanged to the PG 11 branch. Thanks, Amit From 8eb5ce74e8656b517ad5a4e960b70de0ff3bedd5 Mon Sep 17 00:00:00 2001 From: amit Date: Thu, 24 Jan 2019 21:29:17 +0900 Subject: [PATCH 1/2] Fix ATAddForeignKeyConstraint to use correct value of connoinherit While at it, add some Asserts to ConstraintSetParentConstraint to assert the correct value of coninhcount. Many tests fail, but the next patch should take care of them. --- src/backend/catalog/pg_constraint.c | 11 +-- src/backend/commands/tablecmds.c | 5 - src/test/regress/expected/foreign_key.out | 18 -- src/test/regress/sql/foreign_key.sql | 15 ++- 4 files changed, 43 insertions(+), 6 deletions(-) diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 446b54b9ff..d2b8489b6c 100644 --- a/src/backend/catalog/pg_constraint.c +++ b/src/backend/catalog/pg_constraint.c @@ -785,6 +785,12 @@ ConstraintSetParentConstraint(Oid childConstrId, Oid parentConstrId) { constrForm->conislocal = false; constrForm->coninhcount++; + /* +* An inherited foreign key constraint can never have more than one +* parent, because inheriting foreign keys is only allowed for +* partitioning where multiple inheritance is disallowed. +*/ + Assert(constrForm->coninhcount == 1); constrForm->conparentid = parentConstrId; CatalogTupleUpdate(constrRel, &tuple->t_self, newtup); @@ -797,8 +803,9 @@ ConstraintSetParentConstraint(Oid childConstrId, Oid parentConstrId) else { constrForm->coninhcount--; - if (constrForm->coninhcount <= 0) - constrForm->conislocal = true; + /* See the above comment. */ + Assert(constrForm->coninhcount == 0); + constrForm->conislocal = true; constrForm->conparentid = InvalidOid; deleteDependencyRecordsForClass(ConstraintRelationId, childConstrId, diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 738c178107..fea4d99735 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -7251,6 +7251,9 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel, boolold_check_ok; ObjectAddress address; ListCell *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop); + /* Partitioned table's fore
Re: Improve selectivity estimate for range queries
Hi. At Fri, 11 Jan 2019 11:36:47 +0900, "Yuzuko Hosoya" wrote in <01d4a956$806a2ab0$813e8010$@lab.ntt.co.jp> > Hi, > > Thanks for the comments, and I'm sorry for the late reply. > > > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > > Sent: Friday, January 11, 2019 7:04 AM > > > Robert Haas writes: > > > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane wrote: > > >> A smaller-footprint way to fix the immediate problem might be to > > >> change the values of DEFAULT_INEQ_SEL and friends so that they're > > >> even less likely to be matched by accident. That is, instead of > > >> 0., use 0.342 or some such. > > > > > That's not a dumb idea, but it seems pretty unprincipled to me, and to > > > be honest I'm kind of surprised that you're not proposing something > > > cleaner. > > > > The problem is the invasiveness of such a change (large) vs the benefit > > (not so large). The > upthread > > patch attempted to add a separate signaling path, but it was very > > incomplete --- and yet both I > and > > Horiguchi-san thought it was already too messy. > > > > Maybe at some point we'll go over to something reasonably principled, like > > adding confidence > intervals > > to all selectivity estimates. That would be *really* invasive but perhaps > > would bring enough > benefit > > to justify itself. But the current patch is just attempting to fix one > > extremely narrow pain > point, > > and if that is all it's doing it should have a commensurately small > > footprint. So I don't think > the > > submitted patch looks good from a cost/benefit standpoint. > > > Yes, I agree with you. Indeed the patch I attached is insufficient in > cost-effectiveness. > However, I want to solve problems of that real estimates happened to equal to > the default > values such as this case, even though it's a narrow pain point. So I tried > distinguishing > explicitly between real estimates and otherwise as Robert said. > > The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm > concerned whether > any range queries really cannot match 0.342 (or some such) by > accident in any > environments. Is the way which Horiguchi-san did enough to prove that? It cannot be perfect in priciple, but I think it is reliable enough. This is not principled at all but very effective comparatively the complexity, I think. Actually, i/(i*3+1) for some 10^n's are: 1/ 4:binary format: 3f d0 00 00 00 00 00 00 10/31:binary format: 3f d4 a5 29 4a 52 94 a5 100/ 301:binary format: 3f d5 43 30 7a 78 c5 51 1000/ 3001:binary format: 3f d5 53 83 74 70 f1 95 1/ 30001:binary format: 3f d5 55 26 bb 44 2b a8 10/31:binary format: 3f d5 55 50 ac 4a 74 6d 100/ 301:binary format: 3f d5 55 54 de 07 5a 96 1000/ 3001:binary format: 3f d5 55 55 49 67 22 6d 1/ 30001:binary format: 3f d5 55 55 54 23 e9 d7 10/31:binary format: 3f d5 55 55 55 36 ca 95 100/ 301:binary format: 3f d5 55 55 55 52 47 75 1000/ 3001:binary format: 3f d5 55 55 55 55 07 25 1/ 30001:binary format: 3f d5 55 55 55 55 4d 84 10/31:binary format: 3f d5 55 55 55 55 54 8d 100/ 301:binary format: 3f d5 55 55 55 55 55 41 1000/ 3001:binary format: 3f d5 55 55 55 55 55 53 So, (as Tom said upthread) intuitively we can expect that we are safe up to roughly 10^10? for the simple cases. # Of course involving histogram makes things complex, though. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: using expression syntax for partition bounds
Hi, Thanks for looking. On 2019/01/24 21:00, Alvaro Herrera wrote: > Why did you lose the parser_errposition in parse_utilcmd.c line 3854? > >> -/* Fail if we don't have a constant (i.e., non-immutable coercion) */ >> -if (!IsA(value, Const)) >> +/* Make sure the expression does not refer to any vars. */ >> +if (contain_var_clause(value)) >> ereport(ERROR, >> -(errcode(ERRCODE_DATATYPE_MISMATCH), >> - errmsg("specified value cannot be cast to type >> %s for column \"%s\"", >> -format_type_be(colType), >> colName), >> - errdetail("The cast requires a non-immutable >> conversion."), >> - errhint("Try putting the literal value in >> single quotes."), >> - parser_errposition(pstate, con->location))); >> +(errcode(ERRCODE_INVALID_COLUMN_REFERENCE), >> + errmsg("cannot use column references in >> partition bound expression"))); The if (contain_var_clause(value)) block is new code, but I agree its ereport should have parser_errposition just like other ereports in that function. Fixed that in the attached. Thanks, Amit From ae4fbcaa97364e1a33c5c25eb983a23d3acad30c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Jul 2018 14:05:22 +0900 Subject: [PATCH v11] Allow generalized expression syntax for partition bounds Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote --- doc/src/sgml/ref/alter_table.sgml | 6 +- doc/src/sgml/ref/create_table.sgml | 19 +-- src/backend/commands/tablecmds.c | 9 ++ src/backend/optimizer/util/clauses.c | 4 +- src/backend/parser/gram.y | 60 +--- src/backend/parser/parse_agg.c | 10 ++ src/backend/parser/parse_expr.c| 5 + src/backend/parser/parse_func.c| 3 + src/backend/parser/parse_utilcmd.c | 215 +++-- src/include/optimizer/clauses.h| 2 + src/include/parser/parse_node.h| 3 +- src/include/utils/partcache.h | 6 + src/test/regress/expected/create_table.out | 91 +--- src/test/regress/sql/create_table.sql | 51 ++- 14 files changed, 315 insertions(+), 169 deletions(-) diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 0d1feaf828..0aa0f093f2 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -86,9 +86,9 @@ ALTER TABLE [ IF EXISTS ] name and partition_bound_spec is: -IN ( { numeric_literal | string_literal | TRUE | FALSE | NULL } [, ...] ) | -FROM ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) - TO ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) | +IN ( partition_bound_expr [, ...] ) | +FROM ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) + TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) and column_constraint is: diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 857515ec8f..22dbc07b23 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -87,9 +87,9 @@ class="parameter">referential_action ] [ ON UPDATE and partition_bound_spec is: -IN ( { numeric_literal | string_literal | TRUE | FALSE | NULL } [, ...] ) | -FROM ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) - TO ( { numeric_literal | string_literal | TRUE | FALSE | MINVALUE | MAXVALUE } [, ...] ) | +IN ( partition_bound_expr [, ...] ) | +FROM ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) + TO ( { partition_bound_expr | MINVALUE | MAXVALUE } [, ...] ) | WITH ( MODULUS numeric_literal, REMAINDER numeric_literal ) index_parameters in UNIQUE, PRIMARY KEY, and EXCLUDE constraints are: @@ -413,12 +413,13 @@ WITH ( MODULUS numeric_literal, REM - Each of the values specified in - the partition_bound_spec is - a literal, NULL, MINVALUE, or - MAXVALUE. Each literal value must be either a - numeric constant that is coercible to the corresponding partition key - column's type, or a string literal that is valid input for that type. + partition_bound_expr is + any variable-free expression (subqueries, window functions, aggregate + functions, and set-returning functions are not allowed). Its data type + must match the data type of the corresponding partition key column. + The expression is evaluated once at table creation time, so it can + even contain volatile expressions such as + CURRENT_TIMESTAMP. diff --git a/src/backend/commands/tablecmds.c b/src/backe
Re: pgsql: Detach constraints when partitions are detached
CC'ing -hackers. On 2019-Jan-24, Amit Langote wrote: > On 2019/01/24 13:58, Tom Lane wrote: > > Alvaro Herrera writes: > >> Detach constraints when partitions are detached > > > > Hm ... it looks like this fails under -DRELCACHE_FORCE_RELEASE: > > Oops, sorry. This is why: > > index_close(idx, NoLock); > ... > +if (!idx->rd_index->indisprimary && !idx->rd_index->indisunique) > +continue; > > Attached a fix. Mumble. I changed course here and went for simpler coding instead. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Use an enum for RELKIND_*?
Kyotaro HORIGUCHI writes: > I might misunderstand something, but my compiler (gcc 7.3.1) > won't be quiet about omitted value even with default:. > ... I would call that a compiler bug, TBH. The code is 100% correct, if you intended to allow the default case to handle some enum values, which is perfectly reasonable coding. > Isn't it enough that at least one platform correctly warns that? No, especially not if it's only a warning. Many developers would not see it initially, and the buildfarm likely wouldn't complain either. regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote: > > Second, when would you use syscache_memory_target != 0? > > It is a suggestion upthread, we sometimes want to keep some known > amount of caches despite that expration should be activated. > > > If you had > > syscache_prune_min_age really fast, e.g. 10 seconds? What is the > > use-case for this? You have a query that touches 10k objects, and then > > the connection stays active but doesn't touch many of those 10k objects, > > and you want it cleaned up in seconds instead of minutes? (I can't see > > why you would not clean up all unreferenced objects after _minutes_ of > > disuse, but removing them after seconds of disuse seems undesirable.) > > What are the odds you would retain the entires you want with a fast > > target? > > Do you asking the reason for the unit? It's just because it won't > be so large even in seconds, to the utmost 3600 seconds. Even > though I don't think such a short dutaion setting is meaningful > in the real world, either I don't think we need to inhibit > that. (Actually it is useful for testing:p) Another reason is We have gone from ignoring the cache bloat problem to designing an API that even we don't know what value they provide, and if we don't know, we can be sure our users will not know. Every GUC has a cost, even if it is not used. I suggest you go with just syscache_prune_min_age, get that into PG 12, and we can then reevaluate what we need. If you want to hard-code a minimum cache size where no pruning will happen, maybe based on the system catalogs or typical load, that is fine. > that GUC_UNIT_MIN doesn't seem so common that it is used only by > two variables, log_rotation_age and old_snapshot_threshold. > > > What is the value of being able to change a specific backend's stat > > interval? I don't remember any other setting having this ability. > > As mentioned upthread, it takes significant time to take > statistics so I believe no one is willing to turn it on at all > times. As the result it should be useless because it cannot be > turned on on an active backend when it actually gets bloat. So I > wanted to provide a remote switching feture. > > I also thought that there's some other features that is useful if > it could be turned on remotely so the remote GUC feature but it > was too complex... Well, I am thinking if we want to do something like this, we should do it for all GUCs, not just for this one, so I suggest we not do this now either. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Protect syscache from bloating with negative cache entries
Bruce Momjian writes: > On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote: >> I also thought that there's some other features that is useful if >> it could be turned on remotely so the remote GUC feature but it >> was too complex... > Well, I am thinking if we want to do something like this, we should do > it for all GUCs, not just for this one, so I suggest we not do this now > either. I will argue hard that we should not do it at all, ever. There is already a mechanism for broadcasting global GUC changes: apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP. I do not think we need something that can remotely change a GUC's value in just one session. The potential for bugs, misuse, and just plain confusion is enormous, and the advantage seems minimal. regards, tom lane
Re: problems with foreign keys on partitioned tables
Hello On 2019-Jan-24, Amit Langote wrote: > Thinking more on this, my proposal to rip dependencies between parent and > child constraints altogether to resolve the bug I initially reported is > starting to sound a bit overambitious especially considering that we'd > need to back-patch it (the patch didn't even consider index constraints > properly, creating a divergence between the behaviors of inherited foreign > key constraints and inherited index constraints). We can pursue it if > only to avoid bloating the catalog for what can be achieved with little > bit of additional code in tablecmds.c, but maybe we should refrain from > doing it in reaction to this particular bug. While studying your fix it occurred to me that perhaps we could change things so that we first collect a list of objects to drop, and only when we're done recursing perform the deletion, as per the attached patch. However, this fails for the test case in your 0001 patch (but not the one you show in your email body), because you added a stealthy extra ingredient to it: the constraint in the grandchild has a different name, so when ATExecDropConstraint() tries to search for it by name, it's just not there, not because it was dropped but because it has never existed in the first place. Unless I misunderstand, this means that your plan to remove those catalog tuples won't work at all, because there is no way to find those constraints other than via pg_depend if they have different names. I'm leaning towards the idea that your patch is the definitive fix and not just a backpatchable band-aid. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 747acbd2b9..29860acaa2 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -417,7 +417,7 @@ static void CloneFkReferencing(Relation pg_constraint, Relation parentRel, static void ATExecDropConstraint(Relation rel, const char *constrName, DropBehavior behavior, bool recurse, bool recursing, - bool missing_ok, LOCKMODE lockmode); + bool missing_ok, LOCKMODE lockmode, ObjectAddresses *objsToDelete); static void ATPrepAlterColumnType(List **wqueue, AlteredTableInfo *tab, Relation rel, bool recurse, bool recursing, @@ -4160,12 +4160,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, case AT_DropConstraint: /* DROP CONSTRAINT */ ATExecDropConstraint(rel, cmd->name, cmd->behavior, false, false, - cmd->missing_ok, lockmode); + cmd->missing_ok, lockmode, NULL); break; case AT_DropConstraintRecurse: /* DROP CONSTRAINT with recursion */ ATExecDropConstraint(rel, cmd->name, cmd->behavior, true, false, - cmd->missing_ok, lockmode); + cmd->missing_ok, lockmode, NULL); break; case AT_AlterColumnType: /* ALTER COLUMN TYPE */ address = ATExecAlterColumnType(tab, rel, cmd, lockmode); @@ -9219,7 +9219,8 @@ static void ATExecDropConstraint(Relation rel, const char *constrName, DropBehavior behavior, bool recurse, bool recursing, - bool missing_ok, LOCKMODE lockmode) + bool missing_ok, LOCKMODE lockmode, + ObjectAddresses *objsToDelete) { List *children; ListCell *child; @@ -9237,6 +9238,12 @@ ATExecDropConstraint(Relation rel, const char *constrName, conrel = heap_open(ConstraintRelationId, RowExclusiveLock); + if (!recursing) + { + Assert(objsToDelete == NULL); + objsToDelete = new_object_addresses(); + } + /* * Find and drop the target constraint */ @@ -9290,13 +9297,13 @@ ATExecDropConstraint(Relation rel, const char *constrName, } /* - * Perform the actual constraint deletion + * Register the constraint for deletion */ conobj.classId = ConstraintRelationId; conobj.objectId = HeapTupleGetOid(tuple); conobj.objectSubId = 0; - performDeletion(&conobj, behavior, 0); + add_exact_object_address(&conobj, objsToDelete); found = true; } @@ -9402,7 +9409,7 @@ ATExecDropConstraint(Relation rel, const char *constrName, /* Time to delete this child constraint, too */ ATExecDropConstraint(childrel, constrName, behavior, true, true, - false, lockmode); + false, lockmode, objsToDelete); } else { @@ -9435,6 +9442,9 @@ ATExecDropConstraint(Relation rel, const char *constrName, heap_close(childrel, NoLock); } + if (!recursing) + performMultipleDeletions(objsToDelete, behavior, 0); + heap_close(conrel, RowExclusiveLock); }
Re: problems with foreign keys on partitioned tables
On Fri, Jan 25, 2019 at 12:08 AM Alvaro Herrera wrote: > On 2019-Jan-24, Amit Langote wrote: > > > Thinking more on this, my proposal to rip dependencies between parent and > > child constraints altogether to resolve the bug I initially reported is > > starting to sound a bit overambitious especially considering that we'd > > need to back-patch it (the patch didn't even consider index constraints > > properly, creating a divergence between the behaviors of inherited foreign > > key constraints and inherited index constraints). We can pursue it if > > only to avoid bloating the catalog for what can be achieved with little > > bit of additional code in tablecmds.c, but maybe we should refrain from > > doing it in reaction to this particular bug. > > While studying your fix it occurred to me that perhaps we could change > things so that we first collect a list of objects to drop, and only when > we're done recursing perform the deletion, as per the attached patch. > However, this fails for the test case in your 0001 patch (but not the > one you show in your email body), because you added a stealthy extra > ingredient to it: the constraint in the grandchild has a different name, > so when ATExecDropConstraint() tries to search for it by name, it's just > not there, not because it was dropped but because it has never existed > in the first place. Doesn't the following performDeletion() at the start of ATExecDropConstraint(), through findDependentObject()'s own recursion, take care of deleting *all* constraints, including those of? /* * Perform the actual constraint deletion */ conobj.classId = ConstraintRelationId; conobj.objectId = con->oid; conobj.objectSubId = 0; performDeletion(&conobj, behavior, 0); > Unless I misunderstand, this means that your plan to remove those > catalog tuples won't work at all, because there is no way to find those > constraints other than via pg_depend if they have different names. Yeah, that's right. Actually, I gave up on developing the patch further based on that approach (no-dependencies approach) when I edited the test to give the grandchild constraint its own name. > I'm leaning towards the idea that your patch is the definitive fix and > not just a backpatchable band-aid. Yeah, I think so too. Thanks, Amit
Re: problems with foreign keys on partitioned tables
On Fri, Jan 25, 2019 at 12:30 AM Amit Langote wrote: > > On Fri, Jan 25, 2019 at 12:08 AM Alvaro Herrera > wrote: > > On 2019-Jan-24, Amit Langote wrote: > > > > > Thinking more on this, my proposal to rip dependencies between parent and > > > child constraints altogether to resolve the bug I initially reported is > > > starting to sound a bit overambitious especially considering that we'd > > > need to back-patch it (the patch didn't even consider index constraints > > > properly, creating a divergence between the behaviors of inherited foreign > > > key constraints and inherited index constraints). We can pursue it if > > > only to avoid bloating the catalog for what can be achieved with little > > > bit of additional code in tablecmds.c, but maybe we should refrain from > > > doing it in reaction to this particular bug. > > > > While studying your fix it occurred to me that perhaps we could change > > things so that we first collect a list of objects to drop, and only when > > we're done recursing perform the deletion, as per the attached patch. > > However, this fails for the test case in your 0001 patch (but not the > > one you show in your email body), because you added a stealthy extra > > ingredient to it: the constraint in the grandchild has a different name, > > so when ATExecDropConstraint() tries to search for it by name, it's just > > not there, not because it was dropped but because it has never existed > > in the first place. > > Doesn't the following performDeletion() at the start of > ATExecDropConstraint(), through findDependentObject()'s own recursion, > take care of deleting *all* constraints, including those of? Meant to say: "...including those of the grandchildren?" > /* > * Perform the actual constraint deletion > */ > conobj.classId = ConstraintRelationId; > conobj.objectId = con->oid; > conobj.objectSubId = 0; > > performDeletion(&conobj, behavior, 0); Thanks, Amit
Re: Thread-unsafe coding in ecpg
On 1/23/19 10:53 PM, Tom Lane wrote: > I wrote: >> This suggests that, rather than throwing up our hands if the initial >> _configthreadlocale call returns -1, we should act as though the function >> doesn't exist, and just soldier on the same as before. The code in there >> assumes that -1 is a can't-happen case and doesn't try to recover, >> but apparently that's over-optimistic. > I pushed a patch to fix that. jacana has been upgraded to gcc 8.1.0 so it knows about _configthreadlocale() and it's now happy. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: problems with foreign keys on partitioned tables
On 2019-Jan-25, Amit Langote wrote: > On Fri, Jan 25, 2019 at 12:30 AM Amit Langote wrote: > > Doesn't the following performDeletion() at the start of > > ATExecDropConstraint(), through findDependentObject()'s own recursion, > > take care of deleting *all* constraints, including those of? > > Meant to say: "...including those of the grandchildren?" > > > /* > > * Perform the actual constraint deletion > > */ > > conobj.classId = ConstraintRelationId; > > conobj.objectId = con->oid; > > conobj.objectSubId = 0; > > > > performDeletion(&conobj, behavior, 0); Of course it does when the dependencies are set up -- but in the approach we just gave up on, those dependencies would not exist. Anyway, my motivation was that performMultipleDeletions has the advantage that it collects all objects to be dropped before deleting anyway, and so the error that a constraint was dropped in a previous recursion step would not occur. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
Fabien COELHO writes: >> I had in mind something more like the attached. > Yep. > I'm not too happy that it mixes API levels, and about the int/double/int > path. > Attached an updated version which relies on pg_jrand48 instead. Hm, I'm not sure that's really an improvement, but I pushed it like that (and the other change along with it). > Also, as > the pseudo-random state is fully controlled, seeded test results are > deterministic so the expected value can be fully checked. I found that the "expected value" was different in v11 than HEAD, which surprised me. It looks like the reason is that HEAD sets up more/different RandomStates from the same seed than v11 did. Not sure if it's a good thing for this behavior to change across versions. regards, tom lane
Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
On 2019-Jan-24, Tom Lane wrote: > > Also, as > > the pseudo-random state is fully controlled, seeded test results are > > deterministic so the expected value can be fully checked. > > I found that the "expected value" was different in v11 than HEAD, > which surprised me. It looks like the reason is that HEAD sets up > more/different RandomStates from the same seed than v11 did. Not > sure if it's a good thing for this behavior to change across versions. The rationale behind this was that some internal uses of random numbers messed up the determinism of user-invoked random functions; 409231919443 commit message says While at it, use separate random state for thread administratrivia such as deciding which script to run, how long to delay for throttling, or whether to log a message when sampling; this not only makes these tasks independent of each other, but makes the actual thread run deterministic. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Built-in connection pooler
Hi hacker, I am working for some time under built-in connection pooler for Postgres: https://www.postgresql.org/message-id/flat/a866346d-5582-e8e8-2492-fd32732b0783%40postgrespro.ru#b1c354d5cf937893a75954e1e77f81f5 Unlike existed external pooler, this solution supports session semantic for pooled connections: you can use temporary tables, prepared statements, GUCs,... But to make it possible I need to store/restore session context. It is not so difficult, but it requires significant number of changes in Postgres code. It will be committed in PgProEE-12 version of Postgres Professional version of Postgres, but I realize that there are few changes to commit it to mainstream version of Postgres. Dimitri Fontaine proposed to develop much simpler version of pooler which can be community: The main idea I want to pursue is the following: - only solve the “idle connection” problem, nothing else, making idle connection basically free - implement a layer in between a connection and a session, managing a “client backend” pool - use the ability to give a socket to another process, as you did, so that the pool is not a proxy - allow re-using of a backend for a new session only when it is safe to do so Unfortunately, we have not found a way to support SSL connections with socket redirection. So I have implemented solution with traditional proxy approach. If client changes session context (creates temporary tables, set GUC values, prepare statements,...) then its backend becomes "tainted" and is not more participate in pooling. It is now dedicated to this backend. But it still receives data through proxy. Once this client is disconnected, tainted backend is terminated. Built-in proxy accepts connection on special port (by default 6543). If you connect to standard port, then normal Postgres backends will be launched and there is no difference with vanilla Postgres . And if you connect to proxy port, then connection is redirected to one of proxy workers which then perform scheduling of all sessions, assigned to it. There is currently on migration of sessions between proxies. There are three policies of assigning session to proxy: random, round-robin and load-balancing. The main differences with pgbouncer&K are that: 1. It is embedded and requires no extra steps for installation and configurations. 2. It is not single threaded (no bottleneck) 3. It supports all clients (if client needs session semantic, then it will be implicitly given dedicated backend) Some performance results (pgbench -S -n): #Connections Proxy Proxy/SSL Direct Direct/SSL 1 13752 12396 17443 15762 10 53415 59615 68334 85885 1000 60152 20445 60003 24047 Proxy configuration is the following: session_pool_size = 4 connection_proxies = 2 postgres=# select * from pg_pooler_state(); pid | n_clients | n_ssl_clients | n_pools | n_backends | n_dedicated_backends | tx_bytes | rx_bytes | n_transactions --+---+---+-++--+--+--+ 1310 | 1 | 0 | 1 | 4 | 0 | 10324739 | 9834981 | 156388 1311 | 0 | 0 | 1 | 4 | 0 | 10430566 | 9936634 | 158007 (2 rows) This implementation contains much less changes to Postgres core (it is more like invoking pgbouncer as Postgres worker). The main things I have added are: 1. Mechanism for sending socket to a process (needed to redirect connection to proxy) 2. Support of edge pooling mode for epoll (needed to multiplex reads and writes) 3. Library libpqconn for establishing libpq connection from core Proxy version of built-in connection pool is in conn_proxy branch in the following GIT repository: https://github.com/postgrespro/postgresql.builtin_pool.git Also I attach patch to the master to this mail. Will be please to receive your comments. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e94b305..ee12562 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -704,6 +704,123 @@ include_dir 'conf.d' + + max_sessions (integer) + + max_sessions configuration parameter + + + + + The maximum number of client sessions that can be handled by + one connection proxy when session pooling is switched on. + This parameter does not add any memory or CPU overhead, so + specifying a large max_sessions value + does not affect performance. + If the max_sessions limit is reached new connection are not accepted. + + + The default value is 1000. This param
Re: problems with foreign keys on partitioned tables
On 2019-Jan-24, Amit Langote wrote: > A few hunks of the originally proposed patch are attached here as 0001, > especially the part which fixes ATAddForeignKeyConstraint to pass the > correct value of connoninherit to CreateConstraintEntry (which should be > false for partitioned tables). With that change, many tests start failing > because of the above bug. That patch also adds a test case like the one > above, but it fails along with others due to the bug. Patch 0002 is the > aforementioned simpler fix to make the errors (existing and the newly > added) go away. Cool, thanks. I made a bunch of fixes -- I added an elog(ERROR) check to ensure a constraint whose parent is being set does not already have a parent; that seemed in line with the new asserts that check the coninhcount. I also moved those asserts, changing the spirit of what they checked. Also: I wasn't sure about stopping recursion for legacy inheritance in ATExecDropConstraint() for non-check constraints, so I changed that to occur in partitioned only. Also, stylistic fixes. I was mildly surprised to realize that the my_fkey constraint on fk_part_1_1 is gone after dropping fkey on its parent, since it was declared locally when that table was created. However, it makes perfect sense in retrospect, since we made it dependent on its parent. I'm not terribly happy about this, but I don't quite see a way to make it better that doesn't require much more code than is warranted. > These patches apply unchanged to the PG 11 branch. Yeah, only if you tried to compile, it would have complained about table_close() ;-) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila wrote: > > On Thu, Jan 24, 2019 at 3:39 AM John Naylor > wrote: > > mkdir -p data1 data2 standby > > > > echo 'heap' > data1/foo > > echo 'fsm' > data1/foo_fsm > > > > # simulate streaming replication > > rsync --archive data1 standby > > > > # simulate pg_upgrade, skipping FSM > > ln data1/foo -t data2/ > > > > rsync --archive --delete --hard-links --size-only --no-inc-recursive > > data1 data2 standby > > > > # result > > ls standby/data1 > > ls standby/data2 > > > > > > The result is that foo_fsm is not copied to standby/data2, contrary to > > what the docs above imply for other unlinked files. Can anyone shed > > light on this? > > > > Is foo_fsm present in standby/data1? Yes it is. > I think what doc means to say is > that it copies any unlinked files present in primary's new cluster > (which in your case will be data2). In that case, I'm still confused why that doc says, "Unfortunately, rsync needlessly copies files associated with temporary and unlogged tables because these files don't normally exist on standby servers." I fail to see why the primary's new cluster would have these if they weren't linked. And in the case we're discussing here, the skipped FSMs won't be on data2, so won't end up in standby/data2. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during INSERT and UPDATE
On Wed, Jan 23, 2019 at 7:56 PM David Rowley wrote: > On Thu, 24 Jan 2019 at 13:38, John Naylor wrote: > > Hmm, now instead of an 85x speedup over master in the 10k partition > > case, I only get 20x. Anyone else see this? > > What's it like with fsync off? No change. Just in case, I tried git bisect and also recreated the cluster, but never got the same performance as my first test, so not sure what happened. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Implicit timezone conversion replicating from timestamp to timestamptz?
We are working to migrate several large tables from the timestamp to the timestamptz data type by using logical replication (so as to avoid long downtime for type conversions). We are using pglogical but curious if what I share below applies to native logical replication as well. Both source and destination dbs are at localtime, which is 'America/Chicago' time zone. The source system has a timestamp stored "at time zone UTC", like this for 6:00pm Chicago time: 2019-01-24 20:00:00.00 I was *very surprised* to find that replicating above timestamp to timestamptz actually does so correctly, showing this value in my psql client on the subscriber: 2019-01-24 14:00:00.00-06 How does it know/why does it assume it knows that the time zone of the timestamp data type is UTC on the provider given that my clusters are at America/Chicago? I would have actually expected an incorrect conversion of the data unless I set the timezone to UTC on the way in on the subscriber via a trigger. That is, I was expecting to see this: 2019-01-24 20:00:00.00-06 Which is obviously wrong. So why does it do this and is there some assumption being made somewhere in the code base that a timestamp is actually saved "at time zone UTC"? Thanks, Jeremy
Re: Delay locking partitions during INSERT and UPDATE
On 1/24/19 9:50 PM, John Naylor wrote: > On Wed, Jan 23, 2019 at 7:56 PM David Rowley > wrote: >> On Thu, 24 Jan 2019 at 13:38, John Naylor >> wrote: >>> Hmm, now instead of an 85x speedup over master in the 10k partition >>> case, I only get 20x. Anyone else see this? >> >> What's it like with fsync off? > > No change. Just in case, I tried git bisect and also recreated the > cluster, but never got the same performance as my first test, so not > sure what happened. I can still see about the same performance as before (on both clusters). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On Thu, Jan 24, 2019 at 10:02 AM Tom Lane wrote: > Bruce Momjian writes: > > On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote: > >> I also thought that there's some other features that is useful if > >> it could be turned on remotely so the remote GUC feature but it > >> was too complex... > > > Well, I am thinking if we want to do something like this, we should do > > it for all GUCs, not just for this one, so I suggest we not do this now > > either. > > I will argue hard that we should not do it at all, ever. > > There is already a mechanism for broadcasting global GUC changes: > apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP. > I do not think we need something that can remotely change a GUC's > value in just one session. The potential for bugs, misuse, and > just plain confusion is enormous, and the advantage seems minimal. I think there might be some merit in being able to activate debugging or tracing facilities for a particular session remotely, but designing something that will do that sort of thing well seems like a very complex problem that certainly should not be sandwiched into another patch that is mostly about something else. And if we ever get such a thing I suspect it should be entirely separate from the GUC system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Log a sample of transactions
On Fri, Jan 18, 2019 at 8:23 AM Adrien NAYRAT wrote: > On 1/18/19 9:03 AM, Peter Eisentraut wrote: > > But if you have trouble with a specific transaction, how will a setting > > help that randomly logs transactions, not necessarily the one you are > > concerned about? > > Yes, it assumes your application performs same type of transaction. > Maybe the use case is too specific to have this feature in core? It doesn't sound too crazy to me. Say you log a sample of statements. But then you're like, wow, this is hard to interpret, because I don't know what happened earlier in the transaction. So then you use this feature instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: proposal - plpgsql unique statement id
On 22/01/2019 19:04, Pavel Stehule wrote: > It was missing, fixed, thank you for check committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Almost bug in COPY FROM processing of GB18030 encoded input
On Wed, Jan 23, 2019 at 6:23 AM Heikki Linnakangas wrote: > I happened to notice that when CopyReadLineText() calls mblen(), it > passes only the first byte of the multi-byte characters. However, > pg_gb18030_mblen() looks at the first and the second byte. > CopyReadLineText() always passes \0 as the second byte, so > pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded > characters as 2. > > It works out fine, though, because the second half of the 4-byte encoded > character always looks like another 2-byte encoded character, in > GB18030. CopyReadLineText() is looking for delimiter and escape > characters and newlines, and only single-byte characters are supported > for those, so treating a 4-byte character as two 2-byte characters is > harmless. Yikes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Delay locking partitions during INSERT and UPDATE
On Thu, Jan 24, 2019 at 4:17 PM Tomas Vondra wrote: > I can still see about the same performance as before (on both clusters). Thanks for checking! I think the thing to do now is have a committer look at it. It's a small patch with obvious performance effects -- there's just the question of whether the locking behavior with concurrent DDL is as safe as it is now. Anyone have anything else? -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during INSERT and UPDATE
On 1/24/19 10:34 PM, John Naylor wrote: > On Thu, Jan 24, 2019 at 4:17 PM Tomas Vondra > wrote: >> I can still see about the same performance as before (on both clusters). > > Thanks for checking! I think the thing to do now is have a committer > look at it. It's a small patch with obvious performance effects -- > there's just the question of whether the locking behavior with > concurrent DDL is as safe as it is now. > > Anyone have anything else? > Yes, I don't see why would the patch change that and I've been looking for such cases. I think David was looking at that this week too, and I assume he'll send an update if he finds anything. If not, I plan to get it committed soon-ish (possibly next week after the FOSDEM dev meeting, unless there are some objections). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Old protocol fastpath calls borked?
Hi, In parse_fcall_arguments_20(): c0a8c3ac13f8 (Tom Lane 2003-05-08 18:16:37 + 579) argsize = pq_getmsgint(msgBuf, 4); 0ac6298bb8ac (Tom Lane 2003-05-09 18:08:48 + 580) if (argsize == -1) 0ac6298bb8ac (Tom Lane 2003-05-09 18:08:48 + 581) { 0ac6298bb8ac (Tom Lane 2003-05-09 18:08:48 + 582) fcinfo->argnull[i] = true; 147d4bf3e5e3 (Tom Lane 2006-04-04 19:35:37 + 583) fcinfo->arg[i] = OidReceiveFunctionCall(typreceive, NULL, 147d4bf3e5e3 (Tom Lane 2006-04-04 19:35:37 + 584) typioparam, -1); 0ac6298bb8ac (Tom Lane 2003-05-09 18:08:48 + 585) continue; c0a8c3ac13f8 (Tom Lane 2003-05-08 18:16:37 + 586) } we appear to constantly setting argnull to true for all arguments? Since, apparently, 2003? I don't have a test-program at hand, but that kind of seems to suggest this never really has been used? Greetings, Andres Freund
Re: Old protocol fastpath calls borked?
Andres Freund writes: > In parse_fcall_arguments_20(): > we appear to constantly setting argnull to true for all arguments? Uh, it looks to me like it does so only if the frontend sends a -1 length field, which is the protocol's convention for indicating a null. regards, tom lane
Re: proposal - plpgsql unique statement id
Peter Eisentraut writes: > committed Why didn't this patch modify the dumping logic in pl_funcs.c to print the IDs? I'm not aware of other cases where we intentionally omit fields from debug-support printouts. regards, tom lane
Re: Old protocol fastpath calls borked?
Hi, On 2019-01-24 17:04:32 -0500, Tom Lane wrote: > Andres Freund writes: > > In parse_fcall_arguments_20(): > > we appear to constantly setting argnull to true for all arguments? > > Uh, it looks to me like it does so only if the frontend sends a -1 > length field, which is the protocol's convention for indicating a > null. Ah, brainfade. Probably triggered by the fact that I forgot that we call input functions even on NULL (which never quite made sense to me). Greetings, Andres Freund
Re: Old protocol fastpath calls borked?
Andres Freund writes: > Ah, brainfade. Probably triggered by the fact that I forgot that we call > input functions even on NULL (which never quite made sense to me). That's so that domain_in can reject NULLs if the domain constraints say to do so. Mind you, the SQL committee should never have allowed NOT NULL domain constraints in the first place, because the concept is fundamentally incompatible with outer joins. But it's there and we try to honor it in this case. regards, tom lane
Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs
On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila wrote: > On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu wrote: > > Recently, we hit a few occurrences of deadlock when IO failure (including > > disk full, random remote disk IO failures) happens in single user mode. We > > found the issue exists on both Linux and Windows in multiple postgres > > versions. > > > > 3. Because the unable to write relation data scenario is difficult to > > hit naturally even reserved space is turned off, I have prepared a small > > patch (see attachment “emulate-error.patch”) to force an error when PG > > tries to write data to relation files. We can just apply the patch and > > there is no need to put efforts flooding data to disk any more. > > I have one question related to the way you have tried to emulate the error. > > @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, > BlockNumber blocknum, > nbytes, > BLCKSZ); > + ereport(ERROR, > + (errcode(ERRCODE_INTERNAL_ERROR), > + errmsg("Emulate exception in mdwrite() when writing to disk"))); > + > > We generally reserve the space in a relation before attempting to > write, so not sure how you are able to hit the disk full situation via > mdwrite. If you see the description of the function, that also > indicates same. Presumably ZFS or BTRFS or something more exotic could still get ENOSPC here, and of course any filesystem could give us EIO here (because the disk is on fire or the remote NFS server is rebooting due to an automatic Windows update). -- Thomas Munro http://www.enterprisedb.com
Re: monitoring CREATE INDEX [CONCURRENTLY]
On 2019-Jan-09, Pavan Deolasee wrote: > Looks like it's missing the validate_index blocks-scanned report, which is > not AM-specific and your original proposal does mention that. True. That phase is actually 3 phases, which would be reported separately: 5. index_bulk_delete() scan 6. performsort 7. validate_index_heapscan > I thought a bit about index_build part. If most AMs follow a somewhat > standard phases while building an index, it might be simpler to define > those phases and have AM agnostic code report those phases. Well, think about btrees. We first scan the whole table putting all tuples in a spool (two spools actually), then we tell the spools to get sorted, then we extract tuples from the spools, and finally we read the spool to produce the leaf pages. If we just report the table scan, the reports gets to 100% complete in that phase and then waits for a very long time during which nothing seems to happen. That's not cool. I'm adding a new AM support routine which turns the subphase number into a textual description, so that we don't have to hardcode phase names in the agnostic code. > Can we have more descriptive text for waiters? Such as "waiting for > existing writers", "waiting for intermediate writers" and "waiting for > old readers". Not sure if I got those correct, something of that sort > will definitely give more insight into what the transaction is waiting > for. Can do. > Can we actually also report the list of transactions the command is waiting > on? > That could be useful to the user if CIC appears to be stuck too long. I'm afraid this is not possible, because the progress report interface doesn't allow for something like that. > IMHO we should just use full term INDEX instead of IDX, such as > PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple > of extra characters won't make a difference. Really? I though it was clear enough and it's *three* characters saved even. > I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE > when the wait phase is over, to avoid any confusion. For example, I noticed > that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers. Yes, absolutely. > +GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount) > > Could that out variable be named something differently? "countp" or > something like that. I did not check if there is some practice that we > follow, but I remember suffixing with "p" rather than prefixing with > "o" (for out I assume) Sure. > +/* Progress parameters for CREATE INDEX */ > +#define PROGRESS_CREATEIDX_PHASE 0 > +/* 1 and 2 reserved for "waitfor" metrics */ > +#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3 > +#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4 > + > > Is there a reason to leave those reserve placeholders, only to fill them a > few lines down? Yes -- those are going to be used by other reports that also use similar metrics, such as the CLUSTER report. I'm running out of columns to put the numbers into :-( Right now I have 1. phase 2. subphase (for index_build) 3. lockers total (to wait for) 4. lockers done 5. blocks total 6. blocks done 7. tapes total 8. tapes done 9. partitions total 10. partitions done The "tapes total/done" bit is about reporting the performsort steps; I'm not really sure yet it'll be tapes, but I hope it can be done with two numbers. Anyway, in btree build I have these subphases 1. spool heapscan using IndexBuildHeapScan 2. _bt_parallel_heapscan stuff (only one of these in a build) 3. bt_leafbuild, performsort spool 1 4. bt_leafbuild, performsort spool 2 5. bt_load and I don't have columns to put the metrics for the btload thing, assuming I figure out what those are. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Synchronous replay take III
On Tue, Jan 15, 2019 at 11:17 PM Masahiko Sawada wrote: > Regarding the current (v10 patch) design I have some questions and > comments. Hi Sawada-san, Thanks for your testing and feedback. > The patch introduces new GUC parameter synchronous_replay. We can set > synchronous_commit = off while setting synchronous_replay = on. With > this setting, the backend will synchrnously wait for standbys to > replay. I'm concerned that having two separate GUC parameters > controling the transaction commit behaviour would confuse users. It's > a just idea but maybe we can use 'remote_apply' for synchronous replay > purpose and introduce new parameter for standby server something like > allow_stale_read. That is an interesting idea. That choice means that the new mode always implies synchronous_commit = on (since remote_apply is a "higher" level). I wanted them to be independent, so you could express your durability requirement separately from your visibility requirement. Concretely, if none of your potential sync replay standbys are keeping up and they are all dropped to "unavailable", then you'd be able to see a difference: with your proposal we'd still have a synchronous commit wait, but with mine that could independently be on or off. Generally, I think we are too restrictive in our durability levels, and there was some discussion about whether it's OK to have a strict linear knob (which your idea extends): https://www.postgresql.org/message-id/flat/CAEepm%3D3FFaanSS4sugG%2BApzq2tCVjEYCO2wOQBod2d7GWb%3DDvA%40mail.gmail.com Hmm, perhaps your way would be better for now anyway, just because it's simpler to understand and explain. Perhaps you wouldn't need a separate "allow_stale_read" GUC, you could just set synchronous_commit to a lower level when talking to the standby. (That is, give synchronous_commit a meaning on standbys, whereas currently it has no effect there.) > If while a transaction is waiting for all standbys to replay they > became to unavailable state, should the waiter be released? the patch > seems not to release the waiter. Similarly, wal senders are not aware > of postgresql.conf change while waiting synchronous replay. I think we > should call SyncReplayPotentialStandby() in SyncRepInitConfig(). Good point about the postgresql.conf change. If all the standbys go to unavailable state, then a waiter should be released once they have all either acknowledged that they are unavailable (ie acknowledged that their lease has been revoked, via a reply message with a serial number matching the revocation message), or if that doesn't happen (due to lost network connection, crashed process etc), once the any leases that have been issued have expired (ie a few seconds). Is that not what you see? > With the setting synchronous_standby_names = '' and > synchronous_replay_standby_names = '*' we would get the standby's > status in pg_stat_replication, sync_state = 'async' and sync_replay = > 'available'. It looks odd to me. Yes, this status is correct in > principle. But considering the architecture of PostgreSQL replication > this status is impossible. Yes, this is essentially the same thing that you were arguing against above. Perhaps you are right, and there are no people who would want synchronous replay, but not synchronous commit. > The synchronous_replay_standby_name = '*' setting means that the > backend wait for all standbys connected to the master server to > replay, is that right? In my test, even when some of synchronous > replay standby servers got stuck and then therefore are revoked their > lease, the backend could proceed transactions. It means that it waits for all standbys that are "available" to replay. It doesn't wait for the "unavailable" ones. Most of the patch deals with the transitions between those states. During an available->revoking->unavailable transition, we also wait for the standby to know that it is unavailable (so that it begins to raise errors), and during an unavailable->joining->available transition we also wait for the standby to replay the transition LSN (so that it stops raising errors). That way clients on the standby can rely on the error (or lack of error) to tell them whether their snapshot definitely contains every commit that has returned control on the primary. -- Thomas Munro http://www.enterprisedb.com
RE: libpq debug log
Hi Iwata-san, I used your patch for my private work, so I write my opinion and four feedback below. On Fri, Jan 18, 2019 at 8:19 AM, Iwata, Aya wrote: > - Setting whether to get log or not by using connection strings or environment > variables. It means that application source code changes is not needed to get > the log. > - Getting time when receive and send process start/end. Functions too. This merit was very helpful for my use, so I want your proposal function in postgres. The followings are feedback from me. 1) It would be better making the log format the same as the server log format, I think. Your log format: 2019/01/22 04:15:25.496 ... Server log format: 2019-01-22 04:15:25.496 UTC ... There are two differences: One is separator character of date, "/" and "-". The another is standard time information. 2) It was difficult for me to understand the first line message in the log file. "Max log size is 10B, log min level is LEVEL1" Does this mean as follows? "The maximum size of this file is 10 Bytes, the parameter 'log min level' is set to LEVEL 1." 3) Under the circumstance that the environment variables "PGLOGDIR" and "PGLOGSIZE" are set correctly, the log file will also be created when the user connect the server with "psql". Does this follow the specification you have thought? Is there any option to unset only in that session when you want to connect with "psql"? 4) Your patch affects the behavior of PQtrace(). The log of the existing PQtrace() is as follows: From backend> "id" From backend (#4)> 16387 From backend (#2)> 1 From backend (#4)> 23 ... Your patch makes PQtrace() including the following log in addition to the above. To backend> Msg complete, length 27 Start sending message to backend:End sending message to backend:PQsendQuery end :PQgetResult start :Start receiving message from backend:End receiving message from backend:From backend> T ... For your information. Best regards, - Ryohei Nagaura
Re: \describe*
Attached is a patch to add verbose \describe commands to compliment our existing but slightly cryptic family of \d commands. The goals of this are: - aid user discovery of \d-commands via tab completion - make scripts and snippets slightly more self-documenting and understandable - save experienced users that 0.22 seconds where they try to remember what \dFpS+ means or which command lists user mappings. DESIGN CHOICES: Every new command is of the form \describe-some-system-object-type[-system][-verbose]. The -system suffix stands in for the 'S' suffix and -verbose stands in for '+'. New commands used the singular form, not plural. Every new command has a direct analog \d-command, but the reverse is not always true, especially when it comes to the commands that can specify multiple object types. In those cases, there are multiple long versions that correspond to several singular parameters (\describe-view, \describe-materialized-view, \describe-index, etc) but no combinatorics (i.e. no \describe-view-and-foreign-table). There is a \describe-schema and \describe-namespace, both of which perform \dn. There is a \describe-role but no \describe-user or \describe-database-role. I chose \describe-privilege for \dp I chose \describe-type for \dT instead of \describe-data-type. The command \describe-aggregate-function is \dfa, whereas \describe-aggregate is \da. NOTES: There is currently nothing stopping you from using the short form suffixes on long form commands, but the reverse isn't true. For example, you can type \describe-functionS+ and it'll work, but \df-verbose will not. I allow this mostly because it would take work to prevent it. Documentation XML was updated but not formatted to make the diff easier to read. No regression cases were added. Currently our coverage of \d commands in psql ifself is quite minimal: ~/src/postgres$ grep '\\d' src/test/regress/sql/psql.sql | sort | uniq \copyright \dt arg1 \e arg1 arg2 \df exp \d psql_serial_tab_id_seq but perhaps we could test it indirectly in these other areas: ~/src/postgres/src/test/regress/sql$ grep '\\d' * | sed -e 's/^.*\\d/\\d/g' -e 's/ .*//g' | sort | uniq -c 156 \d 2 \d' 1 \d*', 157 \d+ 1 \d{4})', 1 \da 2 \d+):([a-zA-Z0-9+/=]+)\$([a-zA-Z0-9+=/]+):([a-zA-Z0-9+/=]+)', 4 \des 8 \des+ 1 \det+ 4 \deu 6 \deu+ 1 \dew 14 \dew+ 21 \df 1 \dfn 1 \dfp 4 \dp 4 \dRp 6 \dRp+ 2 \dRs 3 \dRs+ 2 \dt On Mon, Jan 29, 2018 at 9:56 AM David Fetter wrote: > On Mon, Jan 29, 2018 at 02:51:53PM +, Ryan Murphy wrote: > > > > > > >What I propose is in fact a server command, >which at least three of > > > >the other popular RDBMSs already have. > > > > > Well to actually implement it, it would probably be a client command, > > because that's what \d* are. > > Why should this command be silo'ed off to the psql client? If it's a > server command, it's available to all clients, not just psql. > > > We would most likely want them implemented the same, to avoid > > needless complexity. > > We could certainly have \d call DESCRIBE for later versions of the > server. \ commands which call different SQL depending on server > version have long been a standard practice. > > > I think people are more ok with \describe (with the backslash), which > seems > > like what you're suggesting anyway. I read Vik's "hard pass" as being on > > having DESCRIBE which looks like an SQL command but would actually be > > implemented on the client. This seems simpler at first but could cause > > deep confusion later. > > If we implement \d as DESCRIBE for server versions as of when DESCRIBE > is actually implemented, we've got wins all around. > > Best, > David. > -- > David Fetter http://fetter.org/ > Phone: +1 415 235 3778 > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > From e67e61ae789b09c98fe03378c819224d838c2f65 Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Fri, 25 Jan 2019 00:57:23 + Subject: [PATCH] Add \describe commands to compliment \d commands --- doc/src/sgml/ref/psql-ref.sgml | 175 - src/bin/psql/command.c | 132 - src/bin/psql/describe.c| 13 ++- src/bin/psql/describe.h| 3 + src/bin/psql/tab-complete.c| 135 - 5 files changed, 381 insertions(+), 77 deletions(-) diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 6c76cf2f00..363d6d9678 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -871,6 +871,17 @@ testdb=> same line. + +The family of meta-commands starting with \d often +have an equivalent \describe- "long form" command. +The long-form commands often have the suffixes -system +and -verbose which are the equivalent of the +short form suffixes S and + +resp
RE: pg_upgrade: Pass -j down to vacuumdb
Hi, According to CF app, this patch needs review so I took a look at it. Currently, your patch applies and builds cleanly, and all tests are also successful based from the CF bot patch tester. I'm not sure if I have understood the argument raised by Peter correctly. Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same -j number should be used." I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum. According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU cores and tablespaces. [1] As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2], which is the max # of concurrent connections to your db server. [1] https://www.postgresql.org/docs/current/pgupgrade.html [2] https://www.postgresql.org/docs/current/app-vacuumdb.html Regards, Kirk Jamison
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Jan 25, 2019 at 1:03 AM John Naylor wrote: > > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila wrote: > > I think what doc means to say is > > that it copies any unlinked files present in primary's new cluster > > (which in your case will be data2). > > In that case, I'm still confused why that doc says, "Unfortunately, > rsync needlessly copies files associated with temporary and unlogged > tables because these files don't normally exist on standby servers." > I fail to see why the primary's new cluster would have these if they > weren't linked. > Why unlogged files won't be in primary's new cluster? After the upgrade, they should be present in a new cluster if they were present in the old cluster. > And in the case we're discussing here, the skipped > FSMs won't be on data2, so won't end up in standby/data2. > Right. I think we are safe with respect to rsync because I have seen that we do rewrite the vm files in link mode and rsync will copy them from primary's new cluster. I think you can try to address my other comments on your pg_upgrade patch. Once we agree on the code, we need to test below scenarios: (a) upgrade from all supported versions to the latest version (b) upgrade standby with and without using rsync. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: backslash-dot quoting in COPY CSV
On Wed, Jan 2, 2019 at 04:58:35PM +0100, Daniel Verite wrote: > Hi, > > The doc on COPY CSV says about the backslash-dot sequence: > > To avoid any misinterpretation, a \. data value appearing as a > lone entry on a line is automatically quoted on output, and on > input, if quoted, is not interpreted as the end-of-data marker > > However this quoting does not happen when \. is already part > of a quoted field. Example: > > COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV; > > outputs: > > somevalue,"foo > \. > bar" > > which conforms to the CSV rules, by which we are not allowed > to replace \. by anything AFAICS. > The trouble is, when trying to import this back with COPY FROM, > it will error out at the backslash-dot and not import anything. > Furthermore, if these data are meant to be embedded into a > script, it creates a potential risk of SQL injection. > > It is a known issue? I haven't found previous discussions on this. > It looks to me like the ability of backslash-dot to be an end-of-data > marker should be neutralizable for CSV. When the data is not embedded, > it's not needed anyway, and when it's embedded, we could surely think > of alternatives. I was unable to reproduce the failure here using files: CREATE TABLE test (x TEXT); INSERT INTO test VALUES (E'foo\n\\.\nbar'); COPY test TO STDOUT CSV; "foo \. bar" COPY test TO '/u/postgres/tmp/x' CSV; COPY test FROM '/u/postgres/tmp/x' CSV; SELECT * FROM test; x - foo+ \. + bar foo+ \. + bar but I am able to see the failure using STDIN: COPY test FROM STDIN CSV; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself, or an EOF signal. "foo \. ERROR: unterminated CSV quoted field CONTEXT: COPY test, line 1: ""foo This seems like a bug to me. Looking at the code, psql issues the prompts for STDIN, but when it sees \. alone on a line, it has no idea you are in a quoted CSV string, so it thinks the copy is done and sends the result to the server. I can't see an easy way to fix this. I guess we could document it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Use zero for nullness estimates of system attributes
I added some code to selfuncs.c to estimate the selectivity of CTID, including nullness, in my ongoing attempt to add TID range scans [1]. And as Tom pointed out [2], no system attribute can be null, so we might as well handle them all. That's what the attached patch does. I observed a few interesting things with outer join selectivity: While system attributes aren't NULL in the table, they can be in queries such as: SELECT * FROM a LEFT JOIN b ON a.id = b.id WHERE b.ctid IS NULL; And the patch does affect the estimates for such plans. But it's just replacing one hardcoded nullness (0.005) for another (0.0), which seems no worse than the original. I was a bit concerned that with, for example, CREATE TABLE a (id INTEGER); INSERT INTO a SELECT * FROM generate_series(1,1000); ANALYZE a; CREATE TABLE b (id INTEGER, id2 INTEGER); INSERT INTO b SELECT *, * FROM generate_series(1,10); ANALYZE b; EXPLAIN ANALYZE SELECT * FROM a LEFT JOIN b ON a.id = b.id WHERE b.ctid IS NULL; you get a row estimate of 1 (vs the actual 990). It's not specific to system attributes. Plain left-join selectivity calculation doesn't seem to take into account the join selectivity, while anti-join calculation does. I do not think this affects the usefulness of the present patch, but maybe it's something we could improve. Finally: I thought about introducing a macro to attnum.h: /* * AttrNumberIsForSystemAttr * True iff the attribute number corresponds to a system attribute. */ #define AttrNumberIsForSystemAttr(attributeNumber) \ ((bool) ((attributeNumber) < 0)) But there's a zillion places that could be changed to use it, so I haven't in this version of the patch. Edmund [1] https://www.postgresql.org/message-id/flat/31682.1545415852%40sss.pgh.pa.us#bdca5c18ed64f847f44c2645f98ea3a0 [2] https://www.postgresql.org/message-id/31682.1545415852%40sss.pgh.pa.us v1-nullness-selectivity-for-system-cols.patch Description: Binary data
Re: proposal - plpgsql unique statement id
čt 24. 1. 2019 v 23:08 odesílatel Tom Lane napsal: > Peter Eisentraut writes: > > committed > > Why didn't this patch modify the dumping logic in pl_funcs.c to print > the IDs? I'm not aware of other cases where we intentionally omit > fields from debug-support printouts. > I had not a idea, so this field can be useful there. I'll send a patch with it. Regards Pavel > regards, tom lane >
Re: PostgreSQL vs SQL/XML Standards
pá 25. 1. 2019 v 5:46 odesílatel Chapman Flack napsal: > Hi, > > On 01/21/19 01:33, Pavel Stehule wrote: > > ne 20. 1. 2019 23:13 odesílatel Chapman Flack > > napsal: > > >> form (whether or not the word LATERAL is used), and re-executes xmltable > >> whenever the referenced column value changes. In that case, whether the > >> default argument is evaluated at function entry or later doesn't seem > >> to matter: the function is re-executed, so evaluating the new default > >> at the time of entry is sufficient. > > > > it has sense only for volatile functions. it was not often. On second > hand > > deferred evaluation shoul not be a problem, and more, it is evaluated > only > > when it is necessary, what is more sensible for me. > > That makes sense. I trimmed the language about input rows and referring to > earlier columns, and put more emphasis on the usefulness of evaluating > volatile functions only when needed. > > I am: > > - re-attaching xmltable-xpath-result-processing-bugfix-5.patch unchanged > (just so CF app does not lose track) > > - re-attaching xmltable-xmlexists-passing-mechanisms-1.patch unchanged > > - attaching for the first time xml-functions-type-docfix-1.patch > > The doc patch is made to go on top of the passing-mechanisms patch > (there were some doc changes in passing-mechanisms, changed again in > the new patch to be links to the added Limits and Compatibility section). > > I hope the patched docs describe accurately what we have at this point. > looks well Pavel > Regards, > -Chap >
RE: Protect syscache from bloating with negative cache entries
From: Robert Haas [mailto:robertmh...@gmail.com] > On Thu, Jan 24, 2019 at 10:02 AM Tom Lane wrote: > > I will argue hard that we should not do it at all, ever. > > > > There is already a mechanism for broadcasting global GUC changes: > > apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP. > > I do not think we need something that can remotely change a GUC's > > value in just one session. The potential for bugs, misuse, and > > just plain confusion is enormous, and the advantage seems minimal. > > I think there might be some merit in being able to activate debugging > or tracing facilities for a particular session remotely, but designing > something that will do that sort of thing well seems like a very > complex problem that certainly should not be sandwiched into another > patch that is mostly about something else. And if we ever get such a > thing I suspect it should be entirely separate from the GUC system. +1 for a separate patch for remote session configuration. ALTER SYSTEM + SIGHUP targeted at a particular backend would do if the DBA can log into the database server (so, it can't be used for DBaaS.) It would be useful to have pg_reload_conf(pid). Regards Takayuki Tsunakawa