Re: [HACKERS] generated columns
On 15/01/2019 08:18, Pavel Stehule wrote: > I would to have a mechanism for safe replacement of triggers of type > > if TG_TYPE = 'INSERT' THEN > NEW.inserted := CURRENT_TIMESTAMP; > ELSE IF TG_TYPE = 'UPDATE' THEN > NEW.updated := CURRENT_TIMESTAMP; > .. That kind of use is probably better addressed with a temporal facility. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Libpq support to connect to standby server as priority
From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > I don't know what PgJDBC is doing, however I think libpq needs to do > more than just retrying. > > 1) Try to find a node on which pg_is_in_recovery() returns false. If >found, then we assume that is the primary. We also assume that >other nodes are standbys. done. > > 2) If there's no node on which pg_is_in_recovery() returns false, then >we need to retry until we find it. To not retry forever, there >should be a timeout counter parameter. It may be convenient for libpq to be able to retry connection attempts for a specified duration (by failover_timeout or such), because it eliminates the need for the application to do the retry. But I think it's a desirable feature, not a required one. Regards Takayuki Tsunakawa
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote: > NOTICE seems unnecessary here. > > Unfortunally concurrently reindex loses comments, reproducer: Yes, the NOTICE message makes little sense. I am getting back in touch with this stuff. It has been some time but the core of the patch has not actually changed in its base concept, so I am still very familiar with it as the original author. There are even typos I may have introduced a couple of years back, like "contraint". I have not yet spent much time on that, but there are at quick glance a bunch of things that could be retouched to get pieces of that committable. +The concurrent index created during the processing has a name ending in +the suffix ccnew, or ccold if it is an old index definiton which we failed +to drop. Invalid indexes can be dropped using DROP INDEX, +including invalid toast indexes. This needs markups for "ccnew" and "ccold". "definiton" is not correct. index_create does not actually need its extra argument with the tuple descriptor. I think that we had better grab the column name list from indexInfo and just pass that down to index_create() (patched on my local branch), so it is an overkill to take a full copy of the index's TupleDesc. The patch, standing as-is, is close to 2k lines long, so let's cut that first into more pieces refactoring the concurrent build code. Here are some preliminary notes: - WaitForOlderSnapshots() could be in its own patch. - index_concurrently_build() and index_concurrently_set_dead() can be in an independent patch. set_dead() had better be a wrapper on top of index_set_state_flags actually which is able to set any kind of flags. - A couple of pieces in index_create() could be cut as well. I can send patches for those things as first steps which could happen in this commit then, and commit them as needed. This way, we reduce the size of the main patch. Even if the main portion does not get into v12, we'd still have base pieces to move on next. Regarding the grammar, we tend for the last couple of years to avoid complicating the main grammar and move on to parenthesized grammars (see VACUUM, ANALYZE, EXPLAIN, etc). So in the same vein I think that it would make sense to only support CONCURRENTLY within parenthesis and just plugin that with the VERBOSE option. Does somebody mind if I jump into the ship after so long? I was the original author of the monster after all... -- Michael signature.asc Description: PGP signature
Re: [HACKERS] generated columns
st 16. 1. 2019 v 9:26 odesílatel Peter Eisentraut < peter.eisentr...@2ndquadrant.com> napsal: > On 15/01/2019 08:18, Pavel Stehule wrote: > > I would to have a mechanism for safe replacement of triggers of type > > > > if TG_TYPE = 'INSERT' THEN > > NEW.inserted := CURRENT_TIMESTAMP; > > ELSE IF TG_TYPE = 'UPDATE' THEN > > NEW.updated := CURRENT_TIMESTAMP; > > .. > > That kind of use is probably better addressed with a temporal facility. > yes. I am looking for this functionality in Postgres Pavel > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: New vacuum option to do only freezing
On Wed, Jan 16, 2019 at 5:24 AM Robert Haas wrote: > > On Mon, Jan 14, 2019 at 9:24 PM Masahiko Sawada wrote: > > I think that because the tuples that got dead after heap_page_prune() > > looked are recorded but not removed without lazy_vacuum_page() we need > > to process them in lazy_vacuum_page(). For decision about whether to > > truncate we should not change it, so I will fix it. It should be an > > another option to control whether to truncate if we want. > > I think that heap_page_prune() is going to truncate dead tuples to > dead line pointers. At that point the tuple is gone. The only > remaining step is to mark the dead line pointer as unused, which can't > be done without scanning the indexes. So I don't understand what > lazy_vacuum_page() would need to do in this case. > vacuumlazy.c:L1022 switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf)) { case HEAPTUPLE_DEAD: /* * Ordinarily, DEAD tuples would have been removed by * heap_page_prune(), but it's possible that the tuple * state changed since heap_page_prune() looked. In * particular an INSERT_IN_PROGRESS tuple could have * changed to DEAD if the inserter aborted. So this * cannot be considered an error condition. * (snip) */ if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple)) nkeep += 1; else tupgone = true; /* we can delete the tuple */ all_visible = false; break; As the above comment says, it's possible that the state of an INSERT_IN_PROGRESS tuple could be changed to 'dead' after heap_page_prune(). Since such tuple is not truncated at this point we record it and set it as UNUSED in lazy_vacuum_page(). I think that the DISABLE_INDEX_CLEANUP case is the same; we need to process them after recorded. Am I missing something? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Log a sample of transactions
On 1/16/19 1:40 AM, Kuroda, Hayato wrote: By the way, I think the name of this parameter should be changed. In the Cambridge dictionary, the word "rate" means as follows: the speed at which something happens or changes, or the amount or number of times it happens or changes in a particular period. This parameter represents "probability" whether it log, therefore this name is inappropriate. You should use "probability" or "ratio", I think. Hum, I am not an english native speaker, I choosed "rate" because it is already used for auto_explain.sample_rate and for log_statement_sample_rate Next week I will give you some comments about your good source. Thanks!
Re: FETCH FIRST clause WITH TIES option
On Tue, Jan 15, 2019 at 2:51 PM Tomas Vondra wrote: > What do you mean by "raw statistic"? > I mean without further calculation to consider other operation > > IMHO the best thing you can do is call estimate_num_groups() and combine > that with the number of input rows. That shall benefit from ndistinct > coefficients when available, etc. I've been thinking that considering > the unreliability of grouping estimates we should use a multiple of the > average size (because there may be much larger groups), but I think > that's quite unprecipled and I'd much rather try without it first. > > thank you for clarifying. The attache patch use your suggestion uptread regards Surafel diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 4db8142afa..ed7249daeb 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1397,7 +1397,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] In this syntax, the start or count value is required by @@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW +ROW . +WITH TIES option is used to return two or more rows +that tie for last place in the limit results set according to ORDER BY +clause (ORDER BY clause must be specified in this case). and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses. diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index baa669abe8..5e7790c0d6 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -41,6 +41,7 @@ static TupleTableSlot * /* return: a tuple or NULL */ ExecLimit(PlanState *pstate) { LimitState *node = castNode(LimitState, pstate); + ExprContext *econtext = node->ps.ps_ExprContext; ScanDirection direction; TupleTableSlot *slot; PlanState *outerPlan; @@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate) * the state machine state to record having done so. */ if (!node->noCount && - node->position - node->offset >= node->count) + node->position - node->offset >= node->count && + node->limitOption == WITH_ONLY) { node->lstate = LIMIT_WINDOWEND; @@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate) return NULL; } -/* - * Get next tuple from subplan, if any. - */ -slot = ExecProcNode(outerPlan); -if (TupIsNull(slot)) +else if (!node->noCount && + node->position - node->offset >= node->count && + node->limitOption == WITH_TIES) { - node->lstate = LIMIT_SUBPLANEOF; - return NULL; + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + /* + * Test if the new tuple and the last tuple match. + * If so we return the tuple. + */ + econtext->ecxt_innertuple = slot; + econtext->ecxt_outertuple = node->last_slot; + if (ExecQualAndReset(node->eqfunction, econtext)) + { + ExecCopySlot(node->last_slot, slot); + node->subSlot = slot; + node->position++; + } + else + { + node->lstate = LIMIT_WINDOWEND; + + /* + * If we know we won't need to back up, we can release + * resources at this point. + */ + if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD)) + (void) ExecShutdownNode(outerPlan); + + return NULL; + } + +} +else +{ + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + if (node->limitOption == WITH_TIES) + { + ExecCopySlot(node->last_slot, slot); + } + node->subSlot = slot; + node->position++; } -node->subSlot = slot; -node->position++; } else { @@ -311,7 +360,8 @@ recompute_limits(LimitState *node) * must update the child node anyway, in case this is a rescan and the * previous time we got a different result. */ - ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node)); + if(node->limitOption == WITH_ONL
Re: Log a sample of transactions
On Wed, Jan 16, 2019 at 2:03 AM Adrien NAYRAT wrote: > > On 1/15/19 11:42 AM, Masahiko Sawada wrote: > >> When you troubleshoot applicative issues with multi-statements > >> transaction, you may have to log all queries to find all statements of one > >> transaction. With high throughput, it could be hard to log all queries > >> without causing troubles. > > Hm, can we use log_min_duration_statement to find slow queries of a > > transaction instead? Could you please elaborate on the use-case? > > Hello, > > The goal is not to find slow queries in a transaction, but troubleshoot > applicative issue when you have short queries. > > Sometimes you want to understand what happens in a transaction, either > you perfectly know your application, either you have to log all queries > and find ones with the same transaction ID (%x). It could be problematic > if you have a huge traffic with fast queries. > Thank you for the explain! I understood the use case of this patch. This feature would be helpful for troubleshooting. Since we set xact_is_sampled only when transaction starts and see it during transaction we cannot disable logging during transaction and vice versa. I can imagine the use case where user wants to disable logging during transaction. So it might be better to also check if log_xact_sample_rate > 0 in check_log_duration(). Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Python versions (was Re: RHEL 8.0 build)
On 07/01/2019 00:16, Tom Lane wrote: > BTW, this is a pre-existing problem not the fault of this patch, > but while we're fooling with the behavior of python lookup would > be a great time to fix it: we should add something like > > AC_ARG_VAR([PYTHON], [path to Python executable]) > > Aside from improving the "configure --help" documentation, this > will prevent insane results from using inapplicable cached > conclusions about what Python we've got. Since more and more > people seem to be using configure cache files, I think we need > to be more careful about declaring relevant variables. Patch attached. I went through the environment variables listed in installation.sgml and came up with the following that would affect subsequent cacheable tests: MSGFMT PERL PYTHON TCLSH The following are listed but don't affect any other tests, so I didn't include them: BISON DTRACE DTRACEFLAGS FLEX XML2_CONFIG -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From 9ba9c7f7e56714535290776cb622a610bc599d8d Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 16 Jan 2019 10:05:42 +0100 Subject: [PATCH] configure: More use of AC_ARG_VAR AC_ARG_VAR is necessary if an environment variable influences a configure result that is then used by other tests that are cached. With AC_ARG_VAR, a change in the variable is detected on subsequent configure runs and the user is then advised to remove the cache. This adds AC_ARG_VAR calls for: MSGFMT, PERL, PYTHON, TCLSH --- config/perl.m4 | 1 + config/programs.m4 | 1 + config/python.m4 | 1 + config/tcl.m4 | 1 + configure | 12 ++-- 5 files changed, 14 insertions(+), 2 deletions(-) diff --git a/config/perl.m4 b/config/perl.m4 index caefb0705e..059e31c476 100644 --- a/config/perl.m4 +++ b/config/perl.m4 @@ -5,6 +5,7 @@ # -- AC_DEFUN([PGAC_PATH_PERL], [PGAC_PATH_PROGS(PERL, perl) +AC_ARG_VAR(PERL, [Perl program])dnl if test "$PERL"; then pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z ]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']` diff --git a/config/programs.m4 b/config/programs.m4 index aa84bfdb9e..21888cb68f 100644 --- a/config/programs.m4 +++ b/config/programs.m4 @@ -245,6 +245,7 @@ AC_DEFUN([PGAC_CHECK_GETTEXT], AC_CHECK_HEADER([libintl.h], [], [AC_MSG_ERROR([header file is required for NLS])]) PGAC_PATH_PROGS(MSGFMT, msgfmt) + AC_ARG_VAR(MSGFMT, [msgfmt program for NLS])dnl if test -z "$MSGFMT"; then AC_MSG_ERROR([msgfmt is required for NLS]) fi diff --git a/config/python.m4 b/config/python.m4 index 9a4d12112e..c51aa4e332 100644 --- a/config/python.m4 +++ b/config/python.m4 @@ -17,6 +17,7 @@ # newer version. AC_DEFUN([PGAC_PATH_PYTHON], [PGAC_PATH_PROGS(PYTHON, [python python3 python2]) +AC_ARG_VAR(PYTHON, [Python program])dnl if test x"$PYTHON" = x""; then AC_MSG_ERROR([Python not found]) fi diff --git a/config/tcl.m4 b/config/tcl.m4 index 581471f338..9de31a5715 100644 --- a/config/tcl.m4 +++ b/config/tcl.m4 @@ -5,6 +5,7 @@ AC_DEFUN([PGAC_PATH_TCLSH], [PGAC_PATH_PROGS(TCLSH, [tclsh tcl tclsh8.6 tclsh86 tclsh8.5 tclsh85 tclsh8.4 tclsh84]) +AC_ARG_VAR(TCLSH, [Tcl interpreter program (tclsh)])dnl if test x"$TCLSH" = x""; then AC_MSG_ERROR([Tcl shell not found]) fi diff --git a/configure b/configure index 06fc3c6835..f3639c1363 100755 --- a/configure +++ b/configure @@ -888,7 +888,11 @@ PKG_CONFIG_LIBDIR ICU_CFLAGS ICU_LIBS LDFLAGS_EX -LDFLAGS_SL' +LDFLAGS_SL +PERL +PYTHON +MSGFMT +TCLSH' # Initialize some variables set by options. @@ -1588,6 +1592,10 @@ Some influential environment variables: ICU_LIBSlinker flags for ICU, overriding pkg-config LDFLAGS_EX extra linker flags for linking executables only LDFLAGS_SL extra linker flags for linking shared libraries only + PERLPerl program + PYTHON Python program + MSGFMT msgfmt program for NLS + TCLSH Tcl interpreter program (tclsh) Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. @@ -18097,7 +18105,7 @@ $as_echo_n "checking for MSGFMT... " >&6; } $as_echo "$MSGFMT" >&6; } fi - if test -z "$MSGFMT"; then +if test -z "$MSGFMT"; then as_fn_error $? "msgfmt is required for NLS" "$LINENO" 5 fi { $as_echo "$as_me:${as_lineno-$LINENO}: checking for msgfmt flags" >&5 -- 2.20.1
Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0
(2019/01/16 15:21), Ashutosh Bapat wrote: On Wed, Jan 16, 2019 at 11:22 AM Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote: (2019/01/15 13:29), Ashutosh Bapat wrote: > I think, there's something better possible. Two partitioned relations > won't use partition-wise join, if their partition schemes do not match. > Partitioned relations with same partitioning scheme share > PartitionScheme pointer. PartitionScheme structure should get an extra > counter, maintaining a count of number of partitioned relations sharing > that structure. When this counter is 1, that relation is certainly not > going to participate in PWJ and thus need not have all the structure > required by PWJ set up. If we use this counter coupled with > enable_partitionwise_join flag, we can get rid of > consider_partitionwise_join flag altogether, I think. Interesting! That flag was introduced to disable PWJs when whole-row Vars are involved, as you know, so I think we need to first eliminate that limitation, to remove that flag. For that we don't need a separate flag. Do we? AFAIR, somewhere under try_partitionwise_join() we check whether PWJ is possible between two relations. That involves a bunch of checks like checking whether the relations have same bounds. Those checks should be enhanced to incorporate existence of whole-var, I think. Yeah, that check is actually done in build_joinrel_partition_info(), which is called from build_join_rel() and build_child_join_rel() (only the latter is called from try_partitionwise_join()). That flag is used in build_joinrel_partition_info() for that check, but as you mentioned, I think it would be possible to remove that flag, probably by checking the WRV existence from the outer_rel/inner_rel's reltarget, instead of that flag. But I'm not sure we can do that efficiently without complicating the existing code including the original PWJ one. That flag doesn't make that code complicated. I thought it would be better to not complicate that code, because disabling such PWJs would be something temporary until we support them. Anyway, I think this would be a separate issue from the original one we discussed on this thread. Best regards, Etsuro Fujita
Re: de-deduplicate code in DML execution hooks in postgres_fdw
Michael-san, (2019/01/16 15:54), Michael Paquier wrote: On Wed, Jan 16, 2019 at 02:59:15PM +0900, Etsuro Fujita wrote: (2019/01/07 20:26), Etsuro Fujita wrote: On second thought I'd like to propose another option: execute_foreign_modify because I think this would match the existing helper functions for updating foreign tables in postgres_fdw.c better, such as create_foreign_modify, prepare_foreign_modify and finish_foreign_modify. I don't really think the function name should contain "one" or "single_row". Like the names of the calling APIs (ie, ExecForeignInsert, ExecForeignUpdate and ExecForeignDelete), I think it's OK to omit such words from the function name. Here is an updated version of the patch. In addition to the naming, I tweaked the function a little bit to match other functions (mainly variable names), moved it to the place where the helper functions are defined, fiddled with some comments, and removed an extra include file that the original patch added. If there are no objections, I'll commit that version of the patch. I think that you could use PgFdwModifyState for the second argument of execute_foreign_modify instead of ResultRelInfo. Yeah, that is another option, but my favorite would be to use ResultRelInfo, as in the original patch by Ashutosh, because that makes it possible to write postgresExecForeignInsert, postgresExecForeignUpdate, and postgresExecForeignDelete as a single line. Except of that nit, it looks fine to me, thanks for taking care of it. Great! Thanks for the review! Best regards, Etsuro Fujita
Redundant filter in index scan with a bool column
Hi hackers, Consider this query plan: create table t (i int, b bool); create index on t(i, b); set enable_bitmapscan to off; explain select * from t where i = 300 and b; QUERY PLAN - Index Only Scan using t_i_b_idx on t (cost=0.15..24.27 rows=6 width=5) Index Cond: ((i = 300) AND (b = true)) Filter: b The filter is not needed, why is it there? Turns out we can't recognize that the restriction clause 'b' and the index clause 'b = true' are equivalent. My first reaction was to patch operator_predicate_proof to handle this case, but there is a more straightforward way: mark the expanded index clause as potentially redundant when it is generated in expand_indexqual_conditions. There is already RestrictInfo.parent_ec that is used to mark redundant EC-derived join clauses. The patch renames it to rinfo_parent and uses it to mark the expanded index clauses. Namely, for both the expanded and the original clause, rinfo_parent points to the original clause or its generating EC, if set. The name is no good -- the original clause is not a parent of itself, after all. I considered something like redundancy_tag, but some places actually use the fact that it points to the generating EC, so I don't like this name either. What do you think? -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From 144dd4a72fd8278aa6d44865d97239b9fad6965d Mon Sep 17 00:00:00 2001 From: Alexander Kuzmenkov Date: Wed, 16 Jan 2019 14:31:26 +0300 Subject: [PATCH] Recognize that an expanded bool index clause is equivalent to the original one --- src/backend/nodes/copyfuncs.c | 2 +- src/backend/nodes/outfuncs.c | 2 +- src/backend/optimizer/path/costsize.c | 8 ++--- src/backend/optimizer/path/equivclass.c| 28 - src/backend/optimizer/path/indxpath.c | 22 ++ src/backend/optimizer/plan/createplan.c| 48 +++--- src/backend/optimizer/util/restrictinfo.c | 2 +- src/include/nodes/relation.h | 10 +-- src/test/regress/expected/create_index.out | 9 ++ 9 files changed, 71 insertions(+), 60 deletions(-) diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 006a3d1..9acb081 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -2243,7 +2243,7 @@ _copyRestrictInfo(const RestrictInfo *from) COPY_BITMAPSET_FIELD(right_relids); COPY_NODE_FIELD(orclause); /* EquivalenceClasses are never copied, so shallow-copy the pointers */ - COPY_SCALAR_FIELD(parent_ec); + COPY_SCALAR_FIELD(rinfo_parent); COPY_SCALAR_FIELD(eval_cost); COPY_SCALAR_FIELD(norm_selec); COPY_SCALAR_FIELD(outer_selec); diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 0fde876..980e9c6 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -2436,7 +2436,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node) WRITE_BITMAPSET_FIELD(left_relids); WRITE_BITMAPSET_FIELD(right_relids); WRITE_NODE_FIELD(orclause); - /* don't write parent_ec, leads to infinite recursion in plan tree dump */ + /* don't write rinfo_parent, leads to infinite recursion in plan tree dump */ WRITE_FLOAT_FIELD(norm_selec, "%.4f"); WRITE_FLOAT_FIELD(outer_selec, "%.4f"); WRITE_NODE_FIELD(mergeopfamilies); diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 99c5ad9..f350919 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -4673,7 +4673,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root, /* Drop this clause if it matches any column of the FK */ for (i = 0; i < fkinfo->nkeys; i++) { -if (rinfo->parent_ec) +if (rinfo->rinfo_parent) { /* * EC-derived clauses can only match by EC. It is okay to @@ -4683,12 +4683,12 @@ get_foreign_key_join_selectivity(PlannerInfo *root, * have generated one equating the FK's Vars. So for * purposes of estimation, we can act as though it did so. * - * Note: checking parent_ec is a bit of a cheat because - * there are EC-derived clauses that don't have parent_ec + * Note: checking parent EC is a bit of a cheat because + * there are EC-derived clauses that don't have parent EC * set; but such clauses must compare expressions that * aren't just Vars, so they cannot match the FK anyway. */ - if (fkinfo->eclass[i] == rinfo->parent_ec) + if ((Node *) fkinfo->eclass[i] == rinfo->rinfo_parent) { remove_it = true; break; diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 6e134ae..b18d8b6 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1
Re: [PROPOSAL] Shared Ispell dictionaries
Hello Tomas, On 16.01.2019 03:23, Tomas Vondra wrote: I've looked at the patch today, and in general is seems quite solid to me. I do have a couple of minor points 1) I think the comments need more work. Instead of describing all the individual changes here, I've outlined those improvements in attached patches (see the attached "tweaks" patches). Some of it is formatting, minor rewording or larger changes. Some comments are rather redundant (e.g. the one before calls to release the DSM segment). Thank you! 2) It's not quite clear to me why we need DictInitData, which simply combines DictPointerData and list of options. It seems as if the only point is to pass a single parameter to the init function, but is it worth it? Why not to get rid of DictInitData entirely and pass two parameters instead? In the first place init method had two parameters. But in the v7 patch I added DictInitData struct instead of two parameters (list of options and DictPointerData): https://www.postgresql.org/message-id/20180319110648.GA32319%40zakirov.localdomain I haven't way to replace template's init method from init_method(internal) to init_method(internal,internal) in the upgrade script of extensions. If I'm not mistaken we need new syntax here, like ALTER TEXT SEARCH TEMPLATE. Thoughts? 3) I find it a bit cumbersome that before each ts_dict_shmem_release call we construct a dummy DickPointerData value. Why not to pass individual parameters and construct the struct in the function? Agree, it may look too verbose. I'll change it. 4) The reference to max_shared_dictionaries_size is obsolete, because there's no such limit anymore. Yeah, I'll fix it. > /* XXX not really a pointer, so the name is misleading */ I think we don't need DictPointerData struct anymore, because only ts_dict_shmem_release function needs it (see comments above) and we only need it to hash search. I'll move all fields of DictPointerData to TsearchDictKey struct. > XXX "supported" is not the same as "all ispell dicts behave like that". I'll reword the sentence. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
ArchiveEntry optional arguments refactoring
Hi, During the discussion in [1] an idea about refactoring ArchiveEntry was suggested. The reason is that currently this function has significant number of arguments that are "optional", and every change that has to deal with it introduces a lot of useless diffs. In the thread, mentioned above, such an example is tracking current table access method, and I guess "Remove WITH OIDS" commit 578b229718e is also similar. Proposed idea is to refactor out all/optional arguments into a separate data structure, so that adding/removing a new argument wouldn't change that much of unrelated code. Then for every invocation of ArchiveEntry this structure needs to be prepared before the call, or as Andres suggested: ArchiveEntry((ArchiveArgs){.tablespace = 3, .dumpFn = somefunc, ...}); Another suggestion from Amit is to have an ArchiveEntry() function with limited number of parameters, and an ArchiveEntryEx() with those extra parameters which are not needed in usual cases. I want to prepare a patch for that, and I'm inclined to go with the first option, but since there are two solutions to choose from, I would love to hear more opinion about this topic. Any pros/cons we don't see yet? [1]: https://www.postgresql.org/message-id/flat/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de
Re: ArchiveEntry optional arguments refactoring
> On Wed, Jan 16, 2019 at 1:16 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > Hi, > > During the discussion in [1] an idea about refactoring ArchiveEntry was > suggested. The reason is that currently this function has significant number > of > arguments that are "optional", and every change that has to deal with it > introduces a lot of useless diffs. In the thread, mentioned above, such an > example is tracking current table access method, and I guess "Remove WITH > OIDS" > commit 578b229718e is also similar. > > Proposed idea is to refactor out all/optional arguments into a separate data > structure, so that adding/removing a new argument wouldn't change that much of > unrelated code. Then for every invocation of ArchiveEntry this structure needs > to be prepared before the call, or as Andres suggested: > > ArchiveEntry((ArchiveArgs){.tablespace = 3, >.dumpFn = somefunc, >...}); > > Another suggestion from Amit is to have an ArchiveEntry() function with > limited > number of parameters, and an ArchiveEntryEx() with those extra parameters > which > are not needed in usual cases. > > I want to prepare a patch for that, and I'm inclined to go with the first > option, but since there are two solutions to choose from, I would love to hear > more opinion about this topic. Any pros/cons we don't see yet? > > [1]: > https://www.postgresql.org/message-id/flat/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de [CC Andres and Amit]
jsonb_set for an array with an index outside of boundaries
Hi, Per discussion in [1] about generic type subscripting pathc Pavel had interesting commentary. So far jsonb_set, if is invoked for a jsonb array with an index outside of the array boundaries, will implicitely add a value: =# insert into table values('["a", "b", "c"]'); =# update table set data = jsonb_set(data, '{1000}', '"d"'); =# table test; =# table test; data -- ["a", "b", "c", "d"] This is perfectly documented feature, there are no questions here. But for generic type subscripting infrastructure I'm introducing another, more convenient, syntax: =# update table test set data['selector'] = 'value'; Since the idea behind generic type subsripting patch is just to introduce extendable subscripting operation for different data types, here I'm relying on already existing functionality for jsonb. But then this feature of jsonb_set indeed became more confusing with the new syntax. =# update table test set data[1000] = 'd'; =# table test; data -- ["a", "b", "c", "d"] Unfortunately, the only alternative option here would be to return an error and reject such a value, which differs from jsonb_set. I would like to ask , what would be the best solution here - to keep this confusing behaviour, or to have two different implementation of updating jsonb functionality (one for jsonb_set, another for subscripting)? [1]: https://www.postgresql.org/message-id/CA%2Bq6zcXmwR9BDrcf188Mcz5%2BjU8DaqrrOat2mzizKf-nYgDXkg%40mail.gmail.com
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Tue, 15 Jan 2019 at 08:21, Tomas Vondra wrote: > Turns out you were right - the attribute_referenced piece was quite > unnecessary. So I've removed it. I've also extended the regression tests > to verify changing type of another column does not reset the stats. (Trying to find my feet over here) I've read over the entire thread, and apart from missing the last two emails and therefore the latest patch, I managed to read over most of the MCV patch. I didn't quite get to reading mcv.c and don't quite have the energy to take that on now. At this stage I'm trying to get to know the patch. I read a lot of discussing between you and Dean ironing out how the stats should be used to form selectivities. At the time I'd not read the patch yet, so most of it went over my head. I did note down a few things on my read. I've included them below. Hopefully, they're useful. MCV list review 1. In mvc.c there's Assert(ndistinct <= UINT16_MAX); This should be PG_UINT16_MAX 2. math.h should be included just after postgres.h 3. Copyright is still -2017 in mcv.c. Hopefully, if you change it to 2019, you'll never have to bump it ever again! :-) 4. Looking at pg_stats_ext_mcvlist_items() I see you've coded the string building manually. The way it's coded I'm finding a little strange. It means the copying becomes quadratic due to snprintf(buff, 1024, format, values[1], DatumGetPointer(valout)); strncpy(values[1], buff, 1023); So basically, generally, here you're building a new string with values[1] followed by a comma, then followed by valout. One the next line you then copy that new buffer back into values[1]. I understand this part is likely not performance critical, but I see no reason to write the code this way. Are you limiting the strings to 1024 bytes on purpose? I don't see any comment mentioning you want to truncate strings. Would it not be better to do this part using a AppendStringInfoString()? and just manually add a '{', ',' or '}' as and when required? DatumGetPointer(valout) should really be using DatumGetCString(valout). Likely you can also use heap_form_tuple. This will save you having to convert ints into strings then only to have BuildTupleFromCStrings() do the reverse. 5. individiaul -> individual lists. This allows very accurate estimates for individiaul columns, but litst -> lists litst on combinations of columns. Similarly to functional dependencies 6. Worth mentioning planning cycles too? "It's advisable to create MCV statistics objects only on combinations of columns that are actually used in conditions together, and for which misestimation of the number of groups is resulting in bad plans. Otherwise, the ANALYZE cycles are just wasted." 7. straight-forward -> straightforward (most-common values) lists, a straight-forward extension of the per-column 8. adresses -> addresses statistics adresses the limitation by storing individual values, but it 9. Worth mentioning ANALYZE time? This section introduces multivariate variant of MCV (most-common values) lists, a straight-forward extension of the per-column statistics described in . This statistics adresses the limitation by storing individual values, but it is naturally more expensive, both in terms of storage and planning time. 10. low -> a low with low number of distinct values. Before looking at the second query, 11. them -> then on items in the MCV list, and them sums the frequencies 12. Should we be referencing the source from the docs? See mcv_clauselist_selectivity in src/backend/statistics/mcv.c for details. hmm. I see it's not the first going by: git grep -E "\w+\.c\<" 13. Pretty minor, but the following loop in UpdateStatisticsForTypeChange() could use a break; attribute_referenced = false; for (i = 0; i < staForm->stxkeys.dim1; i++) if (attnum == staForm->stxkeys.values[i]) attribute_referenced = true; UPDATE: If I'd reviewed the correct patch I'd have seen that you'd removed this already 14. Again in UpdateStatisticsForTypeChange(), would it not be better to do the statext_is_kind_built(oldtup, STATS_EXT_MCV) check before checking if the stats contain this column? This gets rid of your reset_stats variable. I also don't quite understand why there's an additional check for statext_is_kind_built(oldtup, STATS_EXT_MCV), which if that's false then why do we do the dummy update on the tuple? Have you just coded this so that you can support other stats types later without too much modification? If so, I'm finding it a bit confusing to read, so maybe it's worth only coding it that way if there's more than one stats type to reset for. UPDATE: If I'd reviewed the correct patch I'd have seen that you'd removed this already 15. I see you broke out the remainder of the code from clauselist_selectivity() into clauselist_selectivity_simple(). The comment looks like just a copy and paste from the original. That seems like quite a bit of
Re: [HACKERS] generated columns
On 15/01/2019 08:13, Michael Paquier wrote: > When testing a bulk INSERT into a table which has a stored generated > column, memory keeps growing in size linearly, which does not seem > normal to me. If inserting more tuples than what I tested (I stopped > at 10M because of lack of time), it seems to me that this could result > in OOMs. I would have expected the memory usage to be steady. What are you executing exactly? One INSERT command with many rows? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: insensitive collations
Peter Eisentraut wrote: > > On a table with pre-existing contents, the creation of a unique index > > does not seem to detect the duplicates that are equal per the > > collation and different binary-wise. > > Fixed in the attached updated patch. Check. I've found another issue with aggregates over distinct: the deduplication seems to ignore the collation. postgres=# select distinct x from test3ci; -- OK x - def abc ghi (3 rows) postgres=# select count(distinct x) from test3ci; -- not OK count --- 4 (1 row) postgres=# select array_agg(distinct x) from test3ci; -- not OK array_agg --- {ABC,abc,def,ghi} (1 row) Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: WIP: Avoid creation of the free space map for small tables
On Fri, Jan 11, 2019 at 3:54 AM John Naylor wrote: > > On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila wrote: > > Thanks, Mithun for performance testing, it really helps us to choose > > the right strategy here. Once John provides next version, it would be > > good to see the results of regular pgbench (read-write) runs (say at > > 50 and 300 scale factor) and the results of large copy. I don't think > > there will be any problem, but we should just double check that. > > Attached is v12 using the alternating-page strategy. I've updated the > comments and README as needed. In addition, I've > Few comments: --- 1. Commit message: > Any pages with wasted free space become visible at next relation extension, > so we still control table bloat. I think the free space will be available after the next pass of vacuum, no? How can relation extension make it available? 2. +2. For very small heap relations, the FSM would be relatively large and +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space +and to improve performance. To locate free space in this case, we simply +iterate over the heap, trying alternating pages in turn. There may be some +wasted free space in this case, but it becomes visible again upon next +relation extension. a. Again, how space becomes available at next relation extension. b. I think there is no use of mentioning the version number in the above comment, this code will be present from PG-12, so one can find out from which version this optimization is added. 3. BlockNumber RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage, Size oldSpaceAvail, Size spaceNeeded) { .. + /* First try the local map, if it exists. */ + if (oldPage < fsm_local_map.nblocks) + { .. } The comment doesn't appear to be completely in sync with the code. Can't we just check whether "fsm_local_map.nblocks > 0", if so, we can use a macro for the same? I have changed this in the attached patch, see what you think about it. I have used it at a few other places as well. 4. + * When we initialize the map, the whole heap is potentially available to + * try. If a caller wanted to reset the map after another backend extends + * the relation, this will only flag new blocks as available. No callers + * do this currently, however. + */ +static void +fsm_local_set(Relation rel, BlockNumber curr_nblocks) { .. + if (blkno >= fsm_local_map.nblocks + 2) .. } The way you have tried to support the case as quoted in the comment "If a caller wanted to reset the map after another backend extends .." doesn't appear to be solid and I am not sure if it is correct either. We don't have any way to test the same, so I suggest let's try to simplify the case w.r.t current requirement of this API. I think we should some simple logic to try every other block like: + blkno = cur_nblocks - 1; + while (true) + { + fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL; + if (blkno >= 2) + blkno -= 2; + else + break; + } I have changed this in the attached patch. 5. +/* + * Search the local map for an available block to try, in descending order. + * + * For use when there is no FSM. + */ +static BlockNumber +fsm_local_search(void) We should give a brief explanation as to why we try in descending order. I have added some explanation in the attached patch, see what you think about it? Apart from the above, I have modified a few comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com v13-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch Description: Binary data
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 9:25 AM Mithun Cy wrote: > > On Fri, Jan 11, 2019 at 3:54 AM John Naylor > wrote: > > > > On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila wrote: > > > Thanks, Mithun for performance testing, it really helps us to choose > > > the right strategy here. Once John provides next version, it would be > > > good to see the results of regular pgbench (read-write) runs (say at > > > 50 and 300 scale factor) and the results of large copy. I don't think > > > there will be any problem, but we should just double check that. > > > > Attached is v12 using the alternating-page strategy. I've updated the > > comments and README as needed. In addition, I've > > > execution time in ms. (scale factor indicates size of pgbench_accounts) > scale factor v12-patchbase patch % diff > 300 77166.407 77862.041 -0.8934186557 > 50 13329.233 13284.583 0.3361038882 > > So for large table tests do not show any considerable performance variance > from base code! > I think with these results, we can conclude this patch doesn't seem to have any noticeable regression for all the tests we have done, right? Thanks a lot for doing various performance tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
PostgreSQL-XL shared disk development question
Hello, I already sent the following message to postgres-xl-developers list but they don't seem so active. Therefore, I hope you accept this kind of messages to this mail list. I'm planning to run PostgreSQL-XL data nodes on CephFS as shared folder. My goal is to provide scalability (auto/manual) in terms of CPU and RAM for those who have large amount of data and operate on clouds. When data gets bigger it's hard to make copy of every node instantly we increase number of nodes. That's why I want CEPH to do disk management, distribution of data (disk) and replication of data (disk). I thought of adding SHARED mode alongside with REPLICADTED and DISTRIBUTED modes since PostgreSQL-XL does global transaction and session management. What do you think of this? What do you think I should be careful about this development if you think it's possible? Bests. -- Atıf Ceylan CTO AppstoniA OÜ
draft patch for strtof()
As discussed in the Ryu thread, herewith a draft of a patch to use strtof() for float4 input (rather than using strtod() with its double-rounding issue). An exhaustive search shows that this does not change the resulting bit-pattern for any input string that could have been generated by PG with extra_float_digits=3 set. The risk is that values generated by other software, especially code that uses shortest-exact float output (as a number of languages seem to do, and which PG will do if the Ryu patch goes in) will be incorrectly input; though it appears that only one value (7.038531e-26) is both a possible shortest-exact representation and a rounding error (though a number of other values round incorrectly, they are not shortest representations). This includes a fallback to use strtod() the old way if the platform lacks strtof(). A variant file for the new regression tests is needed for such platforms; I've taken a stab at setting this up for the one platform we know will need it (if there are others, the buildfarm will let us know in due course). -- Andrew (irc:RhodiumToad) diff --git a/configure b/configure index 06fc3c6835..e3176e24e9 100755 --- a/configure +++ b/configure @@ -15802,6 +15802,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof" +if test "x$ac_cv_func_strtof" = xyes; then : + $as_echo "#define HAVE_STRTOF 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" strtof.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS strtof.$ac_objext" + ;; +esac + +fi + case $host_os in diff --git a/configure.in b/configure.in index 4efb912c4d..bdaab717d7 100644 --- a/configure.in +++ b/configure.in @@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([ strlcat strlcpy strnlen + strtof ])) case $host_os in diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 117ded8d1d..dce6ea31cf 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -104,13 +104,39 @@ is_infinite(double val) /* * float4in - converts "num" to float4 + * + * Note that this code now uses strtof(), where it used to use strtod(). + * + * The motivation for using strtof() is to avoid a double-rounding problem: + * for certain decimal inputs, if you round the input correctly to a double, + * and then round the double to a float, the result is incorrect in that it + * does not match the result of rounding the decimal value to float directly. + * + * One of the best examples is 7.038531e-26: + * + * 0xAE43FDp-107 = 7.03853069185120912085...e-26 + * midpoint 7.038531022281...e-26 + * 0xAE43FEp-107 = 7.03853130814879132477...e-26 + * + * making 0xAE43FDp-107 the correct float result, but if you do the conversion + * via a double, you get + * + * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26 + * midpoint 7.038530964884...e-26 + * 0xAE43FD.8000p-107 = 7.038531022281...e-26 + * 0xAE43FD.8008p-107 = 7.038531137076...e-26 + * + * so the value rounds to the double exactly on the midpoint between the two + * nearest floats, and then rounding again to a float gives the incorrect + * result of 0xAE43FEp-107. + * */ Datum float4in(PG_FUNCTION_ARGS) { char *num = PG_GETARG_CSTRING(0); char *orig_num; - double val; + float val; char *endptr; /* @@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS) "real", orig_num))); errno = 0; - val = strtod(num, &endptr); + val = strtof(num, &endptr); /* did we not see anything that looks like a double? */ if (endptr == num || errno != 0) @@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS) int save_errno = errno; /* - * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf, + * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf, * but not all platforms support all of these (and some accept them * but set ERANGE anyway...) Therefore, we check for these inputs - * ourselves if strtod() fails. + * ourselves if strtof() fails. * * Note: C99 also requires hexadecimal input as well as some extended * forms of NaN, but we consider these forms unportable and don't try - * to support them. You can use 'em if your strtod() takes 'em. + * to support them. You can use 'em if your strtof() takes 'em. */ if (pg_strncasecmp(num, "NaN", 3) == 0) { @@ -196,7 +222,7 @@ float4in(PG_FUNCTION_ARGS) * detect whether it's a "real" out-of-range condition by checking * to see if the result is zero or huge. */ - if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL) + if (val == 0.0 || val >= HUGE_VALF || val <= -HUGE_VALF) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"%s\" is out of range for type real", @@ -232,13 +258,7 @@ float4in(PG_FUNCTION_ARGS) errmsg("invalid input syntax for type %s: \"%s\"", "real", orig_num))); - /* - * if we get here, we have a legal double, still need to check to
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
On Tue, Jan 15, 2019 at 11:37 PM Tom Lane wrote: > Well, as I said upthread, it seems like we need to think a bit more > carefully about what it is that clause_is_strict_for is testing --- > and if that ends up finding that some other name is more apposite, > I'd not have any hesitation about renaming it. But we're really > missing a bet if the ScalarArrayOp-specific check isn't inside that > recursive check of an "atomic" expression's properties. The > routines above there only recurse through things that act like > AND or OR, but clause_is_strict_for is capable of descending > through other stuff as long as it's strict in some sense. What > we need to be clear about here is exactly what that sense is. All right, I'll look over all of the other callers and see what makes sense as far as naming (or perhaps consider a new parallel function; unsure at this point.) I might also try to see if we can edit the tests slightly to require the recursion case to be exercised. One other question on testing: do you think the "calculated array" tests are good enough by themselves (i.e., remove the ones with array constants of > 100 values)? I dislike that it's not as obvious what's going on, but given that the code shouldn't care about array size anyway...so if there's an inclination to fewer tests that's the first place I'd look at cutting them. James Coleman
Re: Redundant filter in index scan with a bool column
Alexander Kuzmenkov writes: > The filter is not needed, why is it there? Turns out we can't recognize > that the restriction clause 'b' and the index clause 'b = true' are > equivalent. Yeah, it's intentional that we don't get rid of the extra clause; it doesn't really seem worth the expense and complexity to do so. Indexes on bool columns are a pretty niche case in the first place. Usually, if you are interested in just the rows where b = true, you're better off using "where b" as an index predicate. In your example, we can do this instead: regression=# create index on t(i) where b; CREATE INDEX regression=# explain select * from t where i = 300 and b; QUERY PLAN -- Index Scan using t_i_idx on t (cost=0.12..24.19 rows=6 width=5) Index Cond: (i = 300) (2 rows) resulting in a much smaller index, if the b=true condition is selective enough to be worth indexing. Even in the case you showed, how much is the redundant filter clause really costing? > My first reaction was to patch operator_predicate_proof to > handle this case, but there is a more straightforward way: mark the > expanded index clause as potentially redundant when it is generated in > expand_indexqual_conditions. There is already RestrictInfo.parent_ec > that is used to mark redundant EC-derived join clauses. The patch > renames it to rinfo_parent and uses it to mark the expanded index > clauses. That's an unbelievable hack that almost certainly breaks existing uses. The approach of teaching predtest.c that "b = true" implies "b" would work, but it seems a bit brute-force because ordinarily such cases would never be seen there, thanks to simplify_boolean_equality having canonicalized the former into the latter. The problem we have is that indxpath.c re-generates "b = true" in indexscan conditions. Thinking about it now, I wonder if we could postpone that conversion till later, say do it in create_indexscan_plan after having checked for redundant clauses. Not sure how messy that'd be. regards, tom lane
Re: [PATCH] Allow UNLISTEN during recovery
> > On Tue, Jan 15, 2019 at 10:17 AM Shay Rojansky wrote: > > Unfortunately I'm extremely tight for time at the moment and don't have > time to do the appropriate hot standby setup to test this... As the patch > is pretty straightforward, and since I'm hoping you guys execute the tests > on your end, I hope that's acceptable. If it's absolutely necessary for me > to test the patch locally, let me know and I'll do so. > > I would definitely prefer if you could run the tests. You might > discover that your patch does not sufficiently address tests. Please > do so and confirm here that it works for you and then I can do another > review. > Thanks for insisting - I ended up setting up the environment and running the tests, and discovering that some test-related changes were missing. Here's a 3rd version of the patch. Hope this is now in good shape, let me know if you think anything else needs to be done. From c5497cc4f7fbad4eafecfa72bc99dfebacd0f9bd Mon Sep 17 00:00:00 2001 From: Shay Rojansky Date: Tue, 15 Jan 2019 18:49:40 +0100 Subject: [PATCH] Allow UNLISTEN during recovery --- doc/src/sgml/high-availability.sgml | 16 ++-- src/backend/tcop/utility.c | 2 +- src/test/regress/expected/hs_standby_allowed.out | 3 +++ .../regress/expected/hs_standby_disallowed.out | 4 src/test/regress/sql/hs_standby_allowed.sql | 4 src/test/regress/sql/hs_standby_disallowed.sql | 2 -- 6 files changed, 18 insertions(+), 13 deletions(-) diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml index 4882b20828..bb1c86f73e 100644 --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml @@ -1767,6 +1767,11 @@ if (!triggered) Plugins and extensions - LOAD + + + UNLISTEN + + @@ -1856,18 +1861,17 @@ if (!triggered) - LISTEN, UNLISTEN, NOTIFY + LISTEN, NOTIFY -In normal operation, read-only transactions are allowed to -use LISTEN, UNLISTEN, and -NOTIFY, so Hot Standby sessions operate under slightly tighter -restrictions than ordinary read-only sessions. It is possible that some -of these restrictions might be loosened in a future release. +In normal operation, read-only transactions are allowed to use +LISTEN, and NOTIFY, so Hot Standby sessions +operate under slightly tighter restrictions than ordinary read-only sessions. +It is possible that some of these restrictions might be loosened in a future release. diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c index 27ae6be751..44136060d6 100644 --- a/src/backend/tcop/utility.c +++ b/src/backend/tcop/utility.c @@ -629,7 +629,7 @@ standard_ProcessUtility(PlannedStmt *pstmt, { UnlistenStmt *stmt = (UnlistenStmt *) parsetree; -PreventCommandDuringRecovery("UNLISTEN"); +/* allow UNLISTEN during recovery, which is a noop */ CheckRestrictedOperation("UNLISTEN"); if (stmt->conditionname) Async_Unlisten(stmt->conditionname); diff --git a/src/test/regress/expected/hs_standby_allowed.out b/src/test/regress/expected/hs_standby_allowed.out index 526f88f2be..00b8faf9eb 100644 --- a/src/test/regress/expected/hs_standby_allowed.out +++ b/src/test/regress/expected/hs_standby_allowed.out @@ -208,6 +208,9 @@ LOCK hs1 IN ACCESS SHARE MODE; LOCK hs1 IN ROW SHARE MODE; LOCK hs1 IN ROW EXCLUSIVE MODE; COMMIT; +-- UNLISTEN +UNLISTEN a; +UNLISTEN *; -- LOAD -- should work, easier if there is no test for that... -- ALLOWED COMMANDS diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out index bc117413ff..dff0953e9a 100644 --- a/src/test/regress/expected/hs_standby_disallowed.out +++ b/src/test/regress/expected/hs_standby_disallowed.out @@ -118,10 +118,6 @@ listen a; ERROR: cannot execute LISTEN during recovery notify a; ERROR: cannot execute NOTIFY during recovery -unlisten a; -ERROR: cannot execute UNLISTEN during recovery -unlisten *; -ERROR: cannot execute UNLISTEN during recovery -- disallowed commands ANALYZE hs1; ERROR: cannot execute ANALYZE during recovery diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql index a33199dbbd..6debddc5e9 100644 --- a/src/test/regress/sql/hs_standby_allowed.sql +++ b/src/test/regress/sql/hs_standby_allowed.sql @@ -110,6 +110,10 @@ LOCK hs1 IN ROW SHARE MODE; LOCK hs1 IN ROW EXCLUSIVE MODE; COMMIT; +-- UNLISTEN +UNLISTEN a; +UNLISTEN *; + -- LOAD -- should work, easier if there is no test for that... diff --git a/src/test/regress/sql/hs_standby_disallowed.sql b/src/test/regress/sql/hs_standby_disallowed.sql index 21bbf526b7..a470600eec 100644 --- a/src/test/regress/sql/hs_standby_disallowed.sql +++ b/src/test/regress/sql/hs_standby_disallowed.sql @@ -88,8 +88,6 @@ COMMIT;
Re: draft patch for strtof()
Andrew Gierth writes: > As discussed in the Ryu thread, herewith a draft of a patch to use > strtof() for float4 input (rather than using strtod() with its > double-rounding issue). The errno handling in strtof seems bizarrely overcomplex; why do you need the separate caller_errno variable? > This includes a fallback to use strtod() the old way if the platform > lacks strtof(). A variant file for the new regression tests is needed > for such platforms; I've taken a stab at setting this up for the one > platform we know will need it (if there are others, the buildfarm will > let us know in due course). I'm not that much on board with introducing test cases that we know, beyond question, are going to be portability headaches. What will we actually gain with this, compared to just not having the test case? I can't see that it's worth either the buildfarm cycles or the human maintenance effort just to prove that, yes, some platforms have portability corner cases. I also don't like the prospect that we ship releases that will fail basic regression tests on platforms we haven't tested. Coping with such failures is a large burden for affected packagers or end users, especially when the only useful "coping" mechanism is to ignore the regression test failure. Might as well not have it. regards, tom lane
Re: draft patch for strtof()
> "Tom" == Tom Lane writes: >> As discussed in the Ryu thread, herewith a draft of a patch to use >> strtof() for float4 input (rather than using strtod() with its >> double-rounding issue). Tom> The errno handling in strtof seems bizarrely overcomplex; why do Tom> you need the separate caller_errno variable? Eh. I was preserving the conventional behaviour of setting errno only on errors and not resetting it to 0, but I suppose you could argue that that is overkill given that the function is called only in one place that's supposed to have already set errno to 0. (And yes, I missed a couple of files and the windows build breaks, working on those) >> This includes a fallback to use strtod() the old way if the platform >> lacks strtof(). A variant file for the new regression tests is needed >> for such platforms; I've taken a stab at setting this up for the one >> platform we know will need it (if there are others, the buildfarm will >> let us know in due course). Tom> I'm not that much on board with introducing test cases that we Tom> know, beyond question, are going to be portability headaches. What Tom> will we actually gain with this, compared to just not having the Tom> test case? The purpose of the test case is to ensure that we're getting the right values on input. If these values fail on any platform that anyone actually cares about, then I think we need to know about it. -- Andrew (irc:RhodiumToad)
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Tue, Jan 15, 2019 at 8:48 PM Michael Paquier wrote: > > So the answer to your question is: the WAL receiver fails to start. > > Robert, does this answer your question? I agree that depending on the > timing, we could finish by having the startup process spawning a WAL > receiver with a given connection string, and that the WAL receiver > could use a different one, but as long as we fail the WAL receiver > startup this does not seem like an issue to me, so I disagree with the > upthread statement that the patch author has not thought about such > cases :) OK, if that was in the patch, I dunno why I didn't see it. I really didn't think it was there, but if I'm wrong, then I am. > It seems to me that making the WAL receiver relying more on the GUC > context makes it more robust when it comes to reloading the parameters > which would be an upcoming change as we need to rely on the WAL > receiver check the GUC context itself and FATAL (or we would need to > have the startup process check for a change in > primary_conninfo/primary_slot_name, and then have the startup process > kill the WAL receiver by itself). In short, handling all that in the > WAL receiver would be more robust in the long term in my opinion as we > reduce interactions between the WAL receiver and the startup process. > And on top of it we get rid of ready_to_display, which is something I > am unhappy with since we had to introduce it. I think you have some valid points here, but I also think that the patch as written doesn't really seem to have any user-visible benefits. Sure, ready_to_display is a crock, but getting rid of it doesn't immediately help anybody. And on the flip side your patch might have bugs, in which case we'll be worse off. I'm not going to stand on my soapbox and say that committing this patch is a terrible idea, because as far as I can tell, it isn't. But it feels like it might be a commit for the sake of making a commit, and I can't get too excited about that either. Since the logic will have to be further revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait until that's done and then commit all the changes together instead of guessing that this might make that easier? If this does get committed now, and count me as about -0.25 for that, then I at least think it should have some comments clearly addressing the synchronization issues. Unless those are also in the patch and I missed them, too, but I hope I'm not that dense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: draft patch for strtof()
Andrew Gierth writes: > "Tom" == Tom Lane writes: > Tom> The errno handling in strtof seems bizarrely overcomplex; why do > Tom> you need the separate caller_errno variable? > Eh. I was preserving the conventional behaviour of setting errno only on > errors and not resetting it to 0, Oh, I see. -ENOCAFFEINE. > but I suppose you could argue that > that is overkill given that the function is called only in one place > that's supposed to have already set errno to 0. Well, we probably oughtta endeavor to maintain compatibility with the function's standard behavior, because other calls to it are likely to creep in over time. Objection withdrawn. > Tom> I'm not that much on board with introducing test cases that we > Tom> know, beyond question, are going to be portability headaches. What > Tom> will we actually gain with this, compared to just not having the > Tom> test case? > The purpose of the test case is to ensure that we're getting the right > values on input. If these values fail on any platform that anyone > actually cares about, then I think we need to know about it. Meh. I think the actual outcome will be that we define any platform that gets the wrong answer as one that we don't care about, mainly because we won't have any practical way to fix it. That being the situation, trying to maintain a test case seems like pointless make-work. (FWIW, I'm running the patch on gaur's host, just to confirm it does what you expect. Should have an answer in an hour or so ...) regards, tom lane
Re: What to name the current heap after pluggable storage / what to rename?
On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi wrote: > access/relation.[c|h] name is fine. Or how about access/rel.[c|h], > because nodes/relation.h is related to planner. utils/rel.h is some how > related to relation caches. Insofar as we can reasonably do so, I'd rather pick unique names for header files. I know that there's no law that rules out having both nodes/relation.h and access/relation.h, or likewise utils/rel.h and access/rel.h; the computer won't be confused. But it might create some confusion among human beings, so my vote is for avoiding that sort of thing if we can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: draft patch for strtof()
See if Windows likes this one any better. -- Andrew (irc:RhodiumToad) diff --git a/configure b/configure index 06fc3c6835..e3176e24e9 100755 --- a/configure +++ b/configure @@ -15802,6 +15802,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof" +if test "x$ac_cv_func_strtof" = xyes; then : + $as_echo "#define HAVE_STRTOF 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" strtof.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS strtof.$ac_objext" + ;; +esac + +fi + case $host_os in diff --git a/configure.in b/configure.in index 4efb912c4d..bdaab717d7 100644 --- a/configure.in +++ b/configure.in @@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([ strlcat strlcpy strnlen + strtof ])) case $host_os in diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 117ded8d1d..dce6ea31cf 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -104,13 +104,39 @@ is_infinite(double val) /* * float4in - converts "num" to float4 + * + * Note that this code now uses strtof(), where it used to use strtod(). + * + * The motivation for using strtof() is to avoid a double-rounding problem: + * for certain decimal inputs, if you round the input correctly to a double, + * and then round the double to a float, the result is incorrect in that it + * does not match the result of rounding the decimal value to float directly. + * + * One of the best examples is 7.038531e-26: + * + * 0xAE43FDp-107 = 7.03853069185120912085...e-26 + * midpoint 7.038531022281...e-26 + * 0xAE43FEp-107 = 7.03853130814879132477...e-26 + * + * making 0xAE43FDp-107 the correct float result, but if you do the conversion + * via a double, you get + * + * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26 + * midpoint 7.038530964884...e-26 + * 0xAE43FD.8000p-107 = 7.038531022281...e-26 + * 0xAE43FD.8008p-107 = 7.038531137076...e-26 + * + * so the value rounds to the double exactly on the midpoint between the two + * nearest floats, and then rounding again to a float gives the incorrect + * result of 0xAE43FEp-107. + * */ Datum float4in(PG_FUNCTION_ARGS) { char *num = PG_GETARG_CSTRING(0); char *orig_num; - double val; + float val; char *endptr; /* @@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS) "real", orig_num))); errno = 0; - val = strtod(num, &endptr); + val = strtof(num, &endptr); /* did we not see anything that looks like a double? */ if (endptr == num || errno != 0) @@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS) int save_errno = errno; /* - * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf, + * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf, * but not all platforms support all of these (and some accept them * but set ERANGE anyway...) Therefore, we check for these inputs - * ourselves if strtod() fails. + * ourselves if strtof() fails. * * Note: C99 also requires hexadecimal input as well as some extended * forms of NaN, but we consider these forms unportable and don't try - * to support them. You can use 'em if your strtod() takes 'em. + * to support them. You can use 'em if your strtof() takes 'em. */ if (pg_strncasecmp(num, "NaN", 3) == 0) { @@ -196,7 +222,7 @@ float4in(PG_FUNCTION_ARGS) * detect whether it's a "real" out-of-range condition by checking * to see if the result is zero or huge. */ - if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL) + if (val == 0.0 || val >= HUGE_VALF || val <= -HUGE_VALF) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"%s\" is out of range for type real", @@ -232,13 +258,7 @@ float4in(PG_FUNCTION_ARGS) errmsg("invalid input syntax for type %s: \"%s\"", "real", orig_num))); - /* - * if we get here, we have a legal double, still need to check to see if - * it's a legal float4 - */ - check_float4_val((float4) val, isinf(val), val == 0); - - PG_RETURN_FLOAT4((float4) val); + PG_RETURN_FLOAT4(val); } /* diff --git a/src/include/port.h b/src/include/port.h index a55c473262..a6950c1526 100644 --- a/src/include/port.h +++ b/src/include/port.h @@ -381,6 +381,10 @@ extern int isinf(double x); #endif /* __clang__ && !__cplusplus */ #endif /* !HAVE_ISINF */ +#ifndef HAVE_STRTOF +extern float strtof(const char *nptr, char **endptr); +#endif + #ifndef HAVE_MKDTEMP extern char *mkdtemp(char *path); #endif diff --git a/src/include/utils/float.h b/src/include/utils/float.h index 0f82a25ede..5e11ab4aa9 100644 --- a/src/include/utils/float.h +++ b/src/include/utils/float.h @@ -17,6 +17,11 @@ #include +/* this might not be needed */ +#ifndef HUGE_VALF +#define HUGE_VALF ((float)HUGE_VAL) +#endif + #ifndef M_PI /* From my RH5.2 gcc math.h file - thomas 2000-04-03 */ #define M_PI 3.14159265358979323846 diff --git a/src/port/strtof.c b
Re: What to name the current heap after pluggable storage / what to rename?
Robert Haas writes: > On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi > wrote: >> access/relation.[c|h] name is fine. Or how about access/rel.[c|h], >> because nodes/relation.h is related to planner. utils/rel.h is some how >> related to relation caches. > Insofar as we can reasonably do so, I'd rather pick unique names for > header files. +1 regards, tom lane
Re: draft patch for strtof()
Andrew Gierth writes: > See if Windows likes this one any better. Doubtful, because AFAICT it's the same patch. regards, tom lane
Re: O_DIRECT for relations and SLRUs (Prototype)
On Sat, Jan 12, 2019 at 4:36 PM Thomas Munro wrote: > 1. We need a new "bgreader" process to do read-ahead. I think you'd > want a way to tell it with explicit hints (for example, perhaps > sequential scans would advertise that they're reading sequentially so > that it starts to slurp future blocks into the buffer pool, and > streaming replicas might look ahead in the WAL and tell it what's > coming). In theory this might be better than the heuristics OSes use > to guess our access pattern and pre-fetch into the page cache, since > we have better information (and of course we're skipping a buffer > layer). Right, like if we're reading the end of relation file 16384, we can prefetch the beginning of 16384.1, but the OS won't know to do that. > 2. We need a new kind of bgwriter/syncer that aggressively creates > clean pages so that foreground processes rarely have to evict (since > that is now super slow), but also efficiently finds ranges of dirty > blocks that it can write in big sequential chunks. Yeah. > 3. We probably want SLRUs to use the main buffer pool, instead of > their own mini-pools, so they can benefit from the above. Right. I think this is important, and it makes me think that maybe Michael's patch won't help us much in the end. I believe that the number of pages that are needed for clog data, at least, can very significantly depending on workload and machine size, so there's not one number there that is going to work for everybody, and the algorithms the SLRU code uses for page management have O(n) stuff in them, so they don't scale well to large numbers of SLRU buffers anyway. I think we should try to unify the SLRU stuff with shared_buffers, and then have a test patch like Michael's (not for commit) which we can use to see the impact of that, and then try to reduce that impact with the stuff you mention under #1 and #2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: additional foreign key test coverage
On 09/01/2019 09:20, Mi Tar wrote: > I tested this patch and it applied cleanly and all tests passed. I haven't > looked if the changes to tests are reasonable or extensive to cover all > aspects of what they want to cover. I have committed this with additional tests for partitioned tables, as requested by Álvaro. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: What to name the current heap after pluggable storage / what to rename?
On January 16, 2019 8:08:09 AM PST, Robert Haas wrote: >On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi > wrote: >> access/relation.[c|h] name is fine. Or how about access/rel.[c|h], >> because nodes/relation.h is related to planner. utils/rel.h is some >how >> related to relation caches. > >Insofar as we can reasonably do so, I'd rather pick unique names for >header files. I know that there's no law that rules out having both >nodes/relation.h and access/relation.h, or likewise utils/rel.h and >access/rel.h; the computer won't be confused. But it might create >some confusion among human beings, so my vote is for avoiding that >sort of thing if we can. I prefer that too - if the new name isn't worse enough to make it hard to remember. I'd welcome suggestions that don't conflict... Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: insensitive collations
On 14/01/2019 15:37, Andreas Karlsson wrote: >> Nondeterministic collations do address this by allowing canonically >> equivalent code point sequences to compare as equal. You still need a >> collation implementation that actually does compare them as equal; ICU >> does this, glibc does not AFAICT. > Ah, right! You could use -ks-identic[1] for this. That's the default. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Python versions (was Re: RHEL 8.0 build)
Peter Eisentraut writes: > On 07/01/2019 00:16, Tom Lane wrote: >> BTW, this is a pre-existing problem not the fault of this patch, >> but while we're fooling with the behavior of python lookup would >> be a great time to fix it: we should add something like >> AC_ARG_VAR([PYTHON], [path to Python executable]) > Patch attached. LGTM. I'm slightly annoyed that configure's help output doesn't list the variables in alphabetical order, but that seems to be a pre-existing problem that we likely can't fix. > The following are listed but don't affect any other tests, so I didn't > include them: > BISON > DTRACE > DTRACEFLAGS > FLEX > XML2_CONFIG (slightly confused ...) Surely XML2_CONFIG feeds into what we choose for CPPFLAGS? If that doesn't matter, why not? regards, tom lane
Re: draft patch for strtof()
> "Tom" == Tom Lane writes: > Andrew Gierth writes: >> See if Windows likes this one any better. Tom> Doubtful, because AFAICT it's the same patch. Sigh. copied wrong file. Let's try that again. -- Andrew (irc:RhodiumToad) diff --git a/configure b/configure index 06fc3c6835..e3176e24e9 100755 --- a/configure +++ b/configure @@ -15802,6 +15802,19 @@ esac fi +ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof" +if test "x$ac_cv_func_strtof" = xyes; then : + $as_echo "#define HAVE_STRTOF 1" >>confdefs.h + +else + case " $LIBOBJS " in + *" strtof.$ac_objext "* ) ;; + *) LIBOBJS="$LIBOBJS strtof.$ac_objext" + ;; +esac + +fi + case $host_os in diff --git a/configure.in b/configure.in index 4efb912c4d..bdaab717d7 100644 --- a/configure.in +++ b/configure.in @@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([ strlcat strlcpy strnlen + strtof ])) case $host_os in diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index 117ded8d1d..b907613056 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -104,13 +104,39 @@ is_infinite(double val) /* * float4in - converts "num" to float4 + * + * Note that this code now uses strtof(), where it used to use strtod(). + * + * The motivation for using strtof() is to avoid a double-rounding problem: + * for certain decimal inputs, if you round the input correctly to a double, + * and then round the double to a float, the result is incorrect in that it + * does not match the result of rounding the decimal value to float directly. + * + * One of the best examples is 7.038531e-26: + * + * 0xAE43FDp-107 = 7.03853069185120912085...e-26 + * midpoint 7.038531022281...e-26 + * 0xAE43FEp-107 = 7.03853130814879132477...e-26 + * + * making 0xAE43FDp-107 the correct float result, but if you do the conversion + * via a double, you get + * + * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26 + * midpoint 7.038530964884...e-26 + * 0xAE43FD.8000p-107 = 7.038531022281...e-26 + * 0xAE43FD.8008p-107 = 7.038531137076...e-26 + * + * so the value rounds to the double exactly on the midpoint between the two + * nearest floats, and then rounding again to a float gives the incorrect + * result of 0xAE43FEp-107. + * */ Datum float4in(PG_FUNCTION_ARGS) { char *num = PG_GETARG_CSTRING(0); char *orig_num; - double val; + float val; char *endptr; /* @@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS) "real", orig_num))); errno = 0; - val = strtod(num, &endptr); + val = strtof(num, &endptr); /* did we not see anything that looks like a double? */ if (endptr == num || errno != 0) @@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS) int save_errno = errno; /* - * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf, + * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf, * but not all platforms support all of these (and some accept them * but set ERANGE anyway...) Therefore, we check for these inputs - * ourselves if strtod() fails. + * ourselves if strtof() fails. * * Note: C99 also requires hexadecimal input as well as some extended * forms of NaN, but we consider these forms unportable and don't try - * to support them. You can use 'em if your strtod() takes 'em. + * to support them. You can use 'em if your strtof() takes 'em. */ if (pg_strncasecmp(num, "NaN", 3) == 0) { @@ -196,7 +222,14 @@ float4in(PG_FUNCTION_ARGS) * detect whether it's a "real" out-of-range condition by checking * to see if the result is zero or huge. */ - if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL) + + if (val == 0.0 || +#ifdef HUGE_VALF +val >= HUGE_VALF || val <= -HUGE_VALF +#else +isinf(val) +#endif +) ereport(ERROR, (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), errmsg("\"%s\" is out of range for type real", @@ -232,13 +265,7 @@ float4in(PG_FUNCTION_ARGS) errmsg("invalid input syntax for type %s: \"%s\"", "real", orig_num))); - /* - * if we get here, we have a legal double, still need to check to see if - * it's a legal float4 - */ - check_float4_val((float4) val, isinf(val), val == 0); - - PG_RETURN_FLOAT4((float4) val); + PG_RETURN_FLOAT4(val); } /* diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 9d99816eae..84ca929439 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -555,6 +555,9 @@ /* Define to 1 if you have the `strsignal' function. */ #undef HAVE_STRSIGNAL +/* Define to 1 if you have the `strtof' function. */ +#undef HAVE_STRTOF + /* Define to 1 if you have the `strtoll' function. */ #undef HAVE_STRTOLL diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32 index 1a89a8c24e..793897271c 100644 --- a/src/include/pg_config.h.win32 +++ b/src/include/pg_config.h.win32 @@ -139,6 +139,9 @@
Re: New vacuum option to do only freezing
On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada wrote: > As the above comment says, it's possible that the state of an > INSERT_IN_PROGRESS tuple could be changed to 'dead' after > heap_page_prune(). Since such tuple is not truncated at this point we > record it and set it as UNUSED in lazy_vacuum_page(). I think that the > DISABLE_INDEX_CLEANUP case is the same; we need to process them after > recorded. Am I missing something? I believe you are. Think about it this way. After the first pass over the heap has been completed but before we've done anything to the indexes, let alone started the second pass over the heap, somebody could kill the vacuum process. Somebody could in fact yank the plug out of the wall, stopping the entire server in its tracks. If they do that, then lazy_vacuum_page() will never get executed. Yet, the heap can't be in any kind of corrupted state at this point, right? We know that the system is resilient against crashes, and killing a vacuum or even the whole server midway through does not leave the system in any kind of bad state. If it's fine for lazy_vacuum_page() to never be reached in that case, it must also be fine for it never to be reached if we ask for vacuum to stop cleanly before lazy_vacuum_page(). In the case of the particular comment to which you are referring, that comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I don't see how it bears on the question of whether we need to call lazy_vacuum_page(). It's true that, at any point in time, an in-progress transaction could abort. And if it does then some insert-in-progress tuples could become dead. But if that happens, then the next vacuum will remove them, just as it will remove any tuples that become dead for that reason when vacuum isn't running in the first place. You can't use that as a justification for needing a second heap pass, because if it were, then you'd also need a THIRD heap pass in case a transaction aborts after the second heap pass has visited the pages, and a fourth heap pass in case a transaction aborts after the third heap pass has visited the pages, etc. etc. forever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila wrote: > > On Fri, Jan 11, 2019 at 3:54 AM John Naylor > wrote: > 1. > Commit message: > > Any pages with wasted free space become visible at next relation extension, > > so we still control table bloat. > > I think the free space will be available after the next pass of > vacuum, no? How can relation extension make it available? To explain, this diagram shows the map as it looks for different small table sizes: 0123 A NA ANA NANA So for a 3-block table, the alternating strategy never checks block 1. Any free space block 1 has acquired via delete-and-vacuum will become visible if it extends to 4 blocks. We are accepting a small amount of bloat for improved performance, as discussed. Would it help to include this diagram in the README? > 2. > +2. For very small heap relations, the FSM would be relatively large and > +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for > +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space > +and to improve performance. To locate free space in this case, we simply > +iterate over the heap, trying alternating pages in turn. There may be some > +wasted free space in this case, but it becomes visible again upon next > +relation extension. > > a. Again, how space becomes available at next relation extension. > b. I think there is no use of mentioning the version number in the > above comment, this code will be present from PG-12, so one can find > out from which version this optimization is added. It fits with the reference to PG 8.4 earlier in the document. I chose to be consistent, but to be honest, I'm not much in favor of a lot of version references in code/READMEs. > 3. > BlockNumber > RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage, > Size oldSpaceAvail, Size spaceNeeded) > { > .. > + /* First try the local map, if it exists. */ > + if (oldPage < fsm_local_map.nblocks) > + { > .. > } > > The comment doesn't appear to be completely in sync with the code. > Can't we just check whether "fsm_local_map.nblocks > 0", if so, we > can use a macro for the same? I have changed this in the attached > patch, see what you think about it. I have used it at a few other > places as well. The macro adds clarity, so I'm in favor of using it. > 4. > + * When we initialize the map, the whole heap is potentially available to > + * try. If a caller wanted to reset the map after another backend extends > + * the relation, this will only flag new blocks as available. No callers > + * do this currently, however. > + */ > +static void > +fsm_local_set(Relation rel, BlockNumber curr_nblocks) > { > .. > + if (blkno >= fsm_local_map.nblocks + 2) > .. > } > > > The way you have tried to support the case as quoted in the comment > "If a caller wanted to reset the map after another backend extends .." > doesn't appear to be solid and I am not sure if it is correct either. I removed this case in v9 and you objected to that as unnecessary, so I reverted it for v10. > We don't have any way to test the same, so I suggest let's try to > simplify the case w.r.t current requirement of this API. I think we > should > some simple logic to try every other block like: > > + blkno = cur_nblocks - 1; > + while (true) > + { > + fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL; > + if (blkno >= 2) > + blkno -= 2; > + else > + break; > + } > > I have changed this in the attached patch. Fine by me. > 5. > +/* > + * Search the local map for an available block to try, in descending order. > + * > + * For use when there is no FSM. > + */ > +static BlockNumber > +fsm_local_search(void) > > We should give a brief explanation as to why we try in descending > order. I have added some explanation in the attached patch, see what > you think about it? > > Apart from the above, I have modified a few comments. I'll include these with some grammar corrections in the next version. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Prepare Transaction support for ON COMMIT DROP temporary tables
On Mon, Jan 14, 2019 at 1:41 PM Dimitri Fontaine wrote: > One idea would be that if every temp table in the session belongs to the > transaction, and their namespace too (we could check through pg_depend > that the namespace doesn't contain anything else beside the > transaction's tables), then we could dispose of the temp schema and > on-commit-drop tables at PREPARE COMMIT time. Why not just drop any on-commit-drop tables at PREPARE TRANSACTION time and leave the schema alone? If there are any temp tables touched by the transaction which are not on-commit-drop then we'd have to fail, but as long as all the tables we've got are on-commit-drop then it seems fine to just nuke them at PREPARE time. Such tables must've been created in the current transaction, because otherwise the creating transaction aborted and they're gone for that reason, or it committed and they're gone because they're on-commit-drop. And regardless of whether the transaction we are preparing goes on to commit or abort, those tables will be gone afterwards for the same reasons. So there doesn't in this case seem to be any reason to keep them around until the transaction's fate is known. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 1/16/19 9:27 AM, Michael Paquier wrote:> Regarding the grammar, we tend for the last couple of years to avoid complicating the main grammar and move on to parenthesized grammars (see VACUUM, ANALYZE, EXPLAIN, etc). So in the same vein I think that it would make sense to only support CONCURRENTLY within parenthesis and just plugin that with the VERBOSE option. Personally I do not care, but there have been a lot of voices for keeping REINDEX CONCURRENTLY consistent with CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY. Does somebody mind if I jump into the ship after so long? I was the original author of the monster after all... Fine by me. Peter? Andreas
parseCheckAggregates vs. assign_query_collations
Looking into a bug report on the -general list about grouping sets, which turns out to be an issue of collation assignment: if the query has CASE GROUPING(expr) WHEN 1 ... then the expression is rejected as not matching the one in the GROUP BY clause, because CASE already assigned collations to the expression (as a special case in its transform function) while the rest of the query hasn't yet had them assigned, because parseCheckAggregates gets run before assign_query_collations. I'll be looking into this in detail later, but right now, cam anyone think of any reason why parseCheckAggregates couldn't be moved to after assign_query_collations? -- Andrew (irc:RhodiumToad)
Re: [HACKERS] generated columns
On Thu, Nov 22, 2018 at 10:16 AM Peter Eisentraut wrote: > On 15/11/2018 15:10, Robert Haas wrote: > > I don't have a strong position on 1 vs. 2 vs. 3, but I do think it > > would be nicer not to use '\0' as a column value. I'd suggest you use > > 'n' or '0' or '-' or some other printable character instead. > > I had carefully considered this when attidentity was added. Using '\0' > allows you to use this column as a boolean in C code, which is often > convenient. Also, there are numerous places where a pg_attribute form > or a tuple descriptor is initialized to all zeroes, which works well for > most fields, and adding one exception like this would create a lot of > extra work and bloat the patch and create potential for future > instability. Also note that a C char '\0' is represented as '' (empty > string) in SQL, so this also creates a natural representation in SQL. I'm not really convinced. I think that the stdbool work you've been doing shows that blurring the line between char and bool is a bad idea. And I believe that on general principle, anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Use an enum for RELKIND_*?
On 2018-Dec-21, Greg Stark wrote: > But I had a second look to see if the output pointed out any actual > bugs. I found one though it's pretty minor: > > lockfuncs.c:234:3: warning: enumeration value > ‘LOCKTAG_SPECULATIVE_TOKEN’ not handled in switch [-Wswitch-enum] >switch ((LockTagType) instance->locktag.locktag_type) >^~ > > It just causes speculative locks to be printed wrong in the > pg_lock_status view. So what's a good fix? We can add a new case to the switch. Reporting the XID is easy since we have a column for that, but what to do about the uint32 value of the token? We could put it in the virtualxid column (which is just text), or we could put it in some int32 column and let it show as negative; or we could add a new column. Or we could do nothing, since there are no complaints about this problem. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 11:40 AM John Naylor wrote: > On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila wrote: > > can use a macro for the same? I have changed this in the attached > > patch, see what you think about it. I have used it at a few other > > places as well. > > The macro adds clarity, so I'm in favor of using it. It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more common for macros that refer to constants, and FSMLocalMapExists for expressions, but I've only seen a small amount of the code base. Do we have a style preference here, or is it more a matter of matching the surrounding code? -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: draft patch for strtof()
I wrote: > (FWIW, I'm running the patch on gaur's host, just to confirm it > does what you expect. Should have an answer in an hour or so ...) It does --- it compiles cleanly, and the float4 output matches float4-misrounded-input.out. (This is the v1 patch not v2.) regards, tom lane
Re: parseCheckAggregates vs. assign_query_collations
Andrew Gierth writes: > Looking into a bug report on the -general list about grouping sets, > which turns out to be an issue of collation assignment: if the query has > CASE GROUPING(expr) WHEN 1 ... > then the expression is rejected as not matching the one in the GROUP BY > clause, because CASE already assigned collations to the expression (as a > special case in its transform function) while the rest of the query > hasn't yet had them assigned, because parseCheckAggregates gets run > before assign_query_collations. Bleah. > I'll be looking into this in detail later, but right now, cam anyone > think of any reason why parseCheckAggregates couldn't be moved to after > assign_query_collations? I never particularly liked assign_query_collations, as a matter of overall system design. I'd prefer to nuke that and instead require collation assignment to be done per-expression, ie at the end of transformExpr and similar places. Now that we've seen this example, it's fairly clear why collation assignment really should be considered an integral part of expression parsing. Who's to say there aren't more gotchas of this sort waiting to bite us? Also, if it were integrated into transformExpr as it should have been to begin with, we would not have the need for quite so many random calls to assign_expr_collations, with consequent bugs of omission, cf 7a28e9aa0. regards, tom lane
Re: WIP: Avoid creation of the free space map for small tables
John Naylor writes: > It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more > common for macros that refer to constants, and FSMLocalMapExists for > expressions, but I've only seen a small amount of the code base. Do we > have a style preference here, or is it more a matter of matching the > surrounding code? I believe there's a pretty longstanding tradition in C coding to use all-caps names for macros representing constants. Some people think that goes for all macros period, but I'm not on board with that for function-like macros. Different parts of the PG code base make different choices between camel-case and underscore-separation for multiword function names. For that, I'd say match the style of nearby code. regards, tom lane
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2019-Jan-16, Michael Paquier wrote: > Regarding the grammar, we tend for the last couple of years to avoid > complicating the main grammar and move on to parenthesized grammars > (see VACUUM, ANALYZE, EXPLAIN, etc). So in the same vein I think that > it would make sense to only support CONCURRENTLY within parenthesis > and just plugin that with the VERBOSE option. That's my opinion too, but I was outvoted in another subthread -- see https://postgr.es/m/20181214144529.wvmjwmy7wxgmgyb3@alvherre.pgsql Stephen Frost, Andrew Gierth and Andres Freund all voted to put CONCURRENTLY outside the parens. It seems we now have three votes to put it *in* the parens (you, Peter Eisentraut, me). I guess more votes are needed to settle this issue. My opinion is that if we had had parenthesized options lists back when CREATE INDEX CONCURRENTLY was invented, we would have put it there. But we were young and needed the money ... -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: current_logfiles not following group access and instead follows log_file_mode permissions
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote: > > On reflection, maybe the problem is not that we're giving the file > > the wrong permissions, but that we're putting it in the wrong place? > > That is, seems like it should be in the logfile directory not the > > data directory. That would certainly simplify the intended use-case, > > and it would fix this complaint too. > > Yeah, thinking more on this one using for this file different > permissions than the log files makes little sense, so what you propose > here seems like a sensible thing to do things. Even if we exclude the > file from native BASE_BACKUP this would not solve the case of custom > backup solutions doing their own copy of things, when they rely on > group-read permissions. This would not solve completely the problem > anyway if log files are in the data folder, but it would address the > case where the log files are in an absolute path out of the data > folder. Actually, I agree with the initial patch on the basis that this file that's being created (which I'm honestly a bit amazed that we're doing this way; certainly seems rather grotty to me) is surely not an actual *log* file and therefore using logfile_open() to open it doesn't seem quite right. I would have hoped for a way to pass this information that didn't involve a file at all, but I'll assume that was discussed already and good reasons put forth as to why we can't avoid it. I'm not really sure putting it into the logfile directory is such a hot idea as users might have set up external log file rotation of files in that directory. Of course, in that case they'd probably signal PG right afterwards and PG would go write out a new file, but it still seems pretty awkward. I'm not terribly against solving this issue that way either though, but I tend to think the originally proposed patch is more sensible. Thanks! Stephen signature.asc Description: PGP signature
Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
James Coleman writes: > One other question on testing: do you think the "calculated array" > tests are good enough by themselves (i.e., remove the ones with array > constants of > 100 values)? I dislike that it's not as obvious what's > going on, but given that the code shouldn't care about array size > anyway...so if there's an inclination to fewer tests that's the first > place I'd look at cutting them. I don't have a strong opinion about that at this point. It might be clearer once the patch is finished; for now, there's no harm in erring towards the more-tests direction. regards, tom lane
Re: current_logfiles not following group access and instead follows log_file_mode permissions
Stephen Frost writes: > * Michael Paquier (mich...@paquier.xyz) wrote: >> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote: >>> On reflection, maybe the problem is not that we're giving the file >>> the wrong permissions, but that we're putting it in the wrong place? > I'm not really sure putting it into the logfile directory is such a hot > idea as users might have set up external log file rotation of files in > that directory. Of course, in that case they'd probably signal PG right > afterwards and PG would go write out a new file, but it still seems > pretty awkward. I'm not terribly against solving this issue that way > either though, but I tend to think the originally proposed patch is more > sensible. I dunno, I think that the current design was made without any thought whatsoever about the log-files-outside-the-data-directory case. If you're trying to set things up that way, it's because you want to give logfile read access to people who shouldn't be able to look into the data directory proper. That makes current_logfiles pretty useless to such people, as it's now designed. Now, if the expectation is that current_logfiles is just an internal working file that users shouldn't access directly, then this argument is wrong --- but then why is it documented in user-facing docs? If we're going to accept the patch as-is, then it logically follows that we should de-document current_logfiles, because we're taking the position that it's an internal temporary file not meant for user access. I don't really believe your argument about log rotation: a rotator would presumably be configured either to pay attention to file name patterns (which current_logfiles wouldn't match) or to file age (which current_logfiles shouldn't satisfy either, since it's always rewritten when we switch logfiles). If we wanted to worry about that case, a possible solution is to make the current_logfiles pathname user-configurable so it could be put in some third directory. But I think that adds more complexity than is justified --- and not just for us, but for programs trying to find and use current_logfiles. regards, tom lane
Re: current_logfiles not following group access and instead follows log_file_mode permissions
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > * Michael Paquier (mich...@paquier.xyz) wrote: > >> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote: > >>> On reflection, maybe the problem is not that we're giving the file > >>> the wrong permissions, but that we're putting it in the wrong place? > > > I'm not really sure putting it into the logfile directory is such a hot > > idea as users might have set up external log file rotation of files in > > that directory. Of course, in that case they'd probably signal PG right > > afterwards and PG would go write out a new file, but it still seems > > pretty awkward. I'm not terribly against solving this issue that way > > either though, but I tend to think the originally proposed patch is more > > sensible. > > I dunno, I think that the current design was made without any thought > whatsoever about the log-files-outside-the-data-directory case. If > you're trying to set things up that way, it's because you want to give > logfile read access to people who shouldn't be able to look into the > data directory proper. That makes current_logfiles pretty useless > to such people, as it's now designed. ... or you just want to move the log files to a more sensible location than the data directory. The justification for log_file_mode existing is because you might want to have log files with different privileges, but that's quite a different thing. > Now, if the expectation is that current_logfiles is just an internal > working file that users shouldn't access directly, then this argument > is wrong --- but then why is it documented in user-facing docs? I really couldn't say why it's documented in the user-facing docs, and for my 2c I don't really think it should be- there's a function to get that information. Sprinkling the data directory with files for users to access directly doesn't exactly fit my view of what a good API looks like. The fact that there isn't any discussion about where that file actually lives does make me suspect you're right that log files outside the data directory wasn't really contemplated. > If we're going to accept the patch as-is, then it logically follows > that we should de-document current_logfiles, because we're taking the > position that it's an internal temporary file not meant for user access. ... and hopefully we'd get rid of it one day entirely. > I don't really believe your argument about log rotation: a rotator > would presumably be configured either to pay attention to file name > patterns (which current_logfiles wouldn't match) or to file age > (which current_logfiles shouldn't satisfy either, since it's always > rewritten when we switch logfiles). Yes, a good pattern would avoid picking up on this file and most are configured that way (though they are maybe not as specific as you might think- the default here is just /var/log/postgresql/*.log). > If we wanted to worry about that case, a possible solution is to make the > current_logfiles pathname user-configurable so it could be put in some > third directory. But I think that adds more complexity than is justified > --- and not just for us, but for programs trying to find and use > current_logfiles. I'd much rather move to get rid of that file rather than increase its visability- programs should be using the provided function. Thanks! Stephen signature.asc Description: PGP signature
Re: speeding up planning with partitions
On 2019-Jan-11, Amit Langote wrote: > Looking around a bit more, I started thinking even build_child_join_sjinfo > doesn't belong in appendinfo.c (just to be clear, it was defined in > prepunion.c before this commit), so maybe we should move it to joinrels.c > and instead export adjust_child_relids that's required by it from > appendinfo.c. There's already adjust_child_relids_multilevel in > appendinfo.h, so having adjust_child_relids next to it isn't too bad. At > least not as bad as appendinfo.c exporting build_child_join_sjinfo for > joinrels.c to use. OK, pushed, thanks. I may have caused merge conflicts with the rest of the series, because I reordered some functions -- sorry about that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [PATCH] Log PostgreSQL version number on startup
On 05/01/2019 15:53, Peter Eisentraut wrote: > On 21/11/2018 15:46, Christoph Berg wrote: >> A startup looks like this: >> >> 2018-11-21 15:19:47.259 CET [24453] LOG: listening on IPv6 address "::1", >> port 5431 >> 2018-11-21 15:19:47.259 CET [24453] LOG: listening on IPv4 address >> "127.0.0.1", port 5431 >> 2018-11-21 15:19:47.315 CET [24453] LOG: listening on Unix socket >> "/tmp/.s.PGSQL.5431" >> 2018-11-21 15:19:47.394 CET [24453] LOG: starting PostgreSQL 12devel on >> x86_64-pc-linux-gnu, compiled by gcc (Debian 8.2.0-9) 8.2.0, 64-bit >> 2018-11-21 15:19:47.426 CET [24454] LOG: database system was shut down at >> 2018-11-21 15:15:35 CET >> 2018-11-21 15:19:47.460 CET [24453] LOG: database system is ready to accept >> connections >> >> (I'd rather put the start message before the listening messages, but I >> think the startup message should be logged via logging_collector, and >> listening is logged before the log file is opened.) > > Why don't we start the logging collector before opening the sockets? Specifically, something like the attached. This keeps the dynamic module loading before the logging collector start, so we see those error messages on stderr, but then the setting up of the sockets would get logged. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 0a482552d26488ec5ce8e4d7bad12bf25ba44c8b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 16 Jan 2019 17:32:01 +0100 Subject: [PATCH] postmaster: Start syslogger earlier XXX --- src/backend/postmaster/postmaster.c | 120 ++-- 1 file changed, 60 insertions(+), 60 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index a707d4d465..36efd908bb 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -992,6 +992,66 @@ PostmasterMain(int argc, char *argv[]) */ InitializeMaxBackends(); + /* +* Initialize pipe (or process handle on Windows) that allows children to +* wake up from sleep on postmaster death. +*/ + InitPostmasterDeathWatchHandle(); + + /* +* Forcibly remove the files signaling a standby promotion request. +* Otherwise, the existence of those files triggers a promotion too early, +* whether a user wants that or not. +* +* This removal of files is usually unnecessary because they can exist +* only during a few moments during a standby promotion. However there is +* a race condition: if pg_ctl promote is executed and creates the files +* during a promotion, the files can stay around even after the server is +* brought up to new master. Then, if new standby starts by using the +* backup taken from that master, the files can exist at the server +* startup and should be removed in order to avoid an unexpected +* promotion. +* +* Note that promotion signal files need to be removed before the startup +* process is invoked. Because, after that, they can be used by +* postmaster's SIGUSR1 signal handler. +*/ + RemovePromoteSignalFiles(); + + /* Do the same for logrotate signal file */ + RemoveLogrotateSignalFiles(); + + /* Remove any outdated file holding the current log filenames. */ + if (unlink(LOG_METAINFO_DATAFILE) < 0 && errno != ENOENT) + ereport(LOG, + (errcode_for_file_access(), +errmsg("could not remove file \"%s\": %m", + LOG_METAINFO_DATAFILE))); + + /* +* If enabled, start up syslogger collection subprocess +*/ + SysLoggerPID = SysLogger_Start(); + + /* +* Reset whereToSendOutput from DestDebug (its starting state) to +* DestNone. This stops ereport from sending log messages to stderr unless +* Log_destination permits. We don't do this until the postmaster is +* fully launched, since startup failures may as well be reported to +* stderr. +* +* If we are in fact disabling logging to stderr, first emit a log message +* saying so, to provide a breadcrumb trail for users who may not remember +* that their logging is configured to go somewhere else. +*/ + if (!(Log_destination & LOG_DESTINATION_STDERR)) + ereport(LOG, + (errmsg("ending log output to stderr"), +errhint("Future log output will go to log destination \"%s\".", +Log_destination_string))); + + whereToSendOutput = DestNone; + /* * Establish input sockets. * @@ -1183,12 +1243,6 @@ PostmasterMain(int argc, char *argv[]) */
Re: insensitive collations
On 16/01/2019 14:20, Daniel Verite wrote: > I've found another issue with aggregates over distinct: > the deduplication seems to ignore the collation. I have a fix for that. I'll send it with the next update. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: explain plans with information about (modified) gucs
> [v5] Hi Tomas, Peter suggested upthread to use 'settings' rather than 'gucs' for the explain option (and output?), and you seemed to agree. Are you going to include that in a future version? Speaking of the output, v5's default text doesn't match that of formatted text ('GUCs' / 'GUC').
Re: [HACKERS] generated columns
I couldn't compile v7-0001 and I am testing with the older v6-0001 (of which I still had an instance). So the problem below may have been fixed already. If you add a generated column to a file_fdw foreign table, it works OK wih VIRTUAL (the default) but with STORED it adds an empty column, silently. I would say it would make more sense to get an error. Erik Rijkers
RE: reloption to prevent VACUUM from truncating empty pages at the end of relation
>On Thu, Nov 15, 2018 at 2:30 PM Alvaro Herrera >wrote: >> >> On 2018-Nov-15, Laurenz Albe wrote: >> > > > This new option would not only mitigate the long shared_buffers > > > scan, it would also get rid of the replication conflict caused by > > > the AccessExclusiveLock taken during truncation, which is discussed > > > in > > > https://www.postgresql.org/message-id/c9374921e50a5e8fb1ecf04eb8c6eb > > > c3%40postgrespro.ru and seems to be a more difficult problem than > > > anticipated. > > > > FWIW I was just reminded yesterday that the AEL-for-truncation has > > been diagnosed to be a severe problem in production, and with no other > > solution in sight, I propose to move forward with the stop-gap. I just want to ask whether or not we could proceed with this approach for now and if it is possible that we could have this solution integrated before PG12 development ends? Regards, Kirk Jamison
Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name
On Wed, Jan 16, 2019 at 11:02:48AM -0500, Robert Haas wrote: > I think you have some valid points here, but I also think that the > patch as written doesn't really seem to have any user-visible > benefits. Sure, ready_to_display is a crock, but getting rid of it > doesn't immediately help anybody. And on the flip side your patch > might have bugs, in which case we'll be worse off. I'm not going to > stand on my soapbox and say that committing this patch is a terrible > idea, because as far as I can tell, it isn't. But it feels like it > might be a commit for the sake of making a commit, and I can't get too > excited about that either. Since the logic will have to be further > revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait > until that's done and then commit all the changes together instead of > guessing that this might make that easier? You can say that about any patch which gets committed, even refactoring patches have a risk of introducing bugs, and even subtle ones understood only after release. I was justifying the existence of this thread exactly for opposite reasons. After reviewing the other patch switch to reload the parameters we could do more, and spawning a new thread to attract the correct audience looked rather right (it still does): https://www.postgresql.org/message-id/20181212043208.gi17...@paquier.xyz And this refactoring seemed to be justified as part of a different thread. I don't mind dropping this patch and this thread and just go back there, but that seems like taking steps backward, and what's proposed here is a bit different than just making the parameters reloadable. Still the refactoring looks justified to me for correctness with the parameter handling. Anyway, based on the opinions gathered until now, I don't mind just dropping this patch and move on to the other thread, though we will likely finish with the same discussion as what's done here :) A patch on which two committers are expressing concerns about is as good as going to the waste bin. That's of course fine by me. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] generated columns
On Wed, Jan 16, 2019 at 02:14:41PM +0100, Peter Eisentraut wrote: > On 15/01/2019 08:13, Michael Paquier wrote: >> When testing a bulk INSERT into a table which has a stored generated >> column, memory keeps growing in size linearly, which does not seem >> normal to me. If inserting more tuples than what I tested (I stopped >> at 10M because of lack of time), it seems to me that this could result >> in OOMs. I would have expected the memory usage to be steady. > > What are you executing exactly? One INSERT command with many rows? Yes, something like that grows the memory and CPU usage rather linearly: CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED); INSERT INTO tab VALUES (generate_series(1,1)); -- Michael signature.asc Description: PGP signature
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Wed, Jan 16, 2019 at 02:59:31PM -0300, Alvaro Herrera wrote: > That's my opinion too, but I was outvoted in another subthread -- see > https://postgr.es/m/20181214144529.wvmjwmy7wxgmgyb3@alvherre.pgsql > Stephen Frost, Andrew Gierth and Andres Freund all voted to put > CONCURRENTLY outside the parens. It seems we now have three votes to > put it *in* the parens (you, Peter Eisentraut, me). I guess more votes > are needed to settle this issue. Sure, let's see. I would have been in the crowd of not using parenthetised grammar five years ago, but the recent deals with other commands worry me, as we would repeat the same errors. > My opinion is that if we had had parenthesized options lists back when > CREATE INDEX CONCURRENTLY was invented, we would have put it there. > But we were young and needed the money ... :) -- Michael signature.asc Description: PGP signature
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On 1/16/19 7:56 AM, David Rowley wrote:> On Tue, 15 Jan 2019 at 08:21, Tomas Vondra wrote: >> Turns out you were right - the attribute_referenced piece was quite >> unnecessary. So I've removed it. I've also extended the regression tests >> to verify changing type of another column does not reset the stats. > > (Trying to find my feet over here) > > I've read over the entire thread, and apart from missing the last two > emails and therefore the latest patch, I managed to read over most of > the MCV patch. I didn't quite get to reading mcv.c and don't quite > have the energy to take that on now. > Thanks for looking! > At this stage I'm trying to get to know the patch. I read a lot of > discussing between you and Dean ironing out how the stats should be > used to form selectivities. At the time I'd not read the patch yet, > so most of it went over my head. > > I did note down a few things on my read. I've included them below. > Hopefully, they're useful. > > MCV list review > > 1. In mvc.c there's Assert(ndistinct <= UINT16_MAX); This should be > PG_UINT16_MAX > Yep. Will fix. > 2. math.h should be included just after postgres.h > Yep. Will fix. > 3. Copyright is still -2017 in mcv.c. Hopefully, if you change it to > 2019, you'll never have to bump it ever again! :-) > Optimist ;-) > 4. Looking at pg_stats_ext_mcvlist_items() I see you've coded the > string building manually. The way it's coded I'm finding a little > strange. It means the copying becomes quadratic due to > > snprintf(buff, 1024, format, values[1], DatumGetPointer(valout)); > strncpy(values[1], buff, 1023); > > So basically, generally, here you're building a new string with > values[1] followed by a comma, then followed by valout. One the next > line you then copy that new buffer back into values[1]. I understand > this part is likely not performance critical, but I see no reason to > write the code this way. > > Are you limiting the strings to 1024 bytes on purpose? I don't see > any comment mentioning you want to truncate strings. > > Would it not be better to do this part using a > AppendStringInfoString()? and just manually add a '{', ',' or '}' as > and when required? >> DatumGetPointer(valout) should really be using DatumGetCString(valout). > > Likely you can also use heap_form_tuple. This will save you having to > convert ints into strings then only to have BuildTupleFromCStrings() > do the reverse. > I agree. I admit all of this is a residue of an initial hackish version of the function, and should be changed to StringInfo. Will fix. > 5. individiaul -> individual > lists. This allows very accurate estimates for individiaul columns, but > > litst -> lists > > litst on combinations of columns. Similarly to functional dependencies > Will fix. > 6. Worth mentioning planning cycles too? > > "It's advisable to create MCV statistics objects only > on combinations of columns that are actually used in conditions together, > and for which misestimation of the number of groups is resulting in bad > plans. Otherwise, the ANALYZE cycles are just wasted." > Makes sense. Although that's what we say about the existing stats, so perhaps we should tweak that too. > 7. straight-forward -> straightforward > > (most-common values) lists, a straight-forward extension of the per-column > > 8. adresses -> addresses > > statistics adresses the limitation by storing individual values, but it > Will fix. Thanks for proof-reading. > 9. Worth mentioning ANALYZE time? > > This section introduces multivariate variant of MCV > (most-common values) lists, a straight-forward extension of the per-column > statistics described in linkend="row-estimation-examples"/>. This > statistics adresses the limitation by storing individual values, but it > is naturally more expensive, both in terms of storage and planning time. > Yeah. > 10. low -> a low > > with low number of distinct values. Before looking at the second query, > > 11. them -> then > > on items in the MCV list, and them sums the frequencies > Will fix. > 12. Should we be referencing the source from the docs? > > See mcv_clauselist_selectivity > in src/backend/statistics/mcv.c for details. > > hmm. I see it's not the first going by: git grep -E "\w+\.c\<" > gt Hmm, that does not return anything to me - do you actually see any references to .c files in the sgml docs? I agree that probably is not a good idea, so I'll remove that. > 13. Pretty minor, but the following loop in > UpdateStatisticsForTypeChange() could use a break; > > attribute_referenced = false; > for (i = 0; i < staForm->stxkeys.dim1; i++) > if (attnum == staForm->stxkeys.values[i]) > attribute_referenced = true; > > UPDATE: If I'd reviewed the correct patch I'd have seen that you'd > removed this already > ;-) > 14. Again in UpdateStatisticsForTypeChange(), would it not be better > to do the statext_is_kind_b
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Thu, 17 Jan 2019 at 14:19, Tomas Vondra wrote: > > 12. Should we be referencing the source from the docs? > > > > See mcv_clauselist_selectivity > > in src/backend/statistics/mcv.c for details. > > > > hmm. I see it's not the first going by: git grep -E "\w+\.c\<" > > gt > Hmm, that does not return anything to me - do you actually see any > references to .c files in the sgml docs? I agree that probably is not a > good idea, so I'll remove that. Yeah, I see quite a few. I shouldn't have escaped the < > > 18. In dependencies_clauselist_selectivity() there seem to be a new > > bug introduced. We do: > > > > /* mark this one as done, so we don't touch it again. */ > > *estimatedclauses = bms_add_member(*estimatedclauses, listidx); > > > > but the bms_is_member() check that skipped these has been removed. > > > > It might be easier to document if we just always do: > > > >if (bms_is_member(listidx, *estimatedclauses)) > > continue; > > > > at the start of both loops. list_attnums can just be left unset for > > the originally already estimatedclauses. > > > It's probably not as clear as it should be, but if the clause is already > estimated (or incompatible), then the list_attnums[] entry will be > InvalidAttrNumber. Which is what we check in the second loop. hmm. what about the items that should be skipped when you do the *estimatedclauses = bms_add_member(*estimatedclauses, listidx); in the 2nd loop. You'll need to either also do list_attnums[listidx] = InvalidAttrNumber; for them, or put back the bms_is_member() check, no? I admit to not having debugged it to find an actual bug, it just looks suspicious. > > 25. Does statext_is_compatible_clause_internal)_ need to skip over > RelabelTypes? > > > I believe it does, based on what I've observed during development. Why > do you think it's not necessary? The other way around. I thought it was necessary, but the code does not do it. > > 26. In statext_is_compatible_clause_internal() you mention: /* We only > > support plain Vars for now */, but I see nothing that ensures that > > only Vars are allowed in the is_opclause() condition. > > > > /* see if it actually has the right */ > > ok = (NumRelids((Node *) expr) == 1) && > > (is_pseudo_constant_clause(lsecond(expr->args)) || > > (varonleft = false, > >is_pseudo_constant_clause(linitial(expr->args; > > > > the above would allow var+var == const through. > > > But then we call statext_is_compatible_clause_internal on it again, and > that only allows Vars and "Var op Const" expressions. Maybe there's a > way around that? True, I missed that. Drop that one. > > 33. "ndimentions"? There's no field in the struct by that name. I'd > > assume it's the same size as the isnull array above it? > > > > Datum*values; /* variable-length (ndimensions) */ > > > Yes, that's the case. If it relates to the ndimensions field from the struct below, maybe it's worth crafting that into the comment somehow. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: draft patch for strtof()
> "Tom" == Tom Lane writes: >> (FWIW, I'm running the patch on gaur's host, just to confirm it >> does what you expect. Should have an answer in an hour or so ...) Tom> It does --- it compiles cleanly, and the float4 output matches Tom> float4-misrounded-input.out. Well I'm glad _something_ works. Because it turns out that Windows (at least the version running on Appveyor) completely fucks this up; strtof() is apparently returning infinity or zero _without setting errno_ for values out of range for float: input of "10e70" returns +inf with no error, input of "10e-70" returns (exactly) 0.0 with no error. *facepalm* Any windows-users have any idea about this? -- Andrew (irc:RhodiumToad)
Re: parseCheckAggregates vs. assign_query_collations
> "Tom" == Tom Lane writes: >> I'll be looking into this in detail later, but right now, cam anyone >> think of any reason why parseCheckAggregates couldn't be moved to >> after assign_query_collations? Tom> I never particularly liked assign_query_collations, as a matter of Tom> overall system design. I'd prefer to nuke that and instead require Tom> collation assignment to be done per-expression, ie at the end of Tom> transformExpr and similar places. Now that we've seen this Tom> example, it's fairly clear why collation assignment really should Tom> be considered an integral part of expression parsing. Who's to say Tom> there aren't more gotchas of this sort waiting to bite us? Also, Tom> if it were integrated into transformExpr as it should have been to Tom> begin with, we would not have the need for quite so many random Tom> calls to assign_expr_collations, with consequent bugs of omission, Tom> cf 7a28e9aa0. Sure, this might be the right approach going forward. But right now we need a back-patchable fix, and the above sounds a bit too intrusive for that. Turns out the issue can be reproduced without grouping sets too: select case a||'' when '1' then 0 else 1 end from (select (select random()::text) as a) s group by a||''; ERROR: column "s.a" must appear in the GROUP BY clause or be used in an aggregate function select case when a||'' = '1' then 0 else 1 end from (select (select random()::text) as a) s group by a||''; -- works -- Andrew (irc:RhodiumToad)
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 11:25 PM Tom Lane wrote: > > John Naylor writes: > > It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more > > common for macros that refer to constants, and FSMLocalMapExists for > > expressions, but I've only seen a small amount of the code base. Do we > > have a style preference here, or is it more a matter of matching the > > surrounding code? > I am fine with the style (FSMLocalMapExists) you are suggesting, but see the similar macros in nearby code like: #define FSM_TREE_DEPTH ((SlotsPerFSMPage >= 1626) ? 3 : 4) I think the above is not an exact match. So, I have looked around and found few other macros which serve a somewhat similar purpose, see below: #define ATT_IS_PACKABLE(att) \ ((att)->attlen == -1 && (att)->attstorage != 'p') #define VARLENA_ATT_IS_PACKABLE(att) \ ((att)->attstorage != 'p') #define CHECK_REL_PROCEDURE(pname) #define SPTEST(f, x, y) \ DatumGetBool(DirectFunctionCall2(f, PointPGetDatum(x), PointPGetDatum(y))) > I believe there's a pretty longstanding tradition in C coding to use > all-caps names for macros representing constants. Some people think > that goes for all macros period, but I'm not on board with that for > function-like macros. > > Different parts of the PG code base make different choices between > camel-case and underscore-separation for multiword function names. > For that, I'd say match the style of nearby code. > Yes, that is what we normally do. However, in some cases, we might need to refer to other places as well which I think is the case here. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: WIP: Avoid creation of the free space map for small tables
On Wed, Jan 16, 2019 at 10:10 PM John Naylor wrote: > > On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila wrote: > > > > On Fri, Jan 11, 2019 at 3:54 AM John Naylor > > wrote: > > 1. > > Commit message: > > > Any pages with wasted free space become visible at next relation > > > extension, so we still control table bloat. > > > > I think the free space will be available after the next pass of > > vacuum, no? How can relation extension make it available? > > To explain, this diagram shows the map as it looks for different small > table sizes: > > 0123 > A > NA > ANA > NANA > > So for a 3-block table, the alternating strategy never checks block 1. > Any free space block 1 has acquired via delete-and-vacuum will become > visible if it extends to 4 blocks. We are accepting a small amount of > bloat for improved performance, as discussed. Would it help to include > this diagram in the README? > Yes, I think it would be good if you can explain the concept of local-map with the help of this example. > > 2. > > +2. For very small heap relations, the FSM would be relatively large and > > +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for > > +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space > > +and to improve performance. To locate free space in this case, we simply > > +iterate over the heap, trying alternating pages in turn. There may be some > > +wasted free space in this case, but it becomes visible again upon next > > +relation extension. > > > > a. Again, how space becomes available at next relation extension. > > b. I think there is no use of mentioning the version number in the > > above comment, this code will be present from PG-12, so one can find > > out from which version this optimization is added. > > It fits with the reference to PG 8.4 earlier in the document. I chose > to be consistent, but to be honest, I'm not much in favor of a lot of > version references in code/READMEs. > Then let's not add a reference to the version number in this case. I also don't see much advantage of adding version number at least in this case. > > I'll include these with some grammar corrections in the next version. > Okay, thanks! -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Thu, 17 Jan 2019 at 01:56, David Rowley wrote: > At this stage I'm trying to get to know the patch. I read a lot of > discussing between you and Dean ironing out how the stats should be > used to form selectivities. At the time I'd not read the patch yet, > so most of it went over my head. > > I did note down a few things on my read. I've included them below. > Hopefully, they're useful. > > MCV list review Part 2: 35. The evaluation order of this macro is wrong. #define ITEM_SIZE(ndims) \ (ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)) You'd probably want ITEM_SIZE(10) to return 170, but: select (10 * (2 + 1) + 2 * 8); ?column? -- 46 Unsure why this does not cause a crash. ndims should also have parenthesis around it in case someone does ITEM_SIZE(x + y), likewise for the other ITEM_* macros. 36. Could do with some comments in get_mincount_for_mcv_list(). What's magic about 0.04? 37. I think statext_mcv_build() needs some comments to detail out the arguments. For example can attrs be empty? Must it contain at least 2 members? etc. 38. Too many "it"s * we simply treat it as a special item in the MCV list (it it makes it). 39. I don't see analyze_mcv_list() being used anywhere around this comment: * If we can fit all the items onto the MCV list, do that. Otherwise use * analyze_mcv_list to decide how many items to keep in the MCV list, just * like for the single-dimensional MCV list. 40. The comment in the above item seems to indicate the condition for when all items can fit in the number of groups, but the if condition does not seem to allow for an exact match? if (ngroups > nitems) if you want to check if the number of items can fit in the number of groups should it be: if (ngroups >= nitems) or if (nitems <= ngroups) ? Perhaps I've misunderstood. The comment is a little confusing as I'm not sure where the "Otherwise" code is located. 41. I don't think palloc0() is required here. palloc() should be fine since you're initialising each element in the loop. mcvlist->items = (MCVItem * *) palloc0(sizeof(MCVItem *) * nitems); for (i = 0; i < nitems; i++) { mcvlist->items[i] = (MCVItem *) palloc(sizeof(MCVItem)); mcvlist->items[i]->values = (Datum *) palloc(sizeof(Datum) * numattrs); mcvlist->items[i]->isnull = (bool *) palloc(sizeof(bool) * numattrs); } I think I agree with the comment above that chunk about reducing the number of pallocs, even if it's just allocating the initial array as MCVItems instead of pointers to MCVItems 42. I don't think palloc0() is required in build_distinct_groups(). palloc() should be ok. Maybe it's worth an Assert(j + 1 == ngroups) to ensure count_distinct_groups got them all? 43. You're assuming size_t and long are the same size here. elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length); I know at least one platform where that's not true. 44. Should use DatumGetCString() instead of DatumGetPointer(). else if (info[dim].typlen == -2) /* cstring */ { memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1); data += strlen(DatumGetPointer(v)) + 1; /* terminator */ } 45. No need to set this to NULL. Datum*v = NULL; Is "value" a better name than "v"? 46. What's the extra 'd' for in: elog(ERROR, "invalid MCV magic %d (expected %dd)", and elog(ERROR, "invalid MCV type %d (expected %dd)", 47. Wondering about the logic behind the variation between elog() and ereport() in statext_mcv_deserialize(). They all looks like "can't happen" type errors. 48. format assumes size_t is the same size as long. elog(ERROR, "invalid MCV size %ld (expected %ld)", VARSIZE_ANY_EXHDR(data), expected_size); 49. palloc0() followed by memset(). Can just use palloc(). matches = palloc0(sizeof(char) * mcvlist->nitems); memset(matches, (is_or) ? STATS_MATCH_NONE : STATS_MATCH_FULL, sizeof(char) * mcvlist->nitems); 50. The coding style in mcv_get_match_bitmap I think needs to be postgresqlified. We normally declare all our variables in a chunk then start setting them, unless the assignment is very simple. I don't recall places in the code where have a comment when declaring a variable, for example. FmgrInfo gtproc; Var*var = (varonleft) ? linitial(expr->args) : lsecond(expr->args); Const*cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args); bool isgt = (!varonleft); TypeCacheEntry *typecache = lookup_type_cache(var->vartype, TYPECACHE_GT_OPR); /* match the attribute to a dimension of the statistic */ int idx = bms_member_index(keys, var->varattno); -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Delay locking partitions during query execution
On 2019/01/04 9:53, David Rowley wrote: > Without PREPAREd statements, if the planner itself was unable to prune > the partitions it would already have obtained the lock during > planning, so AcquireExecutorLocks(), in this case, would bump into the > local lock hash table entry and forego trying to obtain the lock > itself. That's not free, but it's significantly faster than obtaining > a lock. Hmm, AcquireExecutorLocks is only called if prepared statements are used and that too if a generic cached plan is reused. GetCachedPlan -> CheckCachedPlan -> AcquireExecutorLocks In GetCachedPlan: if (!customplan) { if (CheckCachedPlan(plansource)) Thanks, Amit
PSA: we lack TAP test coverage on NetBSD and OpenBSD
Although we've got a few NetBSD and OpenBSD buildfarm critters, none of them are doing --enable-tap-tests. If they were, we'd have noticed the pgbench regression tests falling over: not ok 3 - pgbench option error: bad option stderr /(?^:(unrecognized|illegal) option)/ # Failed test 'pgbench option error: bad option stderr /(?^:(unrecognized|illegal) option)/' # at t/002_pgbench_no_server.pl line 190. # 'pgbench: unknown option -- bad-option # Try "pgbench --help" for more information. # ' # doesn't match '(?^:(unrecognized|illegal) option)' Sure enough, manual testing confirms that on these platforms that error message is spelled differently: $ pgbench --bad-option pgbench: unknown option -- bad-option Try "pgbench --help" for more information. I am, TBH, inclined to fix this by removing that test case rather than teaching it another spelling to accept. I think it's very hard to make the case that tests like this one are anything but a waste of developer and buildfarm time. When they are also a portability hazard, it's time to cut our losses. (I also note that this test has caused us problems before, cf 869aa40a2 and 933851033.) regards, tom lane
Re: Ryu floating point output patch
> On Jan 15, 2019, at 2:37 AM, Andrew Gierth > wrote: > > Andres> strtod()'s job ought to computationally be significantly easier > Andres> than the other way round, no? And if there's buggy strtod() > Andres> implementations out there, why would they be guaranteed to do > Andres> the correct thing with our current output, but not with ryu's? > > Funny thing: I've been devoting considerable effort to testing this, and > the one failure mode I've found is very interesting; it's not a problem > with strtod(), in fact it's a bug in our float4in caused by _misuse_ of > strtod(). Hi, I'm trying to reproduce the results by calling float4in in my test code. But I have some difficulties linking the code: testfloat4.c:(.text+0x34): undefined reference to `float4in' testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll' I tried offering float.o to the linker in addition to my test program. I also tried to link all the objects (*.o). But the errors still exist. I attached my changes as a patch. I wonder if creating separated test programs is a good way of testing the code, and I'm interested in learning what I missed. Thank you. test_float4.patch Description: Binary data
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On Wed, Jan 16, 2019 at 05:56:15PM +0100, Andreas Karlsson wrote: > On 1/16/19 9:27 AM, Michael Paquier wrote: >> Does somebody mind if I jump into the ship after so long? I was the >> original author of the monster after all... > > Fine by me. Peter? Okay, I have begun digging into the patch, and extracted for now two things which can be refactored first, giving a total of three patches: - 0001, which creates WaitForOlderSnapshots to snapmgr.c. I think that this can be useful for external extensions to have a process wait for snapshots older than a minimum threshold hold by other transactions. - 0002, which moves the concurrent index build into its own routine, index_build_concurrent(). At the same time, index_build() has a isprimary argument which is not used, so let's remove it. This simplifies a bit the refactoring as well. - 0003 is the core patch, realigned with the rest, fixing some typos I found on the way. Here are also some notes for things I am planning to look after with a second pass: - The concurrent drop (phase 5) part, still shares a lot with DROP INDEX CONCURRENTLY, and I think that we had better refactor more the code so as REINDEX CONCURRENTLY shared more with DROP INDEX. One thing which I think is incorrect is that we do not clear the invalid flag of the drop index before marking it as dead. This looks like a missing piece from another concurrent-related bug fix lost over the rebases this patch had. - set_dead could be refactored so as it is able to handle in input multiple indexes, using WaitForLockersMultiple(). This way CREATE INDEX CONCURRENTLY could also use it. - There are no regression tests for partitioned tables. - The NOTICE messages showing up when a table has no indexes should be removed. - index_create() does not really need a TupleDesc argument, as long as the caller is able to provide a list of column names. - At the end of the day, I think that it would be nice to reach a state where we have a set of low-level routines like index_build_concurrent, index_set_dead_concurrent which are used by both paths of CONCURRENTLY and can be called for each phase within a given transaction. Those pieces can also be helpful to implement for example an extension able to do concurrent reindexing out of core. I think that the refactorings in 0001 and 0002 are committable as-is, and this shaves some code from the core patch. Thoughts? -- Michael From 2d871385cdecd2b860f5eeb2615d8fcf9e866e54 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 17 Jan 2019 11:54:42 +0900 Subject: [PATCH 1/3] Refactor code to wait for older snapshots into its own routine This is being used by CREATE INDEX CONCURRENTLY to make sure that valid indexes are marked as such after waiting for all transactions using snapshots older than the reference snapshot used for concurrent index validation are gone. This piece is useful independently, and can be used by REINDEX CONCURRENTLY. --- src/backend/commands/indexcmds.c | 71 +--- src/backend/utils/time/snapmgr.c | 81 src/include/utils/snapmgr.h | 2 + 3 files changed, 85 insertions(+), 69 deletions(-) diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index 1959e8a82e..a81e656059 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -362,9 +362,7 @@ DefineIndex(Oid relationId, int numberOfAttributes; int numberOfKeyAttributes; TransactionId limitXmin; - VirtualTransactionId *old_snapshots; ObjectAddress address; - int n_old_snapshots; LockRelId heaprelid; LOCKTAG heaplocktag; LOCKMODE lockmode; @@ -1252,74 +1250,9 @@ DefineIndex(Oid relationId, * The index is now valid in the sense that it contains all currently * interesting tuples. But since it might not contain tuples deleted just * before the reference snap was taken, we have to wait out any - * transactions that might have older snapshots. Obtain a list of VXIDs - * of such transactions, and wait for them individually. - * - * We can exclude any running transactions that have xmin > the xmin of - * our reference snapshot; their oldest snapshot must be newer than ours. - * We can also exclude any transactions that have xmin = zero, since they - * evidently have no live snapshot at all (and any one they might be in - * process of taking is certainly newer than ours). Transactions in other - * DBs can be ignored too, since they'll never even be able to see this - * index. - * - * We can also exclude autovacuum processes and processes running manual - * lazy VACUUMs, because they won't be fazed by missing index entries - * either. (Manual ANALYZEs, however, can't be excluded because they - * might be within transactions that are going to do arbitrary operations - * later.) - * - * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not - * check for that. - * - * If a process goes idle-in-transaction
Re: Ryu floating point output patch
> "Donald" == Donald Dong writes: Donald> Hi, Donald> I'm trying to reproduce the results by calling float4in in my Donald> test code. But I have some difficulties linking the code: Donald> testfloat4.c:(.text+0x34): undefined reference to `float4in' Donald> testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll' The way you're doing that isn't really a good way to test it - those other programs being built by that rule are all frontend code, whereas float4in is a backend function. If you want to add your own test it, you could do a loadable module for it, or just do tests from SQL. -- Andrew (irc:RhodiumToad)
Re: de-deduplicate code in DML execution hooks in postgres_fdw
(2019/01/16 20:30), Etsuro Fujita wrote: (2019/01/16 15:54), Michael Paquier wrote: On Wed, Jan 16, 2019 at 02:59:15PM +0900, Etsuro Fujita wrote: If there are no objections, I'll commit that version of the patch. I think that you could use PgFdwModifyState for the second argument of execute_foreign_modify instead of ResultRelInfo. Yeah, that is another option, but my favorite would be to use ResultRelInfo, as in the original patch by Ashutosh, because that makes it possible to write postgresExecForeignInsert, postgresExecForeignUpdate, and postgresExecForeignDelete as a single line. Except of that nit, it looks fine to me, thanks for taking care of it. Great! Thanks for the review! Pushed. I left that argument alone. I think we can change it later, if necessary :). Best regards, Etsuro Fujita
Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well
On Tue, Jan 08, 2019 at 09:09:56AM +0100, Christoph Berg wrote: > Re: Andres Freund 2019-01-08 > <20190108011837.n4mx7dadvojv2...@alap3.anarazel.de> >>> Here's another revision that doesn't add an extra CXXOPT variable but >>> just extends CXXFLAGS whenever COPT or PROFILE are set, which seems >>> more usable. >> >> Why does that seem more usable? How's that supposed to be used for flags >> that aren't valid for both languages? > > The existing COPT and PROFILE are already catch-all type flags that > add to CFLAGS and LDFLAGS. Possibly we should leave those alone and > only add PG_CXXFLAGS and PG_LDFLAGS? Personally I see pgxs as something completely different than what COPT and PROFILE are as we are talking about two different facilities: one which is part of the core installation, and the other which can be used for extension modules, so having PG_CFLAGS, PG_CXXFLAGS and PG_LDFLAGS, but leaving CXXFLAGS out of COPT and PROFILE looks like the better long-term move in terms of pluggability. My 2c. -- Michael signature.asc Description: PGP signature
Re: de-deduplicate code in DML execution hooks in postgres_fdw
On Thu, Jan 17, 2019 at 02:44:25PM +0900, Etsuro Fujita wrote: > Pushed. I left that argument alone. I think we can change it later, if > necessary :). Sure, that's fine as well. Thanks for committing the patch. -- Michael signature.asc Description: PGP signature
Re: pgbench doc fix
>> On 2018-Dec-03, Tatsuo Ishii wrote: >> >>> To: >>> --- >>> -M querymode >>> --protocol=querymode >>> >>> Protocol to use for submitting queries to the server: >>> >>> simple: use simple query protocol. >>> >>> extended: use extended query protocol. >>> >>> prepared: use extended query protocol with prepared statements. >>> >>> Because in "prepared" mode pgbench reuses the parse analysis >>> result for the second and subsequent query iteration, pgbench runs >>> faster in the prepared mode than in other modes. >> >> Looks good to me. > > Thanks. I'm going to commit this if there's no objection. Done. Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: New vacuum option to do only freezing
On Thu, Jan 17, 2019 at 1:33 AM Robert Haas wrote: > > On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada wrote: > > As the above comment says, it's possible that the state of an > > INSERT_IN_PROGRESS tuple could be changed to 'dead' after > > heap_page_prune(). Since such tuple is not truncated at this point we > > record it and set it as UNUSED in lazy_vacuum_page(). I think that the > > DISABLE_INDEX_CLEANUP case is the same; we need to process them after > > recorded. Am I missing something? > > I believe you are. Think about it this way. After the first pass > over the heap has been completed but before we've done anything to the > indexes, let alone started the second pass over the heap, somebody > could kill the vacuum process. Somebody could in fact yank the plug > out of the wall, stopping the entire server in its tracks. If they do > that, then lazy_vacuum_page() will never get executed. Yet, the heap > can't be in any kind of corrupted state at this point, right? We know > that the system is resilient against crashes, and killing a vacuum or > even the whole server midway through does not leave the system in any > kind of bad state. If it's fine for lazy_vacuum_page() to never be > reached in that case, it must also be fine for it never to be reached > if we ask for vacuum to stop cleanly before lazy_vacuum_page(). > Yes, that's right. It doesn't necessarily need to be reached lazy_vacuum_page(). > In the case of the particular comment to which you are referring, that > comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I > don't see how it bears on the question of whether we need to call > lazy_vacuum_page(). It's true that, at any point in time, an > in-progress transaction could abort. And if it does then some > insert-in-progress tuples could become dead. But if that happens, > then the next vacuum will remove them, just as it will remove any > tuples that become dead for that reason when vacuum isn't running in > the first place. You can't use that as a justification for needing a > second heap pass, because if it were, then you'd also need a THIRD > heap pass in case a transaction aborts after the second heap pass has > visited the pages, and a fourth heap pass in case a transaction aborts > after the third heap pass has visited the pages, etc. etc. forever. > The reason why I processed the tuples that became dead after the first heap pass is that I was not sure the reason why we ignore such tuples in the second heap pass despite of there already have been the code doing so which has been used for a long time. I thought we can do that in the same manner even in DISABLE_INDEX_CLEANUP case. Also, since I thought that lazy_vacuum_page() is the best place to set them as DEAD I modified it (In the previous patch I introduced another function setting them as DEAD aside from lazy_vacuum_page(). But since these were almost same I merged them). However, as you mentioned it's true that these tuples will be removed by the next vacuum even if we ignore. Also these tuples would not be many and without this change the patch would get more simple. So if the risk of this change is greater than the benefit, we should not do that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
On Thu, 17 Jan 2019 at 03:42, David Rowley wrote: > 35. The evaluation order of this macro is wrong. > > #define ITEM_SIZE(ndims) \ > (ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)) > > You'd probably want ITEM_SIZE(10) to return 170, but: > > select (10 * (2 + 1) + 2 * 8); > ?column? > -- >46 > > Unsure why this does not cause a crash. > No, the code is actually correct, as explained in the comment above it. Each item contains (ndims) copies of the uint16 index and the boolean, but it always contains exactly 2 doubles, independent of ndims. > ndims should also have parenthesis around it in case someone does > ITEM_SIZE(x + y), likewise for the other ITEM_* macros. > +1 on that point. Regards, Dean
Re: [HACKERS] generated columns
On 16/01/2019 22:40, Erik Rijkers wrote: > I couldn't compile v7-0001 and I am testing with the older v6-0001 (of > which I still had an instance). > > So the problem below may have been fixed already. > > If you add a generated column to a file_fdw foreign table, it works OK > wih VIRTUAL (the default) but with STORED it adds an empty column, > silently. I would say it would make more sense to get an error. Yes, v7 has addressed foreign-data wrappers. (I only tested with postgres_fdw.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services