date:20190116

Re: [HACKERS] generated columns

2019-01-16 Thread Peter Eisentraut

On 15/01/2019 08:18, Pavel Stehule wrote:
> I would to have a mechanism for safe replacement of triggers of type
> 
> if TG_TYPE = 'INSERT' THEN
>   NEW.inserted := CURRENT_TIMESTAMP;
> ELSE IF TG_TYPE = 'UPDATE' THEN
>   NEW.updated := CURRENT_TIMESTAMP;
>  ..

That kind of use is probably better addressed with a temporal facility.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Libpq support to connect to standby server as priority

2019-01-16 Thread Tsunakawa, Takayuki

From: Tatsuo Ishii [mailto:is...@sraoss.co.jp]
> I don't know what PgJDBC is doing, however I think libpq needs to do
> more than just retrying.
> 
> 1) Try to find a node on which pg_is_in_recovery() returns false. If
>found, then we assume that is the primary. We also assume that
>other nodes are standbys. done.
> 
> 2) If there's no node on which pg_is_in_recovery() returns false, then
>we need to retry until we find it. To not retry forever, there
>should be a timeout counter parameter.

It may be convenient for libpq to be able to retry connection attempts for a 
specified duration (by failover_timeout or such), because it eliminates the 
need for the application to do the retry.  But I think it's a desirable 
feature, not a required one.


Regards
Takayuki Tsunakawa

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-16 Thread Michael Paquier

On Fri, Jan 04, 2019 at 03:18:06PM +0300, Sergei Kornilov wrote:
> NOTICE seems unnecessary here.
> 
> Unfortunally concurrently reindex loses comments, reproducer:

Yes, the NOTICE message makes little sense.

I am getting back in touch with this stuff.  It has been some time but
the core of the patch has not actually changed in its base concept, so
I am still very familiar with it as the original author.  There are
even typos I may have introduced a couple of years back, like
"contraint".  I have not yet spent much time on that, but there are at
quick glance a bunch of things that could be retouched to get pieces
of that committable.

+The concurrent index created during the processing has a name ending in
+the suffix ccnew, or ccold if it is an old index definiton which we failed
+to drop. Invalid indexes can be dropped using DROP 
INDEX,
+including invalid toast indexes.
This needs  markups for "ccnew" and "ccold".  "definiton" is
not correct.

index_create does not actually need its extra argument with the tuple
descriptor.  I think that we had better grab the column name list from
indexInfo and just pass that down to index_create() (patched on my
local branch), so it is an overkill to take a full copy of the index's
TupleDesc.

The patch, standing as-is, is close to 2k lines long, so let's cut
that first into more pieces refactoring the concurrent build code.
Here are some preliminary notes:
- WaitForOlderSnapshots() could be in its own patch.
- index_concurrently_build() and index_concurrently_set_dead() can be
in an independent patch.  set_dead() had better be a wrapper on top of
index_set_state_flags actually which is able to set any kind of
flags.
- A couple of pieces in index_create() could be cut as well.

I can send patches for those things as first steps which could happen
in this commit then, and commit them as needed.  This way, we reduce
the size of the main patch.  Even if the main portion does not get
into v12, we'd still have base pieces to move on next.

Regarding the grammar, we tend for the last couple of years to avoid
complicating the main grammar and move on to parenthesized grammars
(see VACUUM, ANALYZE, EXPLAIN, etc).  So in the same vein I think that
it would make sense to only support CONCURRENTLY within parenthesis
and just plugin that with the VERBOSE option.

Does somebody mind if I jump into the ship after so long?  I was the
original author of the monster after all...
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] generated columns

2019-01-16 Thread Pavel Stehule

st 16. 1. 2019 v 9:26 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> On 15/01/2019 08:18, Pavel Stehule wrote:
> > I would to have a mechanism for safe replacement of triggers of type
> >
> > if TG_TYPE = 'INSERT' THEN
> >   NEW.inserted := CURRENT_TIMESTAMP;
> > ELSE IF TG_TYPE = 'UPDATE' THEN
> >   NEW.updated := CURRENT_TIMESTAMP;
> >  ..
>
> That kind of use is probably better addressed with a temporal facility.
>

yes. I am looking for this functionality in Postgres

Pavel


> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: New vacuum option to do only freezing

2019-01-16 Thread Masahiko Sawada

On Wed, Jan 16, 2019 at 5:24 AM Robert Haas  wrote:
>
> On Mon, Jan 14, 2019 at 9:24 PM Masahiko Sawada  wrote:
> > I think that because the tuples that got dead after heap_page_prune()
> > looked are recorded but not removed without lazy_vacuum_page() we need
> > to process them in lazy_vacuum_page(). For decision about whether to
> > truncate we should not change it, so I will fix it. It should be an
> > another option to control whether to truncate if we want.
>
> I think that heap_page_prune() is going to truncate dead tuples to
> dead line pointers. At that point the tuple is gone.  The only
> remaining step is to mark the dead line pointer as unused, which can't
> be done without scanning the indexes.  So I don't understand what
> lazy_vacuum_page() would need to do in this case.
>

vacuumlazy.c:L1022
switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
{
case HEAPTUPLE_DEAD:

/*
 * Ordinarily, DEAD tuples would have been removed by
 * heap_page_prune(), but it's possible that the tuple
 * state changed since heap_page_prune() looked.  In
 * particular an INSERT_IN_PROGRESS tuple could have
 * changed to DEAD if the inserter aborted.  So this
 * cannot be considered an error condition.
 *
 (snip)
 */
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
nkeep += 1;
else
tupgone = true; /* we can delete the tuple */
all_visible = false;
break;

As the above comment says, it's possible that the state of an
INSERT_IN_PROGRESS tuple could be changed to 'dead' after
heap_page_prune(). Since such tuple is not truncated at this point we
record it and set it as UNUSED in lazy_vacuum_page(). I think that the
DISABLE_INDEX_CLEANUP case is the same; we need to process them after
recorded. Am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Log a sample of transactions

2019-01-16 Thread Adrien NAYRAT


On 1/16/19 1:40 AM, Kuroda, Hayato wrote:

By the way, I think the name of this parameter should be changed.
In the Cambridge dictionary, the word "rate" means as follows:

the speed at which something happens or changes,
or the amount or number of times it happens or changes in a particular period.

This parameter represents "probability" whether it log,
therefore this name is inappropriate.
You should use "probability" or "ratio", I think.


Hum, I am not an english native speaker, I choosed "rate" because it is 
already used for auto_explain.sample_rate and for log_statement_sample_rate





Next week I will give you some comments about your good source.


Thanks!

Re: FETCH FIRST clause WITH TIES option

2019-01-16 Thread Surafel Temesgen

On Tue, Jan 15, 2019 at 2:51 PM Tomas Vondra 
wrote:


> What do you mean by "raw statistic"?
>

I mean without further calculation to consider other operation


>
> IMHO the best thing you can do is call estimate_num_groups() and combine
> that with the number of input rows. That shall benefit from ndistinct
> coefficients when available, etc. I've been thinking that considering
> the unreliability of grouping estimates we should use a multiple of the
> average size (because there may be much larger groups), but I think
> that's quite unprecipled and I'd much rather try without it first.
>
>
thank you for clarifying.

The attache patch use your suggestion uptread

regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..5e7790c0d6 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONL

Re: Log a sample of transactions

2019-01-16 Thread Masahiko Sawada

On Wed, Jan 16, 2019 at 2:03 AM Adrien NAYRAT
 wrote:
>
> On 1/15/19 11:42 AM, Masahiko Sawada wrote:
> >> When you troubleshoot applicative issues with multi-statements 
> >> transaction, you may have to log all queries to find all statements of one 
> >> transaction. With high throughput, it could be hard to log all queries 
> >> without causing troubles.
> > Hm, can we use log_min_duration_statement to find slow queries of a
> > transaction instead? Could you please elaborate on the use-case?
>
> Hello,
>
> The goal is not to find slow queries in a transaction, but troubleshoot
> applicative issue when you have short queries.
>
> Sometimes you want to understand what happens in a transaction, either
> you perfectly know your application, either you have to log all queries
> and find ones with the same transaction ID (%x). It could be problematic
> if you have a huge traffic with fast queries.
>

Thank you for the explain! I understood the use case of this patch.
This feature would be helpful for troubleshooting.

Since we set xact_is_sampled only when transaction starts and see it
during transaction we cannot disable logging during transaction and
vice versa. I can imagine the use case where user wants to disable
logging during transaction. So it might be better to also check if
log_xact_sample_rate > 0 in check_log_duration().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Python versions (was Re: RHEL 8.0 build)

2019-01-16 Thread Peter Eisentraut

On 07/01/2019 00:16, Tom Lane wrote:
> BTW, this is a pre-existing problem not the fault of this patch,
> but while we're fooling with the behavior of python lookup would
> be a great time to fix it: we should add something like
> 
> AC_ARG_VAR([PYTHON], [path to Python executable])
> 
> Aside from improving the "configure --help" documentation, this
> will prevent insane results from using inapplicable cached
> conclusions about what Python we've got.  Since more and more
> people seem to be using configure cache files, I think we need
> to be more careful about declaring relevant variables.

Patch attached.

I went through the environment variables listed in installation.sgml and
came up with the following that would affect subsequent cacheable tests:

MSGFMT
PERL
PYTHON
TCLSH

The following are listed but don't affect any other tests, so I didn't
include them:

BISON
DTRACE
DTRACEFLAGS
FLEX
XML2_CONFIG

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 9ba9c7f7e56714535290776cb622a610bc599d8d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 16 Jan 2019 10:05:42 +0100
Subject: [PATCH] configure: More use of AC_ARG_VAR

AC_ARG_VAR is necessary if an environment variable influences a
configure result that is then used by other tests that are cached.
With AC_ARG_VAR, a change in the variable is detected on subsequent
configure runs and the user is then advised to remove the cache.

This adds AC_ARG_VAR calls for: MSGFMT, PERL, PYTHON, TCLSH
---
 config/perl.m4 |  1 +
 config/programs.m4 |  1 +
 config/python.m4   |  1 +
 config/tcl.m4  |  1 +
 configure  | 12 ++--
 5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/config/perl.m4 b/config/perl.m4
index caefb0705e..059e31c476 100644
--- a/config/perl.m4
+++ b/config/perl.m4
@@ -5,6 +5,7 @@
 # --
 AC_DEFUN([PGAC_PATH_PERL],
 [PGAC_PATH_PROGS(PERL, perl)
+AC_ARG_VAR(PERL, [Perl program])dnl
 
 if test "$PERL"; then
   pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z 
]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']`
diff --git a/config/programs.m4 b/config/programs.m4
index aa84bfdb9e..21888cb68f 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -245,6 +245,7 @@ AC_DEFUN([PGAC_CHECK_GETTEXT],
   AC_CHECK_HEADER([libintl.h], [],
   [AC_MSG_ERROR([header file  is required for 
NLS])])
   PGAC_PATH_PROGS(MSGFMT, msgfmt)
+  AC_ARG_VAR(MSGFMT, [msgfmt program for NLS])dnl
   if test -z "$MSGFMT"; then
 AC_MSG_ERROR([msgfmt is required for NLS])
   fi
diff --git a/config/python.m4 b/config/python.m4
index 9a4d12112e..c51aa4e332 100644
--- a/config/python.m4
+++ b/config/python.m4
@@ -17,6 +17,7 @@
 # newer version.
 AC_DEFUN([PGAC_PATH_PYTHON],
 [PGAC_PATH_PROGS(PYTHON, [python python3 python2])
+AC_ARG_VAR(PYTHON, [Python program])dnl
 if test x"$PYTHON" = x""; then
   AC_MSG_ERROR([Python not found])
 fi
diff --git a/config/tcl.m4 b/config/tcl.m4
index 581471f338..9de31a5715 100644
--- a/config/tcl.m4
+++ b/config/tcl.m4
@@ -5,6 +5,7 @@
 
 AC_DEFUN([PGAC_PATH_TCLSH],
 [PGAC_PATH_PROGS(TCLSH, [tclsh tcl tclsh8.6 tclsh86 tclsh8.5 tclsh85 tclsh8.4 
tclsh84])
+AC_ARG_VAR(TCLSH, [Tcl interpreter program (tclsh)])dnl
 if test x"$TCLSH" = x""; then
   AC_MSG_ERROR([Tcl shell not found])
 fi
diff --git a/configure b/configure
index 06fc3c6835..f3639c1363 100755
--- a/configure
+++ b/configure
@@ -888,7 +888,11 @@ PKG_CONFIG_LIBDIR
 ICU_CFLAGS
 ICU_LIBS
 LDFLAGS_EX
-LDFLAGS_SL'
+LDFLAGS_SL
+PERL
+PYTHON
+MSGFMT
+TCLSH'
 
 
 # Initialize some variables set by options.
@@ -1588,6 +1592,10 @@ Some influential environment variables:
   ICU_LIBSlinker flags for ICU, overriding pkg-config
   LDFLAGS_EX  extra linker flags for linking executables only
   LDFLAGS_SL  extra linker flags for linking shared libraries only
+  PERLPerl program
+  PYTHON  Python program
+  MSGFMT  msgfmt program for NLS
+  TCLSH   Tcl interpreter program (tclsh)
 
 Use these variables to override the choices made by `configure' or to help
 it to find libraries and programs with nonstandard names/locations.
@@ -18097,7 +18105,7 @@ $as_echo_n "checking for MSGFMT... " >&6; }
 $as_echo "$MSGFMT" >&6; }
 fi
 
-  if test -z "$MSGFMT"; then
+if test -z "$MSGFMT"; then
 as_fn_error $? "msgfmt is required for NLS" "$LINENO" 5
   fi
   { $as_echo "$as_me:${as_lineno-$LINENO}: checking for msgfmt flags" >&5
-- 
2.20.1

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-16 Thread Etsuro Fujita

(2019/01/16 15:21), Ashutosh Bapat wrote:

On Wed, Jan 16, 2019 at 11:22 AM Etsuro Fujita
mailto:fujita.ets...@lab.ntt.co.jp>> wrote:
(2019/01/15 13:29), Ashutosh Bapat wrote:
 > I think, there's something better possible. Two partitioned relations
 > won't use partition-wise join, if their partition schemes do not
match.
 > Partitioned relations with same partitioning scheme share
 > PartitionScheme pointer. PartitionScheme structure should get an
extra
 > counter, maintaining a count of number of partitioned relations
sharing
 > that structure. When this counter is 1, that relation is
certainly not
 > going to participate in PWJ and thus need not have all the structure
 > required by PWJ set up. If we use this counter coupled with
 > enable_partitionwise_join flag, we can get rid of
 > consider_partitionwise_join flag altogether, I think.

Interesting!

That flag was introduced to disable PWJs when whole-row Vars are
involved, as you know, so I think we need to first eliminate that
limitation, to remove that flag.

For that we don't need a separate flag. Do we? AFAIR, somewhere under
try_partitionwise_join() we check whether PWJ is possible between two
relations. That involves a bunch of checks like checking whether the
relations have same bounds. Those checks should be enhanced to
incorporate existence of whole-var, I think.

Yeah, that check is actually done in build_joinrel_partition_info(), 
which is called from build_join_rel() and build_child_join_rel() (only 
the latter is called from try_partitionwise_join()).

That flag is used in build_joinrel_partition_info() for that check, but 
as you mentioned, I think it would be possible to remove that flag, 
probably by checking the WRV existence from the outer_rel/inner_rel's 
reltarget, instead of that flag.  But I'm not sure we can do that 
efficiently without complicating the existing code including the 
original PWJ one.  That flag doesn't make that code complicated.  I 
thought it would be better to not complicate that code, because 
disabling such PWJs would be something temporary until we support them.

Anyway, I think this would be a separate issue from the original one we 
discussed on this thread.

Best regards,
Etsuro Fujita

Re: de-deduplicate code in DML execution hooks in postgres_fdw

2019-01-16 Thread Etsuro Fujita


Michael-san,

(2019/01/16 15:54), Michael Paquier wrote:

On Wed, Jan 16, 2019 at 02:59:15PM +0900, Etsuro Fujita wrote:

(2019/01/07 20:26), Etsuro Fujita wrote:

On second thought I'd like to propose another option:
execute_foreign_modify because I think this would match the existing
helper functions for updating foreign tables in postgres_fdw.c better,
such as create_foreign_modify, prepare_foreign_modify and
finish_foreign_modify. I don't really think the function name should
contain "one" or "single_row". Like the names of the calling APIs (ie,
ExecForeignInsert, ExecForeignUpdate and ExecForeignDelete), I think
it's OK to omit such words from the function name. Here is an updated
version of the patch. In addition to the naming, I tweaked the function
a little bit to match other functions (mainly variable names), moved it
to the place where the helper functions are defined, fiddled with some
comments, and removed an extra include file that the original patch added.


If there are no objections, I'll commit that version of the patch.


I think that you could use PgFdwModifyState for the second argument of
execute_foreign_modify instead of ResultRelInfo.


Yeah, that is another option, but my favorite would be to use 
ResultRelInfo, as in the original patch by Ashutosh, because that makes 
it possible to write postgresExecForeignInsert, 
postgresExecForeignUpdate, and postgresExecForeignDelete as a single line.



Except of that nit,
it looks fine to me, thanks for taking care of it.


Great!  Thanks for the review!

Best regards,
Etsuro Fujita

Redundant filter in index scan with a bool column

2019-01-16 Thread Alexander Kuzmenkov


Hi hackers,

Consider this query plan:

create table t (i int, b bool);
create index on t(i, b);
set enable_bitmapscan to off;
explain select * from t where i = 300 and b;

   QUERY PLAN
-
 Index Only Scan using t_i_b_idx on t  (cost=0.15..24.27 rows=6 width=5)
   Index Cond: ((i = 300) AND (b = true))
   Filter: b


The filter is not needed, why is it there? Turns out we can't recognize 
that the restriction clause 'b' and the index clause 'b = true' are 
equivalent. My first reaction was to patch operator_predicate_proof to 
handle this case, but there is a more straightforward way: mark the 
expanded index clause as potentially redundant when it is generated in 
expand_indexqual_conditions. There is already RestrictInfo.parent_ec 
that is used to mark redundant EC-derived join clauses. The patch 
renames it to rinfo_parent and uses it to mark the expanded index 
clauses. Namely, for both the expanded and the original clause, 
rinfo_parent points to the original clause or its generating EC, if set. 
The name is no good -- the original clause is not a parent of itself, 
after all. I considered something like redundancy_tag, but some places 
actually use the fact that it points to the generating EC, so I don't 
like this name either.


What do you think?

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 144dd4a72fd8278aa6d44865d97239b9fad6965d Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Wed, 16 Jan 2019 14:31:26 +0300
Subject: [PATCH] Recognize that an expanded bool index clause is equivalent to
 the original one

---
 src/backend/nodes/copyfuncs.c  |  2 +-
 src/backend/nodes/outfuncs.c   |  2 +-
 src/backend/optimizer/path/costsize.c  |  8 ++---
 src/backend/optimizer/path/equivclass.c| 28 -
 src/backend/optimizer/path/indxpath.c  | 22 ++
 src/backend/optimizer/plan/createplan.c| 48 +++---
 src/backend/optimizer/util/restrictinfo.c  |  2 +-
 src/include/nodes/relation.h   | 10 +--
 src/test/regress/expected/create_index.out |  9 ++
 9 files changed, 71 insertions(+), 60 deletions(-)

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 006a3d1..9acb081 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2243,7 +2243,7 @@ _copyRestrictInfo(const RestrictInfo *from)
 	COPY_BITMAPSET_FIELD(right_relids);
 	COPY_NODE_FIELD(orclause);
 	/* EquivalenceClasses are never copied, so shallow-copy the pointers */
-	COPY_SCALAR_FIELD(parent_ec);
+	COPY_SCALAR_FIELD(rinfo_parent);
 	COPY_SCALAR_FIELD(eval_cost);
 	COPY_SCALAR_FIELD(norm_selec);
 	COPY_SCALAR_FIELD(outer_selec);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0fde876..980e9c6 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2436,7 +2436,7 @@ _outRestrictInfo(StringInfo str, const RestrictInfo *node)
 	WRITE_BITMAPSET_FIELD(left_relids);
 	WRITE_BITMAPSET_FIELD(right_relids);
 	WRITE_NODE_FIELD(orclause);
-	/* don't write parent_ec, leads to infinite recursion in plan tree dump */
+	/* don't write rinfo_parent, leads to infinite recursion in plan tree dump */
 	WRITE_FLOAT_FIELD(norm_selec, "%.4f");
 	WRITE_FLOAT_FIELD(outer_selec, "%.4f");
 	WRITE_NODE_FIELD(mergeopfamilies);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 99c5ad9..f350919 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4673,7 +4673,7 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 			/* Drop this clause if it matches any column of the FK */
 			for (i = 0; i < fkinfo->nkeys; i++)
 			{
-if (rinfo->parent_ec)
+if (rinfo->rinfo_parent)
 {
 	/*
 	 * EC-derived clauses can only match by EC.  It is okay to
@@ -4683,12 +4683,12 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
 	 * have generated one equating the FK's Vars.  So for
 	 * purposes of estimation, we can act as though it did so.
 	 *
-	 * Note: checking parent_ec is a bit of a cheat because
-	 * there are EC-derived clauses that don't have parent_ec
+	 * Note: checking parent EC is a bit of a cheat because
+	 * there are EC-derived clauses that don't have parent EC
 	 * set; but such clauses must compare expressions that
 	 * aren't just Vars, so they cannot match the FK anyway.
 	 */
-	if (fkinfo->eclass[i] == rinfo->parent_ec)
+	if ((Node *) fkinfo->eclass[i] == rinfo->rinfo_parent)
 	{
 		remove_it = true;
 		break;
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 6e134ae..b18d8b6 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1

Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-16 Thread Arthur Zakirov

Hello Tomas,

On 16.01.2019 03:23, Tomas Vondra wrote:

I've looked at the patch today, and in general is seems quite solid to
me. I do have a couple of minor points

1) I think the comments need more work. Instead of describing all the
individual changes here, I've outlined those improvements in attached
patches (see the attached "tweaks" patches). Some of it is formatting,
minor rewording or larger changes. Some comments are rather redundant
(e.g. the one before calls to release the DSM segment).

Thank you!

2) It's not quite clear to me why we need DictInitData, which simply
combines DictPointerData and list of options. It seems as if the only
point is to pass a single parameter to the init function, but is it
worth it? Why not to get rid of DictInitData entirely and pass two
parameters instead?

In the first place init method had two parameters. But in the v7 patch I 
added DictInitData struct instead of two parameters (list of options and 
DictPointerData):

https://www.postgresql.org/message-id/20180319110648.GA32319%40zakirov.localdomain

I haven't way to replace template's init method from 
init_method(internal) to init_method(internal,internal) in the upgrade 
script of extensions. If I'm not mistaken we need new syntax here, like 
ALTER TEXT SEARCH TEMPLATE. Thoughts?

3) I find it a bit cumbersome that before each ts_dict_shmem_release
call we construct a dummy DickPointerData value. Why not to pass
individual parameters and construct the struct in the function?

Agree, it may look too verbose. I'll change it.

4) The reference to max_shared_dictionaries_size is obsolete, because
there's no such limit anymore.

Yeah, I'll fix it.

> /* XXX not really a pointer, so the name is misleading */

I think we don't need DictPointerData struct anymore, because only 
ts_dict_shmem_release function needs it (see comments above) and we only 
need it to hash search. I'll move all fields of DictPointerData to 
TsearchDictKey struct.

> XXX "supported" is not the same as "all ispell dicts behave like that".

I'll reword the sentence.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

ArchiveEntry optional arguments refactoring

2019-01-16 Thread Dmitry Dolgov

Hi,

During the discussion in [1] an idea about refactoring ArchiveEntry was
suggested. The reason is that currently this function has significant number of
arguments that are "optional", and every change that has to deal with it
introduces a lot of useless diffs. In the thread, mentioned above, such an
example is tracking current table access method, and I guess "Remove WITH OIDS"
commit 578b229718e is also similar.

Proposed idea is to refactor out all/optional arguments into a separate data
structure, so that adding/removing a new argument wouldn't change that much of
unrelated code. Then for every invocation of ArchiveEntry this structure needs
to be prepared before the call, or as Andres suggested:

ArchiveEntry((ArchiveArgs){.tablespace = 3,
   .dumpFn = somefunc,
   ...});

Another suggestion from Amit is to have an ArchiveEntry() function with limited
number of parameters, and an ArchiveEntryEx() with those extra parameters which
are not needed in usual cases.

I want to prepare a patch for that, and I'm inclined to go with the first
option, but since there are two solutions to choose from, I would love to hear
more opinion about this topic. Any pros/cons we don't see yet?

[1]: 
https://www.postgresql.org/message-id/flat/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de

Re: ArchiveEntry optional arguments refactoring

2019-01-16 Thread Dmitry Dolgov

> On Wed, Jan 16, 2019 at 1:16 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Hi,
>
> During the discussion in [1] an idea about refactoring ArchiveEntry was
> suggested. The reason is that currently this function has significant number 
> of
> arguments that are "optional", and every change that has to deal with it
> introduces a lot of useless diffs. In the thread, mentioned above, such an
> example is tracking current table access method, and I guess "Remove WITH 
> OIDS"
> commit 578b229718e is also similar.
>
> Proposed idea is to refactor out all/optional arguments into a separate data
> structure, so that adding/removing a new argument wouldn't change that much of
> unrelated code. Then for every invocation of ArchiveEntry this structure needs
> to be prepared before the call, or as Andres suggested:
>
> ArchiveEntry((ArchiveArgs){.tablespace = 3,
>.dumpFn = somefunc,
>...});
>
> Another suggestion from Amit is to have an ArchiveEntry() function with 
> limited
> number of parameters, and an ArchiveEntryEx() with those extra parameters 
> which
> are not needed in usual cases.
>
> I want to prepare a patch for that, and I'm inclined to go with the first
> option, but since there are two solutions to choose from, I would love to hear
> more opinion about this topic. Any pros/cons we don't see yet?
>
> [1]: 
> https://www.postgresql.org/message-id/flat/20180703070645.wchpu5muyto5n647%40alap3.anarazel.de

[CC Andres and Amit]

jsonb_set for an array with an index outside of boundaries

2019-01-16 Thread Dmitry Dolgov

Hi,

Per discussion in [1] about generic type subscripting pathc Pavel had
interesting commentary. So far jsonb_set, if is invoked for a jsonb array with
an index outside of the array boundaries, will implicitely add a value:

=# insert into table values('["a", "b", "c"]');
=# update table set data = jsonb_set(data, '{1000}', '"d"');
=# table test;
=# table test;
data
--
["a", "b", "c", "d"]

This is perfectly documented feature, there are no questions here. But for
generic type subscripting infrastructure I'm introducing another, more
convenient, syntax:

=# update table test set data['selector'] = 'value';

Since the idea behind generic type subsripting patch is just to introduce
extendable subscripting operation for different data types, here I'm relying on
already existing functionality for jsonb. But then this feature of jsonb_set
indeed became more confusing with the new syntax.

=# update table test set data[1000] = 'd';
=# table test;
data
--
["a", "b", "c", "d"]

Unfortunately, the only alternative option here would be to return an error and
reject such a value, which differs from jsonb_set. I would like to ask , what
would be the best solution here - to keep this confusing behaviour, or to have
two different implementation of updating jsonb functionality (one for
jsonb_set, another for subscripting)?

[1]: 
https://www.postgresql.org/message-id/CA%2Bq6zcXmwR9BDrcf188Mcz5%2BjU8DaqrrOat2mzizKf-nYgDXkg%40mail.gmail.com

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley

On Tue, 15 Jan 2019 at 08:21, Tomas Vondra  wrote:
> Turns out you were right - the attribute_referenced piece was quite
> unnecessary. So I've removed it. I've also extended the regression tests
> to verify changing type of another column does not reset the stats.

(Trying to find my feet over here)

I've read over the entire thread, and apart from missing the last two
emails and therefore the latest patch, I managed to read over most of
the MCV patch. I didn't quite get to reading mcv.c and don't quite
have the energy to take that on now.

At this stage I'm trying to get to know the patch.  I read a lot of
discussing between you and Dean ironing out how the stats should be
used to form selectivities.  At the time I'd not read the patch yet,
so most of it went over my head.

I did note down a few things on my read.  I've included them below.
Hopefully, they're useful.

MCV list review

1. In mvc.c there's Assert(ndistinct <= UINT16_MAX); This should be
PG_UINT16_MAX

2. math.h should be included just after postgres.h

3. Copyright is still -2017 in mcv.c.  Hopefully, if you change it to
2019, you'll never have to bump it ever again! :-)

4. Looking at pg_stats_ext_mcvlist_items() I see you've coded the
string building manually. The way it's coded I'm finding a little
strange.  It means the copying becomes quadratic due to

snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
strncpy(values[1], buff, 1023);

So basically, generally, here you're building a new string with
values[1] followed by a comma, then followed by valout. One the next
line you then copy that new buffer back into values[1].  I understand
this part is likely not performance critical, but I see no reason to
write the code this way.

Are you limiting the strings to 1024 bytes on purpose?  I don't see
any comment mentioning you want to truncate strings.

Would it not be better to do this part using a
AppendStringInfoString()? and just manually add a '{', ',' or '}' as
and when required?

DatumGetPointer(valout) should really be using DatumGetCString(valout).

Likely you can also use heap_form_tuple.  This will save you having to
convert ints into strings then only to have BuildTupleFromCStrings()
do the reverse.

5. individiaul -> individual
lists.  This allows very accurate estimates for individiaul columns, but

litst -> lists

litst on combinations of columns.  Similarly to functional dependencies

6. Worth mentioning planning cycles too?

 "It's advisable to create MCV statistics objects only
 on combinations of columns that are actually used in conditions together,
 and for which misestimation of the number of groups is resulting in bad
 plans.  Otherwise, the ANALYZE cycles are just wasted."

7. straight-forward -> straightforward

(most-common values) lists, a straight-forward extension of the per-column

8. adresses -> addresses

statistics adresses the limitation by storing individual values, but it

9. Worth mentioning ANALYZE time?

This section introduces multivariate variant of MCV
(most-common values) lists, a straight-forward extension of the per-column
statistics described in . This
statistics adresses the limitation by storing individual values, but it
is naturally more expensive, both in terms of storage and planning time.

10. low -> a low

with low number of distinct values. Before looking at the second query,

11. them -> then

on items in the MCV list, and them sums the frequencies

12.  Should we be referencing the source from the docs?

See mcv_clauselist_selectivity
in src/backend/statistics/mcv.c for details.

hmm. I see it's not the first going by: git grep -E "\w+\.c\<"

13. Pretty minor, but the following loop in
UpdateStatisticsForTypeChange() could use a break;

attribute_referenced = false;
for (i = 0; i < staForm->stxkeys.dim1; i++)
if (attnum == staForm->stxkeys.values[i])
attribute_referenced = true;

UPDATE: If I'd reviewed the correct patch I'd have seen that you'd
removed this already

14. Again in UpdateStatisticsForTypeChange(), would it not be better
to do the statext_is_kind_built(oldtup, STATS_EXT_MCV) check before
checking if the stats contain this column?  This gets rid of your
reset_stats variable.

I also don't quite understand why there's an additional check for
statext_is_kind_built(oldtup, STATS_EXT_MCV), which if that's false
then why do we do the dummy update on the tuple?

Have you just coded this so that you can support other stats types
later without too much modification? If so, I'm finding it a bit
confusing to read, so maybe it's worth only coding it that way if
there's more than one stats type to reset for.

UPDATE: If I'd reviewed the correct patch I'd have seen that you'd
removed this already

15. I see you broke out the remainder of the code from
clauselist_selectivity() into clauselist_selectivity_simple().  The
comment looks like just a copy and paste from the original.  That
seems like quite a bit of

Re: [HACKERS] generated columns

2019-01-16 Thread Peter Eisentraut

On 15/01/2019 08:13, Michael Paquier wrote:
> When testing a bulk INSERT into a table which has a stored generated
> column, memory keeps growing in size linearly, which does not seem
> normal to me.  If inserting more tuples than what I tested (I stopped
> at 10M because of lack of time), it seems to me that this could result
> in OOMs.  I would have expected the memory usage to be steady.

What are you executing exactly?  One INSERT command with many rows?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: insensitive collations

2019-01-16 Thread Daniel Verite

Peter Eisentraut wrote:

> > On a table with pre-existing contents, the creation of a unique index
> > does not seem to detect the duplicates that are equal per the
> > collation and different binary-wise.
> 
> Fixed in the attached updated patch.

Check. I've found another issue with aggregates over distinct:
the deduplication seems to ignore the collation.

postgres=# select distinct x from test3ci;   -- OK
  x  
-
 def
 abc
 ghi
(3 rows)

postgres=# select count(distinct x) from test3ci;  -- not OK
 count 
---
 4
(1 row)

postgres=# select array_agg(distinct x) from test3ci;  -- not OK
 array_agg 
---
 {ABC,abc,def,ghi}
(1 row)


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Amit Kapila

On Fri, Jan 11, 2019 at 3:54 AM John Naylor  wrote:
>
> On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila  wrote:
> > Thanks, Mithun for performance testing, it really helps us to choose
> > the right strategy here.  Once John provides next version, it would be
> > good to see the results of regular pgbench (read-write) runs (say at
> > 50 and 300 scale factor) and the results of large copy.  I don't think
> > there will be any problem, but we should just double check that.
>
> Attached is v12 using the alternating-page strategy. I've updated the
> comments and README as needed. In addition, I've
>

Few comments:
---
1.
Commit message:
> Any pages with wasted free space become visible at next relation extension, 
> so we still control table bloat.

I think the free space will be available after the next pass of
vacuum, no?  How can relation extension make it available?

2.
+2. For very small heap relations, the FSM would be relatively large and
+wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for
+heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space
+and to improve performance.  To locate free space in this case, we simply
+iterate over the heap, trying alternating pages in turn.  There may be some
+wasted free space in this case, but it becomes visible again upon next
+relation extension.

a. Again, how space becomes available at next relation extension.
b. I think there is no use of mentioning the version number in the
above comment, this code will be present from PG-12, so one can find
out from which version this optimization is added.

3.
BlockNumber
 RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
Size oldSpaceAvail, Size spaceNeeded)
{
..
+ /* First try the local map, if it exists. */
+ if (oldPage < fsm_local_map.nblocks)
+ {
..
}

The comment doesn't appear to be completely in sync with the code.
Can't we just check whether "fsm_local_map.nblocks >  0", if so, we
can use a macro for the same? I have changed this in the attached
patch, see what you think about it.  I have used it at a few other
places as well.

4.
+ * When we initialize the map, the whole heap is potentially available to
+ * try.  If a caller wanted to reset the map after another backend extends
+ * the relation, this will only flag new blocks as available.  No callers
+ * do this currently, however.
+ */
+static void
+fsm_local_set(Relation rel, BlockNumber curr_nblocks)
{
..
+ if (blkno >= fsm_local_map.nblocks + 2)
..
}

The way you have tried to support the case as quoted in the comment
"If a caller wanted to reset the map after another backend extends .."
doesn't appear to be solid and I am not sure if it is correct either.
We don't have any way to test the same, so I suggest let's try to
simplify the case w.r.t current requirement of this API.  I think we
should
some simple logic to try every other block like:

+ blkno = cur_nblocks - 1;
+ while (true)
+ {
+ fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
+ if (blkno >= 2)
+ blkno -= 2;
+ else
+ break;
+ }

I have changed this in the attached patch.

5.
+/*
+ * Search the local map for an available block to try, in descending order.
+ *
+ * For use when there is no FSM.
+ */
+static BlockNumber
+fsm_local_search(void)

We should give a brief explanation as to why we try in descending
order.  I have added some explanation in the attached patch, see what
you think about it?

Apart from the above, I have modified a few comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v13-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Amit Kapila

On Wed, Jan 16, 2019 at 9:25 AM Mithun Cy  wrote:
>
> On Fri, Jan 11, 2019 at 3:54 AM John Naylor  
> wrote:
> >
> > On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila  wrote:
> > > Thanks, Mithun for performance testing, it really helps us to choose
> > > the right strategy here.  Once John provides next version, it would be
> > > good to see the results of regular pgbench (read-write) runs (say at
> > > 50 and 300 scale factor) and the results of large copy.  I don't think
> > > there will be any problem, but we should just double check that.
> >
> > Attached is v12 using the alternating-page strategy. I've updated the
> > comments and README as needed. In addition, I've
>
>
> execution time in ms. (scale factor indicates size of pgbench_accounts)
> scale factor   v12-patchbase patch   % diff
> 300   77166.407   77862.041 -0.8934186557
> 50 13329.233  13284.583   0.3361038882
>
> So for large table tests do not show any considerable performance variance 
> from base code!
>

I think with these results, we can conclude this patch doesn't seem to
have any noticeable regression for all the tests we have done, right?
Thanks a lot for doing various performance tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

PostgreSQL-XL shared disk development question

2019-01-16 Thread M . Atıf CEYLAN

Hello,

I already sent the following message to postgres-xl-developers list but
they don't seem so active. Therefore, I hope you accept this kind of
messages to this mail list.

I'm planning to run PostgreSQL-XL data nodes on CephFS as shared folder. My
goal is to provide scalability (auto/manual) in terms of CPU and RAM for
those who have large amount of data and operate on clouds. When data gets
bigger it's hard to make copy of every node instantly we increase number of
nodes. That's why I want CEPH to do disk management, distribution of data
(disk) and replication of data (disk). I thought of adding SHARED mode
alongside with REPLICADTED and DISTRIBUTED modes since PostgreSQL-XL does
global transaction and session management.

What do you think of this?

What do you think I should be careful about this development if you think
it's possible?

Bests.

-- 
Atıf Ceylan
CTO
AppstoniA OÜ

draft patch for strtof()

2019-01-16 Thread Andrew Gierth

As discussed in the Ryu thread, herewith a draft of a patch to use
strtof() for float4 input (rather than using strtod() with its
double-rounding issue).

An exhaustive search shows that this does not change the resulting
bit-pattern for any input string that could have been generated by PG
with extra_float_digits=3 set. The risk is that values generated by
other software, especially code that uses shortest-exact float output
(as a number of languages seem to do, and which PG will do if the Ryu
patch goes in) will be incorrectly input; though it appears that only
one value (7.038531e-26) is both a possible shortest-exact
representation and a rounding error (though a number of other values
round incorrectly, they are not shortest representations).

This includes a fallback to use strtod() the old way if the platform
lacks strtof(). A variant file for the new regression tests is needed
for such platforms; I've taken a stab at setting this up for the one
platform we know will need it (if there are others, the buildfarm will
let us know in due course).

-- 
Andrew (irc:RhodiumToad)

diff --git a/configure b/configure
index 06fc3c6835..e3176e24e9 100755
--- a/configure
+++ b/configure
@@ -15802,6 +15802,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof"
+if test "x$ac_cv_func_strtof" = xyes; then :
+  $as_echo "#define HAVE_STRTOF 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" strtof.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
+ ;;
+esac
+
+fi
+
 
 
 case $host_os in
diff --git a/configure.in b/configure.in
index 4efb912c4d..bdaab717d7 100644
--- a/configure.in
+++ b/configure.in
@@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strlcat
 	strlcpy
 	strnlen
+	strtof
 ]))
 
 case $host_os in
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 117ded8d1d..dce6ea31cf 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -104,13 +104,39 @@ is_infinite(double val)
 
 /*
  *		float4in		- converts "num" to float4
+ *
+ * Note that this code now uses strtof(), where it used to use strtod().
+ *
+ * The motivation for using strtof() is to avoid a double-rounding problem:
+ * for certain decimal inputs, if you round the input correctly to a double,
+ * and then round the double to a float, the result is incorrect in that it
+ * does not match the result of rounding the decimal value to float directly.
+ *
+ * One of the best examples is 7.038531e-26:
+ *
+ * 0xAE43FDp-107 = 7.03853069185120912085...e-26
+ *  midpoint   7.038531022281...e-26
+ * 0xAE43FEp-107 = 7.03853130814879132477...e-26
+ *
+ * making 0xAE43FDp-107 the correct float result, but if you do the conversion
+ * via a double, you get
+ *
+ * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26
+ *   midpoint   7.038530964884...e-26
+ * 0xAE43FD.8000p-107 = 7.038531022281...e-26
+ * 0xAE43FD.8008p-107 = 7.038531137076...e-26
+ *
+ * so the value rounds to the double exactly on the midpoint between the two
+ * nearest floats, and then rounding again to a float gives the incorrect
+ * result of 0xAE43FEp-107.
+ *
  */
 Datum
 float4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 	char	   *orig_num;
-	double		val;
+	float		val;
 	char	   *endptr;
 
 	/*
@@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS)
 		"real", orig_num)));
 
 	errno = 0;
-	val = strtod(num, &endptr);
+	val = strtof(num, &endptr);
 
 	/* did we not see anything that looks like a double? */
 	if (endptr == num || errno != 0)
@@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS)
 		int			save_errno = errno;
 
 		/*
-		 * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf,
+		 * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf,
 		 * but not all platforms support all of these (and some accept them
 		 * but set ERANGE anyway...)  Therefore, we check for these inputs
-		 * ourselves if strtod() fails.
+		 * ourselves if strtof() fails.
 		 *
 		 * Note: C99 also requires hexadecimal input as well as some extended
 		 * forms of NaN, but we consider these forms unportable and don't try
-		 * to support them.  You can use 'em if your strtod() takes 'em.
+		 * to support them.  You can use 'em if your strtof() takes 'em.
 		 */
 		if (pg_strncasecmp(num, "NaN", 3) == 0)
 		{
@@ -196,7 +222,7 @@ float4in(PG_FUNCTION_ARGS)
 			 * detect whether it's a "real" out-of-range condition by checking
 			 * to see if the result is zero or huge.
 			 */
-			if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL)
+			if (val == 0.0 || val >= HUGE_VALF || val <= -HUGE_VALF)
 ereport(ERROR,
 		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 		 errmsg("\"%s\" is out of range for type real",
@@ -232,13 +258,7 @@ float4in(PG_FUNCTION_ARGS)
  errmsg("invalid input syntax for type %s: \"%s\"",
 		"real", orig_num)));
 
-	/*
-	 * if we get here, we have a legal double, still need to check to

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-16 Thread James Coleman

On Tue, Jan 15, 2019 at 11:37 PM Tom Lane  wrote:
> Well, as I said upthread, it seems like we need to think a bit more
> carefully about what it is that clause_is_strict_for is testing ---
> and if that ends up finding that some other name is more apposite,
> I'd not have any hesitation about renaming it.  But we're really
> missing a bet if the ScalarArrayOp-specific check isn't inside that
> recursive check of an "atomic" expression's properties.  The
> routines above there only recurse through things that act like
> AND or OR, but clause_is_strict_for is capable of descending
> through other stuff as long as it's strict in some sense.  What
> we need to be clear about here is exactly what that sense is.

All right, I'll look over all of the other callers and see what makes
sense as far as naming (or perhaps consider a new parallel function;
unsure at this point.)

I might also try to see if we can edit the tests slightly to require
the recursion case to be exercised.

One other question on testing: do you think the "calculated array"
tests are good enough by themselves (i.e., remove the ones with array
constants of > 100 values)? I dislike that it's not as obvious what's
going on, but given that the code shouldn't care about array size
anyway...so if there's an inclination to fewer tests that's the first
place I'd look at cutting them.

James Coleman

Re: Redundant filter in index scan with a bool column

2019-01-16 Thread Tom Lane

Alexander Kuzmenkov  writes:
> The filter is not needed, why is it there? Turns out we can't recognize 
> that the restriction clause 'b' and the index clause 'b = true' are 
> equivalent.

Yeah, it's intentional that we don't get rid of the extra clause;
it doesn't really seem worth the expense and complexity to do so.
Indexes on bool columns are a pretty niche case in the first place.
Usually, if you are interested in just the rows where b = true,
you're better off using "where b" as an index predicate.  In your
example, we can do this instead:

regression=# create index on t(i) where b;
CREATE INDEX
regression=# explain select * from t where i = 300 and b;
QUERY PLAN
--
 Index Scan using t_i_idx on t  (cost=0.12..24.19 rows=6 width=5)
   Index Cond: (i = 300)
(2 rows)

resulting in a much smaller index, if the b=true condition is selective
enough to be worth indexing.  Even in the case you showed, how much is
the redundant filter clause really costing?

> My first reaction was to patch operator_predicate_proof to 
> handle this case, but there is a more straightforward way: mark the 
> expanded index clause as potentially redundant when it is generated in 
> expand_indexqual_conditions. There is already RestrictInfo.parent_ec 
> that is used to mark redundant EC-derived join clauses. The patch 
> renames it to rinfo_parent and uses it to mark the expanded index 
> clauses.

That's an unbelievable hack that almost certainly breaks existing uses.

The approach of teaching predtest.c that "b = true" implies "b" would
work, but it seems a bit brute-force because ordinarily such cases
would never be seen there, thanks to simplify_boolean_equality having
canonicalized the former into the latter.  The problem we have is that
indxpath.c re-generates "b = true" in indexscan conditions.  Thinking
about it now, I wonder if we could postpone that conversion till later,
say do it in create_indexscan_plan after having checked for redundant
clauses.  Not sure how messy that'd be.

regards, tom lane

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-16 Thread Shay Rojansky

>
> On Tue, Jan 15, 2019 at 10:17 AM Shay Rojansky  wrote:
> > Unfortunately I'm extremely tight for time at the moment and don't have
> time to do the appropriate hot standby setup to test this... As the patch
> is pretty straightforward, and since I'm hoping you guys execute the tests
> on your end, I hope that's acceptable. If it's absolutely necessary for me
> to test the patch locally, let me know and I'll do so.
>
> I would definitely prefer if you could run the tests. You might
> discover that your patch does not sufficiently address tests. Please
> do so and confirm here that it works for you and then I can do another
> review.
>

Thanks for insisting - I ended up setting up the environment and running
the tests, and discovering that some test-related changes were missing.
Here's a 3rd version of the patch. Hope this is now in good shape, let me
know if you think anything else needs to be done.
From c5497cc4f7fbad4eafecfa72bc99dfebacd0f9bd Mon Sep 17 00:00:00 2001
From: Shay Rojansky 
Date: Tue, 15 Jan 2019 18:49:40 +0100
Subject: [PATCH] Allow UNLISTEN during recovery

---
 doc/src/sgml/high-availability.sgml  | 16 ++--
 src/backend/tcop/utility.c   |  2 +-
 src/test/regress/expected/hs_standby_allowed.out |  3 +++
 .../regress/expected/hs_standby_disallowed.out   |  4 
 src/test/regress/sql/hs_standby_allowed.sql  |  4 
 src/test/regress/sql/hs_standby_disallowed.sql   |  2 --
 6 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 4882b20828..bb1c86f73e 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1767,6 +1767,11 @@ if (!triggered)
Plugins and extensions - LOAD
   
  
+ 
+  
+   UNLISTEN
+  
+ 
 

 
@@ -1856,18 +1861,17 @@ if (!triggered)
  
  
   
-   LISTEN, UNLISTEN, NOTIFY
+   LISTEN, NOTIFY
   
  
 

 

-In normal operation, read-only transactions are allowed to
-use LISTEN, UNLISTEN, and
-NOTIFY, so Hot Standby sessions operate under slightly tighter
-restrictions than ordinary read-only sessions.  It is possible that some
-of these restrictions might be loosened in a future release.
+In normal operation, read-only transactions are allowed to use
+LISTEN, and NOTIFY, so Hot Standby sessions
+operate under slightly tighter restrictions than ordinary read-only sessions.
+It is possible that some of these restrictions might be loosened in a future release.

 

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 27ae6be751..44136060d6 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -629,7 +629,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			{
 UnlistenStmt *stmt = (UnlistenStmt *) parsetree;
 
-PreventCommandDuringRecovery("UNLISTEN");
+/* allow UNLISTEN during recovery, which is a noop */
 CheckRestrictedOperation("UNLISTEN");
 if (stmt->conditionname)
 	Async_Unlisten(stmt->conditionname);
diff --git a/src/test/regress/expected/hs_standby_allowed.out b/src/test/regress/expected/hs_standby_allowed.out
index 526f88f2be..00b8faf9eb 100644
--- a/src/test/regress/expected/hs_standby_allowed.out
+++ b/src/test/regress/expected/hs_standby_allowed.out
@@ -208,6 +208,9 @@ LOCK hs1 IN ACCESS SHARE MODE;
 LOCK hs1 IN ROW SHARE MODE;
 LOCK hs1 IN ROW EXCLUSIVE MODE;
 COMMIT;
+-- UNLISTEN
+UNLISTEN a;
+UNLISTEN *;
 -- LOAD
 -- should work, easier if there is no test for that...
 -- ALLOWED COMMANDS
diff --git a/src/test/regress/expected/hs_standby_disallowed.out b/src/test/regress/expected/hs_standby_disallowed.out
index bc117413ff..dff0953e9a 100644
--- a/src/test/regress/expected/hs_standby_disallowed.out
+++ b/src/test/regress/expected/hs_standby_disallowed.out
@@ -118,10 +118,6 @@ listen a;
 ERROR:  cannot execute LISTEN during recovery
 notify a;
 ERROR:  cannot execute NOTIFY during recovery
-unlisten a;
-ERROR:  cannot execute UNLISTEN during recovery
-unlisten *;
-ERROR:  cannot execute UNLISTEN during recovery
 -- disallowed commands
 ANALYZE hs1;
 ERROR:  cannot execute ANALYZE during recovery
diff --git a/src/test/regress/sql/hs_standby_allowed.sql b/src/test/regress/sql/hs_standby_allowed.sql
index a33199dbbd..6debddc5e9 100644
--- a/src/test/regress/sql/hs_standby_allowed.sql
+++ b/src/test/regress/sql/hs_standby_allowed.sql
@@ -110,6 +110,10 @@ LOCK hs1 IN ROW SHARE MODE;
 LOCK hs1 IN ROW EXCLUSIVE MODE;
 COMMIT;
 
+-- UNLISTEN
+UNLISTEN a;
+UNLISTEN *;
+
 -- LOAD
 -- should work, easier if there is no test for that...
 
diff --git a/src/test/regress/sql/hs_standby_disallowed.sql b/src/test/regress/sql/hs_standby_disallowed.sql
index 21bbf526b7..a470600eec 100644
--- a/src/test/regress/sql/hs_standby_disallowed.sql
+++ b/src/test/regress/sql/hs_standby_disallowed.sql
@@ -88,8 +88,6 @@ COMMIT;

Re: draft patch for strtof()

2019-01-16 Thread Tom Lane

Andrew Gierth  writes:
> As discussed in the Ryu thread, herewith a draft of a patch to use
> strtof() for float4 input (rather than using strtod() with its
> double-rounding issue).

The errno handling in strtof seems bizarrely overcomplex; why do
you need the separate caller_errno variable?

> This includes a fallback to use strtod() the old way if the platform
> lacks strtof(). A variant file for the new regression tests is needed
> for such platforms; I've taken a stab at setting this up for the one
> platform we know will need it (if there are others, the buildfarm will
> let us know in due course).

I'm not that much on board with introducing test cases that we know,
beyond question, are going to be portability headaches.  What will
we actually gain with this, compared to just not having the test case?
I can't see that it's worth either the buildfarm cycles or the human
maintenance effort just to prove that, yes, some platforms have
portability corner cases.  I also don't like the prospect that we
ship releases that will fail basic regression tests on platforms
we haven't tested.  Coping with such failures is a large burden
for affected packagers or end users, especially when the only useful
"coping" mechanism is to ignore the regression test failure.  Might
as well not have it.

regards, tom lane

Re: draft patch for strtof()

2019-01-16 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> As discussed in the Ryu thread, herewith a draft of a patch to use
 >> strtof() for float4 input (rather than using strtod() with its
 >> double-rounding issue).

 Tom> The errno handling in strtof seems bizarrely overcomplex; why do
 Tom> you need the separate caller_errno variable?

Eh. I was preserving the conventional behaviour of setting errno only on
errors and not resetting it to 0, but I suppose you could argue that
that is overkill given that the function is called only in one place
that's supposed to have already set errno to 0.

(And yes, I missed a couple of files and the windows build breaks,
working on those)

 >> This includes a fallback to use strtod() the old way if the platform
 >> lacks strtof(). A variant file for the new regression tests is needed
 >> for such platforms; I've taken a stab at setting this up for the one
 >> platform we know will need it (if there are others, the buildfarm will
 >> let us know in due course).

 Tom> I'm not that much on board with introducing test cases that we
 Tom> know, beyond question, are going to be portability headaches. What
 Tom> will we actually gain with this, compared to just not having the
 Tom> test case?

The purpose of the test case is to ensure that we're getting the right
values on input. If these values fail on any platform that anyone
actually cares about, then I think we need to know about it.

-- 
Andrew (irc:RhodiumToad)

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-16 Thread Robert Haas

On Tue, Jan 15, 2019 at 8:48 PM Michael Paquier  wrote:
> > So the answer to your question is: the WAL receiver fails to start.
>
> Robert, does this answer your question?  I agree that depending on the
> timing, we could finish by having the startup process spawning a WAL
> receiver with a given connection string, and that the WAL receiver
> could use a different one, but as long as we fail the WAL receiver
> startup this does not seem like an issue to me, so I disagree with the
> upthread statement that the patch author has not thought about such
> cases :)

OK, if that was in the patch, I dunno why I didn't see it.  I really
didn't think it was there, but if I'm wrong, then I am.

> It seems to me that making the WAL receiver relying more on the GUC
> context makes it more robust when it comes to reloading the parameters
> which would be an upcoming change as we need to rely on the WAL
> receiver check the GUC context itself and FATAL (or we would need to
> have the startup process check for a change in
> primary_conninfo/primary_slot_name, and then have the startup process
> kill the WAL receiver by itself).  In short, handling all that in the
> WAL receiver would be more robust in the long term in my opinion as we
> reduce interactions between the WAL receiver and the startup process.
> And on top of it we get rid of ready_to_display, which is something I
> am unhappy with since we had to introduce it.

I think you have some valid points here, but I also think that the
patch as written doesn't really seem to have any user-visible
benefits.  Sure, ready_to_display is a crock, but getting rid of it
doesn't immediately help anybody.  And on the flip side your patch
might have bugs, in which case we'll be worse off.  I'm not going to
stand on my soapbox and say that committing this patch is a terrible
idea, because as far as I can tell, it isn't.  But it feels like it
might be a commit for the sake of making a commit, and I can't get too
excited about that either.  Since the logic will have to be further
revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait
until that's done and then commit all the changes together instead of
guessing that this might make that easier?

If this does get committed now, and count me as about -0.25 for that,
then I at least think it should have some comments clearly addressing
the synchronization issues.  Unless those are also in the patch and I
missed them, too, but I hope I'm not that dense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: draft patch for strtof()

2019-01-16 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> The errno handling in strtof seems bizarrely overcomplex; why do
>  Tom> you need the separate caller_errno variable?

> Eh. I was preserving the conventional behaviour of setting errno only on
> errors and not resetting it to 0,

Oh, I see.  -ENOCAFFEINE.

> but I suppose you could argue that
> that is overkill given that the function is called only in one place
> that's supposed to have already set errno to 0.

Well, we probably oughtta endeavor to maintain compatibility with
the function's standard behavior, because other calls to it are
likely to creep in over time.  Objection withdrawn.

>  Tom> I'm not that much on board with introducing test cases that we
>  Tom> know, beyond question, are going to be portability headaches. What
>  Tom> will we actually gain with this, compared to just not having the
>  Tom> test case?

> The purpose of the test case is to ensure that we're getting the right
> values on input. If these values fail on any platform that anyone
> actually cares about, then I think we need to know about it.

Meh.  I think the actual outcome will be that we define any platform
that gets the wrong answer as one that we don't care about, mainly
because we won't have any practical way to fix it.  That being the
situation, trying to maintain a test case seems like pointless
make-work.

(FWIW, I'm running the patch on gaur's host, just to confirm it
does what you expect.  Should have an answer in an hour or so ...)

regards, tom lane

Re: What to name the current heap after pluggable storage / what to rename?

2019-01-16 Thread Robert Haas

On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
 wrote:
> access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
> because nodes/relation.h is related to planner. utils/rel.h is some how
> related to relation caches.

Insofar as we can reasonably do so, I'd rather pick unique names for
header files.  I know that there's no law that rules out having both
nodes/relation.h and access/relation.h, or likewise utils/rel.h and
access/rel.h; the computer won't be confused.  But it might create
some confusion among human beings, so my vote is for avoiding that
sort of thing if we can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: draft patch for strtof()

2019-01-16 Thread Andrew Gierth

See if Windows likes this one any better.

-- 
Andrew (irc:RhodiumToad)

diff --git a/configure b/configure
index 06fc3c6835..e3176e24e9 100755
--- a/configure
+++ b/configure
@@ -15802,6 +15802,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof"
+if test "x$ac_cv_func_strtof" = xyes; then :
+  $as_echo "#define HAVE_STRTOF 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" strtof.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
+ ;;
+esac
+
+fi
+
 
 
 case $host_os in
diff --git a/configure.in b/configure.in
index 4efb912c4d..bdaab717d7 100644
--- a/configure.in
+++ b/configure.in
@@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strlcat
 	strlcpy
 	strnlen
+	strtof
 ]))
 
 case $host_os in
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 117ded8d1d..dce6ea31cf 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -104,13 +104,39 @@ is_infinite(double val)
 
 /*
  *		float4in		- converts "num" to float4
+ *
+ * Note that this code now uses strtof(), where it used to use strtod().
+ *
+ * The motivation for using strtof() is to avoid a double-rounding problem:
+ * for certain decimal inputs, if you round the input correctly to a double,
+ * and then round the double to a float, the result is incorrect in that it
+ * does not match the result of rounding the decimal value to float directly.
+ *
+ * One of the best examples is 7.038531e-26:
+ *
+ * 0xAE43FDp-107 = 7.03853069185120912085...e-26
+ *  midpoint   7.038531022281...e-26
+ * 0xAE43FEp-107 = 7.03853130814879132477...e-26
+ *
+ * making 0xAE43FDp-107 the correct float result, but if you do the conversion
+ * via a double, you get
+ *
+ * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26
+ *   midpoint   7.038530964884...e-26
+ * 0xAE43FD.8000p-107 = 7.038531022281...e-26
+ * 0xAE43FD.8008p-107 = 7.038531137076...e-26
+ *
+ * so the value rounds to the double exactly on the midpoint between the two
+ * nearest floats, and then rounding again to a float gives the incorrect
+ * result of 0xAE43FEp-107.
+ *
  */
 Datum
 float4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 	char	   *orig_num;
-	double		val;
+	float		val;
 	char	   *endptr;
 
 	/*
@@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS)
 		"real", orig_num)));
 
 	errno = 0;
-	val = strtod(num, &endptr);
+	val = strtof(num, &endptr);
 
 	/* did we not see anything that looks like a double? */
 	if (endptr == num || errno != 0)
@@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS)
 		int			save_errno = errno;
 
 		/*
-		 * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf,
+		 * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf,
 		 * but not all platforms support all of these (and some accept them
 		 * but set ERANGE anyway...)  Therefore, we check for these inputs
-		 * ourselves if strtod() fails.
+		 * ourselves if strtof() fails.
 		 *
 		 * Note: C99 also requires hexadecimal input as well as some extended
 		 * forms of NaN, but we consider these forms unportable and don't try
-		 * to support them.  You can use 'em if your strtod() takes 'em.
+		 * to support them.  You can use 'em if your strtof() takes 'em.
 		 */
 		if (pg_strncasecmp(num, "NaN", 3) == 0)
 		{
@@ -196,7 +222,7 @@ float4in(PG_FUNCTION_ARGS)
 			 * detect whether it's a "real" out-of-range condition by checking
 			 * to see if the result is zero or huge.
 			 */
-			if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL)
+			if (val == 0.0 || val >= HUGE_VALF || val <= -HUGE_VALF)
 ereport(ERROR,
 		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 		 errmsg("\"%s\" is out of range for type real",
@@ -232,13 +258,7 @@ float4in(PG_FUNCTION_ARGS)
  errmsg("invalid input syntax for type %s: \"%s\"",
 		"real", orig_num)));
 
-	/*
-	 * if we get here, we have a legal double, still need to check to see if
-	 * it's a legal float4
-	 */
-	check_float4_val((float4) val, isinf(val), val == 0);
-
-	PG_RETURN_FLOAT4((float4) val);
+	PG_RETURN_FLOAT4(val);
 }
 
 /*
diff --git a/src/include/port.h b/src/include/port.h
index a55c473262..a6950c1526 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -381,6 +381,10 @@ extern int	isinf(double x);
 #endif			/* __clang__ && !__cplusplus */
 #endif			/* !HAVE_ISINF */
 
+#ifndef HAVE_STRTOF
+extern float strtof(const char *nptr, char **endptr);
+#endif
+
 #ifndef HAVE_MKDTEMP
 extern char *mkdtemp(char *path);
 #endif
diff --git a/src/include/utils/float.h b/src/include/utils/float.h
index 0f82a25ede..5e11ab4aa9 100644
--- a/src/include/utils/float.h
+++ b/src/include/utils/float.h
@@ -17,6 +17,11 @@
 
 #include 
 
+/* this might not be needed */
+#ifndef HUGE_VALF
+#define HUGE_VALF ((float)HUGE_VAL)
+#endif
+
 #ifndef M_PI
 /* From my RH5.2 gcc math.h file - thomas 2000-04-03 */
 #define M_PI 3.14159265358979323846
diff --git a/src/port/strtof.c b

Re: What to name the current heap after pluggable storage / what to rename?

2019-01-16 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
>  wrote:
>> access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
>> because nodes/relation.h is related to planner. utils/rel.h is some how
>> related to relation caches.

> Insofar as we can reasonably do so, I'd rather pick unique names for
> header files.

+1

regards, tom lane

Re: draft patch for strtof()

2019-01-16 Thread Tom Lane

Andrew Gierth  writes:
> See if Windows likes this one any better.

Doubtful, because AFAICT it's the same patch.

regards, tom lane

Re: O_DIRECT for relations and SLRUs (Prototype)

2019-01-16 Thread Robert Haas

On Sat, Jan 12, 2019 at 4:36 PM Thomas Munro
 wrote:
> 1.  We need a new "bgreader" process to do read-ahead.  I think you'd
> want a way to tell it with explicit hints (for example, perhaps
> sequential scans would advertise that they're reading sequentially so
> that it starts to slurp future blocks into the buffer pool, and
> streaming replicas might look ahead in the WAL and tell it what's
> coming).  In theory this might be better than the heuristics OSes use
> to guess our access pattern and pre-fetch into the page cache, since
> we have better information (and of course we're skipping a buffer
> layer).

Right, like if we're reading the end of relation file 16384, we can
prefetch the beginning of 16384.1, but the OS won't know to do that.

> 2.  We need a new kind of bgwriter/syncer that aggressively creates
> clean pages so that foreground processes rarely have to evict (since
> that is now super slow), but also efficiently finds ranges of dirty
> blocks that it can write in big sequential chunks.

Yeah.

> 3.  We probably want SLRUs to use the main buffer pool, instead of
> their own mini-pools, so they can benefit from the above.

Right.  I think this is important, and it makes me think that maybe
Michael's patch won't help us much in the end.  I believe that the
number of pages that are needed for clog data, at least, can very
significantly depending on workload and machine size, so there's not
one number there that is going to work for everybody, and the
algorithms the SLRU code uses for page management have O(n) stuff in
them, so they don't scale well to large numbers of SLRU buffers
anyway.  I think we should try to unify the SLRU stuff with
shared_buffers, and then have a test patch like Michael's (not for
commit) which we can use to see the impact of that, and then try to
reduce that impact with the stuff you mention under #1 and #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: additional foreign key test coverage

2019-01-16 Thread Peter Eisentraut

On 09/01/2019 09:20, Mi Tar wrote:
> I tested this patch and it applied cleanly and all tests passed. I haven't 
> looked if the changes to tests are reasonable or extensive to cover all 
> aspects of what they want to cover.

I have committed this with additional tests for partitioned tables, as
requested by Álvaro.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: What to name the current heap after pluggable storage / what to rename?

2019-01-16 Thread Andres Freund




On January 16, 2019 8:08:09 AM PST, Robert Haas  wrote:
>On Tue, Jan 15, 2019 at 10:23 PM Haribabu Kommi
> wrote:
>> access/relation.[c|h] name is fine. Or how about access/rel.[c|h],
>> because nodes/relation.h is related to planner. utils/rel.h is some
>how
>> related to relation caches.
>
>Insofar as we can reasonably do so, I'd rather pick unique names for
>header files.  I know that there's no law that rules out having both
>nodes/relation.h and access/relation.h, or likewise utils/rel.h and
>access/rel.h; the computer won't be confused.  But it might create
>some confusion among human beings, so my vote is for avoiding that
>sort of thing if we can.

I prefer that too - if the new name isn't worse enough to make it hard to 
remember. I'd welcome suggestions that don't conflict...

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: insensitive collations

2019-01-16 Thread Peter Eisentraut

On 14/01/2019 15:37, Andreas Karlsson wrote:
>> Nondeterministic collations do address this by allowing canonically
>> equivalent code point sequences to compare as equal.  You still need a
>> collation implementation that actually does compare them as equal; ICU
>> does this, glibc does not AFAICT.
> Ah, right! You could use -ks-identic[1] for this.

That's the default.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Python versions (was Re: RHEL 8.0 build)

2019-01-16 Thread Tom Lane

Peter Eisentraut  writes:
> On 07/01/2019 00:16, Tom Lane wrote:
>> BTW, this is a pre-existing problem not the fault of this patch,
>> but while we're fooling with the behavior of python lookup would
>> be a great time to fix it: we should add something like
>> AC_ARG_VAR([PYTHON], [path to Python executable])

> Patch attached.

LGTM.  I'm slightly annoyed that configure's help output doesn't list
the variables in alphabetical order, but that seems to be a pre-existing
problem that we likely can't fix.

> The following are listed but don't affect any other tests, so I didn't
> include them:

> BISON
> DTRACE
> DTRACEFLAGS
> FLEX
> XML2_CONFIG

(slightly confused ...)  Surely XML2_CONFIG feeds into what we choose for
CPPFLAGS?  If that doesn't matter, why not?

regards, tom lane

Re: draft patch for strtof()

2019-01-16 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 > Andrew Gierth  writes:
 >> See if Windows likes this one any better.

 Tom> Doubtful, because AFAICT it's the same patch.

Sigh. copied wrong file. Let's try that again.

-- 
Andrew (irc:RhodiumToad)

diff --git a/configure b/configure
index 06fc3c6835..e3176e24e9 100755
--- a/configure
+++ b/configure
@@ -15802,6 +15802,19 @@ esac
 
 fi
 
+ac_fn_c_check_func "$LINENO" "strtof" "ac_cv_func_strtof"
+if test "x$ac_cv_func_strtof" = xyes; then :
+  $as_echo "#define HAVE_STRTOF 1" >>confdefs.h
+
+else
+  case " $LIBOBJS " in
+  *" strtof.$ac_objext "* ) ;;
+  *) LIBOBJS="$LIBOBJS strtof.$ac_objext"
+ ;;
+esac
+
+fi
+
 
 
 case $host_os in
diff --git a/configure.in b/configure.in
index 4efb912c4d..bdaab717d7 100644
--- a/configure.in
+++ b/configure.in
@@ -1703,6 +1703,7 @@ AC_REPLACE_FUNCS(m4_normalize([
 	strlcat
 	strlcpy
 	strnlen
+	strtof
 ]))
 
 case $host_os in
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 117ded8d1d..b907613056 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -104,13 +104,39 @@ is_infinite(double val)
 
 /*
  *		float4in		- converts "num" to float4
+ *
+ * Note that this code now uses strtof(), where it used to use strtod().
+ *
+ * The motivation for using strtof() is to avoid a double-rounding problem:
+ * for certain decimal inputs, if you round the input correctly to a double,
+ * and then round the double to a float, the result is incorrect in that it
+ * does not match the result of rounding the decimal value to float directly.
+ *
+ * One of the best examples is 7.038531e-26:
+ *
+ * 0xAE43FDp-107 = 7.03853069185120912085...e-26
+ *  midpoint   7.038531022281...e-26
+ * 0xAE43FEp-107 = 7.03853130814879132477...e-26
+ *
+ * making 0xAE43FDp-107 the correct float result, but if you do the conversion
+ * via a double, you get
+ *
+ * 0xAE43FD.7FF8p-107 = 7.038530907487...e-26
+ *   midpoint   7.038530964884...e-26
+ * 0xAE43FD.8000p-107 = 7.038531022281...e-26
+ * 0xAE43FD.8008p-107 = 7.038531137076...e-26
+ *
+ * so the value rounds to the double exactly on the midpoint between the two
+ * nearest floats, and then rounding again to a float gives the incorrect
+ * result of 0xAE43FEp-107.
+ *
  */
 Datum
 float4in(PG_FUNCTION_ARGS)
 {
 	char	   *num = PG_GETARG_CSTRING(0);
 	char	   *orig_num;
-	double		val;
+	float		val;
 	char	   *endptr;
 
 	/*
@@ -135,7 +161,7 @@ float4in(PG_FUNCTION_ARGS)
 		"real", orig_num)));
 
 	errno = 0;
-	val = strtod(num, &endptr);
+	val = strtof(num, &endptr);
 
 	/* did we not see anything that looks like a double? */
 	if (endptr == num || errno != 0)
@@ -143,14 +169,14 @@ float4in(PG_FUNCTION_ARGS)
 		int			save_errno = errno;
 
 		/*
-		 * C99 requires that strtod() accept NaN, [+-]Infinity, and [+-]Inf,
+		 * C99 requires that strtof() accept NaN, [+-]Infinity, and [+-]Inf,
 		 * but not all platforms support all of these (and some accept them
 		 * but set ERANGE anyway...)  Therefore, we check for these inputs
-		 * ourselves if strtod() fails.
+		 * ourselves if strtof() fails.
 		 *
 		 * Note: C99 also requires hexadecimal input as well as some extended
 		 * forms of NaN, but we consider these forms unportable and don't try
-		 * to support them.  You can use 'em if your strtod() takes 'em.
+		 * to support them.  You can use 'em if your strtof() takes 'em.
 		 */
 		if (pg_strncasecmp(num, "NaN", 3) == 0)
 		{
@@ -196,7 +222,14 @@ float4in(PG_FUNCTION_ARGS)
 			 * detect whether it's a "real" out-of-range condition by checking
 			 * to see if the result is zero or huge.
 			 */
-			if (val == 0.0 || val >= HUGE_VAL || val <= -HUGE_VAL)
+
+			if (val == 0.0 ||
+#ifdef HUGE_VALF
+val >= HUGE_VALF || val <= -HUGE_VALF
+#else
+isinf(val)
+#endif
+)
 ereport(ERROR,
 		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 		 errmsg("\"%s\" is out of range for type real",
@@ -232,13 +265,7 @@ float4in(PG_FUNCTION_ARGS)
  errmsg("invalid input syntax for type %s: \"%s\"",
 		"real", orig_num)));
 
-	/*
-	 * if we get here, we have a legal double, still need to check to see if
-	 * it's a legal float4
-	 */
-	check_float4_val((float4) val, isinf(val), val == 0);
-
-	PG_RETURN_FLOAT4((float4) val);
+	PG_RETURN_FLOAT4(val);
 }
 
 /*
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 9d99816eae..84ca929439 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -555,6 +555,9 @@
 /* Define to 1 if you have the `strsignal' function. */
 #undef HAVE_STRSIGNAL
 
+/* Define to 1 if you have the `strtof' function. */
+#undef HAVE_STRTOF
+
 /* Define to 1 if you have the `strtoll' function. */
 #undef HAVE_STRTOLL
 
diff --git a/src/include/pg_config.h.win32 b/src/include/pg_config.h.win32
index 1a89a8c24e..793897271c 100644
--- a/src/include/pg_config.h.win32
+++ b/src/include/pg_config.h.win32
@@ -139,6 +139,9 @@

Re: New vacuum option to do only freezing

2019-01-16 Thread Robert Haas

On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada  wrote:
> As the above comment says, it's possible that the state of an
> INSERT_IN_PROGRESS tuple could be changed to 'dead' after
> heap_page_prune(). Since such tuple is not truncated at this point we
> record it and set it as UNUSED in lazy_vacuum_page(). I think that the
> DISABLE_INDEX_CLEANUP case is the same; we need to process them after
> recorded. Am I missing something?

I believe you are.  Think about it this way.  After the first pass
over the heap has been completed but before we've done anything to the
indexes, let alone started the second pass over the heap, somebody
could kill the vacuum process.  Somebody could in fact yank the plug
out of the wall, stopping the entire server in its tracks.  If they do
that, then lazy_vacuum_page() will never get executed.  Yet, the heap
can't be in any kind of corrupted state at this point, right?  We know
that the system is resilient against crashes, and killing a vacuum or
even the whole server midway through does not leave the system in any
kind of bad state.  If it's fine for lazy_vacuum_page() to never be
reached in that case, it must also be fine for it never to be reached
if we ask for vacuum to stop cleanly before lazy_vacuum_page().

In the case of the particular comment to which you are referring, that
comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I
don't see how it bears on the question of whether we need to call
lazy_vacuum_page().  It's true that, at any point in time, an
in-progress transaction could abort.  And if it does then some
insert-in-progress tuples could become dead.  But if that happens,
then the next vacuum will remove them, just as it will remove any
tuples that become dead for that reason when vacuum isn't running in
the first place.  You can't use that as a justification for needing a
second heap pass, because if it were, then you'd also need a THIRD
heap pass in case a transaction aborts after the second heap pass has
visited the pages, and a fourth heap pass in case a transaction aborts
after the third heap pass has visited the pages, etc. etc. forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread John Naylor

On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila  wrote:
>
> On Fri, Jan 11, 2019 at 3:54 AM John Naylor  
> wrote:
> 1.
> Commit message:
> > Any pages with wasted free space become visible at next relation extension, 
> > so we still control table bloat.
>
> I think the free space will be available after the next pass of
> vacuum, no?  How can relation extension make it available?

To explain, this diagram shows the map as it looks for different small
table sizes:

0123
A
NA
ANA
NANA

So for a 3-block table, the alternating strategy never checks block 1.
Any free space block 1 has acquired via delete-and-vacuum will become
visible if it extends to 4 blocks. We are accepting a small amount of
bloat for improved performance, as discussed. Would it help to include
this diagram in the README?

> 2.
> +2. For very small heap relations, the FSM would be relatively large and
> +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for
> +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space
> +and to improve performance.  To locate free space in this case, we simply
> +iterate over the heap, trying alternating pages in turn.  There may be some
> +wasted free space in this case, but it becomes visible again upon next
> +relation extension.
>
> a. Again, how space becomes available at next relation extension.
> b. I think there is no use of mentioning the version number in the
> above comment, this code will be present from PG-12, so one can find
> out from which version this optimization is added.

It fits with the reference to PG 8.4 earlier in the document. I chose
to be consistent, but to be honest, I'm not much in favor of a lot of
version references in code/READMEs.

> 3.
> BlockNumber
>  RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
> Size oldSpaceAvail, Size spaceNeeded)
> {
> ..
> + /* First try the local map, if it exists. */
> + if (oldPage < fsm_local_map.nblocks)
> + {
> ..
> }
>
> The comment doesn't appear to be completely in sync with the code.
> Can't we just check whether "fsm_local_map.nblocks >  0", if so, we
> can use a macro for the same? I have changed this in the attached
> patch, see what you think about it.  I have used it at a few other
> places as well.

The macro adds clarity, so I'm in favor of using it.

> 4.
> + * When we initialize the map, the whole heap is potentially available to
> + * try.  If a caller wanted to reset the map after another backend extends
> + * the relation, this will only flag new blocks as available.  No callers
> + * do this currently, however.
> + */
> +static void
> +fsm_local_set(Relation rel, BlockNumber curr_nblocks)
> {
> ..
> + if (blkno >= fsm_local_map.nblocks + 2)
> ..
> }
>
>
> The way you have tried to support the case as quoted in the comment
> "If a caller wanted to reset the map after another backend extends .."
> doesn't appear to be solid and I am not sure if it is correct either.

I removed this case in v9 and you objected to that as unnecessary, so
I reverted it for v10.

> We don't have any way to test the same, so I suggest let's try to
> simplify the case w.r.t current requirement of this API.  I think we
> should
> some simple logic to try every other block like:
>
> + blkno = cur_nblocks - 1;
> + while (true)
> + {
> + fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
> + if (blkno >= 2)
> + blkno -= 2;
> + else
> + break;
> + }
>
> I have changed this in the attached patch.

Fine by me.

> 5.
> +/*
> + * Search the local map for an available block to try, in descending order.
> + *
> + * For use when there is no FSM.
> + */
> +static BlockNumber
> +fsm_local_search(void)
>
> We should give a brief explanation as to why we try in descending
> order.  I have added some explanation in the attached patch, see what
> you think about it?
>
> Apart from the above, I have modified a few comments.

I'll include these with some grammar corrections in the next version.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Prepare Transaction support for ON COMMIT DROP temporary tables

2019-01-16 Thread Robert Haas

On Mon, Jan 14, 2019 at 1:41 PM Dimitri Fontaine  wrote:
> One idea would be that if every temp table in the session belongs to the
> transaction, and their namespace too (we could check through pg_depend
> that the namespace doesn't contain anything else beside the
> transaction's tables), then we could dispose of the temp schema and
> on-commit-drop tables at PREPARE COMMIT time.

Why not just drop any on-commit-drop tables at PREPARE TRANSACTION
time and leave the schema alone?  If there are any temp tables touched
by the transaction which are not on-commit-drop then we'd have to
fail, but as long as all the tables we've got are on-commit-drop then
it seems fine to just nuke them at PREPARE time.  Such tables must've
been created in the current transaction, because otherwise the
creating transaction aborted and they're gone for that reason, or it
committed and they're gone because they're on-commit-drop.  And
regardless of whether the transaction we are preparing goes on to
commit or abort, those tables will be gone afterwards for the same
reasons.  So there doesn't in this case seem to be any reason to keep
them around until the transaction's fate is known.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-16 Thread Andreas Karlsson

On 1/16/19 9:27 AM, Michael Paquier wrote:> Regarding the grammar, we 
tend for the last couple of years to avoid

complicating the main grammar and move on to parenthesized grammars
(see VACUUM, ANALYZE, EXPLAIN, etc).  So in the same vein I think that
it would make sense to only support CONCURRENTLY within parenthesis
and just plugin that with the VERBOSE option.


Personally I do not care, but there have been a lot of voices for 
keeping REINDEX CONCURRENTLY consistent with CREATE INDEX CONCURRENTLY 
and DROP INDEX CONCURRENTLY.



Does somebody mind if I jump into the ship after so long?  I was the
original author of the monster after all...


Fine by me. Peter?

Andreas

parseCheckAggregates vs. assign_query_collations

2019-01-16 Thread Andrew Gierth

Looking into a bug report on the -general list about grouping sets,
which turns out to be an issue of collation assignment: if the query has

  CASE GROUPING(expr) WHEN 1 ...

then the expression is rejected as not matching the one in the GROUP BY
clause, because CASE already assigned collations to the expression (as a
special case in its transform function) while the rest of the query
hasn't yet had them assigned, because parseCheckAggregates gets run
before assign_query_collations.

I'll be looking into this in detail later, but right now, cam anyone
think of any reason why parseCheckAggregates couldn't be moved to after
assign_query_collations?

-- 
Andrew (irc:RhodiumToad)

Re: [HACKERS] generated columns

2019-01-16 Thread Robert Haas

On Thu, Nov 22, 2018 at 10:16 AM Peter Eisentraut
 wrote:
> On 15/11/2018 15:10, Robert Haas wrote:
> > I don't have a strong position on 1 vs. 2 vs. 3, but I do think it
> > would be nicer not to use '\0' as a column value.  I'd suggest you use
> > 'n' or '0' or '-' or some other printable character instead.
>
> I had carefully considered this when attidentity was added.  Using '\0'
> allows you to use this column as a boolean in C code, which is often
> convenient.  Also, there are numerous places where a pg_attribute form
> or a tuple descriptor is initialized to all zeroes, which works well for
> most fields, and adding one exception like this would create a lot of
> extra work and bloat the patch and create potential for future
> instability.  Also note that a C char '\0' is represented as '' (empty
> string) in SQL, so this also creates a natural representation in SQL.

I'm not really convinced.  I think that the stdbool work you've been
doing shows that blurring the line between char and bool is a bad
idea.  And I believe that on general principle, anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Use an enum for RELKIND_*?

2019-01-16 Thread Alvaro Herrera

On 2018-Dec-21, Greg Stark wrote:

> But I had a second look to see if the output pointed out any actual
> bugs. I found one though it's pretty minor:
> 
> lockfuncs.c:234:3: warning: enumeration value
> ‘LOCKTAG_SPECULATIVE_TOKEN’ not handled in switch [-Wswitch-enum]
>switch ((LockTagType) instance->locktag.locktag_type)
>^~
> 
> It just causes speculative locks to be printed wrong in the
> pg_lock_status view.

So what's a good fix?  We can add a new case to the switch.  Reporting
the XID is easy since we have a column for that, but what to do about
the uint32 value of the token?  We could put it in the virtualxid column
(which is just text), or we could put it in some int32 column and let it
show as negative; or we could add a new column.

Or we could do nothing, since there are no complaints about this
problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread John Naylor

On Wed, Jan 16, 2019 at 11:40 AM John Naylor
 wrote:
> On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila
 wrote:
> > can use a macro for the same? I have changed this in the attached
> > patch, see what you think about it.  I have used it at a few other
> > places as well.
>
> The macro adds clarity, so I'm in favor of using it.

It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more
common for macros that refer to constants, and FSMLocalMapExists for
expressions, but I've only seen a small amount of the code base. Do we
have a style preference here, or is it more a matter of matching the
surrounding code?

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: draft patch for strtof()

2019-01-16 Thread Tom Lane

I wrote:
> (FWIW, I'm running the patch on gaur's host, just to confirm it
> does what you expect.  Should have an answer in an hour or so ...)

It does --- it compiles cleanly, and the float4 output matches
float4-misrounded-input.out.

(This is the v1 patch not v2.)

regards, tom lane

Re: parseCheckAggregates vs. assign_query_collations

2019-01-16 Thread Tom Lane

Andrew Gierth  writes:
> Looking into a bug report on the -general list about grouping sets,
> which turns out to be an issue of collation assignment: if the query has
>   CASE GROUPING(expr) WHEN 1 ...
> then the expression is rejected as not matching the one in the GROUP BY
> clause, because CASE already assigned collations to the expression (as a
> special case in its transform function) while the rest of the query
> hasn't yet had them assigned, because parseCheckAggregates gets run
> before assign_query_collations.

Bleah.

> I'll be looking into this in detail later, but right now, cam anyone
> think of any reason why parseCheckAggregates couldn't be moved to after
> assign_query_collations?

I never particularly liked assign_query_collations, as a matter of overall
system design.  I'd prefer to nuke that and instead require collation
assignment to be done per-expression, ie at the end of transformExpr and
similar places.  Now that we've seen this example, it's fairly clear why
collation assignment really should be considered an integral part of
expression parsing.  Who's to say there aren't more gotchas of this sort
waiting to bite us?  Also, if it were integrated into transformExpr as
it should have been to begin with, we would not have the need for quite
so many random calls to assign_expr_collations, with consequent bugs of
omission, cf 7a28e9aa0.

regards, tom lane

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Tom Lane

John Naylor  writes:
> It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more
> common for macros that refer to constants, and FSMLocalMapExists for
> expressions, but I've only seen a small amount of the code base. Do we
> have a style preference here, or is it more a matter of matching the
> surrounding code?

I believe there's a pretty longstanding tradition in C coding to use
all-caps names for macros representing constants.  Some people think
that goes for all macros period, but I'm not on board with that for
function-like macros.

Different parts of the PG code base make different choices between
camel-case and underscore-separation for multiword function names.
For that, I'd say match the style of nearby code.

regards, tom lane

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-16 Thread Alvaro Herrera

On 2019-Jan-16, Michael Paquier wrote:

> Regarding the grammar, we tend for the last couple of years to avoid
> complicating the main grammar and move on to parenthesized grammars
> (see VACUUM, ANALYZE, EXPLAIN, etc).  So in the same vein I think that
> it would make sense to only support CONCURRENTLY within parenthesis
> and just plugin that with the VERBOSE option.

That's my opinion too, but I was outvoted in another subthread -- see
https://postgr.es/m/20181214144529.wvmjwmy7wxgmgyb3@alvherre.pgsql
Stephen Frost, Andrew Gierth and Andres Freund all voted to put
CONCURRENTLY outside the parens.  It seems we now have three votes to
put it *in* the parens (you, Peter Eisentraut, me).  I guess more votes
are needed to settle this issue.

My opinion is that if we had had parenthesized options lists back when
CREATE INDEX CONCURRENTLY was invented, we would have put it there.
But we were young and needed the money ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-16 Thread Stephen Frost

Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote:
> > On reflection, maybe the problem is not that we're giving the file
> > the wrong permissions, but that we're putting it in the wrong place?
> > That is, seems like it should be in the logfile directory not the
> > data directory.  That would certainly simplify the intended use-case,
> > and it would fix this complaint too.
> 
> Yeah, thinking more on this one using for this file different
> permissions than the log files makes little sense, so what you propose
> here seems like a sensible thing to do things.  Even if we exclude the
> file from native BASE_BACKUP this would not solve the case of custom
> backup solutions doing their own copy of things, when they rely on
> group-read permissions.  This would not solve completely the problem
> anyway if log files are in the data folder, but it would address the
> case where the log files are in an absolute path out of the data
> folder.

Actually, I agree with the initial patch on the basis that this file
that's being created (which I'm honestly a bit amazed that we're doing
this way; certainly seems rather grotty to me) is surely not an actual
*log* file and therefore using logfile_open() to open it doesn't seem
quite right.  I would have hoped for a way to pass this information that
didn't involve a file at all, but I'll assume that was discussed already
and good reasons put forth as to why we can't avoid it.

I'm not really sure putting it into the logfile directory is such a hot
idea as users might have set up external log file rotation of files in
that directory.  Of course, in that case they'd probably signal PG right
afterwards and PG would go write out a new file, but it still seems
pretty awkward.  I'm not terribly against solving this issue that way
either though, but I tend to think the originally proposed patch is more
sensible.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-16 Thread Tom Lane

James Coleman  writes:
> One other question on testing: do you think the "calculated array"
> tests are good enough by themselves (i.e., remove the ones with array
> constants of > 100 values)? I dislike that it's not as obvious what's
> going on, but given that the code shouldn't care about array size
> anyway...so if there's an inclination to fewer tests that's the first
> place I'd look at cutting them.

I don't have a strong opinion about that at this point.  It might be
clearer once the patch is finished; for now, there's no harm in erring
towards the more-tests direction.

regards, tom lane

Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-16 Thread Tom Lane

Stephen Frost  writes:
> * Michael Paquier (mich...@paquier.xyz) wrote:
>> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote:
>>> On reflection, maybe the problem is not that we're giving the file
>>> the wrong permissions, but that we're putting it in the wrong place?

> I'm not really sure putting it into the logfile directory is such a hot
> idea as users might have set up external log file rotation of files in
> that directory.  Of course, in that case they'd probably signal PG right
> afterwards and PG would go write out a new file, but it still seems
> pretty awkward.  I'm not terribly against solving this issue that way
> either though, but I tend to think the originally proposed patch is more
> sensible.

I dunno, I think that the current design was made without any thought
whatsoever about the log-files-outside-the-data-directory case.  If
you're trying to set things up that way, it's because you want to give
logfile read access to people who shouldn't be able to look into the
data directory proper.  That makes current_logfiles pretty useless
to such people, as it's now designed.

Now, if the expectation is that current_logfiles is just an internal
working file that users shouldn't access directly, then this argument
is wrong --- but then why is it documented in user-facing docs?

If we're going to accept the patch as-is, then it logically follows
that we should de-document current_logfiles, because we're taking the
position that it's an internal temporary file not meant for user access.

I don't really believe your argument about log rotation: a rotator
would presumably be configured either to pay attention to file name
patterns (which current_logfiles wouldn't match) or to file age
(which current_logfiles shouldn't satisfy either, since it's always
rewritten when we switch logfiles).

If we wanted to worry about that case, a possible solution is to make the
current_logfiles pathname user-configurable so it could be put in some
third directory.  But I think that adds more complexity than is justified
--- and not just for us, but for programs trying to find and use
current_logfiles.

regards, tom lane

Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-16 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Michael Paquier (mich...@paquier.xyz) wrote:
> >> On Tue, Jan 15, 2019 at 10:53:30AM -0500, Tom Lane wrote:
> >>> On reflection, maybe the problem is not that we're giving the file
> >>> the wrong permissions, but that we're putting it in the wrong place?
> 
> > I'm not really sure putting it into the logfile directory is such a hot
> > idea as users might have set up external log file rotation of files in
> > that directory.  Of course, in that case they'd probably signal PG right
> > afterwards and PG would go write out a new file, but it still seems
> > pretty awkward.  I'm not terribly against solving this issue that way
> > either though, but I tend to think the originally proposed patch is more
> > sensible.
> 
> I dunno, I think that the current design was made without any thought
> whatsoever about the log-files-outside-the-data-directory case.  If
> you're trying to set things up that way, it's because you want to give
> logfile read access to people who shouldn't be able to look into the
> data directory proper.  That makes current_logfiles pretty useless
> to such people, as it's now designed.

... or you just want to move the log files to a more sensible location
than the data directory.  The justification for log_file_mode existing
is because you might want to have log files with different privileges,
but that's quite a different thing.

> Now, if the expectation is that current_logfiles is just an internal
> working file that users shouldn't access directly, then this argument
> is wrong --- but then why is it documented in user-facing docs?

I really couldn't say why it's documented in the user-facing docs, and
for my 2c I don't really think it should be- there's a function to get
that information.  Sprinkling the data directory with files for users to
access directly doesn't exactly fit my view of what a good API looks
like.

The fact that there isn't any discussion about where that file actually
lives does make me suspect you're right that log files outside the data
directory wasn't really contemplated.

> If we're going to accept the patch as-is, then it logically follows
> that we should de-document current_logfiles, because we're taking the
> position that it's an internal temporary file not meant for user access.

... and hopefully we'd get rid of it one day entirely.

> I don't really believe your argument about log rotation: a rotator
> would presumably be configured either to pay attention to file name
> patterns (which current_logfiles wouldn't match) or to file age
> (which current_logfiles shouldn't satisfy either, since it's always
> rewritten when we switch logfiles).

Yes, a good pattern would avoid picking up on this file and most are
configured that way (though they are maybe not as specific as you might
think- the default here is just /var/log/postgresql/*.log).

> If we wanted to worry about that case, a possible solution is to make the
> current_logfiles pathname user-configurable so it could be put in some
> third directory.  But I think that adds more complexity than is justified
> --- and not just for us, but for programs trying to find and use
> current_logfiles.

I'd much rather move to get rid of that file rather than increase its
visability- programs should be using the provided function.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: speeding up planning with partitions

2019-01-16 Thread Alvaro Herrera

On 2019-Jan-11, Amit Langote wrote:

> Looking around a bit more, I started thinking even build_child_join_sjinfo
> doesn't belong in appendinfo.c (just to be clear, it was defined in
> prepunion.c before this commit), so maybe we should move it to joinrels.c
> and instead export adjust_child_relids that's required by it from
> appendinfo.c.  There's already adjust_child_relids_multilevel in
> appendinfo.h, so having adjust_child_relids next to it isn't too bad.  At
> least not as bad as appendinfo.c exporting build_child_join_sjinfo for
> joinrels.c to use.

OK, pushed, thanks.  I may have caused merge conflicts with the rest of
the series, because I reordered some functions -- sorry about that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Log PostgreSQL version number on startup

2019-01-16 Thread Peter Eisentraut

On 05/01/2019 15:53, Peter Eisentraut wrote:
> On 21/11/2018 15:46, Christoph Berg wrote:
>> A startup looks like this:
>>
>> 2018-11-21 15:19:47.259 CET [24453] LOG:  listening on IPv6 address "::1", 
>> port 5431
>> 2018-11-21 15:19:47.259 CET [24453] LOG:  listening on IPv4 address 
>> "127.0.0.1", port 5431
>> 2018-11-21 15:19:47.315 CET [24453] LOG:  listening on Unix socket 
>> "/tmp/.s.PGSQL.5431"
>> 2018-11-21 15:19:47.394 CET [24453] LOG:  starting PostgreSQL 12devel on 
>> x86_64-pc-linux-gnu, compiled by gcc (Debian 8.2.0-9) 8.2.0, 64-bit
>> 2018-11-21 15:19:47.426 CET [24454] LOG:  database system was shut down at 
>> 2018-11-21 15:15:35 CET
>> 2018-11-21 15:19:47.460 CET [24453] LOG:  database system is ready to accept 
>> connections
>>
>> (I'd rather put the start message before the listening messages, but I
>> think the startup message should be logged via logging_collector, and
>> listening is logged before the log file is opened.)
> 
> Why don't we start the logging collector before opening the sockets?

Specifically, something like the attached.

This keeps the dynamic module loading before the logging collector
start, so we see those error messages on stderr, but then the setting up
of the sockets would get logged.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 0a482552d26488ec5ce8e4d7bad12bf25ba44c8b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 16 Jan 2019 17:32:01 +0100
Subject: [PATCH] postmaster: Start syslogger earlier

XXX
---
 src/backend/postmaster/postmaster.c | 120 ++--
 1 file changed, 60 insertions(+), 60 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index a707d4d465..36efd908bb 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -992,6 +992,66 @@ PostmasterMain(int argc, char *argv[])
 */
InitializeMaxBackends();
 
+   /*
+* Initialize pipe (or process handle on Windows) that allows children 
to
+* wake up from sleep on postmaster death.
+*/
+   InitPostmasterDeathWatchHandle();
+
+   /*
+* Forcibly remove the files signaling a standby promotion request.
+* Otherwise, the existence of those files triggers a promotion too 
early,
+* whether a user wants that or not.
+*
+* This removal of files is usually unnecessary because they can exist
+* only during a few moments during a standby promotion. However there 
is
+* a race condition: if pg_ctl promote is executed and creates the files
+* during a promotion, the files can stay around even after the server 
is
+* brought up to new master. Then, if new standby starts by using the
+* backup taken from that master, the files can exist at the server
+* startup and should be removed in order to avoid an unexpected
+* promotion.
+*
+* Note that promotion signal files need to be removed before the 
startup
+* process is invoked. Because, after that, they can be used by
+* postmaster's SIGUSR1 signal handler.
+*/
+   RemovePromoteSignalFiles();
+
+   /* Do the same for logrotate signal file */
+   RemoveLogrotateSignalFiles();
+
+   /* Remove any outdated file holding the current log filenames. */
+   if (unlink(LOG_METAINFO_DATAFILE) < 0 && errno != ENOENT)
+   ereport(LOG,
+   (errcode_for_file_access(),
+errmsg("could not remove file \"%s\": %m",
+   LOG_METAINFO_DATAFILE)));
+
+   /*
+* If enabled, start up syslogger collection subprocess
+*/
+   SysLoggerPID = SysLogger_Start();
+
+   /*
+* Reset whereToSendOutput from DestDebug (its starting state) to
+* DestNone. This stops ereport from sending log messages to stderr 
unless
+* Log_destination permits.  We don't do this until the postmaster is
+* fully launched, since startup failures may as well be reported to
+* stderr.
+*
+* If we are in fact disabling logging to stderr, first emit a log 
message
+* saying so, to provide a breadcrumb trail for users who may not 
remember
+* that their logging is configured to go somewhere else.
+*/
+   if (!(Log_destination & LOG_DESTINATION_STDERR))
+   ereport(LOG,
+   (errmsg("ending log output to stderr"),
+errhint("Future log output will go to log 
destination \"%s\".",
+Log_destination_string)));
+
+   whereToSendOutput = DestNone;
+
/*
 * Establish input sockets.
 *
@@ -1183,12 +1243,6 @@ PostmasterMain(int argc, char *argv[])
 */

Re: insensitive collations

2019-01-16 Thread Peter Eisentraut

On 16/01/2019 14:20, Daniel Verite wrote:
> I've found another issue with aggregates over distinct:
> the deduplication seems to ignore the collation.

I have a fix for that.  I'll send it with the next update.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: explain plans with information about (modified) gucs

2019-01-16 Thread John Naylor

> [v5]

Hi Tomas,
Peter suggested upthread to use 'settings' rather than 'gucs' for the
explain option (and output?), and you seemed to agree. Are you going
to include that in a future version? Speaking of the output, v5's
default text doesn't match that of formatted text ('GUCs' / 'GUC').

Re: [HACKERS] generated columns

2019-01-16 Thread Erik Rijkers

I couldn't compile v7-0001 and I am testing with the older v6-0001 (of 
which I still had an instance).


So the problem below may have been fixed already.

If you add a generated column to a file_fdw foreign table, it works OK 
wih VIRTUAL (the default) but with STORED it adds an empty column, 
silently.  I would say it would make more sense to get an error.



Erik Rijkers

RE: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-01-16 Thread Jamison, Kirk

>On Thu, Nov 15, 2018 at 2:30 PM Alvaro Herrera  
>wrote:
>>
>> On 2018-Nov-15, Laurenz Albe wrote:
>>
> > > This new option would not only mitigate the long shared_buffers 
> > > scan, it would also get rid of the replication conflict caused by 
> > > the AccessExclusiveLock taken during truncation, which is discussed 
> > > in 
> > > https://www.postgresql.org/message-id/c9374921e50a5e8fb1ecf04eb8c6eb
> > > c3%40postgrespro.ru and seems to be a more difficult problem than 
> > > anticipated.
> >
> > FWIW I was just reminded yesterday that the AEL-for-truncation has 
> > been diagnosed to be a severe problem in production, and with no other 
> > solution in sight, I propose to move forward with the stop-gap.

I just want to ask whether or not we could proceed with this approach for now 
and 
if it is possible that we could have this solution integrated before PG12 
development ends?

Regards,
Kirk Jamison

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-16 Thread Michael Paquier

On Wed, Jan 16, 2019 at 11:02:48AM -0500, Robert Haas wrote:
> I think you have some valid points here, but I also think that the
> patch as written doesn't really seem to have any user-visible
> benefits.  Sure, ready_to_display is a crock, but getting rid of it
> doesn't immediately help anybody.  And on the flip side your patch
> might have bugs, in which case we'll be worse off.  I'm not going to
> stand on my soapbox and say that committing this patch is a terrible
> idea, because as far as I can tell, it isn't.  But it feels like it
> might be a commit for the sake of making a commit, and I can't get too
> excited about that either.  Since the logic will have to be further
> revised anyway to make primary_conninfo PGC_SIGHUP, why not just wait
> until that's done and then commit all the changes together instead of
> guessing that this might make that easier?

You can say that about any patch which gets committed, even
refactoring patches have a risk of introducing bugs, and even subtle
ones understood only after release.

I was justifying the existence of this thread exactly for opposite
reasons.  After reviewing the other patch switch to reload the
parameters we could do more, and spawning a new thread to attract the
correct audience looked rather right (it still does):
https://www.postgresql.org/message-id/20181212043208.gi17...@paquier.xyz

And this refactoring seemed to be justified as part of a different
thread.  I don't mind dropping this patch and this thread and just go
back there, but that seems like taking steps backward, and what's
proposed here is a bit different than just making the parameters
reloadable.  Still the refactoring looks justified to me for
correctness with the parameter handling.

Anyway, based on the opinions gathered until now, I don't mind just
dropping this patch and move on to the other thread, though we will
likely finish with the same discussion as what's done here :)

A patch on which two committers are expressing concerns about is as
good as going to the waste bin.  That's of course fine by me.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] generated columns

2019-01-16 Thread Michael Paquier

On Wed, Jan 16, 2019 at 02:14:41PM +0100, Peter Eisentraut wrote:
> On 15/01/2019 08:13, Michael Paquier wrote:
>> When testing a bulk INSERT into a table which has a stored generated
>> column, memory keeps growing in size linearly, which does not seem
>> normal to me.  If inserting more tuples than what I tested (I stopped
>> at 10M because of lack of time), it seems to me that this could result
>> in OOMs.  I would have expected the memory usage to be steady.
> 
> What are you executing exactly?  One INSERT command with many rows?

Yes, something like that grows the memory and CPU usage rather
linearly:
CREATE TABLE tab (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
INSERT INTO tab VALUES (generate_series(1,1));
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-16 Thread Michael Paquier

On Wed, Jan 16, 2019 at 02:59:31PM -0300, Alvaro Herrera wrote:
> That's my opinion too, but I was outvoted in another subthread -- see
> https://postgr.es/m/20181214144529.wvmjwmy7wxgmgyb3@alvherre.pgsql
> Stephen Frost, Andrew Gierth and Andres Freund all voted to put
> CONCURRENTLY outside the parens.  It seems we now have three votes to
> put it *in* the parens (you, Peter Eisentraut, me).  I guess more votes
> are needed to settle this issue.

Sure, let's see.  I would have been in the crowd of not using
parenthetised grammar five years ago, but the recent deals with other
commands worry me, as we would repeat the same errors.

> My opinion is that if we had had parenthesized options lists back when
> CREATE INDEX CONCURRENTLY was invented, we would have put it there.
> But we were young and needed the money ...

:)
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread Tomas Vondra





On 1/16/19 7:56 AM, David Rowley wrote:> On Tue, 15 Jan 2019 at 08:21, 
Tomas Vondra  wrote:

>> Turns out you were right - the attribute_referenced piece was quite
>> unnecessary. So I've removed it. I've also extended the regression tests
>> to verify changing type of another column does not reset the stats.
>
> (Trying to find my feet over here)
>
> I've read over the entire thread, and apart from missing the last two
> emails and therefore the latest patch, I managed to read over most of
> the MCV patch. I didn't quite get to reading mcv.c and don't quite
> have the energy to take that on now.
>
Thanks for looking!

> At this stage I'm trying to get to know the patch.  I read a lot of
> discussing between you and Dean ironing out how the stats should be
> used to form selectivities.  At the time I'd not read the patch yet,
> so most of it went over my head.
>
> I did note down a few things on my read.  I've included them below.
> Hopefully, they're useful.
>
> MCV list review
>
> 1. In mvc.c there's Assert(ndistinct <= UINT16_MAX); This should be
> PG_UINT16_MAX
>
Yep. Will fix.

> 2. math.h should be included just after postgres.h
>
Yep. Will fix.

> 3. Copyright is still -2017 in mcv.c.  Hopefully, if you change it to
> 2019, you'll never have to bump it ever again! :-)
>
Optimist ;-)

> 4. Looking at pg_stats_ext_mcvlist_items() I see you've coded the
> string building manually. The way it's coded I'm finding a little
> strange.  It means the copying becomes quadratic due to
>
> snprintf(buff, 1024, format, values[1], DatumGetPointer(valout));
> strncpy(values[1], buff, 1023);
>
> So basically, generally, here you're building a new string with
> values[1] followed by a comma, then followed by valout. One the next
> line you then copy that new buffer back into values[1].  I understand
> this part is likely not performance critical, but I see no reason to
> write the code this way.
>
> Are you limiting the strings to 1024 bytes on purpose?  I don't see
> any comment mentioning you want to truncate strings.
>
> Would it not be better to do this part using a
> AppendStringInfoString()? and just manually add a '{', ',' or '}' as
> and when required?
>> DatumGetPointer(valout) should really be using DatumGetCString(valout).
>
> Likely you can also use heap_form_tuple.  This will save you having to
> convert ints into strings then only to have BuildTupleFromCStrings()
> do the reverse.
>
I agree. I admit all of this is a residue of an initial hackish version 
of the function, and should be changed to StringInfo. Will fix.


> 5. individiaul -> individual
>  lists.  This allows very accurate estimates for individiaul 
columns, but

>
>  litst -> lists
>
>  litst on combinations of columns.  Similarly to functional 
dependencies

>
Will fix.

> 6. Worth mentioning planning cycles too?
>
>   "It's advisable to create MCV statistics 
objects only
>   on combinations of columns that are actually used in conditions 
together,
>   and for which misestimation of the number of groups is 
resulting in bad
>   plans.  Otherwise, the ANALYZE cycles are 
just wasted."

>
Makes sense. Although that's what we say about the existing stats, so 
perhaps we should tweak that too.


> 7. straight-forward -> straightforward
>
> (most-common values) lists, a straight-forward extension of the 
per-column

> > 8. adresses -> addresses
>
> statistics adresses the limitation by storing individual values, but it
>
Will fix. Thanks for proof-reading.

> 9. Worth mentioning ANALYZE time?
>
>  This section introduces multivariate variant of 
MCV
>  (most-common values) lists, a straight-forward extension of the 
per-column
>  statistics described in linkend="row-estimation-examples"/>. This
>  statistics adresses the limitation by storing individual values, 
but it
>  is naturally more expensive, both in terms of storage and 
planning time.

>
Yeah.

> 10. low -> a low
>
>  with low number of distinct values. Before looking at the second 
query,

>
> 11. them -> then
>
>  on items in the MCV list, and them sums the 
frequencies

>
Will fix.

> 12.  Should we be referencing the source from the docs?
>
> See mcv_clauselist_selectivity
>  in src/backend/statistics/mcv.c for details.
>
> hmm. I see it's not the first going by: git grep -E "\w+\.c\<"
> gt
Hmm, that does not return anything to me - do you actually see any 
references to .c files in the sgml docs? I agree that probably is not a 
good idea, so I'll remove that.


> 13. Pretty minor, but the following loop in
> UpdateStatisticsForTypeChange() could use a break;
>
> attribute_referenced = false;
> for (i = 0; i < staForm->stxkeys.dim1; i++)
> if (attnum == staForm->stxkeys.values[i])
> attribute_referenced = true;
>
> UPDATE: If I'd reviewed the correct patch I'd have seen that you'd
> removed this already
>
;-)

> 14. Again in UpdateStatisticsForTypeChange(), would it not be better
> to do the statext_is_kind_b

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley

On Thu, 17 Jan 2019 at 14:19, Tomas Vondra  wrote:
>  > 12.  Should we be referencing the source from the docs?
>  >
>  > See mcv_clauselist_selectivity
>  >  in src/backend/statistics/mcv.c for details.
>  >
>  > hmm. I see it's not the first going by: git grep -E "\w+\.c\<"
>  > gt
> Hmm, that does not return anything to me - do you actually see any
> references to .c files in the sgml docs? I agree that probably is not a
> good idea, so I'll remove that.

Yeah, I see quite a few. I shouldn't have escaped the <

>  > 18. In dependencies_clauselist_selectivity() there seem to be a new
>  > bug introduced.  We do:
>  >
>  > /* mark this one as done, so we don't touch it again. */
>  > *estimatedclauses = bms_add_member(*estimatedclauses, listidx);
>  >
>  > but the bms_is_member() check that skipped these has been removed.
>  >
>  > It might be easier to document if we just always do:
>  >
>  >if (bms_is_member(listidx, *estimatedclauses))
>  > continue;
>  >
>  > at the start of both loops. list_attnums can just be left unset for
>  > the originally already estimatedclauses.
>  >
> It's probably not as clear as it should be, but if the clause is already
> estimated (or incompatible), then the list_attnums[] entry will be
> InvalidAttrNumber. Which is what we check in the second loop.

hmm. what about the items that should be skipped when you do the
*estimatedclauses = bms_add_member(*estimatedclauses, listidx); in the
2nd loop. You'll need to either also do list_attnums[listidx] =
InvalidAttrNumber; for them, or put back the bms_is_member() check,
no?  I admit to not having debugged it to find an actual bug, it just
looks suspicious.

>  > 25. Does statext_is_compatible_clause_internal)_ need to skip over
> RelabelTypes?
>  >
> I believe it does, based on what I've observed during development. Why
> do you think it's not necessary?

The other way around. I thought it was necessary, but the code does not do it.

>  > 26. In statext_is_compatible_clause_internal() you mention: /* We only
>  > support plain Vars for now */, but I see nothing that ensures that
>  > only Vars are allowed in the is_opclause() condition.
>  >
>  > /* see if it actually has the right */
>  > ok = (NumRelids((Node *) expr) == 1) &&
>  > (is_pseudo_constant_clause(lsecond(expr->args)) ||
>  > (varonleft = false,
>  >is_pseudo_constant_clause(linitial(expr->args;
>  >
>  > the above would allow var+var == const through.
>  >
> But then we call statext_is_compatible_clause_internal on it again, and
> that only allows Vars and "Var op Const" expressions. Maybe there's a
> way around that?

True, I missed that. Drop that one.

>  > 33. "ndimentions"? There's no field in the struct by that name. I'd
>  > assume it's the same size as the isnull array above it?
>  >
>  > Datum*values; /* variable-length (ndimensions) */
>  >
> Yes, that's the case.

If it relates to the ndimensions field from the struct below, maybe
it's worth crafting that into the comment somehow.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: draft patch for strtof()

2019-01-16 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> (FWIW, I'm running the patch on gaur's host, just to confirm it
 >> does what you expect.  Should have an answer in an hour or so ...)

 Tom> It does --- it compiles cleanly, and the float4 output matches
 Tom> float4-misrounded-input.out.

Well I'm glad _something_ works.

Because it turns out that Windows (at least the version running on
Appveyor) completely fucks this up; strtof() is apparently returning
infinity or zero _without setting errno_ for values out of range for
float: input of "10e70" returns +inf with no error, input of "10e-70"
returns (exactly) 0.0 with no error.

*facepalm*

Any windows-users have any idea about this?

-- 
Andrew (irc:RhodiumToad)

Re: parseCheckAggregates vs. assign_query_collations

2019-01-16 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> I'll be looking into this in detail later, but right now, cam anyone
 >> think of any reason why parseCheckAggregates couldn't be moved to
 >> after assign_query_collations?

 Tom> I never particularly liked assign_query_collations, as a matter of
 Tom> overall system design. I'd prefer to nuke that and instead require
 Tom> collation assignment to be done per-expression, ie at the end of
 Tom> transformExpr and similar places. Now that we've seen this
 Tom> example, it's fairly clear why collation assignment really should
 Tom> be considered an integral part of expression parsing. Who's to say
 Tom> there aren't more gotchas of this sort waiting to bite us? Also,
 Tom> if it were integrated into transformExpr as it should have been to
 Tom> begin with, we would not have the need for quite so many random
 Tom> calls to assign_expr_collations, with consequent bugs of omission,
 Tom> cf 7a28e9aa0.

Sure, this might be the right approach going forward. But right now we
need a back-patchable fix, and the above sounds a bit too intrusive for
that.

Turns out the issue can be reproduced without grouping sets too:

select case a||'' when '1' then 0 else 1 end
  from (select (select random()::text) as a) s group by a||''; 
ERROR:  column "s.a" must appear in the GROUP BY clause or be used in an 
aggregate function

select case when a||'' = '1' then 0 else 1 end
  from (select (select random()::text) as a) s group by a||'';  -- works

-- 
Andrew (irc:RhodiumToad)

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Amit Kapila

On Wed, Jan 16, 2019 at 11:25 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > It just occured to me that the style FSM_LOCAL_MAP_EXISTS seems more
> > common for macros that refer to constants, and FSMLocalMapExists for
> > expressions, but I've only seen a small amount of the code base. Do we
> > have a style preference here, or is it more a matter of matching the
> > surrounding code?
>

I am fine with the style (FSMLocalMapExists) you are suggesting, but
see the similar macros in nearby code like:

#define FSM_TREE_DEPTH ((SlotsPerFSMPage >= 1626) ? 3 : 4)

I think the above is not an exact match.  So, I have looked around and
found few other macros which serve a somewhat similar purpose, see
below:

#define ATT_IS_PACKABLE(att) \
((att)->attlen == -1 && (att)->attstorage != 'p')

#define VARLENA_ATT_IS_PACKABLE(att) \
((att)->attstorage != 'p')

#define CHECK_REL_PROCEDURE(pname)

#define SPTEST(f, x, y) \
DatumGetBool(DirectFunctionCall2(f, PointPGetDatum(x), PointPGetDatum(y)))

> I believe there's a pretty longstanding tradition in C coding to use
> all-caps names for macros representing constants.  Some people think
> that goes for all macros period, but I'm not on board with that for
> function-like macros.
>
> Different parts of the PG code base make different choices between
> camel-case and underscore-separation for multiword function names.
> For that, I'd say match the style of nearby code.
>

Yes, that is what we normally do.  However, in some cases, we might
need to refer to other places as well which I think is the case here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Amit Kapila

On Wed, Jan 16, 2019 at 10:10 PM John Naylor
 wrote:
>
> On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila  wrote:
> >
> > On Fri, Jan 11, 2019 at 3:54 AM John Naylor  
> > wrote:
> > 1.
> > Commit message:
> > > Any pages with wasted free space become visible at next relation 
> > > extension, so we still control table bloat.
> >
> > I think the free space will be available after the next pass of
> > vacuum, no?  How can relation extension make it available?
>
> To explain, this diagram shows the map as it looks for different small
> table sizes:
>
> 0123
> A
> NA
> ANA
> NANA
>
> So for a 3-block table, the alternating strategy never checks block 1.
> Any free space block 1 has acquired via delete-and-vacuum will become
> visible if it extends to 4 blocks. We are accepting a small amount of
> bloat for improved performance, as discussed. Would it help to include
> this diagram in the README?
>

Yes, I think it would be good if you can explain the concept of
local-map with the help of this example.

> > 2.
> > +2. For very small heap relations, the FSM would be relatively large and
> > +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for
> > +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space
> > +and to improve performance.  To locate free space in this case, we simply
> > +iterate over the heap, trying alternating pages in turn.  There may be some
> > +wasted free space in this case, but it becomes visible again upon next
> > +relation extension.
> >
> > a. Again, how space becomes available at next relation extension.
> > b. I think there is no use of mentioning the version number in the
> > above comment, this code will be present from PG-12, so one can find
> > out from which version this optimization is added.
>
> It fits with the reference to PG 8.4 earlier in the document. I chose
> to be consistent, but to be honest, I'm not much in favor of a lot of
> version references in code/READMEs.
>

Then let's not add a reference to the version number in this case.  I
also don't see much advantage of adding version number at least in
this case.

>
> I'll include these with some grammar corrections in the next version.
>

Okay, thanks!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread David Rowley

On Thu, 17 Jan 2019 at 01:56, David Rowley  wrote:
> At this stage I'm trying to get to know the patch.  I read a lot of
> discussing between you and Dean ironing out how the stats should be
> used to form selectivities.  At the time I'd not read the patch yet,
> so most of it went over my head.
>
> I did note down a few things on my read.  I've included them below.
> Hopefully, they're useful.
>
> MCV list review

Part 2:

35. The evaluation order of this macro is wrong.

#define ITEM_SIZE(ndims) \
(ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))

You'd probably want ITEM_SIZE(10) to return 170, but:

select (10 * (2 + 1) + 2 * 8);
 ?column?
--
   46

Unsure why this does not cause a crash.

ndims should also have parenthesis around it in case someone does
ITEM_SIZE(x + y), likewise for the other ITEM_* macros.

36. Could do with some comments in get_mincount_for_mcv_list(). What's
magic about 0.04?

37. I think statext_mcv_build() needs some comments to detail out the
arguments.  For example can attrs be empty? Must it contain at least 2
members? etc.

38. Too many "it"s

* we simply treat it as a special item in the MCV list (it it makes it).

39. I don't see analyze_mcv_list() being used anywhere around this comment:

* If we can fit all the items onto the MCV list, do that. Otherwise use
* analyze_mcv_list to decide how many items to keep in the MCV list, just
* like for the single-dimensional MCV list.

40. The comment in the above item seems to indicate the condition for
when all items can fit in the number of groups, but the if condition
does not seem to allow for an exact match?

if (ngroups > nitems)

if you want to check if the number of items can fit in the number of
groups should it be: if (ngroups >= nitems) or if (nitems <= ngroups)
? Perhaps I've misunderstood. The comment is a little confusing as I'm
not sure where the "Otherwise" code is located.

41. I don't think palloc0() is required here. palloc() should be fine
since you're initialising each element in the loop.

mcvlist->items = (MCVItem * *) palloc0(sizeof(MCVItem *) * nitems);

for (i = 0; i < nitems; i++)
{
mcvlist->items[i] = (MCVItem *) palloc(sizeof(MCVItem));
mcvlist->items[i]->values = (Datum *) palloc(sizeof(Datum) * numattrs);
mcvlist->items[i]->isnull = (bool *) palloc(sizeof(bool) * numattrs);
}

I think I agree with the comment above that chunk about reducing the
number of pallocs, even if it's just allocating the initial array as
MCVItems instead of pointers to MCVItems

42. I don't think palloc0() is required in build_distinct_groups().
palloc() should be ok.

Maybe it's worth an Assert(j + 1 == ngroups) to ensure
count_distinct_groups got them all?

43. You're assuming size_t and long are the same size here.

elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);

I know at least one platform where that's not true.

44. Should use DatumGetCString() instead of DatumGetPointer().

else if (info[dim].typlen == -2) /* cstring */
{
memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
data += strlen(DatumGetPointer(v)) + 1; /* terminator */
}

45. No need to set this to NULL.

Datum*v = NULL;

Is "value" a better name than "v"?

46. What's the extra 'd' for in:

elog(ERROR, "invalid MCV magic %d (expected %dd)",

and

elog(ERROR, "invalid MCV type %d (expected %dd)",

47. Wondering about the logic behind the variation between elog() and
ereport() in statext_mcv_deserialize(). They all looks like "can't
happen" type errors.

48. format assumes size_t is the same size as long.

elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);

49. palloc0() followed by memset(). Can just use palloc().

matches = palloc0(sizeof(char) * mcvlist->nitems);
memset(matches, (is_or) ? STATS_MATCH_NONE : STATS_MATCH_FULL,
   sizeof(char) * mcvlist->nitems);

50. The coding style in mcv_get_match_bitmap I think needs to be
postgresqlified. We normally declare all our variables in a chunk then
start setting them, unless the assignment is very simple. I don't
recall places in the code where have a comment when declaring a
variable, for example.

FmgrInfo gtproc;
Var*var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
Const*cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
bool isgt = (!varonleft);

TypeCacheEntry *typecache
= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);

/* match the attribute to a dimension of the statistic */
int idx = bms_member_index(keys, var->varattno);

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Delay locking partitions during query execution

2019-01-16 Thread Amit Langote

On 2019/01/04 9:53, David Rowley wrote:
> Without PREPAREd statements, if the planner itself was unable to prune
> the partitions it would already have obtained the lock during
> planning, so AcquireExecutorLocks(), in this case, would bump into the
> local lock hash table entry and forego trying to obtain the lock
> itself.  That's not free, but it's significantly faster than obtaining
> a lock.

Hmm, AcquireExecutorLocks is only called if prepared statements are used
and that too if a generic cached plan is reused.

GetCachedPlan -> CheckCachedPlan -> AcquireExecutorLocks

In GetCachedPlan:

if (!customplan)
{
if (CheckCachedPlan(plansource))


Thanks,
Amit

PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-16 Thread Tom Lane

Although we've got a few NetBSD and OpenBSD buildfarm critters,
none of them are doing --enable-tap-tests.  If they were, we'd
have noticed the pgbench regression tests falling over:

not ok 3 - pgbench option error: bad option stderr /(?^:(unrecognized|illegal) 
option)/
#   Failed test 'pgbench option error: bad option stderr 
/(?^:(unrecognized|illegal) option)/'
#   at t/002_pgbench_no_server.pl line 190.
#   'pgbench: unknown option -- bad-option
# Try "pgbench --help" for more information.
# '
# doesn't match '(?^:(unrecognized|illegal) option)'

Sure enough, manual testing confirms that on these platforms
that error message is spelled differently:

$ pgbench --bad-option
pgbench: unknown option -- bad-option
Try "pgbench --help" for more information.


I am, TBH, inclined to fix this by removing that test case rather
than teaching it another spelling to accept.  I think it's very
hard to make the case that tests like this one are anything but
a waste of developer and buildfarm time.  When they are also a
portability hazard, it's time to cut our losses.  (I also note
that this test has caused us problems before, cf 869aa40a2 and
933851033.)

regards, tom lane

Re: Ryu floating point output patch

2019-01-16 Thread Donald Dong

> On Jan 15, 2019, at 2:37 AM, Andrew Gierth  
> wrote:
> 
> Andres> strtod()'s job ought to computationally be significantly easier
> Andres> than the other way round, no? And if there's buggy strtod()
> Andres> implementations out there, why would they be guaranteed to do
> Andres> the correct thing with our current output, but not with ryu's?
> 
> Funny thing: I've been devoting considerable effort to testing this, and
> the one failure mode I've found is very interesting; it's not a problem
> with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
> strtod().

Hi,

I'm trying to reproduce the results by calling float4in in my test code. But I
have some difficulties linking the code:

testfloat4.c:(.text+0x34): undefined reference to `float4in'
testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll'

I tried offering float.o to the linker in addition to my test program. I also
tried to link all the objects (*.o). But the errors still exist. I attached my 
changes as a patch.

I wonder if creating separated test programs is a good way of testing the code,
and I'm interested in learning what I missed.

Thank you.

test_float4.patch
Description: Binary data

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2019-01-16 Thread Michael Paquier

On Wed, Jan 16, 2019 at 05:56:15PM +0100, Andreas Karlsson wrote:
> On 1/16/19 9:27 AM, Michael Paquier wrote:
>> Does somebody mind if I jump into the ship after so long?  I was the
>> original author of the monster after all...
> 
> Fine by me. Peter?

Okay, I have begun digging into the patch, and extracted for now two
things which can be refactored first, giving a total of three patches:
- 0001, which creates WaitForOlderSnapshots to snapmgr.c.  I think
that this can be useful for external extensions to have a process wait
for snapshots older than a minimum threshold hold by other
transactions.
- 0002, which moves the concurrent index build into its own routine,
index_build_concurrent().  At the same time, index_build() has a
isprimary argument which is not used, so let's remove it.  This
simplifies a bit the refactoring as well.
- 0003 is the core patch, realigned with the rest, fixing some typos I
found on the way.

Here are also some notes for things I am planning to look after with a
second pass:
- The concurrent drop (phase 5) part, still shares a lot with DROP
INDEX CONCURRENTLY, and I think that we had better refactor more the
code so as REINDEX CONCURRENTLY shared more with DROP INDEX.  One
thing which I think is incorrect is that we do not clear the invalid
flag of the drop index before marking it as dead.  This looks like a
missing piece from another concurrent-related bug fix lost over the
rebases this patch had.
- set_dead could be refactored so as it is able to handle in input
multiple indexes, using WaitForLockersMultiple().  This way CREATE
INDEX CONCURRENTLY could also use it.
- There are no regression tests for partitioned tables.
- The NOTICE messages showing up when a table has no indexes should be
removed.
- index_create() does not really need a TupleDesc argument, as long as
the caller is able to provide a list of column names.
- At the end of the day, I think that it would be nice to reach a
state where we have a set of low-level routines like
index_build_concurrent, index_set_dead_concurrent which are used by
both paths of CONCURRENTLY and can be called for each phase within a
given transaction.  Those pieces can also be helpful to implement for
example an extension able to do concurrent reindexing out of core.

I think that the refactorings in 0001 and 0002 are committable as-is,
and this shaves some code from the core patch.

Thoughts?
--
Michael
From 2d871385cdecd2b860f5eeb2615d8fcf9e866e54 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 17 Jan 2019 11:54:42 +0900
Subject: [PATCH 1/3] Refactor code to wait for older snapshots into its own
 routine

This is being used by CREATE INDEX CONCURRENTLY to make sure that valid
indexes are marked as such after waiting for all transactions using
snapshots older than the reference snapshot used for concurrent index
validation are gone.  This piece is useful independently, and can be
used by REINDEX CONCURRENTLY.
---
 src/backend/commands/indexcmds.c | 71 +---
 src/backend/utils/time/snapmgr.c | 81 
 src/include/utils/snapmgr.h  |  2 +
 3 files changed, 85 insertions(+), 69 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 1959e8a82e..a81e656059 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -362,9 +362,7 @@ DefineIndex(Oid relationId,
 	int			numberOfAttributes;
 	int			numberOfKeyAttributes;
 	TransactionId limitXmin;
-	VirtualTransactionId *old_snapshots;
 	ObjectAddress address;
-	int			n_old_snapshots;
 	LockRelId	heaprelid;
 	LOCKTAG		heaplocktag;
 	LOCKMODE	lockmode;
@@ -1252,74 +1250,9 @@ DefineIndex(Oid relationId,
 	 * The index is now valid in the sense that it contains all currently
 	 * interesting tuples.  But since it might not contain tuples deleted just
 	 * before the reference snap was taken, we have to wait out any
-	 * transactions that might have older snapshots.  Obtain a list of VXIDs
-	 * of such transactions, and wait for them individually.
-	 *
-	 * We can exclude any running transactions that have xmin > the xmin of
-	 * our reference snapshot; their oldest snapshot must be newer than ours.
-	 * We can also exclude any transactions that have xmin = zero, since they
-	 * evidently have no live snapshot at all (and any one they might be in
-	 * process of taking is certainly newer than ours).  Transactions in other
-	 * DBs can be ignored too, since they'll never even be able to see this
-	 * index.
-	 *
-	 * We can also exclude autovacuum processes and processes running manual
-	 * lazy VACUUMs, because they won't be fazed by missing index entries
-	 * either.  (Manual ANALYZEs, however, can't be excluded because they
-	 * might be within transactions that are going to do arbitrary operations
-	 * later.)
-	 *
-	 * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
-	 * check for that.
-	 *
-	 * If a process goes idle-in-transaction

Re: Ryu floating point output patch

2019-01-16 Thread Andrew Gierth

> "Donald" == Donald Dong  writes:

 Donald> Hi,

 Donald> I'm trying to reproduce the results by calling float4in in my
 Donald> test code. But I have some difficulties linking the code:

 Donald> testfloat4.c:(.text+0x34): undefined reference to `float4in'
 Donald> testfloat4.c:(.text+0x3c): undefined reference to 
`DirectFunctionCall1Coll'

The way you're doing that isn't really a good way to test it - those
other programs being built by that rule are all frontend code, whereas
float4in is a backend function. If you want to add your own test it, you
could do a loadable module for it, or just do tests from SQL.

-- 
Andrew (irc:RhodiumToad)

Re: de-deduplicate code in DML execution hooks in postgres_fdw

2019-01-16 Thread Etsuro Fujita


(2019/01/16 20:30), Etsuro Fujita wrote:

(2019/01/16 15:54), Michael Paquier wrote:

On Wed, Jan 16, 2019 at 02:59:15PM +0900, Etsuro Fujita wrote:

If there are no objections, I'll commit that version of the patch.


I think that you could use PgFdwModifyState for the second argument of
execute_foreign_modify instead of ResultRelInfo.


Yeah, that is another option, but my favorite would be to use
ResultRelInfo, as in the original patch by Ashutosh, because that makes
it possible to write postgresExecForeignInsert,
postgresExecForeignUpdate, and postgresExecForeignDelete as a single line.


Except of that nit,
it looks fine to me, thanks for taking care of it.


Great! Thanks for the review!


Pushed.  I left that argument alone.  I think we can change it later, if 
necessary :).


Best regards,
Etsuro Fujita

Re: [PATCH] Pass COPT and PROFILE to CXXFLAGS as well

2019-01-16 Thread Michael Paquier

On Tue, Jan 08, 2019 at 09:09:56AM +0100, Christoph Berg wrote:
> Re: Andres Freund 2019-01-08 
> <20190108011837.n4mx7dadvojv2...@alap3.anarazel.de>
>>> Here's another revision that doesn't add an extra CXXOPT variable but
>>> just extends CXXFLAGS whenever COPT or PROFILE are set, which seems
>>> more usable.
>> 
>> Why does that seem more usable? How's that supposed to be used for flags
>> that aren't valid for both languages?
> 
> The existing COPT and PROFILE are already catch-all type flags that
> add to CFLAGS and LDFLAGS. Possibly we should leave those alone and
> only add PG_CXXFLAGS and PG_LDFLAGS?

Personally I see pgxs as something completely different than what COPT
and PROFILE are as we are talking about two different facilities: one
which is part of the core installation, and the other which can be
used for extension modules, so having PG_CFLAGS, PG_CXXFLAGS and
PG_LDFLAGS, but leaving CXXFLAGS out of COPT and PROFILE looks like
the better long-term move in terms of pluggability.  My 2c.
--
Michael


signature.asc
Description: PGP signature

Re: de-deduplicate code in DML execution hooks in postgres_fdw

2019-01-16 Thread Michael Paquier

On Thu, Jan 17, 2019 at 02:44:25PM +0900, Etsuro Fujita wrote:
> Pushed.  I left that argument alone.  I think we can change it later, if
> necessary :).

Sure, that's fine as well.  Thanks for committing the patch.
--
Michael


signature.asc
Description: PGP signature

Re: pgbench doc fix

2019-01-16 Thread Tatsuo Ishii

>> On 2018-Dec-03, Tatsuo Ishii wrote:
>> 
>>> To:
>>> ---
>>> -M querymode
>>> --protocol=querymode
>>> 
>>> Protocol to use for submitting queries to the server:
>>> 
>>> simple: use simple query protocol.
>>> 
>>> extended: use extended query protocol.
>>> 
>>> prepared: use extended query protocol with prepared statements.
>>> 
>>> Because in "prepared" mode pgbench reuses the parse analysis
>>> result for the second and subsequent query iteration, pgbench runs
>>> faster in the prepared mode than in other modes.
>> 
>> Looks good to me.
> 
> Thanks. I'm going to commit this if there's no objection.

Done. Thanks.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

Re: New vacuum option to do only freezing

2019-01-16 Thread Masahiko Sawada

On Thu, Jan 17, 2019 at 1:33 AM Robert Haas  wrote:
>
> On Wed, Jan 16, 2019 at 3:30 AM Masahiko Sawada  wrote:
> > As the above comment says, it's possible that the state of an
> > INSERT_IN_PROGRESS tuple could be changed to 'dead' after
> > heap_page_prune(). Since such tuple is not truncated at this point we
> > record it and set it as UNUSED in lazy_vacuum_page(). I think that the
> > DISABLE_INDEX_CLEANUP case is the same; we need to process them after
> > recorded. Am I missing something?
>
> I believe you are.  Think about it this way.  After the first pass
> over the heap has been completed but before we've done anything to the
> indexes, let alone started the second pass over the heap, somebody
> could kill the vacuum process.  Somebody could in fact yank the plug
> out of the wall, stopping the entire server in its tracks.  If they do
> that, then lazy_vacuum_page() will never get executed.  Yet, the heap
> can't be in any kind of corrupted state at this point, right?  We know
> that the system is resilient against crashes, and killing a vacuum or
> even the whole server midway through does not leave the system in any
> kind of bad state.  If it's fine for lazy_vacuum_page() to never be
> reached in that case, it must also be fine for it never to be reached
> if we ask for vacuum to stop cleanly before lazy_vacuum_page().
>

Yes, that's right. It doesn't necessarily need to be reached lazy_vacuum_page().

> In the case of the particular comment to which you are referring, that
> comment is part of lazy_scan_heap(), not lazy_vacuum_page(), so I
> don't see how it bears on the question of whether we need to call
> lazy_vacuum_page().  It's true that, at any point in time, an
> in-progress transaction could abort.  And if it does then some
> insert-in-progress tuples could become dead.  But if that happens,
> then the next vacuum will remove them, just as it will remove any
> tuples that become dead for that reason when vacuum isn't running in
> the first place.  You can't use that as a justification for needing a
> second heap pass, because if it were, then you'd also need a THIRD
> heap pass in case a transaction aborts after the second heap pass has
> visited the pages, and a fourth heap pass in case a transaction aborts
> after the third heap pass has visited the pages, etc. etc. forever.
>

The reason why I processed the tuples that became dead after the first
heap pass is that I was not sure the reason why we ignore such tuples
in the second heap pass despite of there already have been the code
doing so which has been used for a long time. I thought we can do that
in the same manner even in DISABLE_INDEX_CLEANUP case. Also, since I
thought that lazy_vacuum_page() is the best place to set them as DEAD
I modified it (In the previous patch I introduced another function
setting them as DEAD aside from lazy_vacuum_page(). But since these
were almost same I merged them).

However, as you mentioned it's true that these tuples will be removed
by the next vacuum even if we ignore. Also these tuples would not be
many and without this change the patch would get more simple. So if
the risk of this change is greater than the benefit, we should not do
that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-16 Thread Dean Rasheed

On Thu, 17 Jan 2019 at 03:42, David Rowley  wrote:
> 35. The evaluation order of this macro is wrong.
>
> #define ITEM_SIZE(ndims) \
> (ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))
>
> You'd probably want ITEM_SIZE(10) to return 170, but:
>
> select (10 * (2 + 1) + 2 * 8);
>  ?column?
> --
>46
>
> Unsure why this does not cause a crash.
>

No, the code is actually correct, as explained in the comment above
it. Each item contains (ndims) copies of the uint16 index and the
boolean, but it always contains exactly 2 doubles, independent of
ndims.

> ndims should also have parenthesis around it in case someone does
> ITEM_SIZE(x + y), likewise for the other ITEM_* macros.
>

+1 on that point.

Regards,
Dean

Re: [HACKERS] generated columns

2019-01-16 Thread Peter Eisentraut

On 16/01/2019 22:40, Erik Rijkers wrote:
> I couldn't compile v7-0001 and I am testing with the older v6-0001 (of 
> which I still had an instance).
> 
> So the problem below may have been fixed already.
> 
> If you add a generated column to a file_fdw foreign table, it works OK 
> wih VIRTUAL (the default) but with STORED it adds an empty column, 
> silently.  I would say it would make more sense to get an error.

Yes, v7 has addressed foreign-data wrappers.  (I only tested with
postgres_fdw.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

84 matches

Mail list logo