date:20190201

On Fri, Feb 01, 2019 at 05:34:05PM +1100, Haribabu Kommi wrote:
> As Microsoft windows doesn't support POSIX style of permissions, we
> always set for the permissions GUC's as not supported in
> windows. Even the new GUC that is added with --allow-group-access
> feature also mentioned the same.
> 
> The initdb --allow-group-access doesn't have any impact on the
> microsoft windows, so I feel it should be better to write the same
> in initdb docs?

Sounds right.  We may just want an extra sentence in the paragraph
describing --allow-group-access to mention that this parameter has no
effect on Windows.

> need a patch?

Always.
--
Michael


signature.asc
Description: PGP signature

Speeding up creating UPDATE/DELETE generic plan for partitioned table into a lot

2019-02-01 Thread Kato, Sho

Sorry, I lost previous mail[1].

On Fri, 28 Dec 2018 at 20:36, Tsunakawa, Takayuki
 wrote:
> Although I may say the same thing as you, I think a natural idea would be to 
> create a generic plan gradually.  The starting simple question is "why do we 
> have to touch all partitions at first?"  That is, can we behave like this:

I also think creating a generic plan gradually is a better idea because planner 
should create a plan when it is needed.
Any ideas?

On 2018-12-31 08:57:04, David Rowley wrote
>I imagine the place to start looking would be around why planning is so slow 
>for that many partitions. 

As you may already know, FLATCOPY at range_table_mutator has a large bottleneck.
Executing UPDATE, about npart squared RangeTblEntry is copied.
When I execute UPDATE to 100 partitioned table, FLATCOPY takes about 100 * 
0.067 ms while total planning time takes 12.689 ms.

On 2018-12-31 08:57:04, David Rowley wrote
>Another possible interesting idea would be to, instead of creating
>large Append/MergeAppend plans for partition scanning, invent some
>"Partition Seq Scan" and "Partition Index Scan" nodes that are able to
>build plans more similar to scanning a normal table. Likely such nodes
>would need to be programmed with a list of Oids that they're to scan
>during their execution. They'd also need to take care of their own
>tuple mapping for when partitions had their columns in varying orders.

Inventing some "Partition Seq Scan" and "Partition Index Scan" nodes is 
interesting.
It seems easy to add Scan nodes to each partition gradually.

[1]:CAKJS1f-y1HQK+VjG7=C==vGcLnzxjN8ysD5NmaN8Wh4=vsy...@mail.gmail.com

regards,

Re: current_logfiles not following group access and instead follows log_file_mode permissions

On Fri, Jan 18, 2019 at 09:50:40AM -0500, Stephen Frost wrote:
> Yes, we should update the documentation in this regard, though it's
> really an independent thing as that documentation should have been
> updated in the original group-access patch, so I'll see about fixing
> it and back-patching it.

Stephen, could you apply Hari's patch then?  I am not sure what the
consensus is, but documenting the restriction is the minimum we can
do.

-The default permissions are 0600, meaning only the
-server owner can read or write the log files.  The other commonly
-useful setting is 0640, allowing members of the owner's
-group to read the files.  Note however that to make use of such a
-setting, you'll need to alter  to
-store the files somewhere outside the cluster data directory.  In
-any case, it's unwise to make the log files world-readable, since
-they might contain sensitive data.
+The default permissions are either 0600, meaning only 
the
+server owner can read or write the log files or 0640, 
that
+allows any user in the same group can read the log files, based on the new
+cluster created with --allow-group-access option of 
initdb
+command. Note however that to make use of any setting other than default,
+you'll need to alter  to store the files
+somewhere outside the cluster data directory.

I would formulate that differently, by just adding an extra paragraph
to mention that using 0640 is recommended to be
compatible with initdb's --allow-group-access instead of sticking it
on the middle of the existing paragraph.
--
Michael


signature.asc
Description: PGP signature

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

On Fri, Feb 01, 2019 at 09:11:50AM +0100, Laurenz Albe wrote:
> Jamison, Kirk wrote:
>> I wonder if there is a better reloption name for
>> shrink_enabled. (truncate_enabled, vacuum_enabled? Hmm. No?)  
>> On the other hand, shrink_enabled seems to describe well what it's
>> supposed to do when vacuuming tables.  Besides there's a
>> similarly-named autovacuum_enabled option.
> 
> I like "shrink_enabled".
> 
> It may sound weird in the ears of PostgreSQL hackers, but will make
> sense to users.
> 
> Perhaps "vacuum_shrink_enabled" would be even better.

Naming it just vacuum_truncate and autovacuum_truncate (with aliases
for toast and such), looks more natural to me.  "shrink" is not a term
used in the code at all to describe this phase of vacuuming, and this
option talks mainly to people who are experts in PostgreSQL internals
in my opinion.
--
Michael


signature.asc
Description: PGP signature

Re: ALTER SESSION

2019-02-01 Thread Kyotaro HORIGUCHI

Hello.

At Fri, 1 Feb 2019 06:31:40 +, "Tsunakawa, Takayuki" 
 wrote in 
<0A3221C70F24FB45833433255569204D1FB927B1@G01JPEXMBYT05>
> Thanks for a cool feature with nice UI.  Can I expect it to work for 
> background processes?  For troubleshooting, I wanted to investigate how 
> autovacuum launcher/worker behaves by having them emit DEBUG messages.
> 
> (My comments follow)

I haven't did actually, but it doesn't reject background
workers. But background worker seems to assume that no change in
variablres while working. I should consider that.

> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> > .auto.conf is already a kind of such.. My first version signals the change
> > via shared memory (in a largely-improvable way) and add the GUC system the
> > complex "nontransactional SET" feature, which lets a change persists beyond
> > transaction end if any. Holding changes until the session becomes idle seems
> > also complex.
> > 
> > https://www.postgresql.org/message-id/20181127.193622.252197705.horigu
> > chi.kyot...@lab.ntt.co.jp
> > 
> > The most significant reason for passing-by-file is the affinity with the
> > current GUC system.
> 
> Regarding the target session specification, do we want to use pid or a 
> session ID like the following?
> 
> https://www.postgresql.org/docs/devel/runtime-config-logging.html
> --
> The %c escape prints a quasi-unique session identifier, consisting of two 
> 4-byte hexadecimal numbers (without leading zeros) separated by a dot. The 
> numbers are the process start time and the process ID, so %c can also be used 
> as a space saving way of printing those items. For example, to generate the 
> session identifier from pg_stat_activity, use this query:
> 
> SELECT to_hex(trunc(EXTRACT(EPOCH FROM backend_start))::integer) || '.' ||
>to_hex(pid)
> FROM pg_stat_activity;
> 
> pid is easier to type.  However, the session startup needs to try to delete 
> the leftover file.  Is the file system access negligible compared with the 
> total heavy session startup processing?
> 
> If we choose session ID, the session startup doesn't have to care about the 
> leftover file.  However, the background process crash could leave the file 
> for a long time, since the crash may not lead to postmaster restart.  Also, 
> we will get inclined to add sessionid column in pg_stat_activity (the concept 
> of session ID can be useful for other uses.)

Sounds reasonbale. 

The attached version happens to add backend startup time in
PGPROC and I added session id as a usable key. (Heavily WIP)

ALTER SESSION WITH (id '5c540141.b7f') SET work_mem to '128kB';

> I'm OK about passing parameter changes via a file.  But I'm not sure whether 
> using DSM (DSA) is easier with less code.

Perhaps DSA is not required. Currently it uses rather a large
area but I think we can do the same thing with smaller memory by
sending long strings part by part.

> And considering the multi-threaded server Konstantin is proposing, would it 
> better to take pass-by-memory approach?  I imagine that if the server gets 
> multi-threaded, the parameter change would be handled like:
> 
> 1. Allocate memory for one parameter change.
> 2. Write the change to that memory.
> 3. Link the memory to a session-specific list.
> 4. The target session removes the entry from the list, applies the change, 
> and frees the memory.
> 
> The code modification may be minimal when we migrate to the multi-threaded 
> server -- only memory allocation and free functions.

The attached is a WIP patch that:

 - Using the non-transactional SET (for my convenience).

 - based on not file, but static shared memory.
   Using a new signal 

 - It adds PGC_S_REMOTE with the same precedence with PGC_S_SESSION.
   (This causes arguable behavior..)

 - ALTER SESSION syntax. (key can be pid or session id)


(Sorry for the inconsistent name of the patch files..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 82970b92d406d487f1bc88a64fb9e89dc9a6b039 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Feb 2019 09:45:48 +0900
Subject: [PATCH 1/2] Add start timestamp to PGPROC

A ProcArray element may be allocated to another session while
investigating. Pid is not fully relialble for cheking that because of
the possibility of reusing. Add start timestamp to resolve the issue.
---
 src/backend/storage/lmgr/proc.c | 2 ++
 src/include/storage/proc.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 89c80fb687..a63309a432 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -373,6 +373,8 @@ InitProcess(void)
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
 	MyPgXact->xmin = InvalidTransactionId;
+	/* set this before pid to get rid of wrong pair with pid */
+	MyProc->starttimestamp = MyStartTimesta

RE: pg_upgrade: Pass -j down to vacuumdb

2019-02-01 Thread Jamison, Kirk

On January 31, 2019, 9:29PM +, Jesper Pedersen wrote:

>>> I added most of the documentation back, as requested by Kirk.
>> 
>> Actually, I find it useful to be documented. However, major contributors 
>> have higher opinions in terms of experience, so I think it's alright with me 
>> not to include the doc part if they finally say so. 
>
> I think we need to leave it to the Committer to decide, as both Peter and 
> Michael are committers; provided the patch reaches RfC.

Agreed.

>>> 1) You still enforce -j to use the number of jobs that the caller of 
>>> pg_upgrade provides, and we agreed that both things are separate 
>>> concepts upthread, no?  What has been suggested by Alvaro is to add 
>>> a comment so as one can use VACUUM_OPTS with -j optionally, instead 
>>> of suggesting a full-fledged vacuumdb command which depends on what 
>>> pg_upgrade uses.  So there is no actual need for the if/else 
>>> complication business.
> 
>> I think it is ok for the echo command to highlight to the user that 
>> running --analyze-only using the same amount of jobs will give a faster 
>> result.

Since you used user_opts.jobs (which is coming from pg_upgrade, is it not?),
could you clarify more the statement above? Or did you mean somehow that
it won't be a problem for vacuumdb to use the same?
Though correctness-wise is arguable, if the committers can let it pass
from your answer, then I think it's alright.

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.

Regards,
Kirk Jamison

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-01 Thread Arthur Zakirov


On 22.01.2019 22:17, Tomas Vondra wrote:

On 1/22/19 7:36 PM, Arthur Zakirov wrote:

max_shared_dictionaries_size can be renamed to
shared_dictionaries_cleanup_threshold.


That really depends on what exactly the threshold does. If it only
triggers cleanup but does not enforce maximum amount of memory used by
dictionaries, then this name seems OK. If it ensures max amount of
memory, the max_..._size name would be better.


Yep, I thought about the first approach.


I think there are essentially two ways:

(a) Define max amount of memory available for shared dictionarires, and
come up with an eviction algorithm. This will be tricky, because when
the frequently-used dictionaries need a bit more memory than the limit,
this will result in trashing (evict+load over and over).

(b) Define what "unused" means for dictionaries, and unload dictionaries
that become unused. For example, we could track timestamp of the last
time each dict was used, and decide that dictionaries unused for 5 or
more minutes are unused. And evict those.

The advantage of (b) is that it adopts automatically, more or less. When
you have a bunch of frequently used dictionaries, the amount of shared
memory increases. If you stop using them, it decreases after a while.
And rarely used dicts won't force eviction of the frequently used ones.


Thanks for sharing your ideas, Tomas. Unfortunately I won't manage to 
develop new version of the patch till the end of the commitfest due to 
lack of time. I'll think about the second approach. Tracking timestamp 
of the last time a dict was used may be difficult though and may slow 
down FTS...


I move the path to the next commitfest.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PROPOSAL] Shared Ispell dictionaries

2019-02-01 Thread Arthur Zakirov


On 01.02.2019 12:09, Arthur Zakirov wrote:
Thanks for sharing your ideas, Tomas. Unfortunately I won't manage to 
develop new version of the patch till the end of the commitfest due to 
lack of time. I'll think about the second approach. Tracking timestamp 
of the last time a dict was used may be difficult though and may slow 
down FTS...


I move the path to the next commitfest.


Oh, It seems it can't be moved to the next commitfest from status 
"Waiting on Author".


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PROPOSAL]a new data type 'bytea' for ECPG

2019-02-01 Thread Michael Meskes

Matsumura-san,

> Sorry to bother you, but I would be grateful if you would comment to me.

Sorry, I didn't know you were waiting on a reply by me.

> > I have a question about ecpg manual when I add article for bytea.
> > I wonder what does the following about VARCHAR mean.
> > ...
> > I think, if the footnote of VARCHAR is meaningless, I remove it while I add
> > the article for bytea. (I didn't remove in this patch.)

I have no idea where the footnote comes from, but I agree that it doesn't seem
to make sense. The datatype varchar in the C code is handled by the
preprocessor and replaced by a struct definition anyway.

Feel free to remove.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

Re: speeding up planning with partitions

On 2019-Jan-31, David Rowley wrote:

> On Tue, 29 Jan 2019 at 22:32, Amit Langote
>  wrote:
> > Attached updated patches.
> 
> I think we could make the 0001 patch a bit smaller if we were to apply
> the attached first.
> 
> Details in the commit message.
> 
> What do you think?

Amit didn't say what he thinks, but I think it looks good and the
rationale for it makes sense, so I pushed it.  I only amended some
comments a little bit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-Feb-01, Jamison, Kirk wrote:

> I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
> the other developers suggested is to utilize it so that --jobs for vacuumdb
> would be optional and just based from user-specified option $VACUUMDB_OPTS.
> By which it would not utilize the amount of jobs used for pg_upgrade.
> To address your need of needing a parallel, the script would just then
> suggest if the user wants a --jobs option. The if/else for number of jobs is
> not needed in that case, maybe only for ifndef WIN32 else case.

No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed.  I do think some
extra whitespace in the format string is needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: speeding up planning with partitions

Please rebase.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ArchiveEntry optional arguments refactoring

2019-02-01 Thread Daniel Gustafsson

> On 24 Jan 2019, at 13:12, Dmitry Dolgov <9erthali...@gmail.com> wrote: 

> Here is another version, where I accumulated all the suggestions:

Nothing sticks out during review and AFAICT all comments have been addressed.
Everything works as expected during (light) testing between master and an older
version.

+1 on committing this, having spent a lot of time in this code I really
appreciate the improved readability.

cheers ./daniel

where clause not working through psql command line client

2019-02-01 Thread rajan

postgres@Server2[DEV][backup] $ psql -c 'select * from pg_class where relname
= pg_toast_22345'
ERROR:  column "pg_toast_16387" does not exist
LINE 1: select * from pg_class where relname = pg_toast_22345




-
--
Thanks,
Rajan.
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: where clause not working through psql command line client

2019-02-01 Thread amul sul

You forgot to quote pg_toast_22345, try this:

 psql -c " select * from pg_class where relname  = 'pg_toast_22345' "

Regards,
Amul

On Fri, Feb 1, 2019 at 4:05 PM rajan  wrote:

> postgres@Server2[DEV][backup] $ psql -c 'select * from pg_class where
> relname
> = pg_toast_22345'
> ERROR:  column "pg_toast_16387" does not exist
> LINE 1: select * from pg_class where relname = pg_toast_22345
>
>
>
>
> -
> --
> Thanks,
> Rajan.
> --
> Sent from:
> http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
>
>

Re: LD_LIBRARY_PATH_RPATH

2019-02-01 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> At least on my FreeBSD 11 box, the current definition of
 >> $(with_temp_install) is not sufficient to ensure that the various .so
 >> files are loaded from tmp_install and not from the compiled rpath (which
 >> will be the final install dir, which may of course contain old
 >> libraries).

 >> LD_LIBRARY_PATH_RPATH=1 fixes this (by giving LD_LIBRARY_PATH priority
 >> over the DT_RPATH tag in the object).

 >> Is this also an issue on any other platforms?

 Tom> Hm.  Can't reproduce that on current NetBSD or macOS; don't have
 Tom> OpenBSD booted up to try right now.

 Tom> However, if it helps on some platforms, I can't see much harm in
 Tom> adding it in the relevant places. It should be a no-op everywhere
 Tom> else.

Oh, I see why it hasn't previously been an issue - the behavior varies
according to whether both DT_RPATH and DT_RUNPATH, or just DT_RPATH, are
defined in the program header; and when I build using default cc
(clang), I get both tags (which makes LD_LIBRARY_PATH_RPATH
unnecessary), whereas when I build using ports gcc using the recommended
additional -Wl,-R option, for whatever reason that ends up setting only
DT_RPATH.

Confusing. Probably safest to add the env var anyway.

-- 
Andrew (irc:RhodiumToad)

Re: FETCH FIRST clause PERCENT option

2019-02-01 Thread Surafel Temesgen

here is a rebased version of  previous  patch with planner
comment included

regards
Surafel


On Wed, Jan 30, 2019 at 9:07 AM Surafel Temesgen 
wrote:

>
>
> On Mon, Jan 28, 2019 at 1:28 AM Tomas Vondra 
> wrote:
>
>>
>>
>> OK. Does that mean you agree the incremental approach is reasonable?
>>
>
> there are no noticeable performance difference but i love previous
> approach more regarding cursor operation it fetch tuple forward and
> backward from tuplestore only but in incremental approach we have to
> re execute outer node in every forward and backward fetching operation
>
> regards
> Surafel
>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..8491b7831a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..5ba895362d 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -43,6 +43,7 @@ ExecLimit(PlanState *pstate)
 	LimitState *node = castNode(LimitState, pstate);
 	ScanDirection direction;
 	TupleTableSlot *slot;
+	TupleDesc   tupleDescriptor;
 	PlanState  *outerPlan;
 
 	CHECK_FOR_INTERRUPTS();
@@ -52,6 +53,8 @@ ExecLimit(PlanState *pstate)
 	 */
 	direction = node->ps.state->es_direction;
 	outerPlan = outerPlanState(node);
+	slot = node->subSlot;
+	tupleDescriptor = node->ps.ps_ResultTupleDesc;
 
 	/*
 	 * The main logic is a simple state machine.
@@ -60,6 +63,23 @@ ExecLimit(PlanState *pstate)
 	{
 		case LIMIT_INITIAL:
 
+			if (node->limitOption == PERCENTAGE)
+			{
+
+			/*
+			 * Find all rows the plan will return.
+			 */
+for (;;)
+{
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		break;
+	}
+	tuplestore_puttupleslot(node->totalTuple, slot);
+}
+			}
+
 			/*
 			 * First call for this node, so compute limit/offset. (We can't do
 			 * this any earlier, because parameters from upper nodes will not
@@ -87,24 +107,46 @@ ExecLimit(PlanState *pstate)
 return NULL;
 			}
 
-			/*
-			 * Fetch rows from subplan until we reach position > offset.
-			 */
-			for (;;)
+			if (node->limitOption == PERCENTAGE)
 			{
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+for (;;)
 {
-	/*
-	 * The subplan returns too few tuples for us to produce
-	 * any output at all.
-	 */
-	node->lstate = LIMIT_EMPTY;
-	return NULL;
+	slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+	if (!tuplestore_gettupleslot(node->totalTuple, true, true, slot))
+	{
+		node->lstate = LIMIT_EMPTY;
+		return NULL;
+	}
+	else
+	{
+		node->subSlot = slot;
+		if (++node->position > node->offset)
+		break;
+	}
+}
+			}
+			else if (node->limitOption == EXACT_NUMBER)
+			{
+
+/*
+ * Fetch rows from subplan until we reach position > offset.
+ */
+for (;;)
+{
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		/*
+		 * The subplan returns too few tuples for us to produce
+		 * any output at all.
+		 */
+		node->lstate = LIMIT_EMPTY;
+		return NULL;
+	}
+	node->subSlot = slot;
+	if (++node->position > node->offset)
+		break;
 }
-node->subSlot = slot;
-if (++node->position > node->offset)
-	break;
 			}
 
 			/*
@@ -144,18 +186,35 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+if (node->limitOption == PERCENTAGE)
+{
+	slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+	if (tuplestore_gettupleslot(node->totalTuple, true, false, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+

Re: where clause not working through psql command line client

2019-02-01 Thread rajan

Thanks a lot, Amul. 



-
--
Thanks,
Rajan.
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: ArchiveEntry optional arguments refactoring

pgindent didn't like your layout with two-space indents for the struct
members :-(  I thought it was nice, but oh well.  This means we can do
away with the newline at each callsite, and I didn't like the trailing
comma (and I have vague recollections that some old compilers might
complain about them too, though maybe we retired them already.)

> * Use NULL as a default value where it was an empty string before (this
>   required few minor changes for some part of the code outside ArchiveEntry)

Ah, so this is why you changed replace_line_endings.  So the comment on
that function now is wrong -- it fails to indicate that it returns a
malloc'ed "" on NULL input.  But about half the callers want to have a
malloc'ed "-" on NULL input ... I think it'd make the code a little bit
simpler if we did that in replace_line_endings itself, maybe add a
"want_dash" bool argument, so this code

if (!ropt->noOwner)
sanitized_owner = replace_line_endings(te->owner);
else
sanitized_owner = pg_strdup("-");

can become
sanitized_owner = replace_line_endings(te->owner, true);

I don't quite understand why the comments about line sanitization were
added in the callsites rather than in replace_line_endings itself.  I
would rename the function to sanitize_line() and put those comments
there (removing them from the callsites), then the new argument I
suggest would not be completely out of place.

What do you think?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 86781bda4cf852f198f32aee632281fb72f4b7d0 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Fri, 1 Feb 2019 08:16:20 -0300
Subject: [PATCH] ArchiveOpts structure

ArchiveEntry function has some number of arguments, that can be
considered as optional. Refactor out arguments for ArchiveEntry into
ArchiveOpts structure, to make it more flexible for changes.

Author: Dmitry Dolgov
Discussion: https://postgr.es/m/ca+q6zcxrxpe+qp6oerqwj3zs061wpohdxemrdc-yf-2v5vs...@mail.gmail.com
---
 src/bin/pg_dump/pg_backup_archiver.c | 113 ++---
 src/bin/pg_dump/pg_backup_archiver.h |  31 +-
 src/bin/pg_dump/pg_dump.c| 952 +--
 3 files changed, 521 insertions(+), 575 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 2c2f6fb4a9..a3e530bac7 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -77,7 +77,7 @@ static ArchiveHandle *_allocAH(const char *FileSpec, const ArchiveFormat fmt,
 static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
 	  ArchiveHandle *AH);
 static void _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData);
-static char *replace_line_endings(const char *str);
+static char *sanitize_line(const char *str, bool want_hyphen);
 static void _doSetFixedOutputState(ArchiveHandle *AH);
 static void _doSetSessionAuth(ArchiveHandle *AH, const char *user);
 static void _reconnectToDB(ArchiveHandle *AH, const char *dbname);
@@ -1066,17 +1066,8 @@ WriteData(Archive *AHX, const void *data, size_t dLen)
 
 /* Public */
 TocEntry *
-ArchiveEntry(Archive *AHX,
-			 CatalogId catalogId, DumpId dumpId,
-			 const char *tag,
-			 const char *namespace,
-			 const char *tablespace,
-			 const char *owner,
-			 const char *desc, teSection section,
-			 const char *defn,
-			 const char *dropStmt, const char *copyStmt,
-			 const DumpId *deps, int nDeps,
-			 DataDumperPtr dumpFn, void *dumpArg)
+ArchiveEntry(Archive *AHX, CatalogId catalogId,
+			 DumpId dumpId, ArchiveOpts *opts)
 {
 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
 	TocEntry   *newToc;
@@ -1094,22 +1085,22 @@ ArchiveEntry(Archive *AHX,
 
 	newToc->catalogId = catalogId;
 	newToc->dumpId = dumpId;
-	newToc->section = section;
+	newToc->section = opts->section;
 
-	newToc->tag = pg_strdup(tag);
-	newToc->namespace = namespace ? pg_strdup(namespace) : NULL;
-	newToc->tablespace = tablespace ? pg_strdup(tablespace) : NULL;
-	newToc->owner = pg_strdup(owner);
-	newToc->desc = pg_strdup(desc);
-	newToc->defn = pg_strdup(defn);
-	newToc->dropStmt = pg_strdup(dropStmt);
-	newToc->copyStmt = copyStmt ? pg_strdup(copyStmt) : NULL;
+	newToc->tag = pg_strdup(opts->tag);
+	newToc->namespace = opts->namespace ? pg_strdup(opts->namespace) : NULL;
+	newToc->tablespace = opts->tablespace ? pg_strdup(opts->tablespace) : NULL;
+	newToc->owner = opts->owner ? pg_strdup(opts->owner) : NULL;
+	newToc->desc = pg_strdup(opts->description);
+	newToc->defn = opts->createStmt ? pg_strdup(opts->createStmt) : NULL;
+	newToc->dropStmt = opts->dropStmt ? pg_strdup(opts->dropStmt) : NULL;
+	newToc->copyStmt = opts->copyStmt ? pg_strdup(opts->copyStmt) : NULL;
 
-	if (nDeps > 0)
+	if (opts->nDeps > 0)
 	{
-		newToc->dependencies = (DumpId *) pg_malloc(nDeps * sizeof(DumpId));
-		memcpy(newToc->dependencies, deps,

Re: ArchiveEntry optional arguments refactoring

On 2019-Feb-01, Alvaro Herrera wrote:

> ... so this code
> 
>   if (!ropt->noOwner)
>   sanitized_owner = replace_line_endings(te->owner);
>   else
>   sanitized_owner = pg_strdup("-");
> 
> can become
>   sanitized_owner = replace_line_endings(te->owner, true);

Sorry, there's a silly bug here because I picked the wrong example to
hand-type.  The proposed pattern works fine for the schema cases, not
for this owner case.  The owner case is correctly handled (AFAICT) in
the patch I posted.  (Also for some reason I decided to go with "hyphen"
instead of "dash" in the argument name.  Not sure if anybody cares
strongly about using the right terminology there (I don't know which it
is).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Generic type subscripting

I think it's worth pointing out that "git format-patch -v" exists :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Generic type subscripting

On 2019-Feb-01, Alvaro Herrera wrote:

> I think it's worth pointing out that "git format-patch -v" exists :-)

... and you're going to need "git format-patch -v19", because contrib
doesn't build with 18.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Generic type subscripting

On 2019-Feb-01, Alvaro Herrera wrote:

> On 2019-Feb-01, Alvaro Herrera wrote:
> 
> > I think it's worth pointing out that "git format-patch -v" exists :-)
> 
> ... and you're going to need "git format-patch -v19", because contrib
> doesn't build with 18.

And that suggests that maybe we should keep the old names working, to
avoid breaking every extension out there that deals with ArrayRef,
though I'm not sure if after patches 0002 ff it'll be possible to keep
them working without changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: DNS SRV support for LDAP authentication

2019-02-01 Thread Daniel Gustafsson

> On 25 Sep 2018, at 04:09, Thomas Munro  wrote:

> Some people like to use DNS SRV records to advertise LDAP servers on
> their network.  Microsoft Active Directory is usually (always?) set up
> that way.  Here is a patch to allow our LDAP auth module to support
> that kind of discovery.  It copies the convention of the OpenLDAP
> command line tools: if you give it a URL that has no hostname, it'll
> try to extract a domain name from the bind DN, and then ask your DNS
> server for a SRV record for LDAP-over-TCP at that domain.  The
> OpenLDAP version of libldap.so exports the magic to do that, so the
> patch is very small (but the infrastructure set-up to test it is a bit
> of a schlep, see below).  I'll add this to the next Commitfest.

Sounds like a reasonable feature.

> Testing instructions for (paths and commands given for FreeBSD, adjust
> as appropriate):

Trying this quickly on macOS while at a conference didn’t yield much success,
will do another attempt when I’m on a more reliable connection.

> This is a first draft.  Not tested much yet.  I wonder if
> HAVE_LDAP_INITIALIZE is a reasonable way to detact OpenLDAP.  The
> documentation was written in about 7 seconds so probably needs work.
> There is probably a Windowsy way to do this too but I didn't look into
> that.

Reading through the patch, and related OpenLDAP code, this seems like a good
approach.  A few small comments:

+ If PostgreSQL was compiled with OpenLDAP as

Should OpenLDAP be wrapped in  tags as well?  If so, there is
another “bare” instance in client-auth.sgml which perhaps can be wrapped into
this patch while at it.

+   ereport(LOG,
+   (errmsg("could not look up a hostlist for %s",
+   domain)));

Should this be \”%s\”?

+   new_uris = psprintf("%s%s%s://%s:%d",

While this construction isn't introduced in this patch, would it not make sense
to convert uris to StringInfo instead to improve readability?

+   /* Step over this hostname and any spaces. */

Nitpicking on a moved hunk, but single-line comments shouldn’t end in a period
I believe.

cheers ./daniel

Re: [HACKERS] [PATCH] Generic type subscripting

2019-02-01 Thread Dmitry Dolgov

> On Fri, Feb 1, 2019 at 12:54 PM Alvaro Herrera  
> wrote:
>
> On 2019-Feb-01, Alvaro Herrera wrote:
>
> > On 2019-Feb-01, Alvaro Herrera wrote:
> >
> > ... and you're going to need "git format-patch -v19", because contrib
> > doesn't build with 18.
>
> And that suggests that maybe we should keep the old names working, to
> avoid breaking every extension out there that deals with ArrayRef,
> though I'm not sure if after patches 0002 ff it'll be possible to keep
> them working without changes.

Can you please point out for me what exactly doesn't build? I just tried to
build contrib and ran all the tests, everything finished succesfully, which is
also confirmed by bot [1].

> I think it's worth pointing out that "git format-patch -v" exists :-)

Fortunately, I know. But yeah, no idea why I started to add a version number at
the end of patch name :)

[1]: https://travis-ci.org/postgresql-cfbot/postgresql/builds/487401077

Re: [HACKERS] [PATCH] Generic type subscripting

On 2019-Feb-01, Dmitry Dolgov wrote:

> Can you please point out for me what exactly doesn't build? I just tried to
> build contrib and ran all the tests, everything finished succesfully, which is
> also confirmed by bot [1].

Well, this is strange: I removed the changes then re-applied the diff,
and it worked fine this time.  Strange.  I can only offer my terminal
log, where it's obvious that the contrib changes were not applied by
"git apply" (even though it did finish successfully):

alvin: src 141$ git apply 
/tmp/0001-Renaming-for-new-subscripting-mechanism-v18.patch
alvin: src 0$ runpg head install 
building heads/master ...
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c: In 
function 'JumbleExpr':
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2582:8: 
error: 'T_ArrayRef' undeclared (first use in this function)
   case T_ArrayRef:
^~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2582:8: 
note: each undeclared identifier is reported only once for each function it 
appears in
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:5: 
error: unknown type name 'ArrayRef'
 ArrayRef   *aref = (ArrayRef *) node;
 ^~~~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:25: 
error: 'ArrayRef' undeclared (first use in this function)
 ArrayRef   *aref = (ArrayRef *) node;
 ^~~~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:35: 
error: expected expression before ')' token
 ArrayRef   *aref = (ArrayRef *) node;
   ^
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2586:37: 
error: request for member 'refupperindexpr' in something not a structure or 
union
 JumbleExpr(jstate, (Node *) aref->refupperindexpr);
 ^~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2587:37: 
error: request for member 'reflowerindexpr' in something not a structure or 
union
 JumbleExpr(jstate, (Node *) aref->reflowerindexpr);
 ^~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2588:37: 
error: request for member 'refexpr' in something not a structure or union
 JumbleExpr(jstate, (Node *) aref->refexpr);
 ^~
/pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2589:37: 
error: request for member 'refassgnexpr' in something not a structure or union
 JumbleExpr(jstate, (Node *) aref->refassgnexpr);
 ^~
make[1]: *** [pg_stat_statements.o] Error 1
make[1]: Target 'install' not remade because of errors.
make: *** [install-pg_stat_statements-recurse] Error 2
/pgsql/source/master/contrib/postgres_fdw/deparse.c:152:29: error: unknown type 
name 'ArrayRef'
 static void deparseArrayRef(ArrayRef *node, deparse_expr_cxt *context);
 ^~~~
/pgsql/source/master/contrib/postgres_fdw/deparse.c: In function 
'foreign_expr_walker':
/pgsql/source/master/contrib/postgres_fdw/deparse.c:404:8: error: 'T_ArrayRef' 
undeclared (fir   case T_ArrayRef:
^~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:404:8: note: each 
undeclared identifier is reported only once for each function it appears in
/pgsql/source/master/contrib/postgres_fdw/deparse.c:406:5: error: unknown type 
name 'ArrayRef'
 ArrayRef   *ar = (ArrayRef *) node;
 ^~~~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:406:23: error: 'ArrayRef' 
undeclared (first use in this function)
 ArrayRef   *ar = (ArrayRef *) node;
   ^~~~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:406:33: error: expected 
expression before ')' token
 ArrayRef   *ar = (ArrayRef *) node;
 ^
/pgsql/source/master/contrib/postgres_fdw/deparse.c:409:11: error: request for 
member 'refassgnexpr' in something not a structure or union
 if (ar->refassgnexpr != NULL)
   ^~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:417:41: error: request for 
member 'refupperindexpr' in something not a structure or union
 if (!foreign_expr_walker((Node *) ar->refupperindexpr,
 ^~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:420:41: error: request for 
member 'reflowerindexpr' in something not a structure or union
 if (!foreign_expr_walker((Node *) ar->reflowerindexpr,
 ^~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:423:41: error: request for 
member 'refexpr' in something not a structure or union
 if (!foreign_expr_walker((Node *) ar->refexpr,
 ^~
/pgsql/source/master/contrib/postgres_fdw/deparse.c:431:19: error: request for 
member 'refcollid' in something not a structure or union
 collati

Re: Shared Memory: How to use SYSV rather than MMAP ?

2019-02-01 Thread Thomas Munro

Bonjour Tony,

On Sat, Jan 5, 2019 at 3:35 AM REIX, Tony  wrote:
> Thanks for the 2 patches!
>
> I've seen also the discussion around this subject. Very interesting. Should I 
> wait for a decision been taken? or should I study and experiment your patches 
> before?

I am planning to commit the 0001 patch shortly, unless there are
objections.  I attach a new version, which improves the documentation
a bit (cross-referencing the new GUC and the section on sysctl
settings).  That will give us shared_memory_type = sysv.

Can you please try out the 0002 patch on AIX and see if it works, and
if not, tell us how to fix it?  :-)  The desired behaviour is:

1.  If huge_pages = off, it doesn't use them.
2.  ff huge_pages = try, it tries to use them, but if it can't
(perhaps because the Unix user doesn't have CAP_BYPASS_RAC_VMM
capability), it should fall back to non-huge page.
3.  If huge_pages = on, it tries to use them, but if it can't it fails
to start up.

There may also be a case for supporting different page sizes
explicitly (huge_pages = 16GB?), but that could be done later.

-- 
Thomas Munro
http://www.enterprisedb.com

0001-Add-shared_memory_type-GUC-v2.patch
Description: Binary data

Re: Delay locking partitions during query execution

On Thu, Jan 31, 2019 at 8:29 PM David Rowley
 wrote:
> I think perhaps accepting invalidations at the start of the statement
> might appear to fix the problem in master, but I think there's still a
> race condition around CheckCachedPlan() since we'll ignore
> invalidation messages when we perform LockRelationOid() in
> AcquireExecutorLocks() due to the lock already being held.  So there's
> likely still a window where the invalidation message could come in and
> we miss it.That said, if lock is already taken in one session, and
> some other session alters some relation we have a lock on, then that
> alternation can't have been done with an AEL, as that would have
> blocked on our lock, so it must be some ShareUpdateExclusiveLock
> change, and if so that change must not be important enough to block
> concurrently executing queries, so likely shouldn't actually break
> anything.
>
> So in summary, I think that checking for invalidation messages at the
> start of the statement allows us to replan within a transaction, but
> does not guarantee we can the exact correct plan for the current
> settings.

Right, but that's also an inevitably consequence of postponing
locking.  If you start running the statement without locking every
object involved in the query, then somebody could be holding an
AccessExclusiveLock on that object and changing it in any way that
they're allowed to do without holding a lock that does conflict with
one you're holding.  Nothing we do with invalidation messages can
prevent that from happening, though it can make the race narrower. We
*must* be able to cope with having the out-of-date plan or this whole
idea is sunk.

> I also don't have a full grasp on why we must ignore invalidation
> messages when we already have a lock on the relation.  I assume it's
> not a huge issue since obtaining a new lock on any other relation will
> process the invalidation queue anyway. I know that f868a8143a9
> recently made some changes to fix some recursive issues around these
> invalidations.

It's not necessary.  It's a performance optimization.  The idea is: we
need our caches to be up to date.  Anything that changes an object
should take AccessExclusiveLock, send invalidation messages, and only
then release the AccessExclusiveLock.  Anything that looks at an
object should acquire at least AccessShareLock and then process
invalidation messages.  This protocol guarantees that any changes that
are made under AEL will be seen in a race-free way, since the
invalidation messages are sent before releasing the lock and processed
after taking one.  But it doesn't require processing those messages
when we haven't taken a lock, so we don't.

> (I admit to not having the best grasp on how all this works, so feel
> free to shoot this down)

I think the key question here is whether or not you can cope with
someone having done arbitrary AEL-requiring modifications to the
delaylocked partitions.  If you can, the fact that the plan might
sometimes be out-of-date is an inevitable consequence of doing this at
all, but I think we can argue that it's an acceptable consequence as
long as the resulting behavior is not too bletcherous.  If you can't,
then this patch is dead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ATTACH/DETACH PARTITION CONCURRENTLY

On Thu, Jan 31, 2019 at 6:00 PM Alvaro Herrera  wrote:
> > - 0003 doesn't have any handling for parallel query at this point, so
> > even though within a single backend a single query execution will
> > always get the same PartitionDesc for the same relation, the answers
> > might not be consistent across the parallel group.
>
> That doesn't sound good.  I think the easiest would be to just serialize
> the PartitionDesc and send it to the workers instead of them recomputing
> it, but then I worry that this might have bad performance when the
> partition desc is large.  (Or maybe sending bytes over pqmq is faster
> than reading all those catalog entries and so this isn't a concern
> anyway.)

I don't think we'd be using pqmq here, or shm_mq either, but I think
the bigger issues is that starting a parallel query is already a
pretty heavy operation, and so the added overhead of this is probably
not very noticeable.  I agree that it seems a bit expensive, but since
we're already waiting for the postmaster to fork() a new process which
then has to initialize itself, this probably won't break the bank.
What bothers me more is that it's adding a substantial amount of code
that could very well contain bugs to fix something that isn't clearly
a problem in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Delay locking partitions during INSERT and UPDATE

On Thu, Jan 31, 2019 at 4:48 PM David Rowley
 wrote:
> On Fri, 1 Feb 2019 at 07:46, Robert Haas  wrote:
> > I have reviewed this patch and I am in favor of it.  I think it likely
> > needs minor rebasing because of the heap_open -> table_open renaming.
>
> Many thanks for looking at it.   The v2 patch was based on top of the
> heap_open -> table_open change.

Oops.  I guess I opened the wrong version.

I'm now wondering whether the same issues discussed in
https://www.postgresql.org/message-id/CA%2BTgmoZN-80143F8OhN8Cn5-uDae5miLYVwMapAuc%2B7%2BZ7pyNg%40mail.gmail.com
also need discussion with respect to this patch.  But I haven't
thought about it very hard, so I'm not sure whether they do or don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: speeding up planning with partitions

On Fri, 1 Feb 2019 at 22:52, Alvaro Herrera  wrote:
>
> On 2019-Jan-31, David Rowley wrote:
> >
> > I think we could make the 0001 patch a bit smaller if we were to apply
> > the attached first.
> >
> > Details in the commit message.
> >
> > What do you think?
>
> Amit didn't say what he thinks, but I think it looks good and the
> rationale for it makes sense, so I pushed it.  I only amended some
> comments a little bit.

Thanks.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: speeding up planning with partitions

On Fri, 1 Feb 2019 at 23:01, Alvaro Herrera  wrote:
> Please rebase.

I had a short list of other things I noticed when making a partial
pass over the patch again.

I may as well send these now if there's a new version on the way:

1. I think it's okay to convert the following:

/*
* Adjust all_baserels to replace the original target relation with the
* child target relation.  Copy it before modifying though.
*/
subroot->all_baserels = bms_copy(root->all_baserels);
subroot->all_baserels = bms_del_member(subroot->all_baserels,
   root->parse->resultRelation);
subroot->all_baserels = bms_add_member(subroot->all_baserels,
   subroot->parse->resultRelation);

into:
/* Adjust all_baserels */
subroot->all_baserels = adjust_child_relids(root->all_baserels, 1, &appinfo);


2. Any reason to do:

/*
* Generate access paths for the entire join tree.
*
* For UPDATE/DELETE on an inheritance parent, join paths should be
* generated for each child result rel separately.
*/
if (root->parse->resultRelation &&
root->simple_rte_array[root->parse->resultRelation]->inh)

instead of just checking: if (root->inherited_update)

3. This seems like useless code in set_inherit_target_rel_sizes().

/*
* If parallelism is allowable for this query in general, see whether
* it's allowable for this childrel in particular.  For consistency,
* do this before calling set_rel_size() for the child.
*/
if (root->glob->parallelModeOK)
set_rel_consider_parallel(subroot, childrel, childRTE);


parallelModeOK is only ever set for SELECT.  Likely it's fine just to
replace these with:

+   /* We don't consider parallel paths for UPDATE/DELETE
statements */
+   childrel->consider_parallel = false;

or perhaps it's fine to leave it out since build_simple_rel() sets it to false.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Delay locking partitions during INSERT and UPDATE

On Sat, 2 Feb 2019 at 03:07, Robert Haas  wrote:
> I'm now wondering whether the same issues discussed in
> https://www.postgresql.org/message-id/CA%2BTgmoZN-80143F8OhN8Cn5-uDae5miLYVwMapAuc%2B7%2BZ7pyNg%40mail.gmail.com
> also need discussion with respect to this patch.  But I haven't
> thought about it very hard, so I'm not sure whether they do or don't.

I really don't think it does, or if it does then the code is already
broken as it is now.

As the code is today, we obtain the locks well after the plan is
locked in.  The only difference with this patch is that we do the
locking on routing tuples to the partition for the first time rather
than do them all at once when setting up the routing data structure.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: ArchiveEntry optional arguments refactoring

2019-02-01 Thread Dmitry Dolgov

> On Fri, Feb 1, 2019 at 12:33 PM Alvaro Herrera  
> wrote:
>
> pgindent didn't like your layout with two-space indents for the struct
> members :-(  I thought it was nice, but oh well.  This means we can do
> away with the newline at each callsite, and I didn't like the trailing
> comma (and I have vague recollections that some old compilers might
> complain about them too, though maybe we retired them already.)

Oh, ok. In fact I did this almost automatically without thinking too much (a
formatting habit from other languages), so if pgindent doesn't like it, then
fine.

> > * Use NULL as a default value where it was an empty string before (this
> >   required few minor changes for some part of the code outside ArchiveEntry)
>
> I would rename the function to sanitize_line() and put those comments there
> (removing them from the callsites), then the new argument I suggest would not
> be completely out of place.

Yes, sounds pretty reasonable for me.

> (Also for some reason I decided to go with "hyphen" instead of "dash" in the
> argument name.  Not sure if anybody cares strongly about using the right
> terminology there (I don't know which it is).

Just out of curiosity I did some search and could find few examples of using
both "dash" and "hyphen" across the code, but I guess indeed it doesn't really
matter.

Re: LD_LIBRARY_PATH_RPATH

2019-02-01 Thread Tom Lane

Andrew Gierth  writes:
> Is there some reason why ld_library_path_var is defined using a bunch of
> $(if) constructs rather than putting the value (if not LD_LIBRARY_PATH)
> in the individual port makefiles?

I might be wrong, but I think that code is Peter's.  I agree that
having the per-port makefiles set it seems simpler (or maybe move
it to the template files, and have the configure script define the
variable?)

However, does that help any with this requirement?  If the core makefile
logic becomes platform independent for this, then it's harder to see where
to squeeze in an extra variable setting.

regards, tom lane

Re: Delay locking partitions during INSERT and UPDATE

On Fri, Feb 1, 2019 at 9:16 AM David Rowley
 wrote:
> On Sat, 2 Feb 2019 at 03:07, Robert Haas  wrote:
> > I'm now wondering whether the same issues discussed in
> > https://www.postgresql.org/message-id/CA%2BTgmoZN-80143F8OhN8Cn5-uDae5miLYVwMapAuc%2B7%2BZ7pyNg%40mail.gmail.com
> > also need discussion with respect to this patch.  But I haven't
> > thought about it very hard, so I'm not sure whether they do or don't.
>
> I really don't think it does, or if it does then the code is already
> broken as it is now.
>
> As the code is today, we obtain the locks well after the plan is
> locked in.  The only difference with this patch is that we do the
> locking on routing tuples to the partition for the first time rather
> than do them all at once when setting up the routing data structure.

OK, that is what I thought yesterday, and then I was just doubting
myself.  Thanks for thinking about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: ArchiveEntry optional arguments refactoring

On 2019-Feb-01, Dmitry Dolgov wrote:

> > On Fri, Feb 1, 2019 at 12:33 PM Alvaro Herrera  
> > wrote:

> > > * Use NULL as a default value where it was an empty string before (this
> > >   required few minor changes for some part of the code outside 
> > > ArchiveEntry)
> >
> > I would rename the function to sanitize_line() and put those comments there
> > (removing them from the callsites), then the new argument I suggest would 
> > not
> > be completely out of place.
> 
> Yes, sounds pretty reasonable for me.

Thanks for looking -- pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Synchronous replay take III

On Thu, Jan 24, 2019 at 6:42 PM Thomas Munro
 wrote:
> Yes, this is essentially the same thing that you were arguing against
> above.  Perhaps you are right, and there are no people who would want
> synchronous replay, but not synchronous commit.

Maybe I'm misunderstanding the terminology here, but if not, I find
this theory wildly implausible.  *Most* people want read-your-writes
behavior.  *Few* people want to wait for a dead standby.  The only
application of the later is when even a tiny risk of transaction loss
is unacceptable, but the former has all kinds of clustering-related
uses.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Is zheap on track for PostgreSQL 12.0?

2019-02-01 Thread Michael Goldshteyn

I just read an article about the advantages of zheap over the heap that
PostgreSQL currently uses and there were several posts on this list about
it being more compact as well as more performant. I am hopeful that it
makes it into PostgreSQL 12.0, but it is unclear what the current status of
this change is.

Can one of the Postgres devs please chime in on its status for 12.0?

Thanks,

Michael Goldshteyn

Re: pg_dump multi VALUES INSERT

On Thu, 31 Jan 2019 at 11:49, David Rowley  wrote:
> Here's another version, same as before but with tests this time.

Hi Fabien,

Wondering if you have anything else here? I'm happy for the v13
version to be marked as ready for committer.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [PROPOSAL] Shared Ispell dictionaries

On Tue, Jan 22, 2019 at 2:17 PM Tomas Vondra
 wrote:
> I think there are essentially two ways:
>
> (a) Define max amount of memory available for shared dictionarires, and
> come up with an eviction algorithm. This will be tricky, because when
> the frequently-used dictionaries need a bit more memory than the limit,
> this will result in trashing (evict+load over and over).
>
> (b) Define what "unused" means for dictionaries, and unload dictionaries
> that become unused. For example, we could track timestamp of the last
> time each dict was used, and decide that dictionaries unused for 5 or
> more minutes are unused. And evict those.
>
> The advantage of (b) is that it adopts automatically, more or less. When
> you have a bunch of frequently used dictionaries, the amount of shared
> memory increases. If you stop using them, it decreases after a while.
> And rarely used dicts won't force eviction of the frequently used ones.

+1 for (b).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Hi,

On 2019-01-29 12:23:51 -0800, Andres Freund wrote:
> On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> > On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > > runs. Looking at the buildfarm it looks like the failures were,
> > > > excluding handfish which failed without recognizable symptoms before and
> > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > > is interesting.
> > > 
> > > Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> > > got four or five different BSD-ish platforms that reported failures,
> > > and it's hard to believe they've all got identical schedulers.
> > > 
> > > That second handfish failure does match the symptoms elsewhere:
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> > > 
> > > --- 
> > > /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
> > >   2018-10-30 20:11:45.551967381 -0300
> > > +++ 
> > > /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
> > >2019-01-28 22:38:20.614211568 -0200
> > > @@ -0,0 +1,20 @@
> > > +SQL error: page 0 of relation "test_thread" should be empty but is not 
> > > on line 125
> > > 
> > > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > > way higher than elsewhere.  Puzzling.
> > 
> > Interesting.
> > 
> > While chatting with Robert about this issue I came across the following
> > section of code:
> > 
> > /*
> >  * If the FSM knows nothing of the rel, try the last page 
> > before we
> >  * give up and extend.  This avoids one-tuple-per-page syndrome 
> > during
> >  * bootstrapping or in a recently-started system.
> >  */
> > if (targetBlock == InvalidBlockNumber)
> > {
> > BlockNumber nblocks = 
> > RelationGetNumberOfBlocks(relation);
> > 
> > if (nblocks > 0)
> > targetBlock = nblocks - 1;
> > }
> > 
> > 
> > I think that explains the issue (albeit not why it is much more frequent
> > on BSDs).  Because we're not going through the FSM, it's perfectly
> > possible to find a page that is uninitialized, *and* is not yet in the
> > FSM. The only reason this wasn't previously actively broken, I think, is
> > that while we previously *also* looked that page (before the extending
> > backend acquired a lock!), when looking at the page
> > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> > space because it just interprets the zeroes in pd_upper - pd_lower as no
> > free space.
> > 
> > Hm, thinking about what a good solution here could be.
> 
> I wonder if we should just expand the logic we have for
> RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
> just use it without any changes, but the name seems a bit confusing) -
> because that'd prevent the current weirdness that it's possible that the
> buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
> the LockBuffer(BUFFER_LOCK_EXCLUSIVE).  I think that'd allow us to
> alltogether drop the cleanup lock logic we currently have, and also
> protect us against the FSM issue I'd outlined upthread?

Here's a version of the patch implementing this approach.  I assume this
solves the FreeBSD issue, but I'm running tests in a loop on Thomas'
machine.

I did not rename RBM_ZERO_AND_LOCK. New buffers are zeroed too, so that
still seems apt enough.

Greetings,

Andres Freund
>From 2f211ba82bc3a9c0935bfd0cd7fb58355134b929 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 1 Feb 2019 07:08:10 -0800
Subject: [PATCH v3] Move page initialization from RelationAddExtraBlocks() to
 use, take 2.

Previously we initialized pages when bulk extending in
RelationAddExtraBlocks(). That has a major disadvantage: It ties
RelationAddExtraBlocks() to heap, as other types of storage are likely
to need different amounts of special space, have different amount of
free space (previously determined by PageGetHeapFreeSpace()).

That we're relying on initializing pages, but not WAL logging the
initialization, also means the risk for getting
"WARNING:  relation \"%s\" page %u is uninitialized --- fixing"
style warnings in vacuums after crashes/immediate shutdowns, is
considerably higher. The warning sounds much more serious than what
they are.

Fix those two issues together by not initializing pages in
RelationAddExtraPages() (but continue to do so in
RelationGetBufferForTuple(), which is linked much more closely to
heap), and accepting uninitialized pages as normal in
vacuumlazy.c. When vacuumlazy encounters an empty page it now adds it
to the FSM, but does nothing else.  We chose to not issue a debug
message, much less a warning

RE: Shared Memory: How to use SYSV rather than MMAP ?

2019-02-01 Thread REIX, Tony

Hi Thomas,

Thanks for the patch !

I've started to work with it. It is now compiling.

Hu I have an issue here:

# cd /home2/freeware/src/packages/BUILD/postgresql-11.1/64bit/doc/src/sgml
# gmake
/opt/freeware/bin/xmllint --path . --noout --valid postgres.sgml
/opt/freeware/bin/xsltproc --path . --stringparam pg.version '11.1'
stylesheet.xsl postgres.sgml
Makefile:135: recipe for target 'html-stamp' failed
gmake: *** [html-stamp] Segmentation fault (core dumped)

That reaches an issue in /opt/freeware/bin/xsltproc , on AIX.
I never had this I think.

That seems to have nothing to do with your patch, however. Investigating.

I'm trying to work-around this.

I'll let you know.

About testing it, I guess that I'll have to add:

huge_pages = on

in the postgresql.conf file. Correct ?

Thanks !

Cordialement,

Tony Reix

tony.r...@atos.net

ATOS / Bull SAS
ATOS Expert
IBM Coop Architect & Technical Leader
Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net

De : Thomas Munro
Envoyé : vendredi 1 février 2019 14:49:01
À : REIX, Tony
Cc : Andres Freund; Robert Haas; Pg Hackers; EMPEREUR-MOT, SYLVIE
Objet : Re: Shared Memory: How to use SYSV rather than MMAP ?

Bonjour Tony,

On Sat, Jan 5, 2019 at 3:35 AM REIX, Tony wrote:
> Thanks for the 2 patches!
>
> I've seen also the discussion around this subject. Very interesting. Should I
> wait for a decision been taken? or should I study and experiment your patches
> before?

I am planning to commit the 0001 patch shortly, unless there are
objections. I attach a new version, which improves the documentation
a bit (cross-referencing the new GUC and the section on sysctl
settings). That will give us shared_memory_type = sysv.

Can you please try out the 0002 patch on AIX and see if it works, and
if not, tell us how to fix it? :-) The desired behaviour is:

1. If huge_pages = off, it doesn't use them.
2. ff huge_pages = try, it tries to use them, but if it can't
(perhaps because the Unix user doesn't have CAP_BYPASS_RAC_VMM
capability), it should fall back to non-huge page.
3. If huge_pages = on, it tries to use them, but if it can't it fails
to start up.

There may also be a case for supporting different page sizes
explicitly (huge_pages = 16GB?), but that could be done later.

--
Thomas Munro
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com&data=02%7C01%7Ctony.reix%40atos.net%7C1c903b29b1f04b1f7ae208d6884c2a9a%7C33440fc6b7c7412cbb730e70b0198d5a%7C0%7C0%7C636846258044522047&sdata=GsmfQPlv5hvfesT7gZpthSY5hDc7j5Lp4HsPy6VP9k8%3D&reserved=0

ATOS WARNING !
This message contains attachments that could potentially harm your computer.
Please make sure you open ONLY attachments from senders you know, trust and is
in an e-mail that you are expecting.

AVERTISSEMENT ATOS !
Ce message contient des pièces jointes qui peuvent potentiellement endommager
votre ordinateur.
Merci de vous assurer que vous ouvrez uniquement les pièces jointes provenant
d’emails que vous attendez et dont vous connaissez les expéditeurs et leur
faites confiance.

AVISO DE ATOS !
Este mensaje contiene datos adjuntos que pudiera ser que dañaran su ordenador.
Asegúrese de abrir SOLO datos adjuntos enviados desde remitentes de confianza y
que procedan de un correo esperado.

ATOS WARNUNG !
Diese E-Mail enthält Anlagen, welche möglicherweise ihren Computer beschädigen
könnten.
Bitte beachten Sie, daß Sie NUR Anlagen öffnen, von einem Absender den Sie
kennen, vertrauen und vom dem Sie vor allem auch E-Mails mit Anlagen erwarten.

Re: Reorganize collation lookup time and place

2019-02-01 Thread Peter Eisentraut

On 31/01/2019 13:45, Andres Freund wrote:
> So, what's the plan here? Should the CF entry be closed?

Set to RwF.

This patch will probably make a reappearance when we consider making ICU
available as the database-wide default collation.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: LD_LIBRARY_PATH_RPATH

2019-02-01 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> Is there some reason why ld_library_path_var is defined using a
 >> bunch of $(if) constructs rather than putting the value (if not
 >> LD_LIBRARY_PATH) in the individual port makefiles?

 Tom> I might be wrong, but I think that code is Peter's.  I agree that
 Tom> having the per-port makefiles set it seems simpler (or maybe move
 Tom> it to the template files, and have the configure script define the
 Tom> variable?)

 Tom> However, does that help any with this requirement?

No, it just happened to be adjacent.

I'm basically thinking along these lines:

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 6852853041..6e602dde48 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -413,13 +413,15 @@ $(1)="$(if $($(1)),$(2):$$$(1),$(2))"
 endef
 
 # platform-specific environment variable to set shared library path
-define ld_library_path_var
-$(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter 
$(PORTNAME),aix),LIBPATH,$(if $(filter 
$(PORTNAME),hpux),SHLIB_PATH,LD_LIBRARY_PATH)))
-endef
-
-define with_temp_install
-PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call 
add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
-endef
+# individual ports can override this later, this is the default name
+ld_library_path_var = LD_LIBRARY_PATH
+
+# with_temp_install_extra is for individual ports to define if they
+# need something more here. If not defined then the call does nothing.
+with_temp_install = \
+   PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" \
+   $(call add_to_path,$(strip 
$(ld_library_path_var)),$(abs_top_builddir)/tmp_install$(libdir)) \
+   $(call with_temp_install_extra,$(abs_top_builddir)/tmp_install)
 
 ifeq ($(enable_tap_tests),yes)
 
diff --git a/src/makefiles/Makefile.aix b/src/makefiles/Makefile.aix
index 0f6c028938..ba3695dd57 100644
--- a/src/makefiles/Makefile.aix
+++ b/src/makefiles/Makefile.aix
@@ -23,6 +23,9 @@ else
LDFLAGS_SL += -Wl,-bnoentry -Wl,-H512 -Wl,-bM:SRE
 endif
 
+# env var name to use in place of LD_LIBRARY_PATH
+ld_library_path_var = LIBPATH
+
 
 POSTGRES_IMP= postgres.imp
 
diff --git a/src/makefiles/Makefile.darwin b/src/makefiles/Makefile.darwin
index e2b1d44959..b17598f058 100644
--- a/src/makefiles/Makefile.darwin
+++ b/src/makefiles/Makefile.darwin
@@ -2,6 +2,9 @@ AROPT = crs
 
 DLSUFFIX = .so
 
+# env var name to use in place of LD_LIBRARY_PATH
+ld_library_path_var = DYLD_LIBRARY_PATH
+
 ifdef PGXS
   BE_DLLLIBS = -bundle_loader $(bindir)/postgres
 else
diff --git a/src/makefiles/Makefile.freebsd b/src/makefiles/Makefile.freebsd
index ce03c8dcd2..98a6f50615 100644
--- a/src/makefiles/Makefile.freebsd
+++ b/src/makefiles/Makefile.freebsd
@@ -9,6 +9,14 @@ DLSUFFIX = .so
 
 CFLAGS_SL = -fPIC -DPIC
 
+# extra stuff for $(with_temp_install)
+# we need this to get LD_LIBRARY_PATH searched ahead of the compiled-in
+# rpath, if no DT_RUNPATH is present in the executable. The conditions
+# under which DT_RUNPATH are added seem unpredictable, so be safe.
+
+define with_temp_install_extra
+LD_LIBRARY_PATH_RPATH=1
+endef
 
 # Rule for building a shared library from a single .o file
 %.so: %.o
diff --git a/src/makefiles/Makefile.hpux b/src/makefiles/Makefile.hpux
index 30dd3eb77e..c871fb0c7e 100644
--- a/src/makefiles/Makefile.hpux
+++ b/src/makefiles/Makefile.hpux
@@ -36,6 +36,9 @@ else
CFLAGS_SL = +Z
 endif
 
+# env var name to use in place of LD_LIBRARY_PATH
+ld_library_path_var = SHLIB_PATH
+
 # Rule for building a shared library from a single .o file
 %$(DLSUFFIX): %.o
 ifeq ($(GCC), yes)


-- 
Andrew (irc:RhodiumToad)

Re: Shared Memory: How to use SYSV rather than MMAP ?

2019-02-01 Thread Thomas Munro

On Sat, Feb 2, 2019 at 2:23 AM REIX, Tony  wrote:
> Hu I have an issue here:
>
> # cd /home2/freeware/src/packages/BUILD/postgresql-11.1/64bit/doc/src/sgml
> # gmake
> /opt/freeware/bin/xmllint --path . --noout --valid postgres.sgml
> /opt/freeware/bin/xsltproc --path . --stringparam pg.version '11.1'  
> stylesheet.xsl postgres.sgml
> Makefile:135: recipe for target 'html-stamp' failed
> gmake: *** [html-stamp] Segmentation fault (core dumped)
>
> That reaches an issue in  /opt/freeware/bin/xsltproc , on AIX.
> I never had this I think.

Strange.

By the way, the patches are for master (12-to-be), but I guess they'll
work fine on 11.x too.

> About testing it, I guess that I'll have to add:
>
>huge_pages = on
>
> in the postgresql.conf file. Correct ?

Correct.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] [PATCH] Generic type subscripting

On 2019-Feb-01, Dmitry Dolgov wrote:

> The moment was longer than I expected, but here is the rebased version, where
> all the individual patches can be applied and compiled cleanly (although there
> is still functional dependency between 0002 and 0003, since the former
> introduces a new subscripting without any implementation, and the latter
> introduces an implementation for array data type).

Cool, pushed 0001.  I'm afraid I included some pgindenting, so you'll
have to rebase again.  Maybe you already know how to do it without
manually rebasing, but if not, a quick trick to avoid rebasing manually
over all those whitespace changes might be to un-apply with "git show |
patch -p1 -R", then apply your original 0001, commit, apply 0002, then
pgindent; if you now do a git diff to the original commit, you should
get an almost clean diff.  Or you could just try to apply with -w.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Generic type subscripting

2019-02-01 Thread Dmitry Dolgov

> On Fri, Feb 1, 2019 at 4:55 PM Alvaro Herrera  
> wrote:
>
> On 2019-Feb-01, Dmitry Dolgov wrote:
>
> > The moment was longer than I expected, but here is the rebased version, 
> > where
> > all the individual patches can be applied and compiled cleanly (although 
> > there
> > is still functional dependency between 0002 and 0003, since the former
> > introduces a new subscripting without any implementation, and the latter
> > introduces an implementation for array data type).
>
> Cool, pushed 0001.  I'm afraid I included some pgindenting, so you'll
> have to rebase again.  Maybe you already know how to do it without
> manually rebasing, but if not, a quick trick to avoid rebasing manually
> over all those whitespace changes might be to un-apply with "git show |
> patch -p1 -R", then apply your original 0001, commit, apply 0002, then
> pgindent; if you now do a git diff to the original commit, you should
> get an almost clean diff.  Or you could just try to apply with -w.

Great, thank you!

[PATCH 3/3] Add initial support for spgist quadtree @<(point,circle) operator

Signed-off-by: Matwey V. Kornilov 
---
 src/backend/access/spgist/spgquadtreeproc.c | 65 ---
 src/include/catalog/pg_amop.dat |  3 +
 src/test/regress/expected/create_index.out  | 96 +
 src/test/regress/sql/create_index.sql   | 32 ++
 4 files changed, 189 insertions(+), 7 deletions(-)

diff --git a/src/backend/access/spgist/spgquadtreeproc.c 
b/src/backend/access/spgist/spgquadtreeproc.c
index 4904dbbe7b..034e706812 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -142,6 +142,36 @@ static int spg_quad_inner_consistent_box_helper(ScanKey 
sk, Point *centroid)
return r;
 }
 
+static int spg_quad_inner_consistent_circle_helper(ScanKey sk, Point *centroid)
+{
+   CIRCLE *circleQuery = DatumGetCircleP(sk->sk_argument);
+   int r = 0;
+
+   const Point cp = {
+   .x = centroid->x - circleQuery->center.x,
+   .y = centroid->y - circleQuery->center.y
+   };
+   const Point cp2 = {
+   .x = cp.x * cp.x,
+   .y = cp.y * cp.y
+   };
+   const float8 R2 = circleQuery->radius * circleQuery->radius;
+
+   const float8 x_p0 = (0. > cp.x ? 0. : cp2.x);
+   const float8 y_p0 = (0. > cp.y ? 0. : cp2.y);
+   const float8 x_n0 = (0. < cp.x ? 0. : cp2.x);
+   const float8 y_n0 = (0. < cp.y ? 0. : cp2.y);
+
+   const float8 d[4] = {x_p0 + y_p0, x_p0 + y_n0, x_n0 + y_n0, x_n0 + 
y_p0};
+
+   for (int i = 0; i < 4; i++) {
+   if (d[i] <= R2)
+   r |= (1 << (i + 1));
+   }
+
+   return r;
+}
+
 Datum
 spg_quad_choose(PG_FUNCTION_ARGS)
 {
@@ -355,7 +385,18 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
which &= (1 << 1) | (1 << 4);
break;
case RTContainedByStrategyNumber:
-   which &= 
spg_quad_inner_consistent_box_helper(sk, centroid);
+
+   switch (sk->sk_subtype) {
+   case BOXOID:
+   which &= 
spg_quad_inner_consistent_box_helper(sk, centroid);
+   break;
+   case CIRCLEOID:
+   which &= 
spg_quad_inner_consistent_circle_helper(sk, centroid);
+   break;
+   default:
+   elog(ERROR, "unrecognized right type 
OID: %d", sk->sk_subtype);
+   break;
+   }
break;
default:
elog(ERROR, "unrecognized strategy number: %d", 
sk->sk_strategy);
@@ -442,12 +483,22 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)
break;
case RTContainedByStrategyNumber:
 
-   /*
-* For this operator, the query is a box not a 
point.  We
-* cheat to the extent of assuming that 
DatumGetPointP won't
-* do anything that would be bad for a 
pointer-to-box.
-*/
-   res = SPTEST(box_contain_pt, query, datum);
+   switch (sk->sk_subtype) {
+   case BOXOID:
+   /*
+* For this operator, the query is a 
box not a point.  We
+* cheat to the extent of assuming that 
DatumGetPointP won't
+* do anything that would be bad for a 
pointer-to-box.
+*/
+   res = SPTEST(box_contain_pt, query, 
datum);
+   break;
+   case CIRCLEOID:
+   res = SPTEST(circle_contain_pt, query, 
datum);
+   break;
+   default:
+   elog(ERROR, "unrecognized right type 
OID: %d", sk->sk_subtype);
+   break;
+   }
break;
default:
elog(ERROR, "unrecognized strategy number: %d", 
sk->sk_strategy);
diff --git a/src/include/catalog/pg_amop.dat b/src/include/catalog/pg_amop.dat
index 0ab95d8a24..fb7157aa44 100644
--- a/src/include/catalog/pg_amop.dat
+++ b/src/include/catalog/pg_amop.dat
@@ -1373,6 +1373,9 @@
   amoprighttype => 'box', amopstrategy => '8', amopopr => '<@(point,box)',
   amopmethod => 'spgist' },
 { am

[PATCH 1/3] Introduce helper variable in spgquadtreeproc.c

Use shorter variable name instead of repeating in->scankeys[i] in
spg_quad_leaf_consistent() and spg_quad_inner_consistent().

Signed-off-by: Matwey V. Kornilov 
---
 src/backend/access/spgist/spgquadtreeproc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/backend/access/spgist/spgquadtreeproc.c 
b/src/backend/access/spgist/spgquadtreeproc.c
index e50108e1ca..f2e980b758 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -300,10 +300,11 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
 
for (i = 0; i < in->nkeys; i++)
{
-   Point  *query = DatumGetPointP(in->scankeys[i].sk_argument);
+   const ScanKey sk = in->scankeys + i;
+   Point  *query = DatumGetPointP(sk->sk_argument);
BOX*boxQuery;
 
-   switch (in->scankeys[i].sk_strategy)
+   switch (sk->sk_strategy)
{
case RTLeftStrategyNumber:
if (SPTEST(point_right, centroid, query))
@@ -331,7 +332,7 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
 * cheat to the extent of assuming that 
DatumGetPointP won't
 * do anything that would be bad for a 
pointer-to-box.
 */
-   boxQuery = 
DatumGetBoxP(in->scankeys[i].sk_argument);
+   boxQuery = DatumGetBoxP(sk->sk_argument);
 
if 
(DatumGetBool(DirectFunctionCall2(box_contain_pt,

 PointerGetDatum(boxQuery),
@@ -358,8 +359,7 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
}
break;
default:
-   elog(ERROR, "unrecognized strategy number: %d",
-in->scankeys[i].sk_strategy);
+   elog(ERROR, "unrecognized strategy number: %d", 
sk->sk_strategy);
break;
}
 
@@ -421,9 +421,10 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)
res = true;
for (i = 0; i < in->nkeys; i++)
{
-   Point  *query = DatumGetPointP(in->scankeys[i].sk_argument);
+   const ScanKey sk = in->scankeys + i;
+   Point  *query = DatumGetPointP(sk->sk_argument);
 
-   switch (in->scankeys[i].sk_strategy)
+   switch (sk->sk_strategy)
{
case RTLeftStrategyNumber:
res = SPTEST(point_left, datum, query);
@@ -450,8 +451,7 @@ spg_quad_leaf_consistent(PG_FUNCTION_ARGS)
res = SPTEST(box_contain_pt, query, datum);
break;
default:
-   elog(ERROR, "unrecognized strategy number: %d",
-in->scankeys[i].sk_strategy);
+   elog(ERROR, "unrecognized strategy number: %d", 
sk->sk_strategy);
break;
}
 
-- 
2.13.7

[PATCH 2/3] Introduce spg_quad_inner_consistent_box_helper() in spgquadtreeproc.c

This helper function makes spg_quad_inner_consistent() more readable when
changes from the next commit is introduced.

Signed-off-by: Matwey V. Kornilov 
---
 src/backend/access/spgist/spgquadtreeproc.c | 63 ++---
 1 file changed, 31 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/spgist/spgquadtreeproc.c 
b/src/backend/access/spgist/spgquadtreeproc.c
index f2e980b758..4904dbbe7b 100644
--- a/src/backend/access/spgist/spgquadtreeproc.c
+++ b/src/backend/access/spgist/spgquadtreeproc.c
@@ -112,6 +112,36 @@ getQuadrantArea(BOX *bbox, Point *centroid, int quadrant)
return result;
 }
 
+static int spg_quad_inner_consistent_box_helper(ScanKey sk, Point *centroid)
+{
+   /*
+* For this operator, the query is a box not a point.  We
+* cheat to the extent of assuming that DatumGetPointP won't
+* do anything that would be bad for a pointer-to-box.
+*/
+   BOX *boxQuery = DatumGetBoxP(sk->sk_argument);
+   Point p;
+   int r = 0;
+
+   if (DatumGetBool(DirectFunctionCall2(box_contain_pt, 
PointerGetDatum(boxQuery), PointerGetDatum(centroid
+   {
+   /* centroid is in box, so all quadrants are OK */
+   return (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4);
+   }
+
+   /* identify quadrant(s) containing all corners of box */
+   p = boxQuery->low;
+   r |= 1 << getQuadrant(centroid, &p);
+   p.y = boxQuery->high.y;
+   r |= 1 << getQuadrant(centroid, &p);
+   p = boxQuery->high;
+   r |= 1 << getQuadrant(centroid, &p);
+   p.x = boxQuery->low.x;
+   r |= 1 << getQuadrant(centroid, &p);
+
+   return r;
+}
+
 Datum
 spg_quad_choose(PG_FUNCTION_ARGS)
 {
@@ -302,7 +332,6 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
{
const ScanKey sk = in->scankeys + i;
Point  *query = DatumGetPointP(sk->sk_argument);
-   BOX*boxQuery;
 
switch (sk->sk_strategy)
{
@@ -326,37 +355,7 @@ spg_quad_inner_consistent(PG_FUNCTION_ARGS)
which &= (1 << 1) | (1 << 4);
break;
case RTContainedByStrategyNumber:
-
-   /*
-* For this operator, the query is a box not a 
point.  We
-* cheat to the extent of assuming that 
DatumGetPointP won't
-* do anything that would be bad for a 
pointer-to-box.
-*/
-   boxQuery = DatumGetBoxP(sk->sk_argument);
-
-   if 
(DatumGetBool(DirectFunctionCall2(box_contain_pt,
-   
 PointerGetDatum(boxQuery),
-   
 PointerGetDatum(centroid
-   {
-   /* centroid is in box, so all quadrants 
are OK */
-   }
-   else
-   {
-   /* identify quadrant(s) containing all 
corners of box */
-   Point   p;
-   int r = 0;
-
-   p = boxQuery->low;
-   r |= 1 << getQuadrant(centroid, &p);
-   p.y = boxQuery->high.y;
-   r |= 1 << getQuadrant(centroid, &p);
-   p = boxQuery->high;
-   r |= 1 << getQuadrant(centroid, &p);
-   p.x = boxQuery->low.x;
-   r |= 1 << getQuadrant(centroid, &p);
-
-   which &= r;
-   }
+   which &= 
spg_quad_inner_consistent_box_helper(sk, centroid);
break;
default:
elog(ERROR, "unrecognized strategy number: %d", 
sk->sk_strategy);
-- 
2.13.7

[PATCH 0/3] Introduce spgist quadtree @<(point,circle) operator

Hi,

This patch series is to add support for spgist quadtree @<(point,circle)
operator. The first two patches are to refactor existing code before
implemention the new feature. The third commit is the actual implementation
provided with a set of simple unit tests.

Matwey V. Kornilov (3):
  Introduce helper variable in spgquadtreeproc.c
  Introduce spg_quad_inner_consistent_box_helper() in spgquadtreeproc.c
  Add initial support for spgist quadtree @<(point,circle) operator

 src/backend/access/spgist/spgquadtreeproc.c | 138 +++-
 src/include/catalog/pg_amop.dat |   3 +
 src/test/regress/expected/create_index.out  |  96 +++
 src/test/regress/sql/create_index.sql   |  32 +++
 4 files changed, 225 insertions(+), 44 deletions(-)

-- 
2.13.7

fast defaults in heap_getattr vs heap_deform_tuple

Hi,

While working on the patch to slotify trigger.c I got somewhat confused
by the need to expand tuples in trigger.c:

static HeapTuple
GetTupleForTrigger(EState *estate,
   EPQState *epqstate,
   ResultRelInfo *relinfo,
   ItemPointer tid,
   LockTupleMode lockmode,
   TupleTableSlot **newSlot)
{
...
if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts)
result = heap_expand_tuple(&tuple, relation->rd_att);
else
result = heap_copytuple(&tuple);
ReleaseBuffer(buffer);

There's no explanation why GetTupleForTrigger() needs expanding tuples,
but most other parts of postgres don't. Normally deforming ought to take
care of expanding missing attrs.

As far as I can tell, the reason that it's needed is that heap_gettar()
wasn't updated along the rest of the functions. heap_deform_tuple(),
heap_attisnull() etc look for missing attrs, but heap_getattr() doesn't.

That, to me, makes no sense. The reason that we see a difference in
regression test output is that composite_to_json() uses heap_getattr(),
if it used heap_deform_tuple (which'd be considerably faster), we'd get
the default value.

Am I missing something?

Greetings,

Andres Freund

Re: initdb --allow-group-access behaviour in windows

2019-02-01 Thread David Steele


On 2/1/19 10:14 AM, Michael Paquier wrote:

On Fri, Feb 01, 2019 at 05:34:05PM +1100, Haribabu Kommi wrote:

As Microsoft windows doesn't support POSIX style of permissions, we
always set for the permissions GUC's as not supported in
windows. Even the new GUC that is added with --allow-group-access
feature also mentioned the same.

The initdb --allow-group-access doesn't have any impact on the
microsoft windows, so I feel it should be better to write the same
in initdb docs?


Sounds right.  We may just want an extra sentence in the paragraph
describing --allow-group-access to mention that this parameter has no
effect on Windows.


need a patch?


Always.


Hrm, seems like I missed that.  Do you mind writing something that makes 
sense to Windows users and I'll have a look?


Thanks,
--
-David
da...@pgmasters.net

Re: Is zheap on track for PostgreSQL 12.0?

On Fri, Feb 1, 2019 at 8:29 PM Michael Goldshteyn  wrote:
>
> I just read an article about the advantages of zheap over the heap that 
> PostgreSQL currently uses and there were several posts on this list about it 
> being more compact as well as more performant. I am hopeful that it makes it 
> into PostgreSQL 12.0, but it is unclear what the current status of this 
> change is.
>
> Can one of the Postgres devs please chime in on its status for 12.0?
>

I don't think there is any chance for zheap to get into PG12.0.
However, we have made good progress by rebasing it over pluggable API
(thanks to Andres for doing the leg work for same).  We have also made
quite a few performance improvements in the last few months.  Right
now, we are doing some stress testing of the core zheap and trying to
finish the remaining things like support for logical decoding.  Also,
we are trying to make progress on the undo patches by reviewing the
code which is one of the building blocks for zheap.  I request you to
help in review/test of zheap if you want to see more progress on the
same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Reorganize collation lookup time and place

2019-02-01 Thread Peter Geoghegan

On Fri, Feb 1, 2019 at 7:27 AM Peter Eisentraut
 wrote:
> This patch will probably make a reappearance when we consider making ICU
> available as the database-wide default collation.

Somebody recently reported that an ICU locale was over 2000x faster
than the equivalent glibc locale with their query due to supporting
the abbreviated keys optimization:

https://postgr.es/m/CACd=f9cO5OcmAeBK1M5a3+t8BO7E91Ki0WaLAxFm=fbbon_...@mail.gmail.com

We really should get around to supporting ICU for database-wide
collations sooner rather than later. I knew that there were cases
where glibc performs abysmally, but I didn't think it could ever be a
difference of more than two orders of magnitude until recently.

-- 
Peter Geoghegan

Re: pg_upgrade: Pass -j down to vacuumdb

2019-02-01 Thread Jesper Pedersen


Hi,

On 2/1/19 4:58 AM, Alvaro Herrera wrote:

On 2019-Feb-01, Jamison, Kirk wrote:


I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.


No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed.  I do think some
extra whitespace in the format string is needed.



I'm confused by this.

The point of the patch is to highlight to the user that even though 
he/she does "pg_upgrade -j 8" the "-j 8" option isn't passed down to 
vacuumdb.


Default value is 1, so in that case the echo command doesn't highlight 
that fact, otherwise it is. The user can then cancel the script and use 
the suggested command line.


The script then executes the vaccumdb command with the $VACUUMDB_OPTS 
argument as noted in the documentation.


Sample script attached as well.

I added extra space in the --analyze-in-stages part.

Kirk, can you provide a delta patch to match what you are thinking ?

Best regards,
 Jesper

>From 4b5856725af5d971a26a2ba150c1067ee21bb242 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +-
 src/bin/pg_upgrade/check.c  | 28 ++--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..84a7ccb6f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
  in parallel;  a good place to start is the maximum of the number of
  CPU cores and tablespaces.  This option can dramatically reduce the
  time to upgrade a multi-database server running on a multiprocessor
- machine.
+ machine. Note that this option isn't passed to the
+ vacuumdb application by default.
+ The system environment variable VACUUMDB_OPTS can be
+ used to pass extra options to the vacuumdb
+ application during the script execution.
 
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..570b7c9ae7 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s $VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2



analyze_new_cluster.sh
Description: application/shellscript

Re: Index Skip Scan

2019-02-01 Thread Jesper Pedersen

Hi,

On 1/31/19 1:31 AM, Kyotaro HORIGUCHI wrote:

At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> wrote in

A bit of adjustment after nodes/relation -> nodes/pathnodes.

I had a look on this.

The name "index skip scan" is a different feature from the
feature with the name on other prodcuts, which means "index scan
with postfix key (of mainly of multi column key) that scans
ignoring the prefixing part" As Thomas suggested I'd suggest that
we call it "index hop scan". (I can accept Hopscotch, either:p)

Also as mentioned upthread by Peter Geoghegan, this could easly
give worse plan by underestimation. So I also suggest that this
has dynamic fallback function. In such perspective it is not
suitable for AM API level feature.

If all leaf pages are on the buffer and the average hopping
distance is less than (expectedly) 4 pages (the average height of
the tree), the skip scan will lose. If almost all leaf pages are
staying on disk, we could win only by 2-pages step (skipping over
one page).

=
As I'm writing the above, I came to think that it's better
implement this as an pure executor optimization.

Specifically, let _bt_steppage count the ratio of skipped pages
so far then if the ratio exceeds some threshold (maybe around
3/4) it gets into hopscotching mode, where it uses index scan to
find the next page (rather than traversing). As mentioned above,
I think using skip scan to go beyond the next page is a good
bet. If the success ration of skip scans goes below some
threshold (I'm not sure for now), we should fall back to
traversing.

Any opinions?

Some comments on the patch below.

+ skip scan approach, which is based on the idea of
+ https://wiki.postgresql.org/wiki/Free_Space_Map_Problems";>
+ Loose index scan. Rather than scanning all equal values of a key,
+ as soon as a new value is found, it will search for a larger value on the

I'm not sure it is a good thing to put a pointer to rather
unstable source in the documentation.

This adds a new AM method but it seems avaiable only for ordered
indexes, specifically btree. And it seems that the feature can be
implemented in btgettuple since btskip apparently does the same
thing. (I agree to Robert in the point in [1]).

[1]
https://www.postgresql.org/message-id/CA%2BTgmobb3uN0xDqTRu7f7WdjGRAXpSFxeAQnvNr%3DOK5_kC_SSg%40mail.gmail.com

Related to the above, it seems better that the path generation of
skip scan is a part of index scan. Whether skip scan or not is a
matter of index scan itself.

Thanks for your valuable feedback ! And, calling it "Loose index scan"
or something else is better.

Dmitry and I will look at this and take it into account for the next
version.

For now, I have switched the CF entry to WoA.

Thanks again !

Best regards,
Jesper

Re: ATTACH/DETACH PARTITION CONCURRENTLY

On Fri, Feb 1, 2019 at 9:00 AM Robert Haas  wrote:
> I don't think we'd be using pqmq here, or shm_mq either, but I think
> the bigger issues is that starting a parallel query is already a
> pretty heavy operation, and so the added overhead of this is probably
> not very noticeable.  I agree that it seems a bit expensive, but since
> we're already waiting for the postmaster to fork() a new process which
> then has to initialize itself, this probably won't break the bank.
> What bothers me more is that it's adding a substantial amount of code
> that could very well contain bugs to fix something that isn't clearly
> a problem in the first place.

I spent most of the last 6 hours writing and debugging a substantial
chunk of the code that would be needed.  Here's an 0006 patch that
adds functions to serialize and restore PartitionDesc in a manner
similar to what parallel query does for other object types.  Since a
PartitionDesc includes a pointer to a PartitionBoundInfo, that meant
also writing functions to serialize and restore those.  If we want to
go this route, I think the next thing to do would be to integrate this
into the PartitionDirectory infrastructure.

Basically what I'm imagining we would do there is have a hash table
stored in shared memory to go with the one that is already stored in
backend-private memory.  The shared table stores serialized entries,
and the local table stores normal ones.  Any lookups try the local
table first, then the shared table.  If we get a hit in the shared
table, we deserialize whatever we find there and stash the result in
the local table.  If we find it neither place, we generate a new entry
in the local table and then serialize it into the shard table. It's
not quite clear to me at the moment how to solve the concurrency
problems associated with this design, but it's probably not too hard.
I don't have enough mental energy left to figure it out today, though.

After having written this code, I'm still torn about whether to go
further with this design.  On the one hand, this is such boilerplate
code that it's kinda hard to imagine it having too many more bugs; on
the other hand, as you can see, it's a non-trivial amount of code to
add without a real clear reason, and I'm not sure we have one, even
though in the abstract it seems like a better way to go.

Still interesting in hearing more opinions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0006-Serialize-and-restore-ParitionDesc-PartitionBound.patch
Description: Binary data

Re: Index Skip Scan

2019-02-01 Thread James Coleman

On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI
 wrote:
> Also as mentioned upthread by Peter Geoghegan, this could easly
> give worse plan by underestimation. So I also suggest that this
> has dynamic fallback function.  In such perspective it is not
> suitable for AM API level feature.
>
> If all leaf pages are on the buffer and the average hopping
> distance is less than (expectedly) 4 pages (the average height of
> the tree), the skip scan will lose. If almost all leaf pages are
> staying on disk, we could win only by 2-pages step (skipping over
> one page).
>
> =
> As I'm writing the above, I came to think that it's better
> implement this as an pure executor optimization.
>
> Specifically, let _bt_steppage count the ratio of skipped pages
> so far then if the ratio exceeds some threshold (maybe around
> 3/4) it gets into hopscotching mode, where it uses index scan to
> find the next page (rather than traversing). As mentioned above,
> I think using skip scan to go beyond the next page is a good
> bet. If the success ration of skip scans goes below some
> threshold (I'm not sure for now), we should fall back to
> traversing.
>
> Any opinions?

Hi!

I'd like to offer a counterpoint: in cases where this a huge win we
definitely do want this to affect cost estimation, because if it's
purely an executor optimization the index scan path may not be chosen
even when skip scanning would be a dramatic improvement.

I suppose that both requirements could be met by incorporating it into
the existing index scanning code and also modifying to costing to
(only when we have high confidence?) account for the optimization. I'm
not sure if that makes things better than the current state of the
patch or not.

James Coleman

Re: Index Skip Scan

On 2019-02-01 16:04:58 -0500, James Coleman wrote:
> On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI
>  wrote:
> > Also as mentioned upthread by Peter Geoghegan, this could easly
> > give worse plan by underestimation. So I also suggest that this
> > has dynamic fallback function.  In such perspective it is not
> > suitable for AM API level feature.
> >
> > If all leaf pages are on the buffer and the average hopping
> > distance is less than (expectedly) 4 pages (the average height of
> > the tree), the skip scan will lose. If almost all leaf pages are
> > staying on disk, we could win only by 2-pages step (skipping over
> > one page).
> >
> > =
> > As I'm writing the above, I came to think that it's better
> > implement this as an pure executor optimization.
> >
> > Specifically, let _bt_steppage count the ratio of skipped pages
> > so far then if the ratio exceeds some threshold (maybe around
> > 3/4) it gets into hopscotching mode, where it uses index scan to
> > find the next page (rather than traversing). As mentioned above,
> > I think using skip scan to go beyond the next page is a good
> > bet. If the success ration of skip scans goes below some
> > threshold (I'm not sure for now), we should fall back to
> > traversing.
> >
> > Any opinions?
> 
> I'd like to offer a counterpoint: in cases where this a huge win we
> definitely do want this to affect cost estimation, because if it's
> purely an executor optimization the index scan path may not be chosen
> even when skip scanning would be a dramatic improvement.

+many.

Greetings,

Andres Freund

Re: DNS SRV support for LDAP authentication

2019-02-01 Thread Graham Leggett

On 25 Sep 2018, at 04:09, Thomas Munro  wrote:

> Some people like to use DNS SRV records to advertise LDAP servers on
> their network.  Microsoft Active Directory is usually (always?) set up
> that way.  Here is a patch to allow our LDAP auth module to support
> that kind of discovery.

Does this support SSL/TLS?

Regards,
Graham
—

Re: New vacuum option to do only freezing

2019-02-01 Thread Bossart, Nathan

On 1/21/19, 2:23 AM, "Masahiko Sawada"  wrote:
> Understood and Agreed. I've attached the new version patch
> incorporated the review comments.

I took a look at the latest version of the patch.

+
+DISABLE_INDEX_CLEANUP
+
+ 
+  VACUUM removes dead tuples and prunes HOT-updated
+  tuples chain for live tuples on table. If the table has any dead tuple
+  it removes them from both table and indexes for re-use. With this
+  option VACUUM doesn't completely remove dead tuples
+  and disables removing dead tuples from indexes.  This is suitable for
+  avoiding transaction ID wraparound but not sufficient for avoiding
+  index bloat. This option is ignored if the table doesn't have index.
+  Also, this cannot be used in conjunction with FULL
+  option.
+ 
+
+   

IMHO we could document this feature at a slightly higher level without
leaving out any really important user-facing behavior.  Here's a quick
attempt to show what I am thinking:

With this option, VACUUM skips all index cleanup behavior and
only marks tuples as "dead" without reclaiming the storage.
While this can help reclaim transaction IDs faster to avoid
transaction ID wraparound (see Section 24.1.5), it will not
reduce bloat.  Note that this option is ignored for tables
that have no indexes.  Also, this option cannot be used in
conjunction with the FULL option, since FULL requires
rewriting the table.

+   /* Notify user that DISABLE_INDEX_CLEANUP option is ignored */
+   if (!vacrelstats->hasindex && (options & VACOPT_DISABLE_INDEX_CLEANUP))
+   ereport(NOTICE,
+   (errmsg("DISABLE_INDEX_CLEANUP is ignored 
because table \"%s\" does not have index",
+   
RelationGetRelationName(onerel;

It might make more sense to emit this in lazy_scan_heap() where we
determine the value of skip_index_vacuum.

+   if (skip_index_vacuum)
+   ereport(elevel,
+   (errmsg("\"%s\": marked %.0f row 
versions as dead in %u pages",
+   
RelationGetRelationName(onerel),
+   tups_vacuumed, 
vacuumed_pages)));

IIUC tups_vacuumed will include tuples removed during HOT pruning, so
it could be inaccurate to say all of these tuples have only been
marked "dead."  Perhaps we could keep a separate count of tuples
removed via HOT pruning in case we are using DISABLE_INDEX_CLEANUP.
There might be similar problems with the values stored in vacrelstats
that are used at the end of heap_vacuum_rel() (e.g. tuples_deleted).

I would suggest adding this option to vacuumdb in this patch set as
well.

Nathan

Re: fast defaults in heap_getattr vs heap_deform_tuple

Hi,

On 2019-02-01 08:24:04 -0800, Andres Freund wrote:
> While working on the patch to slotify trigger.c I got somewhat confused
> by the need to expand tuples in trigger.c:
> 
> static HeapTuple
> GetTupleForTrigger(EState *estate,
>EPQState *epqstate,
>ResultRelInfo *relinfo,
>ItemPointer tid,
>LockTupleMode lockmode,
>TupleTableSlot **newSlot)
> {
> ...
> if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts)
> result = heap_expand_tuple(&tuple, relation->rd_att);
> else
> result = heap_copytuple(&tuple);
> ReleaseBuffer(buffer);
> 
> There's no explanation why GetTupleForTrigger() needs expanding tuples,
> but most other parts of postgres don't. Normally deforming ought to take
> care of expanding missing attrs.
> 
> As far as I can tell, the reason that it's needed is that heap_gettar()
> wasn't updated along the rest of the functions. heap_deform_tuple(),
> heap_attisnull() etc look for missing attrs, but heap_getattr() doesn't.
> 
> That, to me, makes no sense. The reason that we see a difference in
> regression test output is that composite_to_json() uses heap_getattr(),
> if it used heap_deform_tuple (which'd be considerably faster), we'd get
> the default value.
> 
> Am I missing something?

Indeed, a hacky patch fixing heap_getattr makes the heap_expand_tuple()
in trigger.c unnecessary. Attached. I think we need to do something
along those lines.

Andrew, I think it'd be good to do a ground up review of the fast
defaults patch. There's been a fair number of issues in it, and this is
a another pretty fundamental issue.

Greetings,

Andres Freund
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index ed4549ca579..783b04a3cb9 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -71,8 +71,6 @@
 #define VARLENA_ATT_IS_PACKABLE(att) \
 	((att)->attstorage != 'p')
 
-static Datum getmissingattr(TupleDesc tupleDesc, int attnum, bool *isnull);
-
 
 /* 
  *		misc support routines
@@ -82,7 +80,7 @@ static Datum getmissingattr(TupleDesc tupleDesc, int attnum, bool *isnull);
 /*
  * Return the missing value of an attribute, or NULL if there isn't one.
  */
-static Datum
+Datum
 getmissingattr(TupleDesc tupleDesc,
 			   int attnum, bool *isnull)
 {
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 7b5896b98f9..b412c81d141 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -3412,10 +3412,15 @@ ltrmark:;
 		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 	}
 
+#ifdef IAM_THE_WRONG_FIX
 	if (HeapTupleHeaderGetNatts(tuple.t_data) < relation->rd_att->natts)
 		result = heap_expand_tuple(&tuple, relation->rd_att);
 	else
 		result = heap_copytuple(&tuple);
+#else
+	result = heap_copytuple(&tuple);
+#endif
+
 	ReleaseBuffer(buffer);
 
 	return result;
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index 873a3015fc4..81546ab46c7 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -744,6 +744,10 @@ extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 #endif			/* defined(DISABLE_COMPLEX_MACRO) */
 
 
+extern Datum
+getmissingattr(TupleDesc tupleDesc,
+			   int attnum, bool *isnull);
+
 /* 
  *		heap_getattr
  *
@@ -765,8 +769,7 @@ extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 		( \
 			((attnum) > (int) HeapTupleHeaderGetNatts((tup)->t_data)) ? \
 			( \
-(*(isnull) = true), \
-(Datum)NULL \
+getmissingattr(tupleDesc, attnum, isnull) \
 			) \
 			: \
 fastgetattr((tup), (attnum), (tupleDesc), (isnull)) \

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-01 Thread Chengchao Yu

Hi Amit, Thomas,

Thank you very much for your feedbacks! Apologizes but I just saw both messages.

> We generally reserve the space in a relation before attempting to write, so 
> not sure how you are able to hit the disk full situation via mdwrite.  If you 
> see the description of the function, that also indicates same.

Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due 
to our own storage layer which treat mdextend() actions as setting length of 
the file only. We have a workaround, and any change isn’t needed for Postgres.

> I am not telling that mdwrite can never lead to error, but just trying to 
> understand the issue you actually faced.  I haven't read your proposed 
> solution yet, let's first try to establish the problem you are facing.

We see transient IO errors reading a block where PG instance gets dead-lock in 
single user mode until we kill the instance. The stack trace below shows the 
behavior which is when mdread() failed with buffer holding its lw-lock. This 
happens because in single user mode there is no call back to release the lock 
and when AbortBufferIO() tries to acquire the same lock again, it will wait for 
the lock indefinitely.

Here is the stack trace:

0a 0004`8080cc30 0004`80dcf917 postgres!PGSemaphoreLock+0x65 
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158] 
0b 0004`8080cc90 0004`80db025c postgres!LWLockAcquire+0x137 
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234] 
0c 0004`8080ccd0 0004`80db25db postgres!AbortBufferIO+0x2c 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995] 
0d 0004`8080cd20 0004`80dbce36 postgres!AtProcExit_Buffers+0xb 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479] 
0e 0004`8080cd50 0004`80dbd1bd postgres!shmem_exit+0xf6 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262] 
0f 0004`8080cd80 0004`80dbccfd postgres!proc_exit_prepare+0x4d 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188] 
10 0004`8080cdb0 0004`80ef9e74 postgres!proc_exit+0xd 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141] 
11 0004`8080cde0 0004`80ddb6ef postgres!errfinish+0x204 
[d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624] 
12 0004`8080ce50 0004`80db0f59 postgres!mdread+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806] 
13 0004`8080cea0 0004`80daeb70 postgres!ReadBuffer_common+0x2c9 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 897] 
14 0004`8080cf30 0004`80b81322 postgres!ReadBufferWithoutRelcache+0x60 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 694] 
15 0004`8080cf90 0004`80db9cbb postgres!XLogReadBufferExtended+0x142 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlogutils.c @ 513] 
16 0004`8080cff0 0004`80b2f53a 
postgres!XLogRecordPageWithFreeSpace+0xbb 
[d:\orcasqlagsea10\14\s\src\backend\storage\freespace\freespace.c @ 254] 
17 0004`8080d030 0004`80b6eb94 postgres!heap_xlog_insert+0x36a 
[d:\orcasqlagsea10\14\s\src\backend\access\heap\heapam.c @ 8491] 
18 0004`8080f0d0 0004`80f0a13f postgres!StartupXLOG+0x1f84 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlog.c @ 7480] 
19 0004`8080fbf0 0004`80de121e postgres!InitPostgres+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\utils\init\postinit.c @ 656] 
1a 0004`8080fcd0 0004`80c92c31 postgres!PostgresMain+0x25e 
[d:\orcasqlagsea10\14\s\src\backend\tcop\postgres.c @ 3881] 
1b 0004`8080fed0 0004`80f51df3 postgres!main+0x491 
[d:\orcasqlagsea10\14\s\src\backend\main\main.c @ 235] 

Please let us know should you have more feedbacks. Thank you!
 
Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-Original Message-
From: Thomas Munro  
Sent: Thursday, January 24, 2019 2:32 PM
To: Amit Kapila 
Cc: Chengchao Yu ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila  wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including 
> > disk full, random remote disk IO failures) happens in single user mode. We 
> > found the issue exists on both Linux and Windows in multiple postgres 
> > versions.
> >
> > 3.   Because the unable to write relation data scenario is difficult to 
> > hit naturally even reserved space is turned off, I have prepared a small 
> > patch (see attachment “emulate-error.patch”) to force an error when PG 
> > tries to write data to relation files. We can just apply the patch and 
> > there is no need to put efforts flooding data to disk any more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +8

Don't deform column-by-column in composite_to_json

Hi,

In https://postgr.es/m/20190201162404.onngi77f26baem4g%40alap3.anarazel.de
I noticed that composite_to_json() deforms column-by-column. Given that
it always processes all columns, that seems quite the waste of resources.

In some quick'n dirty dirty testing this gives a ~4% benefit in a table
without nulls and varlenas, and ~7% in one with both. I assume that if
one were to test with a bit wider table the win would be bigger.

A short test shows that it'd be slower to allocate nulls/values with
palloc rather than using MaxHeapAttributeNumber. Given that only output
functions are called from within composite_to_json(), I think that's ok.

Greetings,

Andres Freund
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index de0d0723b71..8724022df54 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1755,6 +1755,8 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 	int			i;
 	bool		needsep = false;
 	const char *sep;
+	Datum		values[MaxHeapAttributeNumber];
+	bool		nulls[MaxHeapAttributeNumber];
 
 	sep = use_line_feeds ? ",\n " : ",";
 
@@ -1772,10 +1774,10 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 
 	appendStringInfoChar(result, '{');
 
+	heap_deform_tuple(tuple, tupdesc, values, nulls);
+
 	for (i = 0; i < tupdesc->natts; i++)
 	{
-		Datum		val;
-		bool		isnull;
 		char	   *attname;
 		JsonTypeCategory tcategory;
 		Oid			outfuncoid;
@@ -1792,9 +1794,7 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 		escape_json(result, attname);
 		appendStringInfoChar(result, ':');
 
-		val = heap_getattr(tuple, i + 1, tupdesc, &isnull);
-
-		if (isnull)
+		if (nulls[i])
 		{
 			tcategory = JSONTYPE_NULL;
 			outfuncoid = InvalidOid;
@@ -1802,7 +1802,8 @@ composite_to_json(Datum composite, StringInfo result, bool use_line_feeds)
 		else
 			json_categorize_type(att->atttypid, &tcategory, &outfuncoid);
 
-		datum_to_json(val, isnull, result, tcategory, outfuncoid, false);
+		datum_to_json(values[i], nulls[i], result, tcategory, outfuncoid,
+	  false);
 	}
 
 	appendStringInfoChar(result, '}');

Re: DNS SRV support for LDAP authentication

2019-02-01 Thread Thomas Munro

On Sat, Feb 2, 2019 at 9:25 AM Graham Leggett  wrote:
> On 25 Sep 2018, at 04:09, Thomas Munro  wrote:
> > Some people like to use DNS SRV records to advertise LDAP servers on
> > their network.  Microsoft Active Directory is usually (always?) set up
> > that way.  Here is a patch to allow our LDAP auth module to support
> > that kind of discovery.
>
> Does this support SSL/TLS?

I didn't try it myself but I found several claims that it works.  I
see complaints that it always looks for _ldap._tcp and not _ldaps._tcp
as you might expect when using ldascheme=ldaps, but that doesn't seem
to be a big problem.  As for ldaptls=1, that must work because it
doesn't even negotiate that until after the connection is made.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Don't deform column-by-column in composite_to_json

2019-02-01 Thread Daniel Gustafsson

> On 2 Feb 2019, at 00:21, Andres Freund  wrote:

> In https://postgr.es/m/20190201162404.onngi77f26baem4g%40alap3.anarazel.de
> I noticed that composite_to_json() deforms column-by-column. Given that
> it always processes all columns, that seems quite the waste of resources.
> 
> In some quick'n dirty dirty testing this gives a ~4% benefit in a table
> without nulls and varlenas, and ~7% in one with both. I assume that if
> one were to test with a bit wider table the win would be bigger.
> 
> A short test shows that it'd be slower to allocate nulls/values with
> palloc rather than using MaxHeapAttributeNumber. Given that only output
> functions are called from within composite_to_json(), I think that's ok.

Nice catch, patch looks good to me.  composite_to_jsonb() has the same
construction, processing every attribute.  Should it get a similar patch as
this?

cheers ./daniel

Re: Delay locking partitions during query execution

2019-02-01 Thread Tomas Vondra

On 2/1/19 2:51 PM, Robert Haas wrote:
>> (I admit to not having the best grasp on how all this works, so feel
>> free to shoot this down)
>>
> I think the key question here is whether or not you can cope with
> someone having done arbitrary AEL-requiring modifications to the
> delaylocked partitions.  If you can, the fact that the plan might
> sometimes be out-of-date is an inevitable consequence of doing this at
> all, but I think we can argue that it's an acceptable consequence as
> long as the resulting behavior is not too bletcherous.  If you can't,
> then this patch is dead.

I'm a bit confused - does the patch actually change anything here? As
demonstrated earlier in this thread, it actually behaves just like
master. No?

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delay locking partitions during query execution

On Sat, 2 Feb 2019 at 13:43, Tomas Vondra  wrote:
>
> On 2/1/19 2:51 PM, Robert Haas wrote:
> >> (I admit to not having the best grasp on how all this works, so feel
> >> free to shoot this down)
> >>
> > I think the key question here is whether or not you can cope with
> > someone having done arbitrary AEL-requiring modifications to the
> > delaylocked partitions.  If you can, the fact that the plan might
> > sometimes be out-of-date is an inevitable consequence of doing this at
> > all, but I think we can argue that it's an acceptable consequence as
> > long as the resulting behavior is not too bletcherous.  If you can't,
> > then this patch is dead.
>
> I'm a bit confused - does the patch actually change anything here? As
> demonstrated earlier in this thread, it actually behaves just like
> master. No?

I guess Robert meant if it say, failed to execute a cached plan with
say "unable to open relation ..." after someone concurrently did
something like ALTER TABLE ... SET TABLESPACE on one of the
partitions.

This particular case can't happen with the patch since we always
accept invalidation message after we obtain a new lock, but I'm
working through various scenarios just in case there is something that
could invalidate the plan, rather than just the relcache entry.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: initdb --allow-group-access behaviour in windows

On Fri, Feb 01, 2019 at 07:14:19PM +0200, David Steele wrote:
> Hrm, seems like I missed that.  Do you mind writing something that makes
> sense to Windows users and I'll have a look?

Perhaps something like the attached?
--
Michael
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 4489b585c7..c3edc243b2 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -202,7 +202,9 @@ PostgreSQL documentation
   

 Allows users in the same group as the cluster owner to read all cluster
-files created by initdb.
+files created by initdb.  This option is irrelevant
+on Windows as it does not support
+POSIX-style group permissions.

   
  


signature.asc
Description: PGP signature

Re: pgbench - add pseudo-random permutation function

On Thu, Oct 25, 2018 at 08:21:27AM +0200, Fabien COELHO wrote:
> Thanks for the hint. Here is an updated patch using this marker.
> 
> I noticed that some instances use a closing "*-" sequence, but it does
> not seem useful, so I left it out.
> 
> I have also tried to improve a few comments, and moved a declaration into a
> loop because I did not like the pg-indented version much.

This patch is marked as ready for committer for some time now, and it
has rotten, so I am moving it to next CF, waiting on author.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid

On Sun, Dec 02, 2018 at 04:41:06PM -0800, Noah Misch wrote:
> Here's a version fixing that test for post-cfdf4dc backends.

This patch set still applies and needs review, so moved to next CF.
--
Michael


signature.asc
Description: PGP signature

Re: PostgreSQL vs SQL/XML Standards

On Thu, Jan 31, 2019 at 04:26:31PM +0100, Pavel Stehule wrote:
> I'll mark this patch as ready for commiters.

For now I have moved the patch to the next CF, with the same status.
--
Michael


signature.asc
Description: PGP signature

Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

On Fri, Nov 16, 2018 at 01:35:15PM -0500, Tom Lane wrote:
> I wonder whether we'd be better off thinking of a way to let FDWs
> invent additional system column IDs for their tables, so that
> something like a remote table OID could be represented in the
> natural way as a Var with negative varattno.  This'd potentially
> also be a win for FDWs whose underlying storage has a row identifier,
> but it's not of type "tid".  Instead of trying to shoehorn their
> row ID into SelfItemPointerAttributeNumber, they could define a
> new system column that has a more appropriate data type.  Admittedly
> there'd be some infrastructure work to do to make this happen, maybe
> a lot of it; but it's a bullet we really need to bite at some point.

This patch got zero input for the last couple of months.  As it is
classified as bug fix, I have moved it to next CF, waiting on author.
Fujita-san, are you planning to look at it?
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

On Sat, Nov 24, 2018 at 05:58:38PM +0100, Fabien COELHO wrote:
>> Unfortunately, there was no activity over the last few commitfests and the
>> proposed patch pgbench-tap-progress-6 can't be applied anymore without
>> conflicts. Fabien, what are your plans about it, could you please post a
>> rebased version?
> 
> Here it is.

I have moved this patch to CF 2019-03, waiting for review.
--
Michael


signature.asc
Description: PGP signature

Re: tab-completion debug print

On Fri, Feb 01, 2019 at 11:07:02AM +0900, Michael Paquier wrote:
> On Thu, Jan 31, 2019 at 04:53:49AM -0800, Andres Freund wrote:
> > My impression is that this patch ought to marked as rejected?
> 
> I may be missing something, but I have the same impression.

Re-reading the thread I think that's too severe.  Tom has mentioned
something interesting here:
https://www.postgresql.org/message-id/f6698.1543206...@sss.pgh.pa.us

So I am marking that stuff as returned with feedback.
--
Michael

signature.asc
Description: PGP signature

Re: [PROPOSAL]a new data type 'bytea' for ECPG

On Fri, Feb 01, 2019 at 10:29:11AM +0100, Michael Meskes wrote:
> I have no idea where the footnote comes from, but I agree that it doesn't seem
> to make sense. The datatype varchar in the C code is handled by the
> preprocessor and replaced by a struct definition anyway.
> 
> Feel free to remove.

Moved to next CF as the discussion moves on.
--
Michael


signature.asc
Description: PGP signature

Re: monitoring CREATE INDEX [CONCURRENTLY]

Hi all,

Based on the latest emails exchanged, the patch got some feedback from
Pavan which has not been addressed.  So I am marking the patch as
returned with feedback.
--
Michael


signature.asc
Description: PGP signature

Re: Connection slots reserved for replication

On Thu, Jan 31, 2019 at 12:08:18PM +0100, Oleksii Kliukin wrote:
> Thanks, set it to RFC.

Oh, I have just noticed this patch when doing my vacuum homework.  I
have just added my name as committer, just wait a bit until the CF is
closed before I begin looking at it..
--
Michael


signature.asc
Description: PGP signature

Re: WIP: Avoid creation of the free space map for small tables

On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
>  wrote:
> >
> > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila  wrote:
> > > There are a few buildfarm failures due to this commit, see my email on
> > > pgsql-committers.  If you have time, you can also once look into
> > > those.
> >
> > I didn't see anything in common with the configs of the failed
> > members. None have a non-default BLCKSZ that I can see.
> >
>
> I have done an analysis of the different failures on buildfarm.
>

In the past few days, we have done a further analysis of each problem
and tried to reproduce it.  We are successful in generating some form
of reproducer for 3 out of 4 problems in the same way as it was failed
in the buildfarm.   For the fourth symptom, we have tried a lot (even
Andrew Dunstan has helped us to run the regression tests with the
faulty commit on Jacana for many hours, but it didn't got reproduced)
but not able to regenerate a failure in a similar way.  However, I
have a theory as mentioned below why the particular test could fail
and the fix for the same is done in the patch.  I am planning to push
the latest version of the patch [1] which has fixes for all the
symptoms. Does anybody have any opinion here?

>
> 4.  Failure on jacana:
> --- 
> c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
> 2018-09-26
> 17:53:33 -0400
> +++ 
> c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
> 2019-01-27 23:14:35
> -0500
> @@ -252,332 +252,7 @@
>  ('(0,100)(0,infinity)'),
>  ('(-infinity,0)(0,infinity)'),
>  ('(-infinity,-infinity)(infinity,infinity)');
> -SET enable_seqscan = false;
> -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
> ..
> ..
> TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
> "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
> Line:
> 1118)
> ..
> 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG:  server process
> (PID 14388) exited with exit code 3
> 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL:  Failed process
> was running: INSERT INTO box_temp
> VALUES (NULL),
>
> I think the reason for this failure is same as previous (as mentioned
> in point-3), but this can happen in a different way.  Say, we have
> searched the local map and then try to extend a relation 'X' and in
> the meantime, another backend has extended such that it creates FSM.
> Now, we will reuse that page and won't clear local map.  Now, say we
> try to insert in relation 'Y' which doesn't have FSM.  It will try to
> set the local map and will find that it already exists, so will fail.
> Now, the question is how it can happen in this box.sql test.  I guess
> that is happening for some system table which is being populated by
> Create Index statement executed just before the failing Insert.
>
> I think both 3 and 4 are timing issues, so we didn't got in our local
> regression runs.
>

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B3ajhRPC0jvUi6p_aMrTUpB568OBH10LrbHtvOLNTgqQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-02-01 Thread Peter Geoghegan

On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan  wrote:
> > The fix here must be to normalize index tuples that are compressed
> > within amcheck, both during initial fingerprinting, and during
> > subsequent probes of the Bloom filter in bt_tuple_present_callback().
>
> I happened to talk to Andres about this in person yesterday. He
> thought that there was reason to be concerned about the need for
> logical normalization beyond TOAST issues. Expression indexes were a
> particular concern, because they could in principle have a change in
> the on-disk representation without a change of logical values -- false
> positives could result. He suggested that the long term solution was
> to bring hash operator class hash functions into Bloom filter hashing,
> at least where available.

I think that the best way forward is to normalize to compensate for
inconsistent input datum TOAST state, and leave it at that. ISTM that
logical normalization beyond that (based on hashing, or anything else)
creates more problems than it solves. I am concerned about cases like
INCLUDE indexes (which may have datums that lack even a B-Tree
opclass), and about the logical-though-semantically-relevant facets of
some datatypes such as numeric's display scale. If I can get an
example from Andres of a case where further logical normalization is
necessary to avoid false positives with expression indexes, that may
change things. (BTW, I implemented another amcheck enhancement that
searches indexes from the root to find matches -- the code is a
trivial addition to the new patch series I'm working on, and seems
like a better way to do enhanced logical normalization if that proves
to be truly necessary.)

Attached draft patch fixes the bug by doing fairly simple
normalization. I think that TOAST compression of datums in indexes is
fairly rare in practice, so I'm not very worried about the fact that
this won't perform as well as it could with indexes that have a lot of
compressed datums. I think that the interface I've added might need to
be expanded for other things in the future (e.g., to make amcheck work
with nbtree-native duplicate compression), and not worrying about the
performance too much helps with that goal.

I'll pick this up next week, and likely commit a fix by Wednesday or
Thursday if there are no objections. I'm not sure if the test case is
worth including.

-- 
Peter Geoghegan

0001-Avoid-amcheck-TOAST-compression-inconsistencies.patch
Description: Binary data

Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance

On 2019-Jan-31, Lætitia Avrot wrote:

> Hi,
> 
> Thanks to Emil Iggland's kind review, I added precision on documentation
> for hyperbolic functions.

Hello

I see that in dtanh() you set errno to 0 before calling tanh(), but 1)
you don't check for it afterwards (seems like you should be checking for
ERANGE, as well as checking the return value for isinf()), and 2) you
don't do that in dsinh() and dcosh() and I'm not quite sure I see why.
What's up with that?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu  wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both 
> messages.
>
> > We generally reserve the space in a relation before attempting to write, so 
> > not sure how you are able to hit the disk full situation via mdwrite.  If 
> > you see the description of the function, that also indicates same.
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft 
> due to our own storage layer which treat mdextend() actions as setting length 
> of the file only. We have a workaround, and any change isn’t needed for 
> Postgres.
>
> > I am not telling that mdwrite can never lead to error, but just trying to 
> > understand the issue you actually faced.  I haven't read your proposed 
> > solution yet, let's first try to establish the problem you are facing.
>
> We see transient IO errors reading a block where PG instance gets dead-lock 
> in single user mode until we kill the instance. The stack trace below shows 
> the behavior which is when mdread() failed with buffer holding its lw-lock. 
> This happens because in single user mode there is no call back to release the 
> lock and when AbortBufferIO() tries to acquire the same lock again, it will 
> wait for the lock indefinitely.
>

I think you can register your patch for next CF [1] so that we don't
forget about it.

[1] - https://commitfest.postgresql.org/22/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Block level parallel vacuum

On Fri, Feb 1, 2019 at 2:49 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 30, 2019 at 2:06 AM Haribabu Kommi  
> wrote:
>
> Thank you. I'll submit the updated patch set.
>

I don't see any chance of getting this committed in the next few days,
so, moved to next CF.   Thanks for working on this and I hope you will
continue work on this project.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: New vacuum option to do only freezing

On Fri, Feb 01, 2019 at 10:43:37PM +, Bossart, Nathan wrote:
> I would suggest adding this option to vacuumdb in this patch set as
> well.

Doing it would be simple enough, for now I have moved it to next CF.
--
Michael


signature.asc
Description: PGP signature

Re: Installation instructions update (pg_ctl)