Re: Statement-level Triggers For Uniqueness Checks

2018-12-25 Thread Dean Rasheed
On Mon, 24 Dec 2018 at 23:57, Corey Huinker  wrote:
>
> So I took a first pass at this, and I got stuck.
>
> [snip]
>
> Any idea where I went wrong?

Take a look at this code in AfterTriggerSaveEvent():

/*
 * If the trigger is a deferred unique constraint check trigger, only
 * queue it if the unique constraint was potentially violated, which
 * we know from index insertion time.
 */
if (trigger->tgfoid == F_UNIQUE_KEY_RECHECK)
{
if (!list_member_oid(recheckIndexes, trigger->tgconstrindid))
continue;   /* Uniqueness definitely not violated */
}

If you trace it back, you'll see that for a statement-level trigger,
recheckIndexes will always be empty.

Regards,
Dean



Re: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-12-25 Thread Michael Paquier
On Fri, Nov 30, 2018 at 12:24:04PM +1300, Thomas Munro wrote:
> The tests pass and show the feature working correctly.  I think this
> is getting close to committable.  I see that Magnus has signed up as
> committer.

It has been one month since this message, and the patch is marked as
ready for committer.  Magnus, are you planning to look at it?
--
Michael


signature.asc
Description: PGP signature


Re: pg_dumpall --exclude-database option

2018-12-25 Thread Fabien COELHO


Hello Andrew,


Rebased and updated patch attached.


Here is a review of v5, sorry for the delay.

Patch applies cleanly, compiles, "make check" is ok.

I do not see Michaël's issue, and do not see how it could be so, for me 
the whole database-specific section generated by the underlying "pg_dump" 
call is removed, as expected.


All is well for me, I turned the patch as ready.


While poking around the dump output, I noticed some unrelated points:

* Command "pg_dump" could tell which database is dumped in the output at 
the start of the section, eg:


  --
  -- PostgreSQL database "foo" dump
  --

Or "pg_dumpall" could issue a comment line in the output telling which 
database is being considered.


* The database dumps should have an introductory comment, like there is 
one for roles, eg:


  --
  -- Databases
  --

* On extensions, the dump creates both the extension and the extension 
comment. However, ISTM that the extension comment is already created by 
the extension, so this is redundant:


 --
 -- Name: pg_dirtyread; Type: EXTENSION; Schema: -; Owner:
 --
 CREATE EXTENSION IF NOT EXISTS pg_dirtyread WITH SCHEMA public;

 --
 -- Name: EXTENSION pg_dirtyread; Type: COMMENT; Schema: -; Owner:
 --
 COMMENT ON EXTENSION pg_dirtyread IS 'Read dead but unvacuumed rows from 
table';

Maybe it should notice that the comment belongs to the extension and need 
not be updated?


--
Fabien.

Re: Statement-level Triggers For Uniqueness Checks

2018-12-25 Thread Dean Rasheed
On Tue, 25 Dec 2018 at 08:04, Dean Rasheed  wrote:
> Take a look at this code in AfterTriggerSaveEvent():
>

Note that the intention behind that code is that in the (fairly
common) case where an insert or update operation is known to not lead
to any potential PK/UNIQUE index violations, the overhead for the
deferred recheck is minimal. For example, consider a bulk insert where
IDs come from a sequence known to not overlap with existing IDs. In
that case, each new ID is determined at index insertion time to not
possibly conflict with any existing ID, recheckIndexes remains empty
for each row, and no recheck triggers are ever queued.

One of the tricky things about replacing the current rechecks with
statement level triggers will be working out how to deal with such
cases (or cases with just a tiny fraction of keys to be rechecked)
without introducing a large overhead.

Regards,
Dean



Re: pg_dumpall --exclude-database option

2018-12-25 Thread Michael Paquier
On Tue, Dec 25, 2018 at 09:36:05AM +0100, Fabien COELHO wrote:
> I do not see Michaël's issue, and do not see how it could be so, for me the
> whole database-specific section generated by the underlying "pg_dump" call
> is removed, as expected.
> 
> All is well for me, I turned the patch as ready.

Sorry for the noise.  I have been double-checking what I said
previously and I am in the wrong.

>   --
>   -- PostgreSQL database "foo" dump
>   --
> 
> Or "pg_dumpall" could issue a comment line in the output telling which
> database is being considered.

Mentioning which database dump has been completed in the end comment
could be additionally nice.
--
Michael


signature.asc
Description: PGP signature


Re: Online verification of checksums

2018-12-25 Thread Fabien COELHO



Hallo Michael,


Yeah, new rebased version attached.


Patch v8 applies cleanly, compiles, global & local make check are ok.

A few comments:

About added tests: the node is left running at the end of the script, 
which is not very clean. I'd suggest to either move the added checks 
before stopping, or to stop again at the end of the script, depending on 
the intention.


I'm wondering (possibly again) about the existing early exit if one block 
cannot be read on retry: the command should count this as a kind of bad 
block, proceed on checking other files, and obviously fail in the end, but 
having checked everything else and generated a report. I do not think that 
this condition warrants a full stop. ISTM that under rare race conditions 
(eg, an unlucky concurrent "drop database" or "drop table") this could 
happen when online, although I could not trigger one despite heavy 
testing, so I'm possibly mistaken.


--
Fabien.



Re: pg_dumpall --exclude-database option

2018-12-25 Thread Fabien COELHO


Bonjour Michaël,


I do not see Michaël's issue [...]


Sorry for the noise.


No big deal!


  -- PostgreSQL database "foo" dump

Or "pg_dumpall" could issue a comment line in the output telling which
database is being considered.


Mentioning which database dump has been completed in the end comment
could be additionally nice.


Indeed, it should be consistent.

Joyeuses fêtes !

--
Fabien.

Feature: temporary materialized views

2018-12-25 Thread Mitar
Hi!

Sometimes materialized views are used to cache a complex query on
which a client works. But after client disconnects, the materialized
view could be deleted. Regular VIEWs and TABLEs both have support for
temporary versions which get automatically dropped at the end of the
session. It seems it is easy to add the same thing for materialized
views as well. See attached PoC patch.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index d01b258b65..996fe8f53d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -278,17 +278,12 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString,
 	Assert(query->commandType == CMD_SELECT);
 
 	/*
-	 * For materialized views, lock down security-restricted operations and
-	 * arrange to make GUC variable changes local to this command.  This is
-	 * not necessary for security, but this keeps the behavior similar to
-	 * REFRESH MATERIALIZED VIEW.  Otherwise, one could create a materialized
-	 * view not possible to refresh.
+	 * For materialized views, arrange to make GUC variable changes local
+	 * to this command.
 	 */
 	if (is_matview)
 	{
 		GetUserIdAndSecContext(&save_userid, &save_sec_context);
-		SetUserIdAndSecContext(save_userid,
-			   save_sec_context | SECURITY_RESTRICTED_OPERATION);
 		save_nestlevel = NewGUCNestLevel();
 	}
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2c2208ffb7..e59e53b154 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -420,7 +420,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
 %type   opt_trusted opt_restart_seqs
 %type 	 OptTemp
-%type 	 OptNoLog
+%type 	 OptTempNoLog
 %type  OnCommitOption
 
 %type 	for_locking_strength
@@ -4054,7 +4054,7 @@ opt_with_data:
  */
 
 CreateMatViewStmt:
-		CREATE OptNoLog MATERIALIZED VIEW create_mv_target AS SelectStmt opt_with_data
+		CREATE OptTempNoLog MATERIALIZED VIEW create_mv_target AS SelectStmt opt_with_data
 {
 	CreateTableAsStmt *ctas = makeNode(CreateTableAsStmt);
 	ctas->query = $7;
@@ -4067,7 +4067,7 @@ CreateMatViewStmt:
 	$5->skipData = !($8);
 	$$ = (Node *) ctas;
 }
-		| CREATE OptNoLog MATERIALIZED VIEW IF_P NOT EXISTS create_mv_target AS SelectStmt opt_with_data
+		| CREATE OptTempNoLog MATERIALIZED VIEW IF_P NOT EXISTS create_mv_target AS SelectStmt opt_with_data
 {
 	CreateTableAsStmt *ctas = makeNode(CreateTableAsStmt);
 	ctas->query = $10;
@@ -4096,8 +4096,11 @@ create_mv_target:
 }
 		;
 
-OptNoLog:	UNLOGGED	{ $$ = RELPERSISTENCE_UNLOGGED; }
-			| /*EMPTY*/	{ $$ = RELPERSISTENCE_PERMANENT; }
+OptTempNoLog:
+		TEMPORARY	{ $$ = RELPERSISTENCE_TEMP; }
+		| TEMP		{ $$ = RELPERSISTENCE_TEMP; }
+		| UNLOGGED	{ $$ = RELPERSISTENCE_UNLOGGED; }
+		| /*EMPTY*/	{ $$ = RELPERSISTENCE_PERMANENT; }
 		;
 
 


Is there any way that one of the Postgres Background/Utility process may go down?

2018-12-25 Thread rajan
Is there any way that one of the Postgres Background process may go down?
meaning the process getting stopped?

For example, can the wal sender process alone stop working? If it does so,
which part of the logs I must check to proceed further.



-
--
Thanks,
Rajan.
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Fabien COELHO



Hallo Michael,


V5 attached.


Patch applies cleanly, compiles, global & local make check are ok.

Given that the specific output is not checked, I do not think that the -P 
check deserves a test on its own, I think that the -P option could simply 
be added to any of the existing tests.


I'm still unhappy that "kB" is used for 1024 bytes in the output, contrary 
to all existing standards (1 kB = 1000 bytes -- SI, 1 KB = 1 KiB = 1024 
bytes -- IEC & JEDEC). The fact that this is also used wrongly elsewhere 
in pg is not relevant, these are bugs to be fixed, not to be replicated.


Given the speed of verifying checksums and its storage-oriented status, I 
also still think that a (possibly fractional) MB (1,000,000 bytes), or 
even GB, is the right unit to use for reporting this progress. On my 
laptop (SSD), verifying runs at least at 1.26 GB/s (on one small test), 
there is no point in displaying kilobytes progress.


I still think that using a more precise time than time(), eg with existing 
macros from "instr_time.h", would not cost anything more and result in a 
better precision output. It would also allow to remove the check used to 
avoid a division-by-zero by switching to double.


If the check is performed while online (other patch in queue), then the 
size may change thus it may not reach or go beyond 100%. No big deal.


I'd consider inverting the sizeonly boolean, so that true does the check 
and false does only the size collection. It seems more logical to me if it 
performs more with true than with false.


--
Fabien.



Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Fabien COELHO



Given the speed of verifying checksums and its storage-oriented status, I 
also still think that a (possibly fractional) MB (1,000,000 bytes), or even 
GB, is the right unit to use for reporting this progress. On my laptop (SSD), 
verifying runs at least at 1.26 GB/s (on one small test), there is no point 
in displaying kilobytes progress.


Obviously the file is cached by the system at such speed, but still most 
disks should provides dozens of MB per second of read bandwidth. If GB is 
used, it should use fractional display (eg 1.25 GB) though.


--
Fabien.



Re: pg_dump multi VALUES INSERT

2018-12-25 Thread Fabien COELHO



Hello Surafel,


Thank you for informing, Here is an updated patch against current master


Patch applies cleanly, compiles, "make check" is okay, but given that the 
feature is not tested...


Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided: 
This feature is seldom used (hmmm... 2 existing instances), because it 
interferes with argument processing if such switches are used as the last 
one. It is only okay with commands which do not expect arguments. For 
backward compatibility, this suggests to add another switch, eg 
--insert-multi=100 or whatever, which would possibly default to 100. The 
alternative is to break compatibility with adding a mandatory argument, 
but I guess it would not be admissible to committers.


Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to 
avoid it and use some stricter option and error out on malformed integers.


The --help output does not document the --inserts argument, nor the 
documentation.


There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding 
function, I'd suggest to simply extend it so that it takes into account 
the "multi insert" setting and default to the previous behavior if not 
set.


--
Fabien.



RE: Timeout parameters

2018-12-25 Thread Fabien COELHO



I'm not sure I understand the use case you have that needs these new 
extensions.



If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established
normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute
the SQL statement
   Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the
server.
In this case, "statement_timeout" can't satisfy at line 4.


Why?

ISTM that "leaving the job" to the server with a client-side connection 
closed is basically an abort, no different from what server-side 
"statement_timeout" already provides?


Also, from a client perspective, if you use statement_timeout, it 
would timeout, then the client would process the error and the connection 
would be ready for the next query without needing to be re-created, which 
is quite costly anyway? Also, if the server is busy, recreating an 
connection is expensive so it won't help much, really?


So from your explanation above I must admit that I do not clearly 
understand the use case for a client-side libpq-level SQL statement 
timeout. I still need some convincing.


About the implementation, I'm wondering whether something simpler could be 
done. Check how psql implements "ctrl-c" to abort a running query: it 
seems that it sends a cancel message, no need to actually abort the 
connection?



I think that there is some kind of a misnomer: this is not a
socket-level timeout, but a client-side query timeout, so it should be

named differently?
Yes, I think so.


Hmmm "client_statement_timeout" maybe?

--
Fabien.



RE: Timeout parameters

2018-12-25 Thread Fabien COELHO


こんにちは Royhei,

About the patches: you are expected to send consistent patches, i.e. one 
feature with its associated documentation, not two separate features and 
another patch for documenting them.


--
Fabien.

Re: Minor comment fix for pg_config_manual.h

2018-12-25 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Dec 24, 2018 at 01:05:25PM +0900, Ian Barwick wrote:
>> Attached is mainly to fix a comment in $subject which has a typo in
>> the referenced initdb option ("--walsegsize", should be
>> "--wal-segsize"), and while I'm there also adds a couple of "the"
>> for readability.

> All that (the error as well as the extra "the" for clarity in this
> sentence) seems right to me.  Any opinions from others?

The text still seems a bit awkward.  Maybe "... to be used when initdb
is run without the ..."

regards, tom lane



Re: Alternative to \copy in psql modelled after \g

2018-12-25 Thread Fabien COELHO



Bonjour Daniel,


But the copy-workflow and non-copy-workflow are different, and in
order to know which one to start, \g would need to analyze the query


It turns out I was wrong on this. The workflows are different but when
psql receives the first PGresult, it's still time to handle the I/O
redirection. It just differs from \copy in the case of an error:
\copy won't even send a command to the server if the local output
stream can't be opened, whereas COPY \g would send the command, and
will have to deal with the client-side error afterwards.

Well, except that up to now, COPY ... TO STDOUT \g file or \g |command
has been ignoring "file" or "|command", which is arguably a bug as Tom
puts it in another discussion in [1].

So as a replacement for the \copyto I was proposing earlier, PFA a patch
for COPY TO STDOUT to make use of the argument to \g.


Patch applies cleanly, compiles, make check is ok.

However, it does not contain any tests (bad:-) nor documentation (hmmm... 
maybe none needed, though).


Is this just kind of a bug fix? Beforehand the documentation says "\g fn" 
sends to file, but that was not happening with COPY, and now it does as it 
says?


A question: if opening the output file fails, should not the query be 
cancelled and an error be reported? Maybe it is too late for that. It 
seems that "SELECT pg_sleep(10) \g /bad/file" fails but executes the query 
nevertheless.


ISTM that overriding copystream on open failures is not necessary, because 
its value is already NULL in this case.


I'd suggest that is_pipe should be initialized to false, otherwise it is 
unclear from the code when it is set before use, and some compilers may 
complain.


--
Fabien.



Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Michael Banck
Hi,

Am Dienstag, den 25.12.2018, 12:12 +0100 schrieb Fabien COELHO:
> > Given the speed of verifying checksums and its storage-oriented status, I 
> > also still think that a (possibly fractional) MB (1,000,000 bytes), or even 
> > GB, is the right unit to use for reporting this progress. On my laptop 
> > (SSD), 
> > verifying runs at least at 1.26 GB/s (on one small test), there is no point 
> > in displaying kilobytes progress.
> 
> Obviously the file is cached by the system at such speed, but still most 
> disks should provides dozens of MB per second of read bandwidth. If GB is 
> used, it should use fractional display (eg 1.25 GB) though.

I think MB indeed makes more sense than kB, so I have changed that now
in V7, per attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..b470c19c08 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -80,6 +80,19 @@ PostgreSQL documentation
  
 
  
+  -P
+  --progress
+  
+   
+Enable progress reporting. Turning this on will deliver an approximate
+progress report during the checksum verification. Sending the
+SIGUSR1 signal will toggle progress reporting
+on or off during the verification run (not available on Windows).
+   
+  
+ 
+
+ 
-V
--version

diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 6444fc9ca4..4cecb64727 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -10,6 +10,8 @@
 #include "postgres_fe.h"
 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -30,9 +32,18 @@ static ControlFileData *ControlFile;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool show_progress = false;
 
 static const char *progname;
 
+/*
+ * Progress status information.
+ */
+int64		total_size = 0;
+int64		current_size = 0;
+pg_time_t	last_progress_update;
+pg_time_t	scan_started;
+
 static void
 usage(void)
 {
@@ -43,6 +54,7 @@ usage(void)
 	printf(_(" [-D, --pgdata=]DATADIR  data directory\n"));
 	printf(_("  -v, --verbose  output verbose messages\n"));
 	printf(_("  -r RELFILENODE check only relation with specified relfilenode\n"));
+	printf(_("  -P, --progress show progress information\n"));
 	printf(_("  -V, --version  output version information, then exit\n"));
 	printf(_("  -?, --help show this help, then exit\n"));
 	printf(_("\nIf no data directory (DATADIR) is specified, "
@@ -67,6 +79,69 @@ static const char *const skip[] = {
 	NULL,
 };
 
+static void
+toggle_progress_report(int signum)
+{
+
+	/* we handle SIGUSR1 only, and toggle the value of show_progress */
+	if (signum == SIGUSR1)
+		show_progress = !show_progress;
+
+}
+
+/*
+ * Report current progress status. Parts borrowed from
+ * PostgreSQLs' src/bin/pg_basebackup.c
+ */
+static void
+report_progress(bool force)
+{
+	pg_time_t	now = time(NULL);
+	int			total_percent = 0;
+
+	char		totalstr[32];
+	char		currentstr[32];
+	char		currspeedstr[32];
+
+	/* Make sure we report at most once a second */
+	if ((now == last_progress_update) && !force)
+		return;
+
+	/* Save current time */
+	last_progress_update = now;
+
+	/*
+	 * Calculate current percent done, based on MB...
+	 */
+	total_percent = total_size ? (int64) ((current_size / 1048576) * 100 / (total_size / 1048576)) : 0;
+
+	/* Don't display larger than 100% */
+	if (total_percent > 100)
+		total_percent = 100;
+
+	/* The same for total size */
+	if (current_size > total_size)
+		total_size = current_size / 1048576;
+
+	snprintf(totalstr, sizeof(totalstr), INT64_FORMAT,
+			 total_size / 1048576);
+	snprintf(currentstr, sizeof(currentstr), INT64_FORMAT,
+			 current_size / 1048576);
+	snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT,
+			 (current_size / 1048576) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started)));
+	fprintf(stderr, "%s/%s MB (%d%%, %s MB/s)",
+			currentstr, totalstr, total_percent, currspeedstr);
+
+	/*
+	 * If we are reporting to a terminal, send a carriage return so that we
+	 * stay on the same line.  If not, send a newline.
+	 */
+	if (isatty(fileno(stderr)))
+		fprintf(stderr, "\r");
+	else
+		fprintf(stderr, "\n");
+}
+
 static bool
 skipfile(const char *fn)
 {
@@ -117,6 +192,7 @@ scan_file(const char *fn, BlockNumber segmentno)
 			contin

Re: Feature: triggers on materialized views

2018-12-25 Thread David Fetter
On Mon, Dec 24, 2018 at 04:13:44PM -0800, Mitar wrote:
> Hi!
> 
> Thanks for reply!
> 
> On Mon, Dec 24, 2018 at 2:20 PM David Fetter  wrote:
> > You've got the right mailing list, a description of what you want, and
> > a PoC patch. You also got the patch in during the time between
> > Commitfests. You're doing great!
> 
> Great!
> 
> One thing I am unclear about is how it is determined if this is a
> viable feature to be eventually included. You gave me some suggestions
> to improve in my patch (adding tests and so on). Does this mean that
> the patch should be fully done before a decision is made?
> 
> Also, the workflow is that I improve things, and resubmit a patch to
> the mailing list, for now?
> 
> > > - Currently only insert and remove operations are done on the
> > > materialized view. This is because the current logic just removes
> > > changed rows and inserts new rows.
> >
> > What other operations might you want to support?
> 
> Update. So if a row is changing, instead of doing a remove and insert,
> what currently is being done, I would prefer an update. Then UPDATE
> trigger operation would happen as well. Maybe the INSERT query could
> be changed to INSERT ... ON CONFLICT UPDATE query (not sure if this
> one does UPDATE trigger operation on conflict), and REMOVE changed to
> remove just rows which were really removed, but not only updated.

There might be a reason it's the way it is. Looking at the commits
that introduced this might shed some light.

> > I'm not sure I understand the problem being described here. Do you see
> > these as useful to separate for some reason?
> 
> So rows which are just updated currently get first DELETE trigger
> called and then INSERT. The issue is that if I am observing this
> behavior from outside, it makes it unclear when I see DELETE if this
> means really that a row has been deleted or it just means that later
> on an INSERT would happen. Now I have to wait for an eventual INSERT
> to determine that. But how long should I wait? It makes consuming
> these notifications tricky.

If it helps you think about it better, all NOTIFICATIONs are sent on
COMMIT, i.e. you don't need to worry as much about what things should
or shouldn't have arrived. The down side, such as it is, is that they
don't convey premature knowledge about a state that may never arrive.

> If I just blindly respond to those notifications, this could introduce
> other problems. For example, if I have a reactive web application it
> could mean a visible flicker to the user. Instead of updating rendered
> row, I would first delete it and then later on re-insert it.

This is at what I hope is a level quite distinct from database
operations. Separation of concerns via the model-view-controller (or
similar) architecture and all that.

> > > Non-concurrent refresh does not trigger any trigger. But it seems
> > > all data to do so is there (previous table, new table), at least for
> > > the statement-level trigger. Row-level triggers could also be
> > > simulated probably (with TRUNCATE and INSERT triggers).
> >
> > Would it make more sense to fill in the missing implementations of NEW
> > and OLD for per-row triggers instead of adding another hack?
> 
> You lost me here. But I agree, we should implement this fully, without
> hacks. I just do not know how exactly.

Sorry I was unclear.  The SQL standard defines both transition tables,
which we have, for per-statement triggers, and transition variables,
which we don't, for per-row triggers. Here's the relevant part of the
syntax:

 ::=
CREATE TRIGGER   
ON  [ REFERENCING  ]


 ::=
...
 ::=
] 
] 



> Are you saying that we should support only row-level triggers, or that
> we should support both statement-level and row-level triggers, but
> just make sure we implement this properly?

The latter, although we might need to defer the row-level triggers
until we support transition variables.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Fabien COELHO


I think MB indeed makes more sense than kB, so I have changed that now 
in V7, per attached.


You use 1024² bytes. What about 100 bytes per MB, as the unit is about 
stored files?


Also, you did not answer to my other points:
 - use "instr_time.h" for better precision
 - invert sizeonly
 - reuse a test

--
Fabien.

Re: Feature: triggers on materialized views

2018-12-25 Thread Mitar
Hi!

On Tue, Dec 25, 2018 at 10:03 AM David Fetter  wrote:
> If it helps you think about it better, all NOTIFICATIONs are sent on
> COMMIT, i.e. you don't need to worry as much about what things should
> or shouldn't have arrived. The down side, such as it is, is that they
> don't convey premature knowledge about a state that may never arrive.

This fact does not really help me. My client code listening to
NOTIFICATIONs does not know when some other client made COMMIT. There
is no NOTIFICATION saying "this is the end of the commit for which I
just sent you notifications".

> This is at what I hope is a level quite distinct from database
> operations. Separation of concerns via the model-view-controller (or
> similar) architecture and all that.

Of course, but garbage in, garbage out. If notifications are
superfluous then another abstraction layer has to fix them. I would
prefer if this would not have to be the case.

But it seems it is relatively easy to fix this logic and have both
INSERTs, DELETEs and UPDATEs. The patch I updated and attached
previously does that.

> Sorry I was unclear.  The SQL standard defines both transition tables,
> which we have, for per-statement triggers, and transition variables,
> which we don't, for per-row triggers.

I thought that PostgreSQL has transition variables per-row triggers,
only that it is not possible to (re)name them (are depend on the
trigger function language)? But there are OLD and NEW variables
available in per-row triggers, or equivalent?

> The latter, although we might need to defer the row-level triggers
> until we support transition variables.

Not sure how transition variables are implemented currently for
regular tables, but we could probably do the same?

Anyway, I do not know how to proceed here to implement or
statement-level or row-level triggers here. It could be just a matter
a calling some function to call them, but I am not familiar with the
codebase enough to know what. So any pointers to existing code which
does something similar would be great. So, what to call once material
views' heaps are swapped to call triggers?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Performance issue in foreign-key-aware join estimation

2018-12-25 Thread Tomas Vondra


On 12/25/18 3:48 AM, David Rowley wrote:
> On Tue, 25 Dec 2018 at 13:46, Tomas Vondra  
> wrote:
>> I however observe failures on 4 regression test suites - inherit,
>> equivclass, partition_join and partition_prune (diff attached). That's a
>> bit surprising, because AFAICS the patch merely optimizes the execution
>> and should not change the planning otherwise (all the failures seem to
>> be a query plan changing in some way). I'm not sure if the plans are
>> correct or better than the old ones.
> 
> Seems I didn't run the tests after doing a last-minute move of the
> create_eclass_index() call in make_one_rel(). I'd previously had it
> just below set_base_rel_pathlists(), which meant that the index didn't
> exist in some cases so it would fall back on the original code.  It
> appears the use case call from set_base_rel_pathlists() require
> is_child eclass members to be indexed too, but these were not in v1 or
> v2 since ec_relids does not record child rel members.
> 
> It seems simple enough to fix this just by adding ec_allrelids and
> setting it for all members rather than just non-children members.
> This also allows me to add the two additional cases to allow
> generate_implied_equalities_for_column() and
> has_relevant_eclass_joinclause() to also make use of the eclass index.
> This further reduces the total planning time for the query on my
> machine to 2304 ms.
> 

OK, that makes sense.

>> The other thing is that we probably should not use a single test case to
>> measure the optimization - I doubt it can improve less extreme queries,
>> but maybe we should verify it does not regress them?
> 
> Agreed. That still needs to be verified.  Although I'm yet unsure what
> sort of form we could use this idea in.  I wonder if it's fine to
> create this eclass index on the fly, or if it's something we should
> keep maintained all the time.  The problem I saw with keeping it all
> the time is down to eq_classes being a List and list_nth() is slow,
> which means the bms_next_member() loops I've added would perform
> poorly when compiled with a linked list lookup.
> 

Yeah, good questions. I think the simplest thing we could do is building
them on the first access - that would at least ensure we don't build the
index without accessing it at least once.

Of course, it might still be faster to do the check directly, for small
numbers of eclasses / fkeys and rels. Perhaps there's some sort of
heuristics deciding when to build the indexes, but that seems like an
overkill at this point.

What I propose is constructing a "minimal" simple query invoking this
code (something like two rels with a join on a foreign key) and measure
impact on that. That seems like a fairly realistic use case.

ALso, I wonder what is te goal here - how much do we need to redude the
duration to be happy? Initially I'd say "on par with the state before
the FK patch" but I guess we're already optimizing beyond that point.
What was the timing before adding the FK stuff?

>> With the patch attached, bms_overlap gets quite high in the profiles. I
>> think one reason for that is that all bitmap operations (not just
>> bms_overlap) always start the loop at 0. For the upper boundary is
>> usually determined as Min() of the lengths, but we don't do that for
>> lower boundary because we don't track that. The bitmaps for eclasses are
>> likely sparse, so this is quite painful. Attaches is a simple patch that
>> adds tracking of "minword" and uses it in various bms_ methods to skip
>> initial part of the loop. On my machine this reduces the timings by
>> roughly 5% (from 1610 to 1530 ms).
> 
> Seems like an interesting idea, although for the most part, Bitmapsets
> are small and this adds some overhead to all use cases.  I doubt it is
> worth the trouble for bms_is_member(), since that does not need to
> perform a loop over each bitmapword.

Perhaps. The bitmapsets are generally small in number of members, but
the values may be actually quite high in some cases (I don't have any
exact statistics, though).

You're right it adds a bit of overhead, but I'd expect that to be
outweighted by the reduction of cache misses. But 5% is fairly close to
noise.

> It looks like most of the bms_overlap() usages are caused by
> functions like have_join_order_restriction(), join_is_legal(), 
> check_outerjoin_delay() and add_paths_to_joinrel(), all of which are 
> performing a loop over the join_info_list. Perhaps that could be 
> indexed a similar way to the eq_classes List. I'd imagine there's 
> about another 20-30% of performance to squeeze out of this by doing 
> that, plus a bit more by making the fkey_list per RelOptInfo.
> 

Looks promising.

> Attached is v3 of the hacked up proof of concept performance demo patch.
> 

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: GIN predicate locking slows down valgrind isolationtests tremendously

2018-12-25 Thread Alexander Korotkov
On Tue, Dec 25, 2018 at 4:32 AM Alexander Korotkov
 wrote:
> On Tue, Dec 25, 2018 at 12:19 AM Alexander Korotkov
>  wrote:
> > So, we're checking for conflict on tree root for every entry insert.
> > That's right for posting tree, but completely unneeded for entry tree.
> > I'm intended to change that to lock root of only posting tree if
> > nobody explains me why I'm wrong...
>
> BTW, I've tried to remove conflict checking from entry tree root
> (patch is attached).  Neither current or my version of isolation tests
> appear to be affected.

Given no comments yet, I'll commit this on no objection.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Shared Memory: How to use SYSV rather than MMAP ?

2018-12-25 Thread Thomas Munro
On Wed, Dec 19, 2018 at 4:17 AM REIX, Tony  wrote:
> Here is the patch we are using now on AIX for enabling SysV shm for AIX, 
> which improves greatly the performance on AIX.
>
> It is compile time.
>
> It seems to me that you'd like this to become a shared_memory_type GUC. 
> Correct? However, I do not know how to do.
>
>
> Even as-is, this patch would greatly improve the performance of PostgreSQL 
> v11.1 in the field on AIX machines. So, we'd like this change to be available 
> for AIX asap.
>
>
> What are the next steps to get this patch accepted? or What are your 
> suggestions for improving it?

Hi Tony,

Since it's not fixing a bug, we wouldn't back-patch that into existing
releases.  But I agree that we should do something like this for
PostgreSQL 12, and I think we should make it user configurable.

Here is a quick rebase of Andres's shared_memory_type patch for
master, so that you can put shared_memory_type=sysv in postgresql.conf
to get the old pre-9.3 behaviour (this may also be useful for other
operating systems).  Here also is a "blind" patch that makes it
respect huge_pages=try/on on AIX (or at least, I think it does; I
don't have an AIX to try it, it almost certainly needs some
adjustments).  Thoughts?


--
Thomas Munro
http://www.enterprisedb.com
From 1d3d78b7bcf1e2be939e8a5b9cc09e5b9f679e2b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 26 Dec 2018 12:03:51 +1300
Subject: [PATCH 1/2] Add shared_memory_type GUC.

Since 9.3 we have used anonymous shared mmap for our main shmem
segment, except in EXEC_BACKEND builds.  Provide a GUC so that users
can opt into System V shared memory once again.  This is reported to
be useful at least on AIX systems.  Future commits will add huge/large
page support for System V memory.

Author: Andres Freund (revived and documented by Thomas Munro)
Discussion: https://postgr.es/m/HE1PR0202MB28126DB4E0B6621CC6A1A91286D90%40HE1PR0202MB2812.eurprd02.prod.outlook.com
Discussion: https://postgr.es/m/2AE143D2-87D3-4AD1-AC78-CE2258230C05%40FreeBSD.org
---
 doc/src/sgml/config.sgml  | 24 +++
 src/backend/port/sysv_shmem.c | 42 +++
 src/backend/storage/ipc/ipci.c|  2 +
 src/backend/utils/misc/guc.c  | 23 ++
 src/backend/utils/misc/postgresql.conf.sample |  6 +++
 src/include/storage/pg_shmem.h| 20 +
 6 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305add..78d1c654d7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1694,6 +1694,30 @@ include_dir 'conf.d'
   
  
 
+ 
+  shared_memory_type (enum)
+  
+   shared_memory_type configuration parameter
+  
+  
+  
+   
+Specifies the shared memory implementation that the server
+should use for the main shared memory region used for
+PostgreSQL's buffer pool and other core facilities.
+Possible values are mmap (for anonymous shared
+memory allocated using mmap), sysv
+(for System V shared memory allocated via shmget) and
+windows (for Windows shared memory).
+Not all values are supported on all platforms; the first supported
+option is the default for that platform.  The use of the
+sysv option, which is not the default on any platform,
+is generally discouraged because it typically requires non-default
+kernel settings to allow for large allocations.
+   
+  
+ 
+
  
   dynamic_shared_memory_type (enum)
   
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 741c455ccb..c4aaa08c6a 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -62,10 +62,11 @@
  * to a process after exec().  Since EXEC_BACKEND is intended only for
  * developer use, this shouldn't be a big problem.  Because of this, we do
  * not worry about supporting anonymous shmem in the EXEC_BACKEND cases below.
+ *
+ * As of PostgreSQL 12, we regained the ability to use a large System V shared
+ * memory region even in non-EXEC_BACKEND builds, if shared_memory_type is set
+ * to sysv (though this is not the default).
  */
-#ifndef EXEC_BACKEND
-#define USE_ANONYMOUS_SHMEM
-#endif
 
 
 typedef key_t IpcMemoryKey;		/* shared memory key passed to shmget(2) */
@@ -75,10 +76,8 @@ typedef int IpcMemoryId;		/* shared memory ID returned by shmget(2) */
 unsigned long UsedShmemSegID = 0;
 void	   *UsedShmemSegAddr = NULL;
 
-#ifdef USE_ANONYMOUS_SHMEM
 static Size AnonymousShmemSize;
 static void *AnonymousShmem = NULL;
-#endif
 
 static void *InternalIpcMemoryCreate(IpcMemoryKey memKey, Size size);
 static void IpcMemoryDetach(int status, Datum shmaddr);
@@ -370,8 +369,6 @@ PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2)
 	return true;
 }
 
-#ifdef USE_ANONYMOUS_SHMEM
-
 #ifdef MAP_HUGETLB
 
 /*
@@ -534,8 +531,6 @@ A

Re: Minor comment fix for pg_config_manual.h

2018-12-25 Thread Michael Paquier
On Tue, Dec 25, 2018 at 10:22:30AM -0500, Tom Lane wrote:
> The text still seems a bit awkward.  Maybe "... to be used when initdb
> is run without the ..."

like the attached perhaps?  At the same time I am thinking about
reformulating the second sentence as well..
--
Michael
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index cc5eedfc41..02395af797 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -14,8 +14,8 @@
  */
 
 /*
- * This is default value for wal_segment_size to be used at initdb when run
- * without --walsegsize option. Must be a valid segment size.
+ * This is the default value for wal_segment_size to be used when initdb is run
+ * without the --wal-segsize option.  It must be a valid segment size.
  */
 #define DEFAULT_XLOG_SEG_SIZE	(16*1024*1024)
 


signature.asc
Description: PGP signature


Re: pg_dumpall --exclude-database option

2018-12-25 Thread Michael Paquier
On Tue, Dec 25, 2018 at 10:32:34AM +0100, Fabien COELHO wrote:
> Joyeuses fêtes !

Merci.  You too Happy New Year and Merry christmas.  (Sentence valid
for all folks reading this email, as well as folks not reading it).
--
Michael


signature.asc
Description: PGP signature


Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Michael Paquier
On Tue, Dec 25, 2018 at 07:05:30PM +0100, Fabien COELHO wrote:
> You use 1024² bytes. What about 100 bytes per MB, as the unit is about
> stored files?
> 
> Also, you did not answer to my other points:
>  - use "instr_time.h" for better precision
>  - invert sizeonly
>  - reuse a test

It seems to me that the SIGUSR1 business is not necessary for the
basic layer of this feature, so I would rather rip that off now.  If
necessary we could always discuss that later on.  My take about the
option is that --progress should not be the default, but that reports
should only be provided if the caller wants them.

--quiet may have some value by itself, but that seems like a separate
discussion to me.

+   /*
+* If we are reporting to a terminal, send a carriage return so that we
+* stay on the same line.  If not, send a newline.
+*/
+   if (isatty(fileno(stderr)))
+   fprintf(stderr, "\r");
+   else
+   fprintf(stderr, "\n");
This bit is not really elegant, why not just '\r'?

+   /* The same for total size */
+   if (current_size > total_size)
+   total_size = current_size / 1048576;
Let's use that in a variable and not hardcode the number.

What's introduced here is very similar to what progress_report() does
in pg_rewind/logging.c.  The report depends on the context so this
cannot be a common routine logic but perhaps we could keep a
consistent output for both tools?
--
Michael


signature.asc
Description: PGP signature


RE: [suggestion]support UNICODE host variables in ECPG

2018-12-25 Thread Nagaura, Ryohei
Hi,

On Fri, Dec 21, 2018 at 5:08 PM, Tom Lane wrote:
> I don't think I buy that argument; it falls down as soon as you consider
> characters above U+.  I worry that by supporting UTF16, we'd basically
> be encouraging users to write code that fails on such characters, which
> doesn't seem like good project policy.
Oh, I mistook.
Thank you for pointing out.

On Mon, Dec 24, 2018 at 5:07 PM, Matsumura Ryo wrote:
> I think that the first benefit of suggestion is providing a way to treat
> UTF16 chars for application. Whether or not to support above
> U+ (e.g. surrogate pair) may be a next discussion.
Thank you for your comments.
Yes, I'd like to judge the necessity of this function before designing.

Best regards,
-
Ryohei Nagaura






RE: Timeout parameters

2018-12-25 Thread Nagaura, Ryohei
Hi,

On Tue, Dec 25, 2018 at 2:59 AM, Fabien COELHO wrote:
> >> 4. The client wants to close the connection while leaving the job to
> >> the server.
> >> In this case, "statement_timeout" can't satisfy at line 4.
>
> Why?
> ISTM that "leaving the job" to the server with a client-side connection
> closed is basically an abort, no different from what server-side
> "statement_timeout" already provides?
"while leaving the job to the server" means that "while the server continue the 
job".
# Sorry for the inappropriate explanation.
I understand that "statement_timeout" won't.

> Also, from a client perspective, if you use statement_timeout, it would
> timeout, then the client would process the error and the connection would
> be ready for the next query without needing to be re-created, which is quite
> costly anyway? Also, if the server is busy, recreating an connection is
> expensive so it won't help much, really?
When the recreating the connection the server may be not busy.
In this case, it isn't so costly to reconnect.
Also, if a client do not have to execute the remaining query immediately after 
timeout, 
the client will have the choice of waiting until the server is not busy.

> About the implementation, I'm wondering whether something simpler could
> be done. Check how psql implements "ctrl-c" to abort a running query: it
> seems that it sends a cancel message, no need to actually abort the
> connection?
This is my homework.

> Hmmm "client_statement_timeout" maybe?
I agree.

Best regards,
-
Ryohei Nagaura





Re: Progress reporting for pg_verify_checksums

2018-12-25 Thread Alvaro Herrera
On 2018-Dec-26, Michael Paquier wrote:

> +   /*
> +* If we are reporting to a terminal, send a carriage return so that we
> +* stay on the same line.  If not, send a newline.
> +*/
> +   if (isatty(fileno(stderr)))
> +   fprintf(stderr, "\r");
> +   else
> +   fprintf(stderr, "\n");
> This bit is not really elegant, why not just '\r'?

Umm, this is established coding pattern in pg_basebackup.c.
Stylistically I'd change all those cases to "fprintf(stderr,
isatty(fileno(stderr)) ? "\r" : "\n")" but leave the string alone, since
AFAIR it took some time to figure out what to do.  (I'd also make the
comment one line instead of four, say "Stay on the same line if
reporting to a terminal".  That makes the whole stanza two lines rather
than eight, which is the appropriate amount of space for it).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-25 Thread Alvaro Herrera
On 2018-Dec-24, Mitar wrote:


> I made another version of the patch. This one does UPDATEs for changed
> row instead of DELETE/INSERT.
> 
> All existing regression tests are still passing (make check).

Okay, but you don't add any for your new feature, which is not good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-25 Thread Mitar
Hi!

On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera  wrote:
> > I made another version of the patch. This one does UPDATEs for changed
> > row instead of DELETE/INSERT.
> >
> > All existing regression tests are still passing (make check).
>
> Okay, but you don't add any for your new feature, which is not good.

Yes, I have not yet done that. I want first to also add calling
triggers for non-concurrent refresh, but I would need a bit help there
(what to call, example of maybe code which does something similar
already).


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



Re: Feature: triggers on materialized views

2018-12-25 Thread Alvaro Herrera
On 2018-Dec-25, Mitar wrote:

> On Tue, Dec 25, 2018 at 6:47 PM Alvaro Herrera  
> wrote:
> > > I made another version of the patch. This one does UPDATEs for changed
> > > row instead of DELETE/INSERT.
> > >
> > > All existing regression tests are still passing (make check).
> >
> > Okay, but you don't add any for your new feature, which is not good.
> 
> Yes, I have not yet done that. I want first to also add calling
> triggers for non-concurrent refresh, but I would need a bit help there
> (what to call, example of maybe code which does something similar
> already).

Well, REFRESH CONCURRENTLY runs completely different than non-concurrent
REFRESH.  The former updates the existing data by observing the
differences with the previous data; the latter simply re-runs the query
and overwrites everything.  So if you simply enabled triggers on
non-concurrent refresh, you'd just see a bunch of inserts into a
throwaway data area (a transient relfilenode, we call it), then a swap
of the MV's relfilenode with the throwaway one.  I doubt it'd be useful.
But then I'm not clear *why* you would like to do a non-concurrent
refresh.  Maybe your situation would be best served by forbidding non-
concurrent refresh when the MV contains any triggers.

Alternatively, maybe reimplement non-concurrent refresh so that it works
identically to concurrent refresh (except with a stronger lock).  Not
sure if this implies any performance penalties.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Feature: triggers on materialized views

2018-12-25 Thread Mitar
Hi!

On Tue, Dec 25, 2018 at 7:05 PM Alvaro Herrera  wrote:
> But then I'm not clear *why* you would like to do a non-concurrent
> refresh.

I mostly wanted to support if for two reasons:

- completeness: maybe we cannot imagine the use case yet, but somebody
might in the future
- getting trigger calls for initial inserts: you can then create
materialized view without data, attach triggers, and then run a
regular refresh; this allows you to have only one code path to process
any (including initial) changes to the view through notifications,
instead of handling initial data differently

> Maybe your situation would be best served by forbidding non-
> concurrent refresh when the MV contains any triggers.

If this would be acceptable by the community, I could do it. I worry
though that one could probably get themselves into a situation where
materialized view losses all data through some WITH NO DATA operation
and concurrent refresh is not possible. Currently concurrent refresh
works only with data. We could make concurrent refresh also work when
materialized view has no data easily (it would just insert data and
not compute diff).

> Alternatively, maybe reimplement non-concurrent refresh so that it works
> identically to concurrent refresh (except with a stronger lock).  Not
> sure if this implies any performance penalties.

Ah, yes. I could just do TRUNCATE and INSERT, instead of heap swap.
That would then generate reasonable trigger calls.

Are there any existing benchmarks for such operations I could use to
see if there are any performance changes if I change implementation
here? Any guidelines how to evaluate this?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



RE: Timeout parameters

2018-12-25 Thread Nagaura, Ryohei
Hi Fabien.

On Wed, Dec 26, 2018 at 3:02 AM, Fabien COELHO wrote:
> About the patches: you are expected to send consistent patches, i.e. one
> feature with its associated documentation, not two separate features and
> another patch for documenting them.
Thank you for teaching me.
I rewrote patches and attached in this mail.

Best regards,
-
Ryohei Nagaura



TCP_backend_v3.patch
Description: TCP_backend_v3.patch


TCP_interface_v3.patch
Description: TCP_interface_v3.patch


Re: speeding up planning with partitions

2018-12-25 Thread Amit Langote
Thank you Imai-san.

On 2018/12/25 16:47, Imai, Yoshikazu wrote:
> Here's the continuation of the review. Almost all of below comments are
> little fixes.
> 
> ---
> 0001: line 76-77
> In commit message:
>   exclusion for target child relation, which is no longer
>   is no longer needed.  Constraint exclusion runs during query_planner
> 
> s/which is no longer is no longer needed/which is no longer needed/
> 
> ---
> 0001: line 464
> + if (IS_DUMMY_REL(find_base_rel(root, resultRelation )))
> 
> s/resultRelation )))/resultRelation)))/
> (There is an extra space.)
> 
> ---
> 0001: line 395-398
> +  * Reset inh_target_child_roots to not be same as parent root's so that
> +  * the subroots for this child's own children (if any) don't end up in
> +  * root parent's list.  We'll eventually merge all entries into one 
> list,
> +  * but that's now now.
> 
> s/that's now now/that's not now/
> 
> ---
> 0001: line 794
> +  * are put into  a list that will be controlled by a single ModifyTable
> 
> s/are put into  a list/are put into a list/
> (There are two spaces between "into" and "a".)

All fixed in my local repository.

> ---
> 0001: line 241-242, 253-254, 291-294 (In set_append_rel_size())
> 
> + if (appinfo->parent_relid == root->parse->resultRelation)
> + subroot = adjust_inherit_target_child(root, childrel, 
> appinfo);
> 
> + add_child_rel_equivalences(subroot, appinfo, rel, 
> childrel,
> +root 
> != subroot);
> 
> + if (subroot != root)
> + {
> + root->inh_target_child_roots =
> + 
> lappend(root->inh_target_child_roots, subroot);
> 
> A boolean value of "appinfo->parent_relid == root->parse->resultRelation" is
> same with "subroot != root"(because of line 241-242), so we can use bool
> variable here like
> child_is_target = (appinfo->parent_relid == root->parse->resultRelation).
> The name of child_is_target is brought from arguments of
> add_child_rel_equivalences() in your patch.
>
> I attached a little diff as "v9-0001-delta.diff".

Sounds like a good idea for clarifying the code, so done.

> ---
> 0001: line 313-431
> 
> adjust_inherit_target_child() is in allpaths.c in your patch, but it has the
> code like below ones which are in master's(not patch applied) planner.c or
> planmain.c, so could it be possible in planner.c(or planmain.c)?
> This point is less important, but I was just wondering whether 
> adjust_inherit_target_child() should be in allpaths.c, planner.c or 
> planmain.c.
> 
> + /* Translate the original query's expressions to this child. */
> + subroot = makeNode(PlannerInfo);
> + memcpy(subroot, root, sizeof(PlannerInfo));
> 
> + root->parse->targetList = root->unexpanded_tlist;
> + subroot->parse = (Query *) adjust_appendrel_attrs(root,
> + 
>   (Node *) root->parse,
> + 
>   1, &appinfo);
> 
> + tlist = preprocess_targetlist(subroot);
> + subroot->processed_tlist = tlist;
> + build_base_rel_tlists(subroot, tlist);

I'm thinking of changing where adjust_inherit_target_child gets called
from.  In the current patch, it's in the middle of set_rel_size which I'm
starting to think is a not a good place for it.  Maybe, I'll place the
call call near to where inheritance is expanded.

> ---
> 0001: line 57-70
> 
> In commit message:
>   This removes some existing code in inheritance_planner that dealt
>   with any subquery RTEs in the query.  The rationale of that code
>   was that the subquery RTEs may change during each iteration of
>   planning (that is, for different children), so different iterations
>   better use different copies of those RTEs.  
>   ...
>   Since with the new code
>   we perform planning just once, I think we don't need this special
>   handling.
> 
> 0001: line 772-782
> -  * controlled by a single ModifyTable node.  All the instances share the
> -  * same rangetable, but each instance must have its own set of subquery
> -  * RTEs within the finished rangetable because (1) they are likely to 
> get
> -  * scribbled on during planning, and (2) it's not inconceivable that
> -  * subqueries could get planned differently in different cases.  We need
> -  * not create duplicate copies of other RTE kinds, in particular not the
> -  * target relations, because they don't have either of those issues.  
> Not
> -  * having to duplicate the target relations is important because doing 
> so
> -  * (1) would result in a rangetable of length O(N^2) for N targets, with
> -  * at least O(N^3) work expended here; and (2) would greatly complicate
> -  * management of the 

Re: Alter table documentation page (again)

2018-12-25 Thread Michael Paquier
On Fri, Dec 07, 2018 at 07:48:39PM +0100, Lætitia Avrot wrote:
> Here's the patch.
> The patch should apply to MASTER. I built and tested it successfully on my
> laptop.
> 
> I'll add it to January's commitfest.

What's proposed here looks good to me, and all the grounds are
covered, so I am switching the patch as ready for committer.

Álvaro, perhaps you would prefer committing it yourself?
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Michael Paquier
On Tue, Dec 25, 2018 at 02:27:03PM +0900, Michael Paquier wrote:
> Thanks, I have committed this one after making the logic to ignore
> column numbers a bit smarter, one problem being that "ALTER INDEX foo
> ALTER COLUMN" would try to suggest SET STATISTICS directly, which is
> incorrect.  Instead I have tweaked the completion so as COLUMN is
> added after "ALTER INDEX foo ALTER".  This could be completed later
> with the column numbers, in a way similar to what ALTER TABLE does.

Could you rebase the latest patch please?  The latest version sent
does not apply anymore: 
[1]: 
https://www.postgresql.org/message-id/bcecaf0e-ab92-8271-6887-da213aea9...@lab.ntt.co.jp
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Tatsuro Yamada

On 2018/12/25 14:27, Michael Paquier wrote:

On Tue, Dec 25, 2018 at 10:56:04AM +0900, Tatsuro Yamada wrote:

Hmm... Okey, I agree.  Why I implemented the exotic part of the
feature is that it is for user-friendly.  However, I suppose that
user know the syntax because the syntax is used by an expert user.


Thanks, I have committed this one after making the logic to ignore
column numbers a bit smarter, one problem being that "ALTER INDEX foo
ALTER COLUMN" would try to suggest SET STATISTICS directly, which is
incorrect.  Instead I have tweaked the completion so as COLUMN is
added after "ALTER INDEX foo ALTER".  This could be completed later
with the column numbers, in a way similar to what ALTER TABLE does.
--
Michael


Thanks for taking your time! :)

Cheers!
Tatsuro Yamada






Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Tatsuro Yamada

On 2018/12/26 13:50, Michael Paquier wrote:

On Tue, Dec 25, 2018 at 02:27:03PM +0900, Michael Paquier wrote:

Thanks, I have committed this one after making the logic to ignore
column numbers a bit smarter, one problem being that "ALTER INDEX foo
ALTER COLUMN" would try to suggest SET STATISTICS directly, which is
incorrect.  Instead I have tweaked the completion so as COLUMN is
added after "ALTER INDEX foo ALTER".  This could be completed later
with the column numbers, in a way similar to what ALTER TABLE does.


Could you rebase the latest patch please?  The latest version sent
does not apply anymore:
[1]: 
https://www.postgresql.org/message-id/bcecaf0e-ab92-8271-6887-da213aea9...@lab.ntt.co.jp
--
Michael


Do you mean my "fix_manual_of_alter_index_v2.patch"?
It is able to patch on f89ae34ab8b4d9e9ce8af6bd889238b0ccff17cb like this:

=
$ patch -p1 < fix_manual_of_alter_index_v2.patch
patching file doc/src/sgml/ref/alter_index.sgml
=

Thanks,
Tatsuro Yamada





Re: plpgsql plugin - stmt_beg/end is not called for top level block of statements

2018-12-25 Thread Michael Paquier
On Wed, Dec 19, 2018 at 07:04:50AM +0100, Pavel Stehule wrote:
> I can imagine some tracking extension, that will do some
> initializations on plpgsql_stmt_block statement hook - but the most
> important will not be called ever.

I was just studying this stuff and reviewing this patch with fresh
eyes, and it seems to me that it is actually incorrect.  This changes
the execution logic so as stmt_beg and stmt_end are called
additionally each time a function, a trigger or an event trigger is
executed.  If one looks closely at the code, he/she could notice that
func_beg and func_end are already present as hook points to watch what
is happening in the execution, as these are here to give entry points
for execution functions, so it seems to me that we don't need extra
watch calls as proposed, because there is already everything needed,
and that the current points are correct.
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Michael Paquier
On Wed, Dec 26, 2018 at 02:05:26PM +0900, Tatsuro Yamada wrote:
> Do you mean my "fix_manual_of_alter_index_v2.patch"?

Nope.  This patch is only a proposal for the documentation.  The main
patch to extend psql completion so as column numbers are suggested
fails to apply.
--
Michael


signature.asc
Description: PGP signature


Re: Allow auto_explain to log to NOTICE

2018-12-25 Thread Michael Paquier
On Tue, Oct 30, 2018 at 11:44:42AM +0100, Daniel Gustafsson wrote:
> Circling back to this, I updated the patch with providing another
> option as I couldn’t think of another way to do it cleanly.  I’ll
> add the patch to the next CF but as it’s just about to start it
> should be moved to the next once started.

+-- Shouldn't log due to query being too fast
+SET auto_explain.log_level = NOTICE;
+SET auto_explain.log_min_duration = 1000;
+SELECT NULL FROM pg_catalog.pg_class WHERE relname = 'pg_class';
I am ready to be that 1s is not enough as some buildfarm machines are
legendary slow.  Honestly, to not have more to worry about I think
that this bit should be dropped.

"auto_explain.log_duration" should actually be named log_summary,
except that it defaults to true to be backward-compatible, while for
EXPLAIN the default is false, no?  It would be nice to be consistent
with EXPLAIN for those options for the naming at least.  The default
behavior of those parameters would be inconsistent as the duration is
showed by default with auto_explain and not with EXPLAIN, but it does
not seem like a good idea to change that..
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Tatsuro Yamada

On 2018/12/26 14:15, Michael Paquier wrote:

On Wed, Dec 26, 2018 at 02:05:26PM +0900, Tatsuro Yamada wrote:

Do you mean my "fix_manual_of_alter_index_v2.patch"?


Nope.  This patch is only a proposal for the documentation.  The main
patch to extend psql completion so as column numbers are suggested
fails to apply.


I rebased the WIP patch. :)

* Following query is added to get attribute numbers of index,
  however its result contains not only expression columns but also other 
columns.

* I'm not sure what should I use "%d" and first "%s" in the query, so I 
commented out: /* %d %s */.
  I know this is ugly.. Do you know how to use?

+#define Query_for_list_of_attribute_numbers \
+"SELECT attnum "\
+"  FROM pg_catalog.pg_attribute a, "\
+"   pg_catalog.pg_class c "\
+" WHERE c.oid = a.attrelid "\
+"   AND a.attnum > 0 "\
+"   AND NOT a.attisdropped "\
+"   /* %d %s */" \
+"   AND a.attrelid = (select oid from pg_catalog.pg_class where relname = '%s') 
"\
+"   AND pg_catalog.pg_table_is_visible(c.oid) "\
+"order by a.attnum asc "

Thanks,
Tatsuro Yamada
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 91df96e796..ee008c4540 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -583,6 +583,18 @@ static const SchemaQuery Query_for_list_of_statistics = {
 "OR '\"' || relname || '\"'='%s') "\
 "   AND pg_catalog.pg_table_is_visible(c.oid)"
 
+#define Query_for_list_of_attribute_numbers \
+"SELECT attnum "\
+"  FROM pg_catalog.pg_attribute a, "\
+"   pg_catalog.pg_class c "\
+" WHERE c.oid = a.attrelid "\
+"   AND a.attnum > 0 "\
+"   AND NOT a.attisdropped "\
+"   /* %d %s */" \
+"   AND a.attrelid = (select oid from pg_catalog.pg_class where relname = '%s') "\
+"   AND pg_catalog.pg_table_is_visible(c.oid) "\
+"order by a.attnum asc "
+
 #define Query_for_list_of_attributes_with_schema \
 "SELECT pg_catalog.quote_ident(attname) "\
 "  FROM pg_catalog.pg_attribute a, pg_catalog.pg_class c, pg_catalog.pg_namespace n "\
@@ -1604,6 +1616,13 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER INDEX  ALTER */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER"))
 		COMPLETE_WITH("COLUMN");
+	/* ALTER INDEX  ALTER COLUMN */
+	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN") ||
+			 Matches("ALTER", "INDEX", MatchAny, "ALTER"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_list_of_attribute_numbers);
+	}
 	/* ALTER INDEX  ALTER COLUMN  */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN", MatchAny))
 		COMPLETE_WITH("SET STATISTICS");


Re: Control your disk usage in PG: Introduction to Disk Quota Extension

2018-12-25 Thread Michael Paquier
On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
> +1 for adding some hooks to support this kind of thing, but I think
> the names you've chosen are not very good.  The hook name should
> describe the place from which it is called, not the purpose for which
> one imagines that it will be used, because somebody else might imagine
> another use.  Both BufferExtendCheckPerms_hook_type and
> SmgrStat_hook_type are imagining that they know what the hook does -
> CheckPerms in the first case and Stat in the second case.

I personally don't mind making Postgres more pluggable, but I don't
think that we actually need the extra ones proposed here at the layer
of smgr, as smgr is already a layer designed to call an underlying set
of APIs able to extend, unlink, etc. depending on the storage type.

> For this particular purpose, I don't immediately see why you need a
> hook in both places.  If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?

Yes, that's a bit awkward.
--
Michael


signature.asc
Description: PGP signature


Re: plpgsql plugin - stmt_beg/end is not called for top level block of statements

2018-12-25 Thread Pavel Stehule
st 26. 12. 2018 v 6:09 odesílatel Michael Paquier 
napsal:

> On Wed, Dec 19, 2018 at 07:04:50AM +0100, Pavel Stehule wrote:
> > I can imagine some tracking extension, that will do some
> > initializations on plpgsql_stmt_block statement hook - but the most
> > important will not be called ever.
>
> I was just studying this stuff and reviewing this patch with fresh
> eyes, and it seems to me that it is actually incorrect.  This changes
> the execution logic so as stmt_beg and stmt_end are called
> additionally each time a function, a trigger or an event trigger is
> executed.  If one looks closely at the code, he/she could notice that
> func_beg and func_end are already present as hook points to watch what
> is happening in the execution, as these are here to give entry points
> for execution functions, so it seems to me that we don't need extra
> watch calls as proposed, because there is already everything needed,
> and that the current points are correct.
>

The design about this feature has not clean borders - I see a problem with
func_beg and func_end because these handlers should to share some logic
with stmt_beg, stmt_end when handler for stmt_block is not empty.

More the some behave can be surprise for developer - example - if use
handler for smt_beg

then for code

$$
BEGIN
  RETURN x;
END;
$$

is called only once - what is expected;

but for code

$$
BEGIN
  BEGIN
RETURN x;
  END;
END
$$

is called two times, what is not expected, if you don't know some about
this inconsistency.

So it is reason, why I don't think so current behave is correct. On second
hand, the impact is very small - only few extensions uses plpgsql plugin
API, and workaround is not hard. So I can live with current state if nobody
see this issue (sure - it is minor issue).

Regards

Pavel


> --
> Michael
>


Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2018-12-25 Thread Tatsuro Yamada

On 2018/12/26 14:50, Tatsuro Yamada wrote:

On 2018/12/26 14:15, Michael Paquier wrote:

On Wed, Dec 26, 2018 at 02:05:26PM +0900, Tatsuro Yamada wrote:

Do you mean my "fix_manual_of_alter_index_v2.patch"?


Nope.  This patch is only a proposal for the documentation.  The main
patch to extend psql completion so as column numbers are suggested
fails to apply.


I rebased the WIP patch. :)

* Following query is added to get attribute numbers of index,
   however its result contains not only expression columns but also other 
columns.

* I'm not sure what should I use "%d" and first "%s" in the query, so I 
commented out: /* %d %s */.
   I know this is ugly.. Do you know how to use?

+#define Query_for_list_of_attribute_numbers \
+"SELECT attnum "\
+"  FROM pg_catalog.pg_attribute a, "\
+"   pg_catalog.pg_class c "\
+" WHERE c.oid = a.attrelid "\
+"   AND a.attnum > 0 "\
+"   AND NOT a.attisdropped "\
+"   /* %d %s */" \
+"   AND a.attrelid = (select oid from pg_catalog.pg_class where relname = '%s') 
"\
+"   AND pg_catalog.pg_table_is_visible(c.oid) "\
+"order by a.attnum asc "


I modified the patch to remove unusable condition.


# create table hoge (a integer, b integer, c integer);
# create index ind_hoge on hoge(a, b, c, (c*1), (c*2), (c*3), (c*4), (c*5), 
(c*6), (c*7), (c*8), (c*9));

# \d ind_hoge
   Index "public.ind_hoge"
 Column |  Type   | Key? | Definition
+-+--+
 a  | integer | yes  | a
 b  | integer | yes  | b
 c  | integer | yes  | c
 expr   | integer | yes  | (c * 1)
 expr1  | integer | yes  | (c * 2)
 expr2  | integer | yes  | (c * 3)
 expr3  | integer | yes  | (c * 4)
 expr4  | integer | yes  | (c * 5)
 expr5  | integer | yes  | (c * 6)
 expr6  | integer | yes  | (c * 7)
 expr7  | integer | yes  | (c * 8)
 expr8  | integer | yes  | (c * 9)
btree, for table "public.hoge"

# alter index ind_hoge alter column 
1   10  11  12  2   3   4   5   6   7   8   9

# alter index ind_hoge alter column 1 
1   10  11  12

# alter index ind_hoge alter column 10 
alter index ind_hoge alter COLUMN 10 SET STATISTICS


Thanks,
Tatsuro Yamada
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 91df96e796..a0e71550e2 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -583,6 +583,18 @@ static const SchemaQuery Query_for_list_of_statistics = {
 "OR '\"' || relname || '\"'='%s') "\
 "   AND pg_catalog.pg_table_is_visible(c.oid)"
 
+#define Query_for_list_of_attribute_numbers \
+"SELECT attnum "\
+"  FROM pg_catalog.pg_attribute a, "\
+"   pg_catalog.pg_class c "\
+" WHERE c.oid = a.attrelid "\
+"   AND a.attnum > 0 "\
+"   AND NOT a.attisdropped "\
+"   /* %d %s */" \
+"   AND a.attrelid = (select oid from pg_catalog.pg_class where relname = '%s') "\
+"   AND pg_catalog.pg_table_is_visible(c.oid) "\
+"order by a.attnum asc "
+
 #define Query_for_list_of_attributes_with_schema \
 "SELECT pg_catalog.quote_ident(attname) "\
 "  FROM pg_catalog.pg_attribute a, pg_catalog.pg_class c, pg_catalog.pg_namespace n "\
@@ -1604,6 +1616,12 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER INDEX  ALTER */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER"))
 		COMPLETE_WITH("COLUMN");
+	/* ALTER INDEX  ALTER COLUMN */
+	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN"))
+	{
+		completion_info_charp = prev3_wd;
+		COMPLETE_WITH_QUERY(Query_for_list_of_attribute_numbers);
+	}
 	/* ALTER INDEX  ALTER COLUMN  */
 	else if (Matches("ALTER", "INDEX", MatchAny, "ALTER", "COLUMN", MatchAny))
 		COMPLETE_WITH("SET STATISTICS");


Re: Don't wake up to check trigger file if none is configured

2018-12-25 Thread Michael Paquier
On Sat, Nov 24, 2018 at 11:29:22AM -0500, Jeff Janes wrote:
> I noticed that the existing codebase does not have a consensus on what to
> pass to WaitLatch for the timeout when the timeout isn't relevant. I picked
> 0, but -1L also has precedent.

WaitLatch enforces the timeout to -1 if WL_TIMEOUT is not given to the
caller, so setting a value does not matter much.  If WL_TIMEOUT is
defined, the code would blow up on an assertion if the timeout is
negative.  In short I would let the timeout be set to 5000L, but just
add the flag on the fly instead of having a if/else with two calls of
WaitLatch() as your patch does.

"git diff --check" complains about a whitespace issue.

Anyway, the timeout is useful to have for another thing: we check if
the WAL receiver is still alive thanks to that periodically, and this
even if no promote file is defined.  That's useful for failure
handling if the WAL receiver gets killed so as the startup process can
define more quickly a new source (or just create a new WAL sender) it
should use for feeding with fresh WAL data, no?

It seems to me that the comment on top of WaitLatch should be clearer
about that, and that the current state leads to confusion.  Another
thing coming to my mind is that I think it would be useful to make the
timeout configurable so as instances can react more quickly in the
case of a sudden death of the WAL receiver (or to check faster for a
trigger file if the HA application is to lazy to send a signal to the
standby host).

Attached is a patch to improve the comment for now.

Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..4452d7566a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12095,7 +12095,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 
 	/*
 	 * Wait for more WAL to arrive. Time out after 5 seconds
-	 * to react to a trigger file promptly.
+	 * to react to a trigger file promptly and to check if
+	 * the WAL receiver is still active.
 	 */
 	(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 	 WL_LATCH_SET | WL_TIMEOUT |


signature.asc
Description: PGP signature


Re: Problems with plan estimates in postgres_fdw

2018-12-25 Thread Etsuro Fujita

(2018/12/17 22:09), Etsuro Fujita wrote:

Here is a set of WIP patches for pushing down ORDER BY LIMIT to the remote:



For some regression test cases with ORDER BY and/or LIMIT, I noticed
that these patches still cannot push down those clause to the remote. I
guess it would be needed to tweak the cost/size estimation added by
these patches, but I didn't look at that in detail yet. Maybe I'm
missing something, though.


I looked into that.  In the previous patch, I estimated costs for 
performing grouping/aggregation with ORDER BY remotely, using the same 
heuristic as scan/join cases if appropriate (i.e., I assumed that a 
remote sort for grouping/aggregation also costs 20% extra), but that 
seems too large for grouping/aggregation.  So I reduced that to 5% 
extra.  The result showed that some test cases can be pushed down, but 
some still cannot.  I think we could push down the ORDER BY in all cases 
by reducing the extra cost to a much smaller value, but I'm not sure 
what value is reasonable.


Attached is an updated version of the patch.  Other changes:

* Modified estimate_path_cost_size() further so that it accounts for 
tlist eval cost as well

* Added more comments
* Simplified code a little bit

I'll add this to the upcoming CF.

Best regards,
Etsuro Fujita
>From 064888324f1e7b25ca31a3b64fce08219db7fd49 Mon Sep 17 00:00:00 2001
From: Etsuro Fujita 
Date: Wed, 26 Dec 2018 16:18:20 +0900
Subject: [PATCH 1/2] postgres_fdw: Perform UPPERREL_ORDERED step remotely

---
 contrib/postgres_fdw/deparse.c |  28 +-
 contrib/postgres_fdw/expected/postgres_fdw.out | 182 +---
 contrib/postgres_fdw/postgres_fdw.c| 374 +++--
 contrib/postgres_fdw/postgres_fdw.h|   9 +-
 src/backend/optimizer/plan/planner.c   |   7 +-
 src/include/nodes/relation.h   |  10 +
 6 files changed, 474 insertions(+), 136 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 654323f..cf7bd5e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -168,7 +168,8 @@ static void printRemotePlaceholder(Oid paramtype, int32 paramtypmod,
 static void deparseSelectSql(List *tlist, bool is_subquery, List **retrieved_attrs,
  deparse_expr_cxt *context);
 static void deparseLockingClause(deparse_expr_cxt *context);
-static void appendOrderByClause(List *pathkeys, deparse_expr_cxt *context);
+static void appendOrderByClause(List *pathkeys, bool has_final_sort,
+	deparse_expr_cxt *context);
 static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
 	  RelOptInfo *foreignrel, bool use_alias,
@@ -930,8 +931,8 @@ build_tlist_to_deparse(RelOptInfo *foreignrel)
 void
 deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
 		List *tlist, List *remote_conds, List *pathkeys,
-		bool is_subquery, List **retrieved_attrs,
-		List **params_list)
+		bool has_final_sort, bool is_subquery,
+		List **retrieved_attrs, List **params_list)
 {
 	deparse_expr_cxt context;
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) rel->fdw_private;
@@ -986,7 +987,7 @@ deparseSelectStmtForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *rel,
 
 	/* Add ORDER BY clause if we found any useful pathkeys */
 	if (pathkeys)
-		appendOrderByClause(pathkeys, &context);
+		appendOrderByClause(pathkeys, has_final_sort, &context);
 
 	/* Add any necessary FOR UPDATE/SHARE. */
 	deparseLockingClause(&context);
@@ -1591,7 +1592,7 @@ deparseRangeTblRef(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 		/* Deparse the subquery representing the relation. */
 		appendStringInfoChar(buf, '(');
 		deparseSelectStmtForRel(buf, root, foreignrel, NIL,
-fpinfo->remote_conds, NIL, true,
+fpinfo->remote_conds, NIL, false, true,
 &retrieved_attrs, params_list);
 		appendStringInfoChar(buf, ')');
 
@@ -3110,7 +3111,8 @@ appendGroupByClause(List *tlist, deparse_expr_cxt *context)
  * base relation are obtained and deparsed.
  */
 static void
-appendOrderByClause(List *pathkeys, deparse_expr_cxt *context)
+appendOrderByClause(List *pathkeys, bool has_final_sort,
+	deparse_expr_cxt *context)
 {
 	ListCell   *lcell;
 	int			nestlevel;
@@ -3127,7 +3129,19 @@ appendOrderByClause(List *pathkeys, deparse_expr_cxt *context)
 		PathKey*pathkey = lfirst(lcell);
 		Expr	   *em_expr;
 
-		em_expr = find_em_expr_for_rel(pathkey->pk_eclass, baserel);
+		if (has_final_sort)
+		{
+			/*
+			 * By construction, context->foreignrel is the input relation to
+			 * the final sort.
+			 */
+			em_expr = find_em_expr_for_input_target(context->root,
+	pathkey->pk_eclass,
+	context->foreignrel->reltarget);
+		}
+		else
+			em_expr = find_em_expr_for_rel(pathkey->pk_eclass, baserel);
+
 		Assert(em_expr != NULL);
 
 		appendStringInfoString(buf, delim);
diff --git a/co

RE: Timeout parameters

2018-12-25 Thread Fabien COELHO



Hello Ryohei,


4. The client wants to close the connection while leaving the job to
the server.
In this case, "statement_timeout" can't satisfy at line 4.


Why?
ISTM that "leaving the job" to the server with a client-side connection
closed is basically an abort, no different from what server-side
"statement_timeout" already provides?

"while leaving the job to the server" means that "while the server continue the 
job".
# Sorry for the inappropriate explanation.
I understand that "statement_timeout" won't.


I still do not understand the use-case specifics: for me, aborting the 
connection, or a softer cancelling the statement, will result in the 
server stopping the statement, so the server does NOT "continue the job", 
so I still do not see how it really differs from the server-side 
statement_timeout setting.


--
Fabien.