Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Amit Kapila
On Sun, Dec 8, 2019 at 10:44 AM Amit Kapila  wrote:
>
> On Sun, Dec 8, 2019 at 1:26 AM Tom Lane  wrote:
> >
> > So, just idly looking at the code in src/backend/port/win32/signal.c
> > and src/port/kill.c, I have to wonder why we have this baroque-looking
> > design of using *two* signal management threads.  And, if I'm
> > reading it right, we create an entire new pipe object and an entire
> > new instance of the second thread for each incoming signal.  Plus, the
> > signal senders use CallNamedPipe (hence, underneath, TransactNamedPipe)
> > which means they in effect wait for the recipient's signal-handling
> > thread to ack receipt of the signal.  Maybe there's a good reason for
> > all this but it sure seems like a lot of wasted cycles from here.
> >
> > I have to wonder why we don't have a single named pipe that lasts as
> > long as the recipient process does, and a signal sender just writes
> > one byte to it, and considers the signal delivered if it is able to
> > do that.  The "message" semantics seem like overkill for that.
> >
> > I dug around in the contemporaneous archives and could only find
> > https://www.postgresql.org/message-id/303E00EBDD07B943924382E153890E5434AA47%40cuthbert.rcsinc.local
> > which describes the existing approach but fails to explain why we
> > should do it like that.
> >
> > This might or might not have much to do with the immediate problem,
> > but I can't help wondering if there's some race-condition-ish behavior
> > in there that's contributing to what we're seeing.
> >
>
> On the receiving side, the work we do after the 'notify' is finished
> (or before CallNamedPipe gets control back) is as follows:
>
> pg_signal_dispatch_thread()
> {
> ..
> FlushFileBuffers(pipe);
> DisconnectNamedPipe(pipe);
> CloseHandle(pipe);
>
> pg_queue_signal(sigNum);
> }
>
> It seems most of these are the system calls which makes me think that
> they might be slow enough on some Windows version that it could lead
> to such race condition.
>

IIUC, once the dispatch thread has queued the signal
(pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread
will execute the signal.  So, if we move pg_queue_signal() before we
do WriteFile in pg_signal_dispatch_thread(), this race condition will
be closed.  Now, we might not want to do this as that will add some
more time (even though very less) before notify on the other side can
finish or maybe there is some technical problem with this idea which I
am not able to see immediately.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Fix a comment in CreateLockFile

2019-12-08 Thread Amit Kapila
On Sun, Dec 8, 2019 at 1:10 PM Hadi Moshayedi  wrote:
>
> It seems that explanation for the contents of the pid file has moved to 
> pidfile.h, but the comments in CreateLockFile() still point to miscadmin.h.
>
> The attached patch updates those pointers.
>

Your patch looks correct to me on a quick look.  I will take of this tomorrow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
Amit Kapila  writes:
> IIUC, once the dispatch thread has queued the signal
> (pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread
> will execute the signal.  So, if we move pg_queue_signal() before we
> do WriteFile in pg_signal_dispatch_thread(), this race condition will
> be closed.  Now, we might not want to do this as that will add some
> more time (even though very less) before notify on the other side can
> finish or maybe there is some technical problem with this idea which I
> am not able to see immediately.

Hmm.  Certainly worth trying to see if it resolves the failure on
Andrew's machines.

It's not real hard to believe that TransactNamedPipe could be
"optimized" so that it preferentially schedules the client thread
once the handshake is done, not the server thread (based on some
heuristic claim that the former is probably an interactive process
and the latter less so).  In that situation, we'd proceed on with
the signal not really delivered, and there is nothing guaranteeing
that it will be delivered anytime soon --- the rest of the test
can make progress regardless of whether that thread ever gets
scheduled again.  So, as long as we've got this handshake mechanism,
it seems like it'd be a good thing for the ack to indicate that
the signal was *actually* delivered (by setting the recipient's
flag bit) and not just that it'll probably get delivered eventually.

I remain a bit unsure that we actually need the handshaking business
at all --- I doubt that Unix signals provide any guarantee of synchronous
delivery on most platforms.  (If I'm reading the POSIX spec correctly,
it only requires synchronous delivery when a thread signals itself.)
But the existence of this unsynchronized thread in the Windows
implementation sure seems like a dubious thing, now that you
point it out.

regards, tom lane




Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
I wrote:
> Amit Kapila  writes:
>> IIUC, once the dispatch thread has queued the signal
>> (pg_queue_signal), the next CHECK_FOR_INTERRUPTS by the main thread
>> will execute the signal.  So, if we move pg_queue_signal() before we
>> do WriteFile in pg_signal_dispatch_thread(), this race condition will
>> be closed.  Now, we might not want to do this as that will add some
>> more time (even though very less) before notify on the other side can
>> finish or maybe there is some technical problem with this idea which I
>> am not able to see immediately.

> Hmm.  Certainly worth trying to see if it resolves the failure on
> Andrew's machines.

For Andrew's convenience, here's a draft patch for that.  I took the
liberty of improving the rather thin comments in this area, too.

regards, tom lane

diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 9b5e544..f1e35fd 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -208,12 +208,15 @@ pgwin32_create_signal_listener(pid_t pid)
  */
 
 
+/*
+ * Queue a signal for the main thread, by setting the flag bit and event.
+ */
 void
 pg_queue_signal(int signum)
 {
 	Assert(pgwin32_signal_event != NULL);
 	if (signum >= PG_SIGNAL_COUNT || signum <= 0)
-		return;
+		return;	/* ignore any bad signal number */
 
 	EnterCriticalSection(&pg_signal_crit_sec);
 	pg_signal_queue |= sigmask(signum);
@@ -222,7 +225,11 @@ pg_queue_signal(int signum)
 	SetEvent(pgwin32_signal_event);
 }
 
-/* Signal dispatching thread */
+/*
+ * Signal dispatching thread.  This runs after we have received a named
+ * pipe connection from a client (signal sender).   Process the request,
+ * close the pipe, and exit.
+ */
 static DWORD WINAPI
 pg_signal_dispatch_thread(LPVOID param)
 {
@@ -242,13 +249,37 @@ pg_signal_dispatch_thread(LPVOID param)
 		CloseHandle(pipe);
 		return 0;
 	}
+
+	/*
+	 * Queue the signal before responding to the client.  In this way, it's
+	 * guaranteed that once kill() has returned in the signal sender, the next
+	 * CHECK_FOR_INTERRUPTS() in the signal recipient will see the signal.
+	 * (This is a stronger guarantee than POSIX makes; maybe we don't need it?
+	 * But without it, we've seen timing bugs on Windows that do not manifest
+	 * on any known Unix.)
+	 */
+	pg_queue_signal(sigNum);
+
+	/*
+	 * Write something back to the client, allowing its CallNamedPipe() call
+	 * to terminate.
+	 */
 	WriteFile(pipe, &sigNum, 1, &bytes, NULL);	/* Don't care if it works or
  * not.. */
+
+	/*
+	 * We must wait for the client to read the data before we can close the
+	 * pipe, else the data will be lost.  (If the WriteFile call failed,
+	 * there'll be nothing in the buffer, so this shouldn't block.)
+	 */
 	FlushFileBuffers(pipe);
+
+	/* This is a formality, since we're about to close the pipe anyway. */
 	DisconnectNamedPipe(pipe);
+
+	/* And we're done. */
 	CloseHandle(pipe);
 
-	pg_queue_signal(sigNum);
 	return 0;
 }
 
@@ -266,6 +297,7 @@ pg_signal_thread(LPVOID param)
 		BOOL		fConnected;
 		HANDLE		hThread;
 
+		/* Create a new pipe instance if we don't have one. */
 		if (pipe == INVALID_HANDLE_VALUE)
 		{
 			pipe = CreateNamedPipe(pipename, PIPE_ACCESS_DUPLEX,
@@ -280,6 +312,11 @@ pg_signal_thread(LPVOID param)
 			}
 		}
 
+		/*
+		 * Wait for a client to connect.  If something connects before we
+		 * reach here, we'll get back a "failure" with ERROR_PIPE_CONNECTED,
+		 * which is actually a success (way to go, Microsoft).
+		 */
 		fConnected = ConnectNamedPipe(pipe, NULL) ? TRUE : (GetLastError() == ERROR_PIPE_CONNECTED);
 		if (fConnected)
 		{


Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-08 Thread Tom Lane
I wrote:
> So, just idly looking at the code in src/backend/port/win32/signal.c
> and src/port/kill.c, I have to wonder why we have this baroque-looking
> design of using *two* signal management threads.  And, if I'm
> reading it right, we create an entire new pipe object and an entire
> new instance of the second thread for each incoming signal.  Plus, the
> signal senders use CallNamedPipe (hence, underneath, TransactNamedPipe)
> which means they in effect wait for the recipient's signal-handling
> thread to ack receipt of the signal.  Maybe there's a good reason for
> all this but it sure seems like a lot of wasted cycles from here.

Here's a possible patch (untested by me) to get rid of the second thread
and the new-pipe-for-every-request behavior.  I believe that the existing
logic may be based on Microsoft's "Multithreaded Pipe Server" example [1]
or something similar, but that's based on an assumption that servicing
a client request may take a substantial amount of time and it's worth
handling requests concurrently.  Neither point applies in this context.

Doing it like this seems attractive to me because it gets rid of two
different failure modes: inability to create a new thread and inability
to create a new pipe handle.  Now on the other hand, it means that
inability to complete the read/write transaction with a client right
away will delay processing of other signals.  But we know that the
client is engaged in a CallNamedPipe operation, so how realistic is
that concern?

This is to be applied on top of the other patch I just sent.

regards, tom lane

[1] https://docs.microsoft.com/en-us/windows/win32/ipc/multithreaded-pipe-server

diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index f1e35fd..aa762b9 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -38,7 +38,7 @@ static pqsigfunc pg_signal_array[PG_SIGNAL_COUNT];
 static pqsigfunc pg_signal_defaults[PG_SIGNAL_COUNT];
 
 
-/* Signal handling thread function */
+/* Signal handling thread functions */
 static DWORD WINAPI pg_signal_thread(LPVOID param);
 static BOOL WINAPI pg_console_handler(DWORD dwCtrlType);
 
@@ -225,64 +225,6 @@ pg_queue_signal(int signum)
 	SetEvent(pgwin32_signal_event);
 }
 
-/*
- * Signal dispatching thread.  This runs after we have received a named
- * pipe connection from a client (signal sender).   Process the request,
- * close the pipe, and exit.
- */
-static DWORD WINAPI
-pg_signal_dispatch_thread(LPVOID param)
-{
-	HANDLE		pipe = (HANDLE) param;
-	BYTE		sigNum;
-	DWORD		bytes;
-
-	if (!ReadFile(pipe, &sigNum, 1, &bytes, NULL))
-	{
-		/* Client died before sending */
-		CloseHandle(pipe);
-		return 0;
-	}
-	if (bytes != 1)
-	{
-		/* Received  bytes over signal pipe (should be 1) */
-		CloseHandle(pipe);
-		return 0;
-	}
-
-	/*
-	 * Queue the signal before responding to the client.  In this way, it's
-	 * guaranteed that once kill() has returned in the signal sender, the next
-	 * CHECK_FOR_INTERRUPTS() in the signal recipient will see the signal.
-	 * (This is a stronger guarantee than POSIX makes; maybe we don't need it?
-	 * But without it, we've seen timing bugs on Windows that do not manifest
-	 * on any known Unix.)
-	 */
-	pg_queue_signal(sigNum);
-
-	/*
-	 * Write something back to the client, allowing its CallNamedPipe() call
-	 * to terminate.
-	 */
-	WriteFile(pipe, &sigNum, 1, &bytes, NULL);	/* Don't care if it works or
- * not.. */
-
-	/*
-	 * We must wait for the client to read the data before we can close the
-	 * pipe, else the data will be lost.  (If the WriteFile call failed,
-	 * there'll be nothing in the buffer, so this shouldn't block.)
-	 */
-	FlushFileBuffers(pipe);
-
-	/* This is a formality, since we're about to close the pipe anyway. */
-	DisconnectNamedPipe(pipe);
-
-	/* And we're done. */
-	CloseHandle(pipe);
-
-	return 0;
-}
-
 /* Signal handling thread */
 static DWORD WINAPI
 pg_signal_thread(LPVOID param)
@@ -290,12 +232,12 @@ pg_signal_thread(LPVOID param)
 	char		pipename[128];
 	HANDLE		pipe = pgwin32_initial_signal_pipe;
 
+	/* Set up pipe name, in case we have to re-create the pipe. */
 	snprintf(pipename, sizeof(pipename), ".\\pipe\\pgsignal_%lu", GetCurrentProcessId());
 
 	for (;;)
 	{
 		BOOL		fConnected;
-		HANDLE		hThread;
 
 		/* Create a new pipe instance if we don't have one. */
 		if (pipe == INVALID_HANDLE_VALUE)
@@ -320,51 +262,52 @@ pg_signal_thread(LPVOID param)
 		fConnected = ConnectNamedPipe(pipe, NULL) ? TRUE : (GetLastError() == ERROR_PIPE_CONNECTED);
 		if (fConnected)
 		{
-			HANDLE		newpipe;
-
 			/*
-			 * We have a connected pipe. Pass this off to a separate thread
-			 * that will do the actual processing of the pipe.
-			 *
-			 * We must also create a new instance of the pipe *before* we
-			 * start running the new thread. If we don't, there is a race
-			 * condition whereby the dispatch thread might run CloseHandle()
-			 * before we have create

Re: [HACKERS] WAL logging problem in 9.4.3?

2019-12-08 Thread Noah Misch
I reviewed your latest code, and it's nearly complete.  mdimmedsync() syncs
only "active segments" (as defined in md.c), but smgrDoPendingSyncs() must
sync active and inactive segments.  This matters when mdtruncate() truncated
the relation after the last checkpoint, causing active segments to become
inactive.  In such cases, syncs of the inactive segments will have been queued
for execution during the next checkpoint.  Since we skipped the
XLOG_SMGR_TRUNCATE record, we must complete those syncs before commit.  Let's
just modify smgrimmedsync() to always sync active and inactive segments;
that's fine to do in other smgrimmedsync() callers, even though they operate
on relations that can't have inactive segments.

On Tue, Dec 03, 2019 at 08:51:46PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 28 Nov 2019 17:23:19 -0500, Noah Misch  wrote in 
> > On Thu, Nov 28, 2019 at 09:35:08PM +0900, Kyotaro Horiguchi wrote:
> > > I measured the performance with the latest patch set.
> > > 
> > > > 1. Determine $DDL_COUNT, a number of DDL transactions that take about 
> > > > one
> > > >minute when done via syncs.
> > > > 2. Start "pgbench -rP1 --progress-timestamp -T180 -c10 -j10".
> > > > 3. Wait 10s.
> > > > 4. Start one DDL backend that runs $DDL_COUNT transactions.
> > > > 5. Save DDL start timestamp, DDL end timestamp, and pgbench output.

>  wal_buffers| 512

This value (4 MiB) is lower than a tuned production system would have.  In
future benchmarks (if any) use wal_buffers=2048 (16 MiB).




disable only nonparallel seq scan.

2019-12-08 Thread Jeff Janes
Is there a way to force a meaningful parallel seq scan, or at least the
planning of one, when the planner wants a non-parallel one?

Usually I can do things like with with enable_* setting, but if I `set
enable_seqscan to off`, it penalizes the parallel seq scan 8 times harder
than it penalizes the non-parallel one, so the plan does not switch.

If I set `force_parallel_mode TO on` then I do get a parallel plan, but it
is a degenerate one which tells me nothing I want to know.

If I `set parallel_tuple_cost = 0` (or in some cases to a negative number),
I can force it switch, but that destroys the purpose, which is to see what
the "would have been" plan estimates are for the parallel seq scan under
the default setting of the cost parameters.

I can creep parallel_tuple_cost downward until it switches, and then try to
extrapolate back up, but this tedious and not very reliable.

Cheers,

Jeff


Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Mark Dilger




On 12/7/19 3:42 PM, Ranier Vilela wrote:

This is the first part of the variable shadow fixes.
Basically it consists of renaming the variables in collision with the global 
ones, with the minimum change in the semantics.

make check pass all the tests.


I think it would be better to split this patch into separate files,
one for each global variable that is being shadowed.  The reason
I say so is apparent looking at the first one in the patch,
RedoRecPtr.  This process global variable is defined in xlog.c:

  static XLogRecPtr RedoRecPtr;

and then, somewhat surprisingly, passed around between static
functions defined within that same file, such as:

  RemoveOldXlogFiles(...)

which in the current code only ever gets a copy of the global,
which begs the question why it needs this passed as a parameter
at all.  All the places calling RemoveOldXlogFiles are within
this file, and all of them pass the global, so why bother?

Another function within xlog.c behaves similarly:

  RemoveXlogFile(...)

Only here, the callers sometimes pass the global RedoRecPtr
(though indirectly, since they themselves received it as an
argument) and sometimes they pass in InvalidXLogRecPtr, which
is just a constant:

  src/include/access/xlogdefs.h:#define InvalidXLogRecPtr   0

So it might make sense to remove the parameter from this
function, too, and replace it with a flag parameter named
something like "is_valid", or perhaps split the function
into two functions, one for valid and one for invalid.

I'm not trying to redesign xlog.c's functions in this email
thread, but only suggesting that these types of arguments
may ensue for each global variable in your patch, and it will
be easier for a committer to know if there is a consensus
about any one of them than about the entire set.  When I
suggested you do this patch set all on this thread, I was
still expecting multiple patches, perhaps named along the
lines of:

  unshadow.RedoRecPtr.patch.1
  unshadow.wal_segment_size.patch.1
  unshadow.synchronous_commit.patch.1
  unshadow.wrconn.patch.1
  unshadow.progname.patch.1
  unshadow.am_syslogger.patch.1
  unshadow.days.patch.1
  unshadow.months.patch.1

etc.  I'm uncomfortable giving you negative feedback of this
sort, since I think you are working hard to improve postgres
and I really appreciate it, so later tonight I'll try to come
back, split your patch for you as described, add an entry to
the commitfest if you haven't already, and mark myself as a
reviewer.

Thanks again for the hard work and the patch!


--
Mark Dilger




Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Tom Lane
Ranier Vilela  writes:
> This is the first part of the variable shadow fixes.
> Basically it consists of renaming the variables in collision with the global 
> ones, with the minimum change in the semantics.

I don't think I'm actually on board with the goal here.

Basically, if we take this seriously, we're throwing away the notion of
nested variable scopes and programming as if we had just a flat namespace.
That wasn't any fun when we had to do it back in assembly-code days, and
I don't see a good reason to revert to that methodology today.

In a few of these cases, like the RedoRecPtr changes, there *might* be
an argument that there's room for confusion about whether the code could
have meant to reference the similarly-named global variable.  But it's
just silly to make that argument in places like your substitution of
/days/ndays/ in date.c.

Based on this sample, I reject the idea that we ought to be trying to
eliminate this class of warnings just because somebody's compiler can be
induced to emit them.  If you want to make a case-by-case argument that
particular situations of this sort are bad programming style, let's have
that argument by all means.  But it needs to be case-by-case, not just
dropping a large patch on us containing a bunch of unrelated changes
and zero specific justification for any of them.

IOW, I don't think you've taken to heart Robert's upthread advice that
this needs to be case-by-case and based on literary not mechanical
considerations.

BTW, if we do go forward with changing the RedoRecPtr uses, I'm not
in love with "XRedoRecPtr" as a replacement parameter name; it conveys
nothing much, and the "X" prefix is already overused in that area of
the code.  Perhaps "pRedoRecPtr" would be a better choice?  Or maybe
make the local variables be all-lower-case "redorecptr", which would
fit well in context in places like

-RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr XRedoRecPtr, XLogRecPtr endptr)


regards, tom lane




Re: proposal: minscale, rtrim, btrim functions for numeric

2019-12-08 Thread Tom Lane
"Karl O. Pinc"  writes:
> FWIW, I bumped around the Internet and looked at Oracle docs to see if
> there's any reason why minscale() might not be a good function name.
> I couldn't find any problems.

> I also couldn't think of a better name than trim_scale() and don't
> have any problems with the name.

I'd just comment that it seems weird that the same patch is introducing
two functions with inconsistently chosen names.  Why does one have
an underscore separating the words and the other not?  I haven't got
a large investment in either naming convention specifically, but it'd
be nice if we could at least pretend to be considering consistency.

regards, tom lane




Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Tom Lane
Mark Dilger  writes:
> I think it would be better to split this patch into separate files,
> one for each global variable that is being shadowed.  The reason
> I say so is apparent looking at the first one in the patch,
> RedoRecPtr.  This process global variable is defined in xlog.c:
>static XLogRecPtr RedoRecPtr;
> and then, somewhat surprisingly, passed around between static
> functions defined within that same file, such as:
>RemoveOldXlogFiles(...)
> which in the current code only ever gets a copy of the global,
> which begs the question why it needs this passed as a parameter
> at all.  All the places calling RemoveOldXlogFiles are within
> this file, and all of them pass the global, so why bother?

I was wondering about that too.  A look in the git history seems
to say that it is the fault of the fairly-recent commit d9fadbf13,
which did things like this:

 /*
  * Recycle or remove all log files older or equal to passed segno.
  *
- * endptr is current (or recent) end of xlog, and PriorRedoRecPtr is the
- * redo pointer of the previous checkpoint. These are used to determine
+ * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 {
DIR*xldir;
struct dirent *xlde;

That is, these arguments *used* to be a different LSN pointer, and that
commit changed them to be mostly equal to RedoRecPtr, and made what
seems like a not very well-advised renaming to go with that.

> So it might make sense to remove the parameter from this
> function, too, and replace it with a flag parameter named
> something like "is_valid", or perhaps split the function
> into two functions, one for valid and one for invalid.

Don't think I buy that.  The fact that these arguments were until recently
different from RedoRecPtr suggests that they might someday be different
again, whereupon we'd have to laboriously revert such a parameter redesign.
I think I'd just go for names that don't have a hard implication that
the parameter values are the same as any particular global variable.

> I'm not trying to redesign xlog.c's functions in this email
> thread, but only suggesting that these types of arguments
> may ensue for each global variable in your patch,

Indeed.  Once again, these are case-by-case issues, not something
that can be improved by a global search-and-replace without much
consideration for the details of each case.

regards, tom lane




Re: The flinfo->fn_extra question, from me this time.

2019-12-08 Thread Dent John
Hi folks,I’ve updated the patch, addressed the rescan issue, and restructured the tests.I’ve taken a slightly different approach this time, re-using the (already pipeline-supporting) machinery of the Materialize node, and extended it to allow an SFRM_Materialize SRF to donate the tuplestore it returns. I feel this yields a better code structure, as well getting as more reuse.It also opens up more informative and transparent EXPLAIN output. For example, the following shows Materialize explicitly, whereas previously a FunctionScan would have silently materialised the result of both generate_series() invocations.postgres=# explain (analyze, costs off, timing off, summary off) select * from generate_series(11,15) r, generate_series(11,14) s;                            QUERY PLAN                            -- Nested Loop (actual rows=20 loops=1)   ->  Function Scan on generate_series s (actual rows=4 loops=1)         ->  SRF Scan (actual rows=4 loops=1)               SFRM: ValuePerCall   ->  Function Scan on generate_series r (actual rows=5 loops=4)         ->  Materialize (actual rows=5 loops=4)               ->  SRF Scan (actual rows=5 loops=1)                     SFRM: ValuePerCallI also thought again about when to materialise, and particularly Robert’s suggestion[1] (which is in also this thread, but I didn’t originally understand the implication of). If I’m not wrong, between occasional explicit use of a Materialize node by the planner, and more careful observation of EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD in FunctionScan’s initialisation, we do actually have what is needed to pipeline without materialisation in at least some cases. There is not a mechanism to preferentially re-execute a SRF rather than materialise it, but because materialisation only seems to be necessary in the face of a join or a scrollable cursor, I’m not considering much of a problem anymore.The EXPLAIN output needs a bit of work, costing is still a sore point, and it’s not quite as straight-line performant as my first attempt, as well as there undoubtedly being some unanticipated breakages and rough edges.But the concept seems to work roughly as I intended (i.e., allowing FunctionScan to pipeline). Unless there are any objections, I will push it into the January commit fest for progressing.(Revised patch attached.)denty.[1] https://www.postgresql.org/message-id/CA%2BTgmobw%2BPhNVciLesd-mQQ4As9D8L2-F7AiKqv465RhDkPf2Q%40mail.gmail.com

pipeline-functionscan-v4.patch
Description: Binary data


hyrax versus isolationtester.c's hard-wired timeouts

2019-12-08 Thread Tom Lane
Buildfarm member hyrax has been intermittently failing the
deadlock-parallel isolation test ever since that went in.
I finally got around to looking at this closely, and what
seems to be happening is simply that isolationtester.c's
hard-wired three-minute timeout for the completion of any
one test step is triggering.  hyrax uses CLOBBER_CACHE_ALWAYS
and it seems to be a little slower than other animals using
CLOBBER_CACHE_ALWAYS, so it's unsurprising that it's showing
the symptom and nobody else is.

There are two things we could do about this:

1. Knock the hard-wired setting up a tad, maybe to 5 minutes.
Easy but doesn't seem terribly future-proof.

2. Make the limit configurable somehow, probably from an
environment variable.  There's precedent for that (PGCTLTIMEOUT),
and it would provide a way for owners of especially slow buildfarm
members to adjust things ... but it would require owners of
especially slow buildfarm animals to adjust things.

Any preferences?  (Actually, it wouldn't be unreasonable to do
both things, I suppose.)

BTW, I notice that isolationtester.c fails to print any sort of warning
notice when it decides it's waited too long.  This seems like a
spectacularly bad idea in hindsight: it's not that obvious why the test
case failed.  Plus there's no way to tell exactly which connection it
decided to send a PQcancel to.  So independently of the timeout-length
issue, I think we ought to also make it print something like
"isolationtester: waited too long for something to happen, canceling
step thus-and-so".

regards, tom lane




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2019-12-08 Thread Thomas Munro
On Wed, Nov 20, 2019 at 1:14 AM Juan José Santamaría Flecha
 wrote:
> On Tue, Nov 19, 2019 at 12:49 PM Thomas Munro  wrote:
>> On Wed, Nov 20, 2019 at 12:28 AM Amit Khandekar  
>> wrote:
>> > On Windows, it is documented that ReadFile() (which is called by
>> > pg_pread) will return false on EOF but only when the file is open for
>> > asynchronous reads/writes. But here we are just dealing with usual
>> > synchronous reads. So pg_pread() code should indeed return 0 on EOF on
>> > Windows. Not yet able to figure out how FileRead() managed to return
>> > this error on Windows. But from your symptoms, it does look like
>> > pg_pread()=>ReadFile() returned false (despite doing asynchronous
>> > reads), and so _dosmaperr() gets called, and then it does not find the
>> > eof error in doserrors[], so the "unrecognized win32 error code"
>> > message is printed. May have to dig up more on this.
>>
>> Hmm.  See also this report:
>>
>> https://www.postgresql.org/message-id/flat/CABuU89MfEvJE%3DWif%2BHk7SCqjSOF4rhgwJWW6aR3hjojpGqFbjQ%40mail.gmail.com
>
> The files from pgwin32_open() are open for synchronous access, while 
> pg_pread() uses the asynchronous functionality to offset the read. Under 
> these circunstances, a read past EOF will return ERROR_HANDLE_EOF (38), as 
> explained in:
>
> https://devblogs.microsoft.com/oldnewthing/20150121-00/?p=44863

FWIW, I sent a pull request to see if the MicrosoftDocs project agrees
that the ReadFile page is misleading on this point:

https://github.com/MicrosoftDocs/sdk-api/pull/7




Statistics improvements for time series data

2019-12-08 Thread Mark Dilger

Hackers,

I'm thinking of submitting a patch, and would like to review my
design ideas with you all before doing so.  I've thought about
this problem before, but I can't find any email where I might
have already proposed this.  If I did, and this is a duplicate,
please forgive me.  I'm not trying to resubmit the idea if it
has already been rejected.

I have been considering applications that maintain some current
state information in the face of frequently inserted time series
data.  Assume you loop as follows, waiting for each new data file
coming out of a buffered application:

  COPY sometable FROM recent_timeseries_data.csv;
  SELECT some_aggregate_statistics
FROM sometable st, another_table, and_another, ...
WHERE st.sometimefield > pretty_recent AND ...
GROUP BY some_aggregate_columns;

If you don't control how frequently that file gets written, you
could get lots of really short files frequently, or fewer larger
files less often.

Even with autovacuum set pretty aggressively on "sometable", you
are likely to get a bad plan for the SELECT due to the statistics
on "sometable" for the "sometimefield" not taking into account the
most recently inserted rows.  Some quick EXPLAIN checking verifies
that the number of rows predicted for a timeframe later than
the newest data as of the most recent ANALYZE will be 1.  (I'd
be interested in counter examples -- I'm just looking at the
results of a quick-and-dirty test.)

Updating that loop to perform an ANALYZE between the COPY and the
SELECT helps, but at the expense of potentially running ANALYZE
too often when the recent_timeseries_data.csv files are short
and frequent.  Using a stored procedure to conditionally run the
analyze seems unnecessarily complicated.

Relying on autovacuum to rescue you from bad plans seems foolishly
optimistic, since it would need to run right between your COPY of
new data and your SELECT over that data.  It is unclear how
autovacuum could be modified to do this for you.  Modifying the
statistics system to be predictive based on the state of affairs
at the last ANALYZE and the number of changes since then seems
more promising, but pretty complicated.  I might return to this
idea in a future patch, but today I'm proposing something simpler.

Would it make sense to add an optional parameter to VACUUM, ANALYZE,
and VACUUM ANALYZE that instructs it to only perform the operation
if autovacuum would do so under the current conditions?  In other
words, to consider the PgStat_StatTabEntry's n_dead_tuples and
n_live_tuples the same way autovacuum would?  Something like:

  ANALYZE sometable IF AND ONLY IF AUTOVACUUM WOULD;

A similar argument can be made for VACUUM, if you are trying to get
the visibility map updated prior to the SELECT so that an index only
scan will be feasible.  As for VACUUM ANALYZE, that has a similar
use case, with the downside that you don't know which thresholds
to use, the ones for vacuum or for analyze.  I think I'd implement
it to run the VACUUM ANALYZE if either condition meets autovacuum's
requirements.  (See autovacuum_vac_scale and autovacuum_anl_scale.)

I think the words "IF AND ONLY IF AUTOVACUUM WOULD" should be
replaced with a single word and added to the grammar where
vacuum_option_elem lists VERBOSE, FREEZE and FULL.  Perhaps
"OPTIONALLY", or "AUTOVACUUMESQUE", though I'm really hoping
somebody has a better suggestion.

In the given example, above, the user would likely set the vacuum
and analyze scale factors to zero and the thresholds to something
they've empirically determined to work well for their purposes.
That might be a problem in practice, given that it also impacts
autovacuum's choices.  Would people prefer that those thresholds
be passed as parameters to the command directly?

  VACUUM sometable OPTIONALLY (vacuum_threshold = 10, vacuum_scale = 0)

and only default to autovacuum's settings when not specified?

--
Mark Dilger




RE: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Ranier Vilela
>I think it would be better to split this patch into separate files,
>one for each global variable that is being shadowed. 
Ok, I agree.

> The reasonI say so is apparent looking at the first one in the patch,
>RedoRecPtr.  This process global variable is defined in xlog.c:
>   static XLogRecPtr RedoRecPtr;
>and then, somewhat surprisingly, passed around between static
>functions defined within that same file, such as:
> RemoveOldXlogFiles(...)
>which in the current code only ever gets a copy of the global,
>which begs the question why it needs this passed as a parameter
>at all.  All the places calling RemoveOldXlogFiles are within
>this file, and all of them pass the global, so why bother?
In general I do not agree to use global variables. But I understand when you 
use it, I believe it is a necessary evil.
So I think that maybe the original author, has the same thought and when using 
a local parameter to pass the variable, and there is a way to further eliminate 
the use of the global variable, maybe it was unfortunate in choosing the name.
And what it would do in this case, with the aim of eliminating the global 
variable in the future.
In my own systems, I make use of only one global variable, and in many 
functions I pass as a parameter, with another name.

>Another function within xlog.c behaves similarly:
>   RemoveXlogFile(...)
>Only here, the callers sometimes pass the global RedoRecPtr
>(tough indirectly, since they themselves received it as an
>argument) and sometimes they pass in InvalidXLogRecPtr, which
>is just a constant:
>   src/include/access/xlogdefs.h:#define InvalidXLogRecPtr  0
>So it might make sense to remove the parameter from this
>function, too, and replace it with a flag parameter named
>something like "is_valid", or perhaps split the function
>into two functions, one for valid and one for invalid.
Again in this case, it would have to be checked whether postgres really will 
make use of the global variable forever.
Which for me is a bad design.

Another complicated case of global variable is PGconn * conn. It is defined as 
global somewhere, but there is widespread use of the name "conn" in many places 
in the code, many in / bin, so if it is in the interest of postgres to correct 
this, it would be better to rename the global variable to something like 
pg_conn, or gconn.

>I'm not trying to redesign xlog.c's functions in this email
>thread, but only suggesting that these types of arguments
>may ensue for each global variable in your patch, and it will
>be easier for a committer to know if there is a consensus
>about any one of them than about the entire set.  When I
>suggested you do this patch set all on this thread, I was
>still expecting multiple patches, perhaps named along the
>lines of:
>   unshadow.RedoRecPtr.patch.1
>   unshadow.wal_segment_size.patch.1
>   unshadow.synchronous_commit.patch.1
>   unshadow.wrconn.patch.1
>   unshadow.progname.patch.1
>   unshadow.am_syslogger.patch.1
>   unshadow.days.patch.1
>   unshadow.months.patch.1
>etc.  I'm uncomfortable giving you negative feedback of this
>sort, since I think you are working hard to improve postgres
>and I really appreciate it, so later tonight I'll try to come
>back, split your patch for you as described, add an entry to
>the commitfest if you haven't already, and mark myself as a
>reviewer.
I appreciate your help.

>Thanks again for the hard work and the patch!
You are welcome.

regards,
Ranier Vilela

--
Mark Dilger




RE: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Ranier Vilela
>I don't think I'm actually on board with the goal here.
Ok, I understand.

>Basically, if we take this seriously, we're throwing away the notion of
>nested variable scopes and programming as if we had just a flat namespace.
>That wasn't any fun when we had to do it back in assembly-code days, and
>I don't see a good reason to revert to that methodology today.
In general I think the use global variables its a bad design. But I understand 
the use.

>In a few of these cases, like the RedoRecPtr changes, there *might* be
>an argument that there's room for confusion about whether the code could
>have meant to reference the similarly-named global variable.  But it's
>just silly to make that argument in places like your substitution of
>/days/ndays/ in date.c.
I would rather fix everything, including days name.

>Based on this sample, I reject the idea that we ought to be trying to
>eliminate this class of warnings just because somebody's compiler can be
>induced to emit them.  If you want to make a case-by-case argument that
>particular situations of this sort are bad programming style, let's have
>that argument by all means.  But it needs to be case-by-case, not just
>dropping a large patch on us containing a bunch of unrelated changes
>and zero specific justification for any of them.
This is why I suggested activating the alert in the development and review 
process, so that any cases that arose would be corrected very early.

>IOW, I don't think you've taken to heart Robert's upthread advice that
>this needs to be case-by-case and based on literary not mechanical
>considerations.
Ok, right.
But I was working on the second class of shadow variables, which are local 
variables, within the function itself, where the patch would lead to a removal 
of the variable declaration, maintaining the same logic and functionality, 
which would lead to better performance and reduction. of memory usage as well 
as very small.
In that case, too, would it have to be case by case?
Wow, there are many and many shadow variables ...

>BTW, if we do go forward with changing the RedoRecPtr uses, I'm not
>in love with "XRedoRecPtr" as a replacement parameter name; it conveys
>nothing much, and the "X" prefix is already overused in that area of
>the code.  Perhaps "pRedoRecPtr" would be a better choice?  Or maybe
>make the local variables be all-lower-case "redorecptr", which would
>fit well in context in places like
pRedoRecPtr, It's perfect for me.

regards,
Ranier Vilela



Re: ssl passphrase callback

2019-12-08 Thread Craig Ringer
On Sat, 7 Dec 2019 at 07:21, Tom Lane  wrote:

> Andrew Dunstan  writes:
> > I've just been looking at that. load_external_function() doesn't
> > actually do anything V1-ish with the value, it just looks up the symbol
> > using dlsym and returns it cast to a PGFunction. Is there any reason I
> > can't just use that and cast it again to the callback function type?
>
> TBH, I think this entire discussion has gone seriously off into the
> weeds.  The original design where we just let a shared_preload_library
> function get into a hook is far superior to any of the overcomplicated
> kluges that are being discussed now.  Something like this, for instance:
>
> >>> ssl_passphrase_command='#superlib.so,my_rot13_passphrase'
>
> makes me positively ill.  It introduces problems that we don't need,
> like how to parse out the sub-parts of the string, and the
> quoting/escaping issues that will come along with that; while from
> the user's perspective it replaces a simple and intellectually-coherent
> variable definition with an unintelligible mess.
>

+1000 from me on that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: proposal: minscale, rtrim, btrim functions for numeric

2019-12-08 Thread Karl O. Pinc
On Sun, 08 Dec 2019 13:57:00 -0500
Tom Lane  wrote:

> "Karl O. Pinc"  writes:
> > FWIW, I bumped around the Internet and looked at Oracle docs to see
> > if there's any reason why minscale() might not be a good function
> > name. I couldn't find any problems.  
> 
> > I also couldn't think of a better name than trim_scale() and don't
> > have any problems with the name.  
> 
> I'd just comment that it seems weird that the same patch is
> introducing two functions with inconsistently chosen names.  Why does
> one have an underscore separating the words and the other not?  I
> haven't got a large investment in either naming convention
> specifically, but it'd be nice if we could at least pretend to be
> considering consistency.

Consistency would be lovely.  I don't feel qualified
to make the decision but here's what I came up with:

I re-read the back-threads and don't see any discussion
of the naming of minscale().

My thoughts run toward asking
the "what is a word?" question, along with "what is the
policy for separating a word?".  Is "min" different
from the prefix "sub"?

"Trim" seems to clearly count as a word and trim_scale()
seems mostly consistent with existing function names.
(E.g. width_bucket(), convert_to().  But there's no
true consistency.  Plenty of functions don't separate
words with "_".  E.g. setseed().)

As far as "min" goes, glancing through function names [1]
does not help much.  The index indicates that when PG puts "min"
in a configuration parameter it separates it with "_".
(Looking at "min" in the index.)
Looking at the function names containing "min" [2] yields:

 aclitemin
 brin_minmax_add_value
 brin_minmax_consistent
 brin_minmax_opcinfo
 brin_minmax_union
 min
 numeric_uminus
 pg_terminate_backend
 range_minus
 txid_snapshot_xmin

Not especially helpful.   

I'm inclined to want
min_scale() instead of "minscale()" based on
how config parameters are named and for consistency
with trim_scale().  Pavel, if you agree then
let's just change minscale() to min_scale() and
let people object if they don't like it.

Regards.

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

[1] 
select pg_proc.proname
  from pg_proc
  group by pg_proc.proname
  order by pg_proc.proname;

[2]
select pg_proc.proname
  from pg_proc
  where pg_proc.proname like '%min%'
  group by pg_proc.proname
  order by pg_proc.proname;




Re: proposal: minscale, rtrim, btrim functions for numeric

2019-12-08 Thread Karl O. Pinc
Hi Pavel,

Thanks for your changes.  More inline below:

On Sun, 8 Dec 2019 08:38:38 +0100
Pavel Stehule  wrote:

> ne 8. 12. 2019 v 2:23 odesílatel Karl O. Pinc  napsal:

> > On Mon, 11 Nov 2019 15:47:37 +0100
> > Pavel Stehule  wrote:

> > > I implemented two functions - first minscale, second trim_scale.
> > > The overhead of second is minimal - so I think it can be good to
> > > have it. I started design with the name "trim_scale", but the
> > > name can be any other.  


> > I comment on various hunks in line below:

> 
> > diff --git a/src/backend/utils/adt/numeric.c
> > b/src/backend/utils/adt/numeric.c index a00db3ce7a..35234aee4c
> > 100644 --- a/src/backend/utils/adt/numeric.c
> > +++ b/src/backend/utils/adt/numeric.c
> >
> > 
> > I believe the hunks in this file should start at about line# 3181.
> > This is right after numeric_scale().  Seems like all the scale
> > related functions should be together.
> >
> > There's no hard standard but I don't see why lines (comment lines in
> > your case) should be longer than 78 characters without good reason.
> > Please reformat.
> > 

I don't see any response from you regarding the above two suggestions.


> 
> > + */
> > +static int
> > +get_min_scale(NumericVar *var)
> > +{
> > +   int minscale = 0;
> > +
> > +   if (var->ndigits > 0)
> > +   {
> > +   NumericDigit last_digit;
> > +
> > +   /* maximal size of minscale, can be lower */
> > +   minscale = (var->ndigits - var->weight - 1) *
> >   DEC_DIGITS; +
> > +   /*
> > +* When there are not digits after decimal point,
> > the previous expression
> >
> > 
> > s/not/no/
> > 
> >
> > +* can be negative. In this case, the minscale must
> > be zero.
> > +*/
> >
> > 
> > s/can be/is/
> > 

By the above, I intended the comment be changed (after line wrapping)
to:
   /*
 * When there are no digits after decimal point,
 * the previous expression is negative. In this
 * case the minscale must be zero.
 */

(Oh yes, on re-reading I think the comma is unnecessary so I removed it too.)



> >
> > +   if (minscale > 0)
> > +   {
> > +   /* reduce minscale if trailing digits in
> > last numeric digits are zero */

And the above comment should either be wrapped (as requested above)
or eliminated.  I like comments but I'm not sure this one contributes
anything.


> > --- a/src/include/catalog/pg_proc.dat
> > +++ b/src/include/catalog/pg_proc.dat
> > @@ -4288,6 +4288,12 @@
> >proname => 'width_bucket', prorettype => 'int4',
> >proargtypes => 'numeric numeric numeric int4',
> >prosrc => 'width_bucket_numeric' },
> > +{ oid => '3434', descr => 'returns minimal scale of numeric value',
> >
> > 
> > How about a descr of?:
> >
> >   minimal scale needed to store the supplied value without data loss
> > 
> >  
> 
> done
> 
> >
> > +  proname => 'minscale', prorettype => 'int4', proargtypes =>
> >   'numeric',
> > +  prosrc => 'numeric_minscale' },
> > +{ oid => '3435', descr => 'returns numeric value with minimal
> > scale',
> >
> > 
> > And likewise a descr of?:
> >
> >   numeric with minimal scale needed to represent the given value
> > 
> >
> > +  proname => 'trim_scale', prorettype => 'numeric', proargtypes =>
> >   'numeric',
> > +  prosrc => 'numeric_trim_scale' },
> >  
> 
> done

Thanks for these changes.  Looking at pg_proc.dat there seems to
be an effort made to keep the lines to a maximum of 78 or 80
characters.  This means starting "descr => '..." on new lines
when the description is long.  Please reformat, doing this or,
if you like, something even more clever to keep the lines short.

Looking good.  We're making progress.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein




Re: Yet another vectorized engine

2019-12-08 Thread Hubert Zhang
Thanks Konstantin,
Your suggestions are very helpful. I have added them into issues of
vectorize_engine repo
https://github.com/zhangh43/vectorize_engine/issues

On Wed, Dec 4, 2019 at 10:08 PM Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 04.12.2019 12:13, Hubert Zhang wrote:
>
> 3. Why you have to implement your own plan_tree_mutator and not using
> expression_tree_mutator?
>
> I also want to replace plan node, e.g. Agg->CustomScan(with VectorAgg
> implementation). expression_tree_mutator cannot be used to mutate plan node
> such as Agg, am I right?
>
>
> O, sorry, I see.
>
>
>
>> 4. As far as I understand you now always try to replace SeqScan with your
>> custom vectorized scan. But it makes sense only if there are quals for this
>> scan or aggregation is performed.
>> In other cases batch+unbatch just adds extra overhead, doesn't it?
>>
> Probably extra overhead for heap format and query like 'select i from t;'
> without qual, projection, aggregation.
> But with column store, VectorScan could directly read batch, and no
> additional batch cost. Column store is the better choice for OLAP queries.
>
>
> Generally, yes.
> But will it be true for the query with a lot of joins?
>
> select * from T1 join T2 on (T1.pk=T2.fk) join T3 on (T2.pk=T3.fk) join T4
> ...
>
> How can batching improve performance in this case?
> Also if query contains LIMIT clause or cursors, then batching can cause
> fetching of useless records (which never will be requested by client).
>
> Can we conclude that it would be better to use vector engine for OLAP
> queries and row engine for OLTP queries.
>
> 5. Throwing and catching exception for queries which can not be vectorized
>> seems to be not the safest and most efficient way of handling such cases.
>> May be it is better to return error code in plan_tree_mutator and
>> propagate this error upstairs?
>
>
> Yes, as for efficiency, another way is to enable some plan node to be
> vectorized and leave other nodes not vectorized and add batch/unbatch layer
> between them(Is this what you said "propagate this error upstairs"). As you
> mentioned, this could introduce additional overhead. Is there any other
> good approaches?
> What do you mean by not safest?
> PG catch will receive the ERROR, and fallback to the original
> non-vectorized plan.
>
>
> The problem with catching and ignoring exception was many times discussed
> in hackers.
> Unfortunately Postgres PG_TRY/PG_CATCH mechanism is not analog of
> exception mechanism in more high level languages, like C++, Java...
> It doesn't perform stack unwind. If some resources (files, locks,
> memory,...) were obtained before throwing error, then them are not
> reclaimed.
> Only rollback of transaction is guaranteed to release all resources. And
> it actually happen in case of normal error processing.
> But if you catch and ignore exception , trying to continue execution, then
> it can cause many problems.
>
> May be in your case it is not a problem, because you know for sure where
> error can happen: it is thrown by plan_tree_mutator
> and looks like there are no resources obtained by this function.  But in
> any case overhead of setjmp is much higher than of explicit checks of
> return code.
> So checking return codes will not actually add some noticeable overhead
> except code complication by adding extra checks.
> But in can be hidden in macros which are used in any case (like MUTATE).
>
>
> 7. How vectorized scan can be combined with parallel execution (it is
>> already supported in9.6, isn't it?)
>>
>
> We didn't implement it yet. But the idea is the same as non parallel one.
> Copy the current parallel scan and implement vectorized Gather, keeping
> their interface to be VectorTupleTableSlot.
> Our basic idea to reuse most of the current PG executor logic, and make
> them vectorized, then tuning performance gradually.
>
>
> Parallel scan is scattering pages between parallel workers.
> To fill VectorTupleSlot with data you may need more than one page (unless
> you make a decision that it can fetch tuples only from single page).
> So it should be somehow take in account specific of parallel search.
> Also there is special nodes for parallel search so if we want to provide
> parallel execution for vectorized operations we need also to substitute
> this nodes with
> custom nodes.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com 
> 
> The Russian Postgres Company
>
>

-- 
Thanks

Hubert Zhang


Re: Rework manipulation and structure of attribute mappings

2019-12-08 Thread Michael Paquier
On Fri, Dec 06, 2019 at 06:03:12PM +0900, Amit Langote wrote:
> On Wed, Dec 4, 2019 at 5:03 PM Michael Paquier  wrote:
>> I see.  That saved me some time, thanks.  It is not really intuitive
>> to name routines about tuple conversion from tupconvert.c to
>> attrmap.c, so I'd think about renaming those routines as well, like
>> build_attrmap_by_name/position instead. That's more consistent with
>> the rest I added.
> 
> Sorry I don't understand this.  Do you mean we should rename the
> routines left behind in tupconvert.c?  For example,
> convert_tuples_by_name() doesn't really "convert" tuples, only builds
> a map needed to do so.  Maybe build_tuple_conversion_map_by_name()
> would be a more suitable name.

I had no plans to touch this area nor to rename this layer because
that was a bit out of the original scope of this patch which is to
remove the confusion and random bets with map lengths.  I see your
point though and actually a name like what you are suggesting reflects
better what the routine does in reality. :p

> Maybe we don't need to repeat here what AttrMap is used for (it's
> already written in attmap.c), only write what it is and why it's
> needed in the first place.  Maybe like this:
> 
> /*
>  * Attribute mapping structure
>  *
>  * This maps attribute numbers between a pair of relations, designated 'input'
>  * and 'output' (most typically inheritance parent and child relations), whose
>  * common columns have different attribute numbers.  Such difference may arise
>  * due to the columns being ordered differently in the two relations or the
>  * two relations having dropped columns at different positions.
>  *
>  * 'maplen' is set to the number of attributes of the 'output' relation,
>  * taking into account any of its dropped attributes, with the corresponding
>  * elements of the 'attnums' array set to zero.
>  */

That sounds better, thanks.

While on it, I have also spent some time checking after the FK-related
points that I suspected as fishy at the beginning of the thread but I
have not been able to break it.  We also have coverage for problems
related to dropped columns in foreign_key.sql (grep for fdrop1), which
is more than enough.  There was actually one extra issue in the patch
as of CloneFkReferencing() when filling in mapped_conkey based on the
number of keys in the constraint.

So, a couple of hours after looking at the code I am finishing with
the updated and indented version attached.  What do you think?
--
Michael
From 73bc37de86640b5e5df682d142d404ebba056b84 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 9 Dec 2019 11:51:33 +0900
Subject: [PATCH v3] Rework attribute mappings

---
 src/backend/access/common/Makefile |   1 +
 src/backend/access/common/attmap.c | 320 +
 src/backend/access/common/tupconvert.c | 287 ++
 src/backend/catalog/index.c|  12 +-
 src/backend/catalog/partition.c|  11 +-
 src/backend/catalog/pg_constraint.c|   1 -
 src/backend/commands/indexcmds.c   |  16 +-
 src/backend/commands/tablecmds.c   |  98 +++
 src/backend/executor/execMain.c|  22 +-
 src/backend/executor/execPartition.c   |  57 ++--
 src/backend/jit/llvm/llvmjit_expr.c|   1 -
 src/backend/parser/parse_utilcmd.c |  18 +-
 src/backend/replication/logical/relation.c |  10 +-
 src/backend/replication/logical/worker.c   |   9 +-
 src/backend/rewrite/rewriteManip.c |  12 +-
 src/include/access/attmap.h|  52 
 src/include/access/tupconvert.h|  13 +-
 src/include/catalog/index.h|   2 +-
 src/include/parser/parse_utilcmd.h |   3 +-
 src/include/replication/logicalrelation.h  |   2 +-
 src/include/rewrite/rewriteManip.h |   3 +-
 21 files changed, 531 insertions(+), 419 deletions(-)
 create mode 100644 src/backend/access/common/attmap.c
 create mode 100644 src/include/access/attmap.h

diff --git a/src/backend/access/common/Makefile b/src/backend/access/common/Makefile
index 6c9c6f3256..fd74e14024 100644
--- a/src/backend/access/common/Makefile
+++ b/src/backend/access/common/Makefile
@@ -13,6 +13,7 @@ top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = \
+	attmap.o \
 	bufmask.o \
 	detoast.o \
 	heaptuple.o \
diff --git a/src/backend/access/common/attmap.c b/src/backend/access/common/attmap.c
new file mode 100644
index 00..eebb7e76d7
--- /dev/null
+++ b/src/backend/access/common/attmap.c
@@ -0,0 +1,320 @@
+/*-
+ *
+ * attmap.c
+ *	  Attribute mapping support.
+ *
+ * This file provides utility routines to build and manage attribute mappings
+ * by comparing input and output TupleDescs.  Such mappings are typically used
+ * by DDL operating on inheritance and partition trees to convert fully
+ * transformed expression trees from parent rowtype to child rowtype or
+ * vic

Re: [PATCH] Finally split StdRdOptions into HeapOptions and ToastOptions

2019-12-08 Thread Michael Paquier
On Fri, Dec 06, 2019 at 06:18:48PM +0300, Nikolay Shaplov wrote:
> In the thread 
> https://www.postgresql.org/message-id/2620882.s52SJui4ql@x200m
> I've suggested to split one big StdRdOption that is used for options storage 
> into into Options structures individual for each relkind and each relam
> 
> And here goes the last part of StrRdOptions removal patch, where StdRdOptions 
> is replaced with HeapOptions and ToastOptions.

-typedef struct StdRdOptions
+/*
+ * HeapOptions
+ * Binary representation of relation options for Heap relations.
+ */
+typedef struct HeapOptions

I think that it makes sense to split relation options dedicated to
heap into their own parsing structure, because those options are
actually related to the table AM heap.  However, I think that this
patch is not ambitious enough in the work which is done and that
things could move into a more generic direction.  At the end of the
day, I'd like to think that we should have something like:
- Heap-related reloptions are built as part of its AM handler in
heapam_handler.c, with reloptions.c holding no more references to
heap.  At all.
- The table AM option parsing follows a model close to what is done
for indexes in terms of option parsing, moving the responsibility to
define relation options to each table AM.
- Toast is an interesting case, as table AMs may want to use toast
tables.  Or not.  Robert may be able to comment more on that as he has
worked in this area for bd12499.
--
Michael


signature.asc
Description: PGP signature


Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Kyotaro Horiguchi
Hello.

At Mon, 9 Dec 2019 01:30:33 +, Ranier Vilela  wrote 
in 
> >I don't think I'm actually on board with the goal here.
> Ok, I understand.
> 
> >Basically, if we take this seriously, we're throwing away the notion of
> >nested variable scopes and programming as if we had just a flat namespace.
> >That wasn't any fun when we had to do it back in assembly-code days, and
> >I don't see a good reason to revert to that methodology today.
> In general I think the use global variables its a bad design. But I 
> understand the use.

The file-scoped variable is needed to be process-persistent in any
way. If we inhibit them, the upper-modules need to create the
persistent area instead, for example, by calling XLogInitGlobals() or
such, which makes things messier. Globality doens't necessarily mean
evil and there're reasons for -Wall doesn't warn the case. I believe
we, and especially committers are not who should be kept away from
knives for the reason that knives generally have a possibility to
injure someone.

> >In a few of these cases, like the RedoRecPtr changes, there *might* be
> >an argument that there's room for confusion about whether the code could
> >have meant to reference the similarly-named global variable.  But it's
> >just silly to make that argument in places like your substitution of
> >/days/ndays/ in date.c.
> I would rather fix everything, including days name.

I might be too accustomed there, but the functions that define
overriding locals don't modify the local variables and only the
functions that don't override the globals modifies the glboals.  I see
no significant confusion here.  By the way changes like "conf_file" ->
"conffile" seems really useless as a fix patch.

> >Based on this sample, I reject the idea that we ought to be trying to
> >eliminate this class of warnings just because somebody's compiler can be
> >induced to emit them.  If you want to make a case-by-case argument that
> >particular situations of this sort are bad programming style, let's have
> >that argument by all means.  But it needs to be case-by-case, not just
> >dropping a large patch on us containing a bunch of unrelated changes
> >and zero specific justification for any of them.
> This is why I suggested activating the alert in the development and review 
> process, so that any cases that arose would be corrected very early.

I don't think it contributes to the argument on programming style in
any way.

> >IOW, I don't think you've taken to heart Robert's upthread advice that
> >this needs to be case-by-case and based on literary not mechanical
> >considerations.
> Ok, right.
> But I was working on the second class of shadow variables, which are local 
> variables, within the function itself, where the patch would lead to a 
> removal of the variable declaration, maintaining the same logic and 
> functionality, which would lead to better performance and reduction. of 
> memory usage as well as very small.
> In that case, too, would it have to be case by case?
> Wow, there are many and many shadow variables ...

As Robert said, they are harmless as far as we notice. Actual bugs
caused by variable overriding would be welcomed to fix. I don't
believe "lead to better performance and reduction (of code?)" without
an evidence since modern compilers I think are not so stupid. Even if
any, performance change in such extent doesn't support the proposal to
remove variable overrides that way.

> >BTW, if we do go forward with changing the RedoRecPtr uses, I'm not
> >in love with "XRedoRecPtr" as a replacement parameter name; it conveys
> >nothing much, and the "X" prefix is already overused in that area of
> >the code.  Perhaps "pRedoRecPtr" would be a better choice?  Or maybe
> >make the local variables be all-lower-case "redorecptr", which would
> >fit well in context in places like
> pRedoRecPtr, It's perfect for me.

Anyway I strongly object to the name 'pRedoRecPtr', which suggests as
if it is a C-pointer to some variable. (And I believe we use Hungarian
notation only if we don't have a better way...)  LatestRedoRecPtr
looks better to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Fix a comment in CreateLockFile

2019-12-08 Thread Amit Kapila
On Sun, Dec 8, 2019 at 4:32 PM Amit Kapila  wrote:
>
> On Sun, Dec 8, 2019 at 1:10 PM Hadi Moshayedi  wrote:
> >
> > It seems that explanation for the contents of the pid file has moved to 
> > pidfile.h, but the comments in CreateLockFile() still point to miscadmin.h.
> >
> > The attached patch updates those pointers.
> >
>
> Your patch looks correct to me on a quick look.  I will take of this tomorrow.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Mark Dilger



On 12/8/19 10:25 AM, Mark Dilger wrote:

I was
still expecting multiple patches, perhaps named along the
lines of:

   unshadow.RedoRecPtr.patch.1
   unshadow.wal_segment_size.patch.1
   unshadow.synchronous_commit.patch.1
   unshadow.wrconn.patch.1
   unshadow.progname.patch.1
   unshadow.am_syslogger.patch.1
   unshadow.days.patch.1
   unshadow.months.patch.1

etc.  I'm uncomfortable giving you negative feedback of this
sort, since I think you are working hard to improve postgres
and I really appreciate it, so later tonight I'll try to come
back, split your patch for you as described, add an entry to
the commitfest if you haven't already, and mark myself as a
reviewer.


To start off, I've taken just six of the 22 or so variables
that you renamed and created patches for them.  I'm not
endorsing these in any way.  I chose these mostly based on
which ones showed up first in your patch file, with one
exception.

I stopped when I got to 'progname' => 'prog_name' as the
whole exercise was getting too absurd even for me.  That
clearly looks like one where the structure of the code
needs to be reconsidered, rather than just renaming stuff.

I'll create the commitfest entry based on this email once
this has been sent.

Patches attached.

--
Mark Dilger
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e4e9ce7398..ae6f6261f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7158,11 +7158,11 @@ StartupXLOG(void)
 
if (info == XLOG_CHECKPOINT_SHUTDOWN)
{
-   CheckPoint  checkPoint;
+   CheckPoint  xcheckPoint;
 
-   memcpy(&checkPoint, 
XLogRecGetData(xlogreader), sizeof(CheckPoint));
-   newTLI = 
checkPoint.ThisTimeLineID;
-   prevTLI = 
checkPoint.PrevTimeLineID;
+   memcpy(&xcheckPoint, 
XLogRecGetData(xlogreader), sizeof(CheckPoint));
+   newTLI = 
xcheckPoint.ThisTimeLineID;
+   prevTLI = 
xcheckPoint.PrevTimeLineID;
}
else if (info == XLOG_END_OF_RECOVERY)
{
diff --git a/src/interfaces/ecpg/preproc/descriptor.c 
b/src/interfaces/ecpg/preproc/descriptor.c
index a29f530327..da3291baca 100644
--- a/src/interfaces/ecpg/preproc/descriptor.c
+++ b/src/interfaces/ecpg/preproc/descriptor.c
@@ -73,7 +73,7 @@ ECPGnumeric_lvalue(char *name)
 static struct descriptor *descriptors;
 
 void
-add_descriptor(char *name, char *connection)
+add_descriptor(char *name, char *conn_str)
 {
struct descriptor *new;
 
@@ -85,18 +85,18 @@ add_descriptor(char *name, char *connection)
new->next = descriptors;
new->name = mm_alloc(strlen(name) + 1);
strcpy(new->name, name);
-   if (connection)
+   if (conn_str)
{
-   new->connection = mm_alloc(strlen(connection) + 1);
-   strcpy(new->connection, connection);
+   new->connection = mm_alloc(strlen(conn_str) + 1);
+   strcpy(new->connection, conn_str);
}
else
-   new->connection = connection;
+   new->connection = conn_str;
descriptors = new;
 }
 
 void
-drop_descriptor(char *name, char *connection)
+drop_descriptor(char *name, char *conn_str)
 {
struct descriptor *i;
struct descriptor **lastptr = &descriptors;
@@ -108,9 +108,9 @@ drop_descriptor(char *name, char *connection)
{
if (strcmp(name, i->name) == 0)
{
-   if ((!connection && !i->connection)
-   || (connection && i->connection
-   && strcmp(connection, i->connection) == 
0))
+   if ((!conn_str && !i->connection)
+   || (conn_str && i->connection
+   && strcmp(conn_str, i->connection) == 
0))
{
*lastptr = i->next;
if (i->connection)
@@ -126,7 +126,7 @@ drop_descriptor(char *name, char *connection)
 
 struct descriptor
   *
-lookup_descriptor(char *name, char *connection)
+lookup_descriptor(char *name, char *conn_str)
 {
struct descriptor *i;
 
@@ -137,9 +137,9 @@ lookup_descriptor(char *name, char *connection)
{
if (strcmp(name, i->name) == 0)
{
-   if ((!connection && !i->connection)
-   || (connection && i->connection
-  

Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-08 Thread Mark Dilger




On 12/8/19 8:50 PM, Mark Dilger wrote:



On 12/8/19 10:25 AM, Mark Dilger wrote:

I was
still expecting multiple patches, perhaps named along the
lines of:

   unshadow.RedoRecPtr.patch.1
   unshadow.wal_segment_size.patch.1
   unshadow.synchronous_commit.patch.1
   unshadow.wrconn.patch.1
   unshadow.progname.patch.1
   unshadow.am_syslogger.patch.1
   unshadow.days.patch.1
   unshadow.months.patch.1

etc.  I'm uncomfortable giving you negative feedback of this
sort, since I think you are working hard to improve postgres
and I really appreciate it, so later tonight I'll try to come
back, split your patch for you as described, add an entry to
the commitfest if you haven't already, and mark myself as a
reviewer.


To start off, I've taken just six of the 22 or so variables
that you renamed and created patches for them.  I'm not
endorsing these in any way.  I chose these mostly based on
which ones showed up first in your patch file, with one
exception.

I stopped when I got to 'progname' => 'prog_name' as the
whole exercise was getting too absurd even for me.  That
clearly looks like one where the structure of the code
needs to be reconsidered, rather than just renaming stuff.

I'll create the commitfest entry based on this email once
this has been sent.

Patches attached.


The commitfest item now exists at

  https://commitfest.postgresql.org/26/2371/

--
Mark Dilger




Re: backup manifests

2019-12-08 Thread Jeevan Chalke
On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia 
wrote:

>
>
> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas  wrote:
>
>> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia 
>> wrote:
>> > Here is the whole stack of patches.
>>
>> I committed 0001, as that's just refactoring and I think (hope) it's
>> uncontroversial. I think 0002-0005 need to be squashed together
>> (crediting all authors properly and in the appropriate order) as it's
>> quite hard to understand right now,
>
>
> Please find attached single patch and I tried to add the credit to all
> the authors.
>

I had a look over the patch and here are my few review comments:

1.
+if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
+manifest_checksums = MC_SHA256;
+else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") == 0)
+manifest_checksums = MC_CRC32C;
+else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
+manifest_checksums = MC_NONE;
+else
+ereport(ERROR,

Is NONE is a valid input? I think the default is "NONE" only and thus no
need
of this as an input. It will be better if we simply error out if input is
neither "SHA256" nor "CRC32C".

I believe you have done this way as from pg_basebackup you are always
passing
MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
given. But I think passing that conditional will be better like we have
maxrate_clause for example.

Well, this is what I think, feel free to ignore as I don't see any
correctness
issue over here.


2.
+if (manifest_checksums != MC_NONE)
+{
+checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
+switch (manifest_checksums)
+{
+case MC_NONE:
+break;
+}

Since switch case is within "if (manifest_checksums != MC_NONE)" condition,
I don't think we need a case for MC_NONE here. Rather we can use a default
case to error out.


3.
+if (manifest_checksums != MC_NONE)
+{
+initialize_manifest_checksum(&cCtx);
+update_manifest_checksum(&cCtx, content, len);
+}

@@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
*tarfilename, struct stat *statbuf
 intsegmentno = 0;
 char   *segmentpath;
 boolverify_checksum = false;
+ChecksumCtx cCtx;
+
+initialize_manifest_checksum(&cCtx);


I see that in a few cases you are calling
initialize/update_manifest_checksum()
conditional and at some other places call is unconditional. It seems like
calling unconditional will not have any issues as switch cases inside them
return doing nothing when manifest_checksums is MC_NONE.


4.
initialize/update/finalize_manifest_checksum() functions may be needed by
the
validation patch as well. And thus I think these functions should not depend
on a global variable as such. Also, it will be good if we keep them in a
file
that is accessible to frontend-only code. Well, you can ignore these
comments
with the argument saying that this refactoring can be done by the patch
adding
validation support. I have no issues. Since both the patches are dependent
and
posted on the same email chain, thought of putting that observation.


5.
+switch (manifest_checksums)
+{
+case MC_SHA256:
+checksumlabel = "SHA256:";
+break;
+case MC_CRC32C:
+checksumlabel = "CRC32C:";
+break;
+case MC_NONE:
+break;
+}

This code in AddFileToManifest() is executed for every file for which we are
adding an entry. However, the checksumlabel will be going to remain the same
throughout. Can it be set just once and then used as is?


6.
Can we avoid manifest_checksums from declaring it as a global variable?
I think for that, we need to pass that to every function and thus need to
change the function signature of various functions. Currently, we pass
"StringInfo manifest" to all the required function, will it better to pass
the struct variable instead? A struct may have members like,
"StringInfo manifest" in it, checksum type (manifest_checksums),
checksum label, etc.


Thanks
-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


ALTER TABLE support for dropping generation expression

2019-12-08 Thread Peter Eisentraut
A small add-on to the generated columns feature:  Add an ALTER TABLE 
subcommand for dropping the generated property from a column, per SQL 
standard.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 8aa4710e1fdda3c605c1dd0c839f3cc7ff0b3918 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 9 Dec 2019 08:41:43 +0100
Subject: [PATCH] ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION

Add an ALTER TABLE subcommand for dropping the generated property from
a column, per SQL standard.
---
 doc/src/sgml/ref/alter_table.sgml   | 18 ++
 src/backend/catalog/sql_features.txt|  2 +-
 src/backend/commands/tablecmds.c| 82 -
 src/backend/parser/gram.y   | 20 +-
 src/bin/psql/tab-complete.c |  2 +-
 src/include/nodes/parsenodes.h  |  1 +
 src/include/parser/kwlist.h |  1 +
 src/test/regress/expected/generated.out | 72 ++
 src/test/regress/sql/generated.sql  | 28 +
 9 files changed, 222 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 8403c797e2..4bf449587c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -46,6 +46,7 @@
 ALTER [ COLUMN ] column_name 
SET DEFAULT expression
 ALTER [ COLUMN ] column_name 
DROP DEFAULT
 ALTER [ COLUMN ] column_name 
{ SET | DROP } NOT NULL
+ALTER [ COLUMN ] column_name 
DROP EXPRESSION [ IF EXISTS ]
 ALTER [ COLUMN ] column_name 
ADD GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( 
sequence_options ) ]
 ALTER [ COLUMN ] column_name 
{ SET GENERATED { ALWAYS | BY DEFAULT } | SET 
sequence_option | RESTART [ [ WITH ] restart ] } [...]
 ALTER [ COLUMN ] column_name 
DROP IDENTITY [ IF EXISTS ]
@@ -241,6 +242,23 @@ Description
 

 
+   
+DROP EXPRESSION [ IF EXISTS ]
+
+ 
+  This form turns a stored generated column into a normal base column.
+  Existing data in the columns is retained, but future changes will no
+  longer apply the generation expression.
+ 
+
+ 
+  If DROP EXPRESSION IF EXISTS is specified and the
+  column is not a stored generated column, no error is thrown.  In this
+  case a notice is issued instead.
+ 
+
+   
+

 ADD GENERATED { ALWAYS | BY DEFAULT } AS 
IDENTITY
 SET GENERATED { ALWAYS | BY DEFAULT }
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index ab3e381cff..9f840ddfd2 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -252,7 +252,7 @@ F381Extended schema manipulation03  ALTER 
TABLE statement: DROP CONSTRAINT clau
 F382   Alter column data type  YES 
 F383   Set column not null clause  YES 
 F384   Drop identity property clause   YES 
-F385   Drop column generation expression clauseNO  
+F385   Drop column generation expression clauseYES 
 F386   Set identity column generation clause   YES 
 F391   Long identifiersYES 
 F392   Unicode escapes in identifiers  YES 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5440eb9015..0820c661da 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -386,6 +386,7 @@ static ObjectAddress ATExecAddIdentity(Relation rel, const 
char *colName,
 static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName,
   Node 
*def, LOCKMODE lockmode);
 static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE lockmode);
+static ObjectAddress ATExecDropExpression(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE lockmode);
 static void ATPrepSetStatistics(Relation rel, const char *colName, int16 
colNum,
Node *newValue, 
LOCKMODE lockmode);
 static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, 
int16 colNum,
@@ -3673,6 +3674,7 @@ AlterTableGetLockLevel(List *cmds)
case AT_AddIdentity:
case AT_DropIdentity:
case AT_SetIdentity:
+   case AT_DropExpression:
cmd_lockmode = AccessExclusiveLock;
break;
 
@@ -3947,6 +3949,11 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd 
*cmd,
/* No command-specific prep needed */
pass = AT_PASS_COL_ATTRS;
break;
+   case AT_DropExpression: /* ALTER COLUMN DROP EXPR

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-12-08 Thread Dilip Kumar
On Mon, Dec 2, 2019 at 2:01 PM Dilip Kumar  wrote:
>
> On Sun, Dec 1, 2019 at 7:58 AM Michael Paquier  wrote:
> >
> > On Fri, Nov 22, 2019 at 01:18:11PM +0530, Dilip Kumar wrote:
> > > I have rebased the patch on the latest head and also fix the issue of
> > > "concurrent abort handling of the (sub)transaction." and attached as
> > > (v1-0013-Extend-handling-of-concurrent-aborts-for-streamin) along with
> > > the complete patch set.  I have added the version number so that we
> > > can track the changes.
> >
> > The patch has rotten a bit and does not apply anymore.  Could you
> > please send a rebased version?  I have moved it to next CF, waiting on
> > author.
>
> I have rebased the patch set on the latest head.
>
I have review the patch set and here are few comments/questions

1.
+static void
+pg_decode_stream_change(LogicalDecodingContext *ctx,
+ ReorderBufferTXN *txn,
+ Relation relation,
+ ReorderBufferChange *change)
+{
+ OutputPluginPrepareWrite(ctx, true);
+ appendStringInfo(ctx->out, "streaming change for TXN %u", txn->xid);
+ OutputPluginWrite(ctx, true);
+}

Should we show the tuple in the streamed change like we do for the
pg_decode_change?

2. pg_logical_slot_get_changes_guts
It recreate the decoding slot [ctx =
CreateDecodingContext(InvalidXLogRecPtr] but doesn't set the streaming
to false, should we pass a parameter to
pg_logical_slot_get_changes_guts saying whether we want streamed results or not

3.
+ XLogRecPtr prev_lsn = InvalidXLogRecPtr;
  ReorderBufferChange *change;
  ReorderBufferChange *specinsert = NULL;

@@ -1565,6 +1965,16 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
  Relation relation = NULL;
  Oid reloid;

+ /*
+ * Enforce correct ordering of changes, merged from multiple
+ * subtransactions. The changes may have the same LSN due to
+ * MULTI_INSERT xlog records.
+ */
+ if (prev_lsn != InvalidXLogRecPtr)
+ Assert(prev_lsn <= change->lsn);
+
+ prev_lsn = change->lsn;
I did not understand, how this change is relavent to this patch

4.
+ /*
+ * TOCHECK: We have to rebuild historic snapshot to be sure it includes all
+ * information about subtransactions, which could arrive after streaming start.
+ */
+ if (!txn->is_schema_sent)
+ snapshot_now = ReorderBufferCopySnap(rb, txn->base_snapshot,
+ txn, command_id);

In which case, txn->is_schema_sent will be true, because at the end of
the stream in ReorderBufferExecuteInvalidations we are always setting
it false,
so while sending next stream it will always be false.  That means we
never required snapshot_now variable in ReorderBufferTXN.

5.
@@ -2299,6 +2746,23 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer
*rb, TransactionId xid,
  txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);

  txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
+
+ /*
+ * We read catalog changes from WAL, which are not yet sent, so
+ * invalidate current schema in order output plugin can resend
+ * schema again.
+ */
+ txn->is_schema_sent = false;

Same as point 4, during decode time it will never be true.

6.
+ /* send fields */
+ pq_sendint64(out, commit_lsn);
+ pq_sendint64(out, txn->end_lsn);
+ pq_sendint64(out, txn->commit_time);

Commit_time and end_lsn is used in standby_feedback


7.
+ /* FIXME optimize the search by bsearch on sorted data */
+ for (i = nsubxacts; i > 0; i--)
+ {
+ if (subxacts[i - 1].xid == subxid)
+ {
+ subidx = (i - 1);
+ found = true;
+ break;
+ }
+ }
We can not rollback intermediate subtransaction without rollbacking
latest sub-transaction, so why do we need
to search in the array?  It will always be the the last subxact no?

8.
+ /*
+ * send feedback to upstream
+ *
+ * XXX Probably should send a valid LSN. But which one?
+ */
+ send_feedback(InvalidXLogRecPtr, false, false);

Why feedback is sent for every change?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com