Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila  wrote:
>
> On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar  wrote:
> >
> > Apart from that, I have also extended the solution for the page lock.
> > And, I have also broken down the 3rd patch in two parts for relation
> > extension and for the page lock.
> >
>
> Thanks, I have made a number of cosmetic changes and written
> appropriate commit messages for all patches.  See the attached patch
> series and let me know your opinion? BTW, did you get a chance to test
> page locks by using the extension which I have posted above or by some
> other way?  I think it is important to test page-lock related patches
> now.

I have reviewed the updated patches and looks fine to me.  Apart from
this I have done testing for the Page Lock using group locking
extension.

--Setup
create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
create index gin_test_idx on gin_test_tbl using gin (i);
create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off);
create index gin_test_idx1 on gin_test_tbl1 using gin (i);

--session1:
select become_lock_group_leader();
select gin_clean_pending_list('gin_test_idx');

--session2:
select become_lock_group_member(session1_pid);
select gin_clean_pending_list('gin_test_idx1');

--session3:
select become_lock_group_leader();
select gin_clean_pending_list('gin_test_idx1');

--session4:
select become_lock_group_member(session3_pid);
select gin_clean_pending_list('gin_test_idx');

ERROR:  deadlock detected
DETAIL:  Process 61953 waits for ExclusiveLock on page 0 of relation
16399 of database 13577; blocked by process 62197.
Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of
database 13577; blocked by process 61953.
HINT:  See server log for query details.


Session1 and Session3 acquire the PageLock on two different index's
meta-pages and blocked in gdb,  meanwhile, their member tries to
acquire the page lock as shown in the above example and it detects the
deadlock which is solved after applying the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: backend type in log_line_prefix?

2020-03-15 Thread Justin Pryzby
On Fri, Mar 13, 2020 at 10:22:52PM +0100, Peter Eisentraut wrote:
> >Can I suggest:
> >
> >-   appendCSVLiteral(&buf, MyBgworkerEntry->bgw_type);
> >+   appendCSVLiteral(&buf, MyBgworkerEntry->bgw_name);
> 
> The difference is intentional.  bgw_type is so that you can filter and group
> by type.  The bgw_name could be totally different for each instance.

I found 5373bc2a0867048bb78f93aede54ac1309b5e227

Your patch adds bgw_type, which is also in pg_stat_activity, so I agree it's
good to allow include it log_line_prefix and CSV.

I suggest the CSV/log should also have the leader_pid, corresponding to
| b025f32e0b Add leader_pid to pg_stat_activity

With the attached on top of your patch, CSV logs like:

2020-03-14 22:09:39.395 
CDT,"pryzbyj","template1",17030,"[local]",5e6d9c69.4286,2,"idle",2020-03-14 
22:09:29 CDT,3/23,0,LOG,0,"statement: explain analyze SELECT COUNT(1), a.a 
FROM t a JOIN t b ON a.a=b.a GROUP BY 2;","psql","client backend",
2020-03-14 22:09:43.094 CDT,,,17042,,5e6d9c73.4292,1,,2020-03-14 22:09:39 
CDT,4/3,0,LOG,0,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17042.0"", 
size 4694016",,"explain analyze SELECT COUNT(1), a.a FROM t a JOIN t b ON 
a.a=b.a GROUP BY 2;",,,"psql","parallel worker",17030
2020-03-14 22:09:43.094 CDT,,,17043,,5e6d9c73.4293,1,,2020-03-14 22:09:39 
CDT,5/3,0,LOG,0,"temporary file: path ""base/pgsql_tmp/pgsql_tmp17043.0"", 
size 4694016",,"explain analyze SELECT COUNT(1), a.a FROM t a JOIN t b ON 
a.a=b.a GROUP BY 2;",,,"psql","parallel worker",17030

As for my question "what's using/trying/failing to use parallel workers", I was
able to look into that by parsing "Workers Planned/Launched" from autoexplain.
It's not a *good* way to do it, but I don't see how to do better and I don't
see any way this patch can improve that.

-- 
Justin
>From 5b111a3a6cd5282dbdf1e504ef648994b1918302 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 14 Mar 2020 19:09:54 -0500
Subject: [PATCH v1 1/2] Fix previous commit..

---
 doc/src/sgml/config.sgml   | 2 +-
 src/backend/utils/error/elog.c | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 53d0c480aa..fc0e2c00c3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6460,7 +6460,7 @@ local0.*/var/log/postgresql
 
  %b
  Backend process type
- yes
+ no
 
 
  %u
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 3a6f7f9456..62eef7b71f 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -72,7 +72,6 @@
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "postmaster/postmaster.h"
 #include "postmaster/syslogger.h"
@@ -2503,7 +2502,7 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 	else if (MyBackendType == B_BG_WORKER)
 		backend_type_str = MyBgworkerEntry->bgw_type;
 	else
-		backend_type_str = pgstat_get_backend_desc(MyBackendType);
+		backend_type_str = GetBackendTypeDesc(MyBackendType);
 
 	if (padding != 0)
 		appendStringInfo(buf, "%*s", padding, backend_type_str);
@@ -2947,7 +2946,7 @@ write_csvlog(ErrorData *edata)
 	else if (MyBackendType == B_BG_WORKER)
 		appendCSVLiteral(&buf, MyBgworkerEntry->bgw_type);
 	else
-		appendCSVLiteral(&buf, pgstat_get_backend_desc(MyBackendType));
+		appendCSVLiteral(&buf, GetBackendTypeDesc(MyBackendType));
 
 	appendStringInfoChar(&buf, '\n');
 
-- 
2.17.0

>From 30172a4771360fd6ed4f4646651efffa3785dfb7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v1 2/2] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  |  8 +++-
 src/backend/utils/error/elog.c| 47 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fc0e2c00c3..d53b62e2df 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6487,6 +6487,11 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %k
+ Leader PID for a parallel process, NULL otherwise
+ yes
+
 
  %t
  Time stamp without milliseconds
@@ -6777,7 +6782,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 H

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Justin Pryzby
On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > Having now played with the patch, I'll suggest that 1000 is too high a
> > threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't 
> > be
> > much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum 
> > GUC.
> 
> ISTM that the danger of regressing workloads due to suddenly repeatedly
> scanning huge indexes that previously were never / rarely scanned is
> significant

You're right - at one point, I was going to argue to skip index cleanup, and I
think wrote that before I finished convincing myself why it wasn't ok to skip.

-- 
Justin




Re: backend type in log_line_prefix?

2020-03-15 Thread Peter Eisentraut

On 2020-03-13 22:24, Peter Eisentraut wrote:

On 2020-03-10 19:07, Alvaro Herrera wrote:

I like these patches; the first two are nice cleanup.

My only gripe is that pgstat_get_backend_desc() is not really a pgstat
function; I think it should have a different name with a prototype in
miscadmin.h (next to the enum's new location, which I would put
someplace near the "pmod.h" comment rather than where you put it;
perhaps just above the AuxProcType definition), and implementation
probably in miscinit.c.


I have committed the refactoring patches with adjustments along these
lines.  The patch with the log_line_prefix and csvlog enhancements is
still under discussion.


I have committed that last one also, after some corrections.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backend type in log_line_prefix?

2020-03-15 Thread Peter Eisentraut

On 2020-03-15 10:57, Justin Pryzby wrote:

I suggest the CSV/log should also have the leader_pid, corresponding to
| b025f32e0b Add leader_pid to pg_stat_activity


I haven't followed those developments.  It sounds interesting, but I 
suggest you start a new thread or continue in the thread that added 
leader_pid.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar  wrote:
>
> On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar  wrote:
> > >
> > > Apart from that, I have also extended the solution for the page lock.
> > > And, I have also broken down the 3rd patch in two parts for relation
> > > extension and for the page lock.
> > >
> >
> > Thanks, I have made a number of cosmetic changes and written
> > appropriate commit messages for all patches.  See the attached patch
> > series and let me know your opinion? BTW, did you get a chance to test
> > page locks by using the extension which I have posted above or by some
> > other way?  I think it is important to test page-lock related patches
> > now.
>
> I have reviewed the updated patches and looks fine to me.  Apart from
> this I have done testing for the Page Lock using group locking
> extension.
>
> --Setup
> create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
> create index gin_test_idx on gin_test_tbl using gin (i);
> create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off);
> create index gin_test_idx1 on gin_test_tbl1 using gin (i);
>
> --session1:
> select become_lock_group_leader();
> select gin_clean_pending_list('gin_test_idx');
>
> --session2:
> select become_lock_group_member(session1_pid);
> select gin_clean_pending_list('gin_test_idx1');
>
> --session3:
> select become_lock_group_leader();
> select gin_clean_pending_list('gin_test_idx1');
>
> --session4:
> select become_lock_group_member(session3_pid);
> select gin_clean_pending_list('gin_test_idx');
>
> ERROR:  deadlock detected
> DETAIL:  Process 61953 waits for ExclusiveLock on page 0 of relation
> 16399 of database 13577; blocked by process 62197.
> Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of
> database 13577; blocked by process 61953.
> HINT:  See server log for query details.
>
>
> Session1 and Session3 acquire the PageLock on two different index's
> meta-pages and blocked in gdb,  meanwhile, their member tries to
> acquire the page lock as shown in the above example and it detects the
> deadlock which is solved after applying the patch.

I have modified 0001 and 0002 slightly,  Basically, instead of two
function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created
a one function.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v8-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-o.patch
Description: Binary data


v8-0003-Allow-relation-extension-lock-to-conflict-among-p.patch
Description: Binary data


v8-0004-Allow-page-lock-to-conflict-among-parallel-group-.patch
Description: Binary data


v8-0002-Add-assert-to-ensure-that-page-locks-don-t-partic.patch
Description: Binary data


expose parallel leader in CSV and log_line_prefix

2020-03-15 Thread Justin Pryzby
See also:
https://commitfest.postgresql.org/27/2390/
https://www.postgresql.org/message-id/flat/caobau_yy5bt0vtpz2_lum6cucgeqmynoj8-rgto+c2+w3de...@mail.gmail.com
b025f32e0b Add leader_pid to pg_stat_activity

-- 
Justin
>From 5268e89fb32fbb639cb39796729dfe0dfcf14705 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Fri, 13 Mar 2020 22:03:06 -0500
Subject: [PATCH v2] Include the leader PID in logfile

See also: b025f32e0b, which adds the leader PID to pg_stat_activity
---
 doc/src/sgml/config.sgml  | 11 +-
 src/backend/utils/error/elog.c| 20 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3cac340f32..f6ded2bc45 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6491,6 +6491,14 @@ local0.*/var/log/postgresql
  Process ID
  no
 
+
+ %k
+ Process ID of the parallel group leader if this
+ process is or was involved in parallel query, null
+ otherwise.  For a parallel group leader, this field is
+ set to its own process ID.
+ no
+
 
  %t
  Time stamp without milliseconds
@@ -6789,7 +6797,7 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, and backend type.
+application name, backend type, and leader PID.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -6819,6 +6827,7 @@ CREATE TABLE postgres_log
   location text,
   application_name text,
   backend_type text,
+  leader_pid integer,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 62eef7b71f..0f5881355d 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -77,6 +77,7 @@
 #include "postmaster/syslogger.h"
 #include "storage/ipc.h"
 #include "storage/proc.h"
+#include "storage/procarray.h"
 #include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
@@ -2560,6 +2561,18 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, "%d", MyProcPid);
 break;
+
+			case 'k':
+if (MyBackendType != B_BG_WORKER)
+	; /* Do nothing */
+else if (!MyProc->lockGroupLeader)
+	; /* Do nothing */
+else if (padding != 0)
+	appendStringInfo(buf, "%*d", padding, MyProc->lockGroupLeader->pid);
+else
+	appendStringInfo(buf, "%d", MyProc->lockGroupLeader->pid);
+break;
+
 			case 'l':
 if (padding != 0)
 	appendStringInfo(buf, "%*ld", padding, log_line_number);
@@ -2948,6 +2961,13 @@ write_csvlog(ErrorData *edata)
 	else
 		appendCSVLiteral(&buf, GetBackendTypeDesc(MyBackendType));
 
+	appendStringInfoChar(&buf, ',');
+
+	/* leader PID */
+	if (MyBackendType == B_BG_WORKER &&
+			MyProc->lockGroupLeader)
+		appendStringInfo(&buf, "%d", MyProc->lockGroupLeader->pid);
+
 	appendStringInfoChar(&buf, '\n');
 
 	/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index aa44f0c9bf..6874792a15 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -530,6 +530,7 @@
 	#   %h = remote host
 	#   %b = backend type
 	#   %p = process ID
+	#   %k = leader PID
 	#   %t = timestamp without milliseconds
 	#   %m = timestamp with milliseconds
 	#   %n = timestamp with milliseconds (as a Unix epoch)
-- 
2.17.0



Re: expose parallel leader in CSV and log_line_prefix

2020-03-15 Thread Julien Rouhaud
On Sun, Mar 15, 2020 at 06:18:31AM -0500, Justin Pryzby wrote:
> See also:
> https://commitfest.postgresql.org/27/2390/
> https://www.postgresql.org/message-id/flat/caobau_yy5bt0vtpz2_lum6cucgeqmynoj8-rgto+c2+w3de...@mail.gmail.com
> b025f32e0b Add leader_pid to pg_stat_activity


FTR this is a followup of 
https://www.postgresql.org/message-id/20200315095728.GA26184%40telsasoft.com

+1 for the feature.  Regarding the patch:


+   case 'k':
+   if (MyBackendType != B_BG_WORKER)
+   ; /* Do nothing */


Isn't the test inverted?  Also a bgworker could run parallel queries through
SPI I think, should we really ignore bgworkers?

+   else if (!MyProc->lockGroupLeader)
+   ; /* Do nothing */


There should be a test that MyProc isn't NULL.

+   else if (padding != 0)
+   appendStringInfo(buf, "%*d", padding, 
MyProc->lockGroupLeader->pid);
+   else
+   appendStringInfo(buf, "%d", MyProc->lockGroupLeader->pid);
+   break;

I think that if padding was asked we should append spaces rather than doing
nothing.




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Amit Kapila
On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar  wrote:
>
> On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar  wrote:
> > >
> > > Apart from that, I have also extended the solution for the page lock.
> > > And, I have also broken down the 3rd patch in two parts for relation
> > > extension and for the page lock.
> > >
> >
> > Thanks, I have made a number of cosmetic changes and written
> > appropriate commit messages for all patches.  See the attached patch
> > series and let me know your opinion? BTW, did you get a chance to test
> > page locks by using the extension which I have posted above or by some
> > other way?  I think it is important to test page-lock related patches
> > now.
>
> I have reviewed the updated patches and looks fine to me.  Apart from
> this I have done testing for the Page Lock using group locking
> extension.
>
> --Setup
> create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
> create index gin_test_idx on gin_test_tbl using gin (i);
> create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off);
> create index gin_test_idx1 on gin_test_tbl1 using gin (i);
>
> --session1:
> select become_lock_group_leader();
> select gin_clean_pending_list('gin_test_idx');
>
> --session2:
> select become_lock_group_member(session1_pid);
> select gin_clean_pending_list('gin_test_idx1');
>
> --session3:
> select become_lock_group_leader();
> select gin_clean_pending_list('gin_test_idx1');
>
> --session4:
> select become_lock_group_member(session3_pid);
> select gin_clean_pending_list('gin_test_idx');
>
> ERROR:  deadlock detected
> DETAIL:  Process 61953 waits for ExclusiveLock on page 0 of relation
> 16399 of database 13577; blocked by process 62197.
> Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of
> database 13577; blocked by process 61953.
> HINT:  See server log for query details.
>
>
> Session1 and Session3 acquire the PageLock on two different index's
> meta-pages and blocked in gdb,  meanwhile, their member tries to
> acquire the page lock as shown in the above example and it detects the
> deadlock which is solved after applying the patch.
>

So, in this test, you have first performed the actions from Session-1
and Session-3 (blocked them via GDB after acquiring page lock) and
then performed the actions from Session-2 and Session-4, right?
Though this is not a very realistic case, it proves the point that
page locks don't participate in the deadlock cycle after the patch.  I
think we can do a few more tests that test other aspects of the patch.

1. Group members wait for page locks.  If you test that the leader
acquires the page lock and then member also tries to acquire the same
lock on the same index, it wouldn't block before the patch, but after
the patch, the member should wait for the leader to release the lock.
2. Try to hit Assert in LockAcquireExtended (a) by trying to
re-acquire the page lock via the debugger, (b) try to acquire the
relation extension lock after page lock and it should be allowed
(after acquiring page lock, we take relation extension lock in
following code path:
ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: Additional improvements to extended statistics

2020-03-15 Thread Dean Rasheed
On Sun, 15 Mar 2020 at 00:08, Tomas Vondra  wrote:
>
> On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote:
> >
> >Attached is a patch series rebased on top of the current master, after
> >committing the ScalarArrayOpExpr enhancements. I've updated the OR patch
> >to get rid of the code duplication, and barring objections I'll get it
> >committed shortly together with the two parts improving test coverage.
>
> I've pushed the two patches improving test coverage for functional
> dependencies and MCV lists, which seems mostly non-controversial. I'll
> wait a bit more with the two patches actually changing behavior (rebased
> version attached, to keep cputube happy).
>

Patch 0001 looks to be mostly ready. Just a couple of final comments:

+   if (is_or)
+   simple_sel = clauselist_selectivity_simple_or(root,
stat_clauses, varRelid,
+ jointype,
sjinfo, NULL, 1.0);
+   else

Surely that should be passing 0.0 as the final argument, otherwise it
will always just return simple_sel = 1.0.


+*
+* XXX We can't multiply with current value, because for OR clauses
+* we start with 0.0, so we simply assign to s1 directly.
+*/
+   s = statext_clauselist_selectivity(root, clauses, varRelid,
+  jointype, sjinfo, rel,
+  &estimatedclauses, true);

That final part of the comment is no longer relevant (variable s1 no
longer exists). Probably it could now just be deleted, since I think
there are sufficient comments elsewhere to explain what's going on.

Otherwise it looks good, and I think this will lead to some very
worthwhile improvements.

Regards,
Dean




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Amit Kapila
On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar  wrote:
>
> I have modified 0001 and 0002 slightly,  Basically, instead of two
> function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created
> a one function.
>

+CheckAndSetLockHeld(LOCALLOCK *locallock, bool value)

Can we rename the parameter as lock_held, acquired or something like
that so that it indicates what it intends to do and probably add a
comment for that variable atop of function?

There is some work left related to testing some parts of the patch and
I can do some more review, but it started to look good to me, so I am
planning to push this in the coming week (say by Wednesday or so)
unless there are some major comments.  There are primarily two parts
of the patch-series (a) Assert that we don't acquire a heavyweight
lock on another object after relation extension lock. (b) Allow
relation extension lock to conflict among the parallel group members.
On similar lines there are two patches for page locks.

I think we have discussed in detail about LWLock approach and it seems
that it might be tricky than we initially thought especially with some
of the latest findings where we have noticed that there are multiple
cases where we can try to re-acquire the relation extension lock and
other things which we have discussed.  Also, all of us don't agree
with that idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila  wrote:
>
> On Sun, Mar 15, 2020 at 1:15 PM Dilip Kumar  wrote:
> >
> > On Sat, Mar 14, 2020 at 7:39 PM Amit Kapila  wrote:
> > >
> > > On Fri, Mar 13, 2020 at 7:02 PM Dilip Kumar  wrote:
> > > >
> > > > Apart from that, I have also extended the solution for the page lock.
> > > > And, I have also broken down the 3rd patch in two parts for relation
> > > > extension and for the page lock.
> > > >
> > >
> > > Thanks, I have made a number of cosmetic changes and written
> > > appropriate commit messages for all patches.  See the attached patch
> > > series and let me know your opinion? BTW, did you get a chance to test
> > > page locks by using the extension which I have posted above or by some
> > > other way?  I think it is important to test page-lock related patches
> > > now.
> >
> > I have reviewed the updated patches and looks fine to me.  Apart from
> > this I have done testing for the Page Lock using group locking
> > extension.
> >
> > --Setup
> > create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
> > create index gin_test_idx on gin_test_tbl using gin (i);
> > create table gin_test_tbl1(i int4[]) with (autovacuum_enabled = off);
> > create index gin_test_idx1 on gin_test_tbl1 using gin (i);
> >
> > --session1:
> > select become_lock_group_leader();
> > select gin_clean_pending_list('gin_test_idx');
> >
> > --session2:
> > select become_lock_group_member(session1_pid);
> > select gin_clean_pending_list('gin_test_idx1');
> >
> > --session3:
> > select become_lock_group_leader();
> > select gin_clean_pending_list('gin_test_idx1');
> >
> > --session4:
> > select become_lock_group_member(session3_pid);
> > select gin_clean_pending_list('gin_test_idx');
> >
> > ERROR:  deadlock detected
> > DETAIL:  Process 61953 waits for ExclusiveLock on page 0 of relation
> > 16399 of database 13577; blocked by process 62197.
> > Process 62197 waits for ExclusiveLock on page 0 of relation 16400 of
> > database 13577; blocked by process 61953.
> > HINT:  See server log for query details.
> >
> >
> > Session1 and Session3 acquire the PageLock on two different index's
> > meta-pages and blocked in gdb,  meanwhile, their member tries to
> > acquire the page lock as shown in the above example and it detects the
> > deadlock which is solved after applying the patch.
> >
>
> So, in this test, you have first performed the actions from Session-1
> and Session-3 (blocked them via GDB after acquiring page lock) and
> then performed the actions from Session-2 and Session-4, right?

Yes

> Though this is not a very realistic case, it proves the point that
> page locks don't participate in the deadlock cycle after the patch.  I
> think we can do a few more tests that test other aspects of the patch.
>
> 1. Group members wait for page locks.  If you test that the leader
> acquires the page lock and then member also tries to acquire the same
> lock on the same index, it wouldn't block before the patch, but after
> the patch, the member should wait for the leader to release the lock.

Okay, I will test this part.

> 2. Try to hit Assert in LockAcquireExtended (a) by trying to
> re-acquire the page lock via the debugger,

I am not sure whether it is true or not,  Because, if we are holding
the page lock and we try the same page lock then the lock will be
granted without reaching the code path.  However, I agree that this is
not intended instead this is a side effect of allowing relation
extension lock while holding the same relation extension lock.  So
basically, now the situation is that if the lock is directly granted
because we are holding the same lock then it will not go to the assert
code.  IMHO, we don't need to add extra code to make it behave
differently.  Please let me know what is your opinion on this.

(b) try to acquire the
> relation extension lock after page lock and it should be allowed
> (after acquiring page lock, we take relation extension lock in
> following code path:
> ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer).

ok

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila  wrote:
>
> On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar  wrote:
> >
> > I have modified 0001 and 0002 slightly,  Basically, instead of two
> > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created
> > a one function.
> >
>
> +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value)
>
> Can we rename the parameter as lock_held, acquired or something like
> that so that it indicates what it intends to do and probably add a
> comment for that variable atop of function?

Done

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v9-0002-Add-assert-to-ensure-that-page-locks-don-t-partic.patch
Description: Binary data


v9-0003-Allow-relation-extension-lock-to-conflict-among-p.patch
Description: Binary data


v9-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-o.patch
Description: Binary data


v9-0004-Allow-page-lock-to-conflict-among-parallel-group-.patch
Description: Binary data


Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-15 Thread Tom Lane
Pavel Stehule  writes:
> Tom Lane  napsal:
>> Yeah, that's what I said.  But does it really add anything beyond the
>> proposed text "A function returning a polymorphic type must have at least
>> one matching polymorphic argument"?  I don't think it'd be terribly
>> helpful to say "A function returning anyelement must have at least one
>> anyelement, anyarray, anynonarray, anyenum, or anyrange argument", and
>> for sure such an error message would be a pain to maintain.

> The error message in your first patch is ok for all types without anyrange.
> A behave of this type is more strict and +/- different than from other
> polymorphic types.

Well, here's a version that does it like that, but personally I find these
messages too verbose and not an improvement on what I had before.

(This is also rebased over the stuff I committed yesterday.)

regards, tom lane

diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index 0b7face..7d887ea 100644
--- a/src/backend/catalog/pg_aggregate.c
+++ b/src/backend/catalog/pg_aggregate.c
@@ -93,8 +93,6 @@ AggregateCreate(const char *aggName,
 	Oid			mfinalfn = InvalidOid;	/* can be omitted */
 	Oid			sortop = InvalidOid;	/* can be omitted */
 	Oid		   *aggArgTypes = parameterTypes->values;
-	bool		hasPolyArg;
-	bool		hasInternalArg;
 	bool		mtransIsStrict = false;
 	Oid			rettype;
 	Oid			finaltype;
@@ -103,6 +101,7 @@ AggregateCreate(const char *aggName,
 	int			nargs_finalfn;
 	Oid			procOid;
 	TupleDesc	tupDesc;
+	char	   *detailmsg;
 	int			i;
 	ObjectAddress myself,
 referenced;
@@ -131,36 +130,33 @@ AggregateCreate(const char *aggName,
 			   FUNC_MAX_ARGS - 1,
 			   FUNC_MAX_ARGS - 1)));
 
-	/* check for polymorphic and INTERNAL arguments */
-	hasPolyArg = false;
-	hasInternalArg = false;
-	for (i = 0; i < numArgs; i++)
-	{
-		if (IsPolymorphicType(aggArgTypes[i]))
-			hasPolyArg = true;
-		else if (aggArgTypes[i] == INTERNALOID)
-			hasInternalArg = true;
-	}
-
 	/*
 	 * If transtype is polymorphic, must have polymorphic argument also; else
 	 * we will have no way to deduce the actual transtype.
 	 */
-	if (IsPolymorphicType(aggTransType) && !hasPolyArg)
+	detailmsg = check_valid_polymorphic_signature(aggTransType,
+  aggArgTypes,
+  numArgs);
+	if (detailmsg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("cannot determine transition data type"),
- errdetail("An aggregate using a polymorphic transition type must have at least one polymorphic argument.")));
+ errdetail_internal("%s", detailmsg)));
 
 	/*
 	 * Likewise for moving-aggregate transtype, if any
 	 */
-	if (OidIsValid(aggmTransType) &&
-		IsPolymorphicType(aggmTransType) && !hasPolyArg)
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
- errmsg("cannot determine transition data type"),
- errdetail("An aggregate using a polymorphic transition type must have at least one polymorphic argument.")));
+	if (OidIsValid(aggmTransType))
+	{
+		detailmsg = check_valid_polymorphic_signature(aggmTransType,
+	  aggArgTypes,
+	  numArgs);
+		if (detailmsg)
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
+	 errmsg("cannot determine transition data type"),
+	 errdetail_internal("%s", detailmsg)));
+	}
 
 	/*
 	 * An ordered-set aggregate that is VARIADIC must be VARIADIC ANY.  In
@@ -492,12 +488,14 @@ AggregateCreate(const char *aggName,
 	 * that itself violates the rule against polymorphic result with no
 	 * polymorphic input.)
 	 */
-	if (IsPolymorphicType(finaltype) && !hasPolyArg)
+	detailmsg = check_valid_polymorphic_signature(finaltype,
+  aggArgTypes,
+  numArgs);
+	if (detailmsg)
 		ereport(ERROR,
 (errcode(ERRCODE_DATATYPE_MISMATCH),
  errmsg("cannot determine result data type"),
- errdetail("An aggregate returning a polymorphic type "
-		   "must have at least one polymorphic argument.")));
+ errdetail_internal("%s", detailmsg)));
 
 	/*
 	 * Also, the return type can't be INTERNAL unless there's at least one
@@ -505,11 +503,14 @@ AggregateCreate(const char *aggName,
 	 * for regular functions, but at the level of aggregates.  We must test
 	 * this explicitly because we allow INTERNAL as the transtype.
 	 */
-	if (finaltype == INTERNALOID && !hasInternalArg)
+	detailmsg = check_valid_internal_signature(finaltype,
+			   aggArgTypes,
+			   numArgs);
+	if (detailmsg)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_FUNCTION_DEFINITION),
  errmsg("unsafe use of pseudo-type \"internal\""),
- errdetail("A function returning \"internal\" must have at least one \"internal\" argument.")));
+ errdetail_internal("%s", detailmsg)));
 
 	/*
 	 * If a moving-aggregate implementation is supplied, look up its finalfn
diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 423fd79..0cac936 100644
--- a/src/bac

Re: [PATCH] Erase the distinctClause if the result is unique by definition

2020-03-15 Thread Andy Fan
Hi All:

I have re-implemented the patch based on David's suggestion/code,  Looks it
works well.   The updated patch mainly includes:

1. Maintain the not_null_colno in RelOptInfo, which includes the not null
from
catalog and the not null from vars.
2. Add the restictinfo check at populate_baserel_uniquekeys. If we are sure
  about only 1 row returned, I add each expr in rel->reltarget->expr as a
unique key.
  like (select a, b, c from t where pk = 1),  the uk will be ( (a), (b),
(c) )
3. postpone the propagate_unique_keys_to_joinrel call to
populate_joinrel_with_paths
  since we know jointype at that time. so we can handle the semi/anti join
specially.
4. Add the rule I suggested above,  if both of the 2 relation yields the a
unique result,
  the join result will be unique as well. the UK can be  ( (rel1_uk1,
rel1_uk2).. )
5. If the unique key is impossible to be referenced by others,  we can
safely ignore
it in order  to keep the (join)rel->unqiuekeys short.
6. I only consider the not null check/opfamily check for the uniquekey
which comes
   from UniqueIndex.  I think that should be correct.
7. I defined each uniquekey as List of Expr,  so I didn't introduce new
node type.
8. checked the uniquekeys's information before create_distinct_paths and
   create_group_paths ignore the new paths to be created if the
sortgroupclauses
is unique already or else create it and add the new uniquekey to the
distinctrel/grouprel.

There are some things I still be in-progress, like:
1. Partition table.
2. union/union all
3. maybe refactor the is_innerrel_unqiue_for/query_is_distinct_for to use
UniqueKey
4. if we are sure the groupby clause is unique, and we have aggregation
call, maybe we
should try Bapat's suggestion,  we can use sort rather than hash.  The
strategy sounds
awesome,  but I didn't check the details so far.
5. more clearer commit message.
6. any more ?

Any feedback is welcome, Thanks for you for your any ideas, suggestions,
demo code!

Best Regards
Andy Fan


v4-0001-Patch-Bypass-distinctClause-groupbyClause-if-the-.patch
Description: Binary data


Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-15 Thread Fabien COELHO


Hello Justin,

Some feedback on v10:

All patches apply cleanly, one on top of the previous. I really wish there 
would be less than 9 patches…


* v10.1 doc change: ok

* v10.2 doc change: ok, not sure why it is not merged with previous

* v10.3 test add: could be merge with both previous

Tests seem a little contrived. I'm wondering whether something more 
straightforward could be proposed. For instance, once the tablespace is 
just created but not used yet, probably we do know that the tmp file 
exists and is empty?


* v10.4 at least, some code!

Compiles, make check ok.

pg_ls_dir_files: I'm fine with the flag approach given the number of 
switches and the internal nature of the function.


I'm not sure of the "FLAG_" prefix which seems too generic, even if it is 
local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix.


ISTM that Pg style requires spaces around operators. I'd suggest some 
parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags & 
FLAG_MISSING_OK)" and other instances.


About:

 if (S_ISDIR(attrib.st_mode)) {
   if (flags&FLAG_SKIP_DIRS)
 continue;
  }

and similars, why not the simpler:

 if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS))
continue;

Especially that it is done like that in previous cases.

Maybe I'd create defines for long and common flag specs, eg:

 #define ..._LS_SIMPLE 
(FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA)

No attempt at recursing.

I'm not sure about these asserts:

   /* isdir depends on metadata */
   Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA));

Hmmm. Why?

   /* Unreasonable to show isdir and skip dirs */
   Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS));

Hmmm. Why would I prevent that, even if it has little sense, it should 
work. I do not see having false on the isdir column as an actual issue.


* v10.5 add is_dir column, a few tests & doc.

Ok.

* v10.6 behavior change for existing functions, always show isdir column,
and removal of IS_DIR flag.

I'm unsure why the features are removed, some use case may benefit from 
the more complete function?


Maybe flags defs should not be changed anyway?

I do not like much the "if (...) /* empty */;" code. Maybe it could be 
caught more cleanly later in the conditional structure.


* v10.7 adds "pg_ls_dir_recurse" function

Using sql recurse to possibly to implement the feature is pretty elegant
and limits open directories to one at a time, which is pretty neat.

Doc looks incomplete and the example is very contrived and badly indented.

The function definition does not follow the style around: uppercase 
whereas all others are lowercase, "" instead of '', no "as"…


I do not understand why oid 8511 is given to the new function.

I do not understand why UNION ALL and not UNION.

I would have put the definition after "pg_ls_dir_metadata" definition.

pg_ls_dir_metadata seems defined as (text,bool,bool) but called as 
(text,bool,bool,bool).


Maybe a better alias could be given instead of x?

There are no tests for the new function. I'm not sure it would work.

* v10.8 change flags & add test on pg_ls_logdir().

I'm unsure why it is done at this stage.

* v10.9 change some ls functions and fix patch 10.7 issue

I'm unsure why it is done at this stage. "make check" ok.

--
Fabien.

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-15 Thread James Coleman
On Sat, Mar 14, 2020 at 10:55 PM James Coleman  wrote:
>
> On Fri, Mar 13, 2020 at 1:06 PM James Coleman  wrote:
> >
> > On Thu, Mar 12, 2020 at 5:53 PM Alvaro Herrera  
> > wrote:
> > >
> > > I gave this a very quick look; I don't claim to understand it or
> > > anything, but I thought these trivial cleanups worthwhile.  The only
> > > non-cosmetic thing is changing order of arguments to the SOn_printf()
> > > calls in 0008; I think they are contrary to what the comment says.
> >
> > Yes, I think you're correct (re: 0008).
> >
> > They all look generally good to me, and are included in the attached
> > patch series.
>
> I just realized something about this (unsure if in Alvaro's or in my
> applying that) broke make check pretty decently (3 test files broken,
> also much slower, and the incremental sort test returns a lot of
> obviously broken results).
>
> I'll take a look tomorrow and hopefully get a fix (probably will reply
> to the more recent subthread's though).

This took a bit of manually just excluding changes until I got a
red/green set because nothing in the patch set looked all incorrect.
But it turns out this change breaks things:

- if (tuplesort_gettupleslot(read_sortstate, ScanDirectionIsForward(dir),
- false, slot,
NULL) || node->finished)
+ if (node->finished ||
+  tuplesort_gettupleslot(read_sortstate, ScanDirectionIsForward(dir),
+ false, slot, NULL))

I believe what's happening here is that we need the
tuplesort_gettupleslot to set the slot to a NULL tuple (if there
aren't any left) before we return the slot, but that returns false, so
the node->finished check is to ensure that the first time that method
nulls out the slot and returns false we still return the value in
slot.

Since this isn't obvious, I'll add a comment. I think I'm also going
to rename node->finished to be more clear.

In this debugging I also noticed that we don't set node->finished back
to false in rescan, which I assume is a bug, but I don't really
understand a whole lot about rescan. IIRC from some previous
discussions rescan exists for things like cursors, where you can move
back and forth over the result set. Assuming that's the case, do we
need explicit tests for cursors using incremental sort? Is there a
good strategy for how much to do there (since I don't want to
duplicate every non-cursor functional test).

Thanks,
James




Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-15 Thread Pavel Stehule
ne 15. 3. 2020 v 17:48 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > Tom Lane  napsal:
> >> Yeah, that's what I said.  But does it really add anything beyond the
> >> proposed text "A function returning a polymorphic type must have at
> least
> >> one matching polymorphic argument"?  I don't think it'd be terribly
> >> helpful to say "A function returning anyelement must have at least one
> >> anyelement, anyarray, anynonarray, anyenum, or anyrange argument", and
> >> for sure such an error message would be a pain to maintain.
>
> > The error message in your first patch is ok for all types without
> anyrange.
> > A behave of this type is more strict and +/- different than from other
> > polymorphic types.
>
> Well, here's a version that does it like that, but personally I find these
> messages too verbose and not an improvement on what I had before.
>

There was a problem just with anyrange type. This last version looks
perfect.

Regards

Pavel


> (This is also rebased over the stuff I committed yesterday.)
>
> regards, tom lane
>
>


Re: WAL usage calculation patch

2020-03-15 Thread Kirill Bychik
> > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik  
> > > wrote:
> > > > I wanted to keep the patch small and simple, and fit to practical
> > > > needs. This patch is supposed to provide tuning assistance, catching
> > > > an io heavy query in commit-bound situation.
> > > > Total WAL usage per DB can be assessed rather easily using other means.
> > > > Let's get this change into the codebase and then work on connecting
> > > > WAL usage to  (auto)vacuum stats.
> > >
> > > I agree that having a view of the full activity is a way bigger scope,
> > > so it could be done later (and at this point in pg14), but I'm still
> > > hoping that we can get insight of other backend WAL activity, such as
> > > autovacuum, in pg13.
> >
> > How do you think this information should be exposed? Via the 
> > pg_stat_statement?
>
> That's unlikely, since autovacuum won't trigger any hook.  I was
> thinking on some new view for pgstats, similarly to the example I
> showed previously. The implementation is straightforward, although
> pg_stat_database is maybe not the best choice here.

After extensive thinking and some code diving, I did not manage to
come up with a sane idea on how to expose data about autovacuum WAL
usage. Must be the flu.

> > Anyways, I believe this change could be bigger than FPI. I propose to
> > plan a separate patch for it, or even add it to the TODO after the
> > core patch of wal usage is merged.
>
> Just in case, if the problem is a lack of time, I'd be happy to help
> on that if needed.  Otherwise, I'll definitely not try to block any
> progress for the feature as proposed.

Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.

> > Please expect a new patch version next week, with FPI counters added.

Please find attached patch version 003, with FP writes and minor
corrections. Hope i use attachment versioning as expected in this
group :)

Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.

Thanks!
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4fa446ffa4..c975aa0dc7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -42,6 +42,7 @@
 #include "commands/progress.h"
 #include "commands/tablespace.h"
 #include "common/controldata_utils.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -1231,6 +1232,13 @@ XLogInsertRecord(XLogRecData *rdata,
 	ProcLastRecPtr = StartPos;
 	XactLastRecEnd = EndPos;
 
+	/* Provide WAL update data to the instrumentation */
+	if (inserted)
+	{
+		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+		pgWalUsage.wal_records++;
+	}
+
 	return EndPos;
 }
 
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 2fa0a7f667..1f71cc0a76 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -25,6 +25,7 @@
 #include "access/xloginsert.h"
 #include "catalog/pg_control.h"
 #include "common/pg_lzcompress.h"
+#include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "replication/origin.h"
@@ -635,6 +636,11 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 			 */
 			bkpb.fork_flags |= BKPBLOCK_HAS_IMAGE;
 
+			/*
+			 * Report a full page image constructed for the WAL record
+			 */
+			pgWalUsage.wal_fp_records++;
+
 			/*
 			 * Construct XLogRecData entries for the page content.
 			 */
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a753d6efa0..017367878f 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -62,6 +62,7 @@
 #define PARALLEL_KEY_DSAUINT64CONST(0xE007)
 #define PARALLEL_KEY_QUERY_TEXT		UINT64CONST(0xE008)
 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE009)
+#define PARALLEL_KEY_WAL_USAGE			UINT64CONST(0xE00A)
 
 #define PARALLEL_TUPLE_QUEUE_SIZE		65536
 
@@ -573,6 +574,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 	char	   *pstmt_space;
 	char	   *paramlistinfo_space;
 	BufferUsage *bufusage_space;
+	WalUsage	*walusage_space;
 	SharedExecutorInstrumentation *instrumentation = NULL;
 	SharedJitInstrumentation *jit_instrumentation = NULL;
 	int			pstmt_len;
@@ -646,6 +648,13 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
 		   mul_size(sizeof(BufferUsage), pcxt->nworkers));
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 
+	/*
+	 * Same thing for WalUsage.
+	 */
+	shm_toc_estimate_chunk(&pcxt->estimator,
+		   mul_size(sizeof(WalUsage), pcxt->nworkers));
+	shm_toc_estimate_keys(&pcxt->estimator, 1);
+
 	/* Estimate space for tuple queue

Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2020-03-15 Thread James Coleman
On Fri, Mar 13, 2020 at 4:22 PM Tom Lane  wrote:
>
> Alvaro Herrera  writes:
> > Also, I wonder if it would be better to modify our policies so that we
> > update typedefs.list more frequently.  Some people include additions
> > with their commits, but it's far from SOP.
>
> Perhaps.  My own workflow includes pulling down a fresh typedefs.list
> from the buildfarm (which is trivial to automate) and then adding any
> typedefs invented by the patch I'm working on.  The latter part of it
> makes it hard to see how the in-tree list would be very helpful; and
> if we started expecting patches to include typedef updates, I'm afraid
> we'd get lots of patch collisions in that file.
>
> I don't have any big objection to updating the in-tree list more often,
> but personally I wouldn't use it, unless we can find a better workflow.

How does the buildfarm automate generating the typedefs list? Would it
be relatively easy to incorporate that into a tool that someone could
use locally with pgindent?

Thanks,
James




Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2020-03-15 Thread Justin Pryzby
On Sun, Mar 15, 2020 at 06:15:02PM +0100, Fabien COELHO wrote:
> Some feedback on v10:

Thanks for looking.  I'm hoping to hear from Alvaro what he thinks of this
approach (all functions to show isdir, rather than one function which lists
recursively).

> All patches apply cleanly, one on top of the previous. I really wish there
> would be less than 9 patches…

I kept them separate to allow the earlier patches to be applied.
And intended to make easier to review, even if it's more work for me..

If you mean that it's a pain to apply 9 patches, I will suggest to use:
|git am -3 ./mailbox
where ./mailbox is either a copy of the mail you received, or retrieved from
the web interface.

To test that each one works (compiles, passes tests, etc), I use git rebase -i
HEAD~11 and "e"edit the target (set of) patches.

> * v10.1 doc change: ok
> 
> * v10.2 doc change: ok, not sure why it is not merged with previous

As I mentioned, separate since I'm proposing that they're backpatched to
different releases.  Those could be applied now (and Tom already applied a
patch identical to 0001 in a prior patchset).

> * v10.3 test add: could be merge with both previous

> Tests seem a little contrived. I'm wondering whether something more
> straightforward could be proposed. For instance, once the tablespace is just
> created but not used yet, probably we do know that the tmp file exists and
> is empty?

The tmpdir *doesn't* exist until someone creates tmpfiles there.
As it mentions:
+-- This tests the missing_ok parameter, which causes pg_ls_tmpdir to succeed 
even if the tmpdir doesn't exist yet

> * v10.4 at least, some code!
> I'm not sure of the "FLAG_" prefix which seems too generic, even if it is
> local. I'd suggest "LS_DIR_*", maybe, as a more specific prefix.

Done.

> ISTM that Pg style requires spaces around operators. I'd suggest some
> parenthesis would help as well, eg: "flags&FLAG_MISSING_OK" -> "(flags &
> FLAG_MISSING_OK)" and other instances.

Partially took your suggestion.

> About:
> 
>  if (S_ISDIR(attrib.st_mode)) {
>if (flags&FLAG_SKIP_DIRS)
>  continue;
>   }
> 
> and similars, why not the simpler:
> 
>  if (S_ISDIR(attrib.st_mode) && (flags & FLAG_SKIP_DIRS))
> continue;

That's not the same - if SKIP_DIRS isn't set, your way would fail that test for
dirs, and then hit the !ISREG test, and skip them anyway.
|else if (!S_ISREG(attrib.st_mode))
|   continue

> Maybe I'd create defines for long and common flag specs, eg:
> 
>  #define ..._LS_SIMPLE 
> (FLAG_SKIP_DIRS|FLAG_SKIP_HIDDEN|FLAG_SKIP_SPECIAL|FLAG_METADATA)

Done

> I'm not sure about these asserts:
> 
>/* isdir depends on metadata */
>Assert(!(flags&FLAG_ISDIR) || (flags&FLAG_METADATA));
> 
> Hmmm. Why?

It's not supported to show isdir without showing metadata (because that case
isn't needed to support the old and the new behaviors).

+   if (flags & FLAG_METADATA)
+   {
+   values[1] = Int64GetDatum((int64) attrib.st_size);
+   values[2] = 
TimestampTzGetDatum(time_t_to_timestamptz(attrib.st_mtime));
+   if (flags & FLAG_ISDIR)
+   values[3] = 
BoolGetDatum(S_ISDIR(attrib.st_mode));
+   }

>/* Unreasonable to show isdir and skip dirs */
>Assert(!(flags&FLAG_ISDIR) || !(flags&FLAG_SKIP_DIRS));
> 
> Hmmm. Why would I prevent that, even if it has little sense, it should work.
> I do not see having false on the isdir column as an actual issue.

It's unimportant, but testing for intended use of flags during development.

> * v10.6 behavior change for existing functions, always show isdir column,
> and removal of IS_DIR flag.
> 
> I'm unsure why the features are removed, some use case may benefit from the
> more complete function?
> Maybe flags defs should not be changed anyway?

Maybe.  I put them back...but it means they're not being exercized by any
*used* case.

> I do not like much the "if (...) /* empty */;" code. Maybe it could be
> caught more cleanly later in the conditional structure.

This went away when I put back the SKIP_DIRS flag.

> * v10.7 adds "pg_ls_dir_recurse" function

> Doc looks incomplete and the example is very contrived and badly indented.

Why you think it's contrived?  Listing a tmpdir recursively is the initial
motivation of this patch.  Maybe you think I should list just the tmpdir for
one tablespace ?  Note that for temp_tablespaces parameter:

|When there is more than one name in the list, PostgreSQL chooses a random 
member of the list each time a temporary object is to be created; except that 
within a transaction, successively created temporary objects are placed in 
successive tablespaces from the list.

> The function definition does not follow the style around: uppercase whereas
> all others are lowercase, "" instead of '', no "as"…

I used "" because of this:
| x.name||'/'||a.name
I don't know if there's a better way to join paths in SQL,

Re: Memory-Bounded Hash Aggregation

2020-03-15 Thread Jeff Davis
On Thu, 2020-03-12 at 16:01 -0500, Justin Pryzby wrote:
> I don't understand what's meant by "the chosen plan".
> Should it say, "at execution ..." instead of "execution time" ?

I removed that wording; hopefully it's more clear without it?

> Either remove "plan types" for consistency with
> enable_groupingsets_hash_disk,
> Or add it there.  Maybe it should say "when the memory usage would
> OTHERWISE BE
> expected to exceed.."

I added "plan types".

I don't think "otherwise be..." would quite work there. "Otherwise"
sounds to me like it's referring to another plan type (e.g.
Sort+GroupAgg), and that doesn't fit.

It's probably best to leave that level of detail out of the docs. I
think the main use case for enable_hashagg_disk is for users who
experience some plan changes and want the old behavior which favors
Sort when there are a lot of groups.

> +show_hashagg_info(AggState *aggstate, ExplainState *es)
> +{
> + Agg *agg   = (Agg *)aggstate->ss.ps.plan;
> + long memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
> 
> I see this partially duplicates my patch [0] to show memory stats for

...

> You added hash_mem_peak and hash_batches_used to struct AggState.
> In my 0001 patch, I added instrumentation to struct TupleHashTable

I replied in that thread and I'm not sure that tracking the memory in
the TupleHashTable is the right approach. The group keys and the
transition state data can't be estimated easily that way. Perhaps we
can do that if the THT owns the memory contexts (and can call
MemoryContextMemAllocated()), rather than using passed-in ones, but
that might require more discussion. (I'm open to that idea, by the
way.)

Also, my patch also considers the buffer space, so would that be a
third memory number?

For now, I think I'll leave the way I report it in a simpler form and
we can change it later as we sort out these details. That leaves mine
specific to HashAgg, but we can always refactor it later.

I did change my code to put the metacontext in a child context of its
own so that I could call MemoryContextMemAllocated() on it to include
it in the memory total, and that will make reporting it separately
easier when we want to do so.

> + if (from_tape)
> + partition_mem += HASHAGG_READ_BUFFER_SIZE;
> + partition_mem = npartitions * HASHAGG_WRITE_BUFFER_SIZE;
> 
> => That looks wrong ; should say += ?

Good catch! Fixed.

Regards,
Jeff Davis

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3cac340f323..9f7f7736665 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4462,6 +4462,23 @@ ANY num_sync ( 
+  enable_groupingsets_hash_disk (boolean)
+  
+   enable_groupingsets_hash_disk configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation plan
+types for grouping sets when the total size of the hash tables is
+expected to exceed work_mem.  See .  The default is
+off.
+   
+  
+ 
+
  
   enable_hashagg (boolean)
   
@@ -4476,6 +4493,21 @@ ANY num_sync ( 
+  enable_hashagg_disk (boolean)
+  
+   enable_hashagg_disk configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's use of hashed aggregation plan
+types when the memory usage is expected to exceed
+work_mem.  The default is on.
+   
+  
+ 
+
  
   enable_hashjoin (boolean)
   
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d901dc4a50e..58141d8393c 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -104,6 +104,7 @@ static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
 static void show_sort_info(SortState *sortstate, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
+static void show_hashagg_info(AggState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
 static void show_instrumentation_count(const char *qlabel, int which,
@@ -1882,6 +1883,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		case T_Agg:
 			show_agg_keys(castNode(AggState, planstate), ancestors, es);
 			show_upper_qual(plan->qual, "Filter", planstate, ancestors, es);
+			show_hashagg_info((AggState *) planstate, es);
 			if (plan->qual)
 show_instrumentation_count("Rows Removed by Filter", 1,
 		   planstate, es);
@@ -2769,6 +2771,41 @@ show_hash_info(HashState *hashstate, ExplainState *es)
 	}
 }
 
+/*
+ * Show information on hash aggregate memory usage and batches.
+ */
+static void
+show_hashagg_info(AggState *aggstate, ExplainState *es)
+{
+	Agg		*agg	   = (Agg *)aggstate->ss.ps.plan;
+	long	 memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024;
+
+	Assert(IsA(aggstate, AggState));
+

More weird stuff in polymorphic type resolution

2020-03-15 Thread Tom Lane
While poking at Pavel's "anycompatible" patch, I found a couple
more pre-existing issues having to do with special cases for
actual input type "anyarray".  Ordinarily that would be impossible
since we should have resolved "anyarray" to some specific array
type earlier; but you can make it happen by applying a function
to one of the "anyarray" columns of pg_statistic or pg_stats.

* I can provoke an assertion failure thus:

regression=# create function fooid(f1 anyarray) returns anyarray
as 'select $1' language sql;
CREATE FUNCTION
regression=# select fooid(stavalues1) from pg_statistic;
server closed the connection unexpectedly

The server log shows

TRAP: BadArgument("!IsPolymorphicType(rettype)", File: "functions.c", Line: 
1606)
postgres: postgres regression [local] 
SELECT(ExceptionalCondition+0x55)[0x8e85c5]
postgres: postgres regression [local] 
SELECT(check_sql_fn_retval+0x79b)[0x6664db]

The reason this happens is that the parser intentionally allows an
actual argument type of anyarray to match a declared argument type
of anyarray, whereupon the "resolved" function output type is also
anyarray.  We have some regression test cases that depend on that
behavior, so we can't just take it out.  However, check_sql_fn_retval
is assuming too much.  In a non-assert build, what happens is

ERROR:  42P13: return type anyarray is not supported for SQL functions
CONTEXT:  SQL function "fooid" during inlining
LOCATION:  check_sql_fn_retval, functions.c:1888

because the code after the assert properly rejects pseudotypes.
That seems fine, so I think it's sufficient to take out that assertion.
(Note: all the PLs throw errors successfully, so it's just SQL-language
functions with this issue.)

* There's some inconsistency between these cases:

regression=# create function foo1(f1 anyarray, f2 anyelement) returns bool
language sql as 'select $1[1] = f2';
CREATE FUNCTION
regression=# select foo1(stavalues1, 46) from pg_statistic;
ERROR:  function foo1(anyarray, integer) does not exist
LINE 1: select foo1(stavalues1, 46) from pg_statistic;
   ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.

regression=# create function foo2(f1 anyarray, f2 anyrange) returns bool
language sql as 'select $1[1] = lower(f2)';
CREATE FUNCTION
regression=# select foo2(stavalues1, int8range(42,46)) from pg_statistic;
ERROR:  argument declared anyrange is not consistent with argument declared 
anyelement
DETAIL:  int8range versus anyelement

The reason for the inconsistency is that parse_coerce.c has special cases
to forbid the combination of (a) matching an actual argument type of
anyarray to a declared anyarray while (b) also having an anyelement
argument.  That's to prevent the risk that the actual array element type
doesn't agree with the other argument.  But there's no similar restriction
for the combination of anyarray and anyrange arguments, which seems like
a clear oversight in the anyrange logic.  On reflection, in fact, we
should not allow matching an actual-argument anyarray if there are *any*
other pseudotype arguments, including another anyarray, because we can't
guarantee that the two anyarrays will contain matching argument types.

A rule like that would also justify not worrying about matching
anyarray to anycompatiblearray (as Pavel's patch already doesn't,
though not for any documented reason).  There is little point in using
anycompatiblearray unless there's at least one other polymorphic argument
to match it against, so if we're going to reject anyarray input in such
cases, there's no situation where it's useful to allow that.

Having said that, I'm not sure whether to prefer the first error, which
happens because check_generic_type_consistency decides the function
doesn't match at all, or the second, where check_generic_type_consistency
accepts the match and then enforce_generic_type_consistency spits up.
The second error isn't all that much better ... but maybe we could accept
the match and then make enforce_generic_type_consistency issue a more
on-point error?  That would carry some risk of creating ambiguous-function
errors where there were none before, but I doubt it's a big problem.
If that change makes us unable to pick a function, the same failure
would occur for a similar call involving a regular array column.

In any case I'm inclined not to back-patch a fix for the second issue,
since all it's going to do is exchange one error for another.  Not sure
about the assertion removal --- that wouldn't affect production builds,
but people doing development/testing on back branches might appreciate it.

Thoughts?

regards, tom lane




Re: control max length of parameter values logged

2020-03-15 Thread Alvaro Herrera
On 2020-Mar-14, Tom Lane wrote:

> Bruce Momjian  writes:
> > I am sorry --- I am confused.  Why are we truncating or allowing control
> > of truncation of BIND parameter values, but have no such facility for
> > queries.  Do we assume queries are shorter than BIND parameters, or is
> > it just that it is easier to trim BIND parameters than values embedded
> > in non-EXECUTE queries.
> 
> The cases that Alvaro was worried about were enormous values supplied
> via bind parameters.  We haven't heard comparable complaints about
> the statement text.

To be more precise, I have seen cases of enormous statement text, but
those are fixed precisely by moving the bulk to parameters.  So the
ability to trim the parameter is important.  I've never seen a very
large query without the bulk being parameterizable.

> Also, from a security standpoint, the contents
> of the statement text are way more critical than the contents of
> an out-of-line parameter; you can't do SQL injection from the latter.

That's a good point too.

> So I think the audience for trimming would be a lot smaller for
> statement-text trimming.

Nod.  (I think if we really wanted to trim queries, it would have to be
something semantically sensible, not just trim whatever is at the end of
the statement literal.  Say, only trim parts of the where clause that
are of the form "something op constant", and rules like that, plus put
placeholders to show that they were there.  This sounds a lot of work to
figure out usefully ...)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal: new polymorphic types - commontype and commontypearray

2020-03-15 Thread Tom Lane
Pavel Stehule  writes:
> ne 15. 3. 2020 v 17:48 odesílatel Tom Lane  napsal:
>> Well, here's a version that does it like that, but personally I find these
>> messages too verbose and not an improvement on what I had before.

> There was a problem just with anyrange type. This last version looks
> perfect.

If you think that "matching polymorphic types" is too vague, I'm
not sure there's much daylight between there and spelling it out
in full as this latest patch does.  "anyrange is the only problem"
might be a tenable viewpoint today, but once this patchset goes
in there's going to be much more scope for confusion about which
arguments potentially match a polymorphic result.

regards, tom lane




RE: Planning counters in pg_stat_statements (using pgss_store)

2020-03-15 Thread imai.yoshik...@fujitsu.com
On Sat, Mar 14, 2020 at 5:28 PM, Julien Rouhaud wrote:
> On Sat, Mar 14, 2020 at 03:04:00AM -0700, legrand legrand wrote:
> > imai.yoshik...@fujitsu.com wrote
> > > On Thu, Mar 12, 2020 at 6:37 PM, Julien Rouhaud wrote:
> > >> That's very interesting feedback, thanks!  I'm not a fan of giving a way
> > >> for
> > >> queries to say that they want to be ignored by pg_stat_statements, but
> > >> double
> > >> counting the planning counters seem even worse, so I'm +0.5 to accept
> > >> NULL
> > >> query string in the planner, incidentally making pgss ignore those.
> > >
> > > It is preferable that we can count various queries statistics as much as
> > > possible
> > > but if it causes overhead even when without using pg_stat_statements, we
> > > would
> > > not have to force them to set appropriate query_text.
> > > About settings a fixed string in query_text, I think it doesn't make much
> > > sense
> > > because users can't take any actions by seeing those queries' stats.
> > > Moreover, if
> > > we set a fixed string in query_text to avoid pg_stat_statement's errors,
> > > codes
> > > would be inexplicable for other developers who don't know there's such
> > > requirements.
> > > After all, I agree accepting NULL query string in the planner.
> > >
> > > I don't know it is useful but there are also codes that avoid an error
> > > when
> > > sourceText is NULL.
> > >
> > > executor_errposition(EState *estate, int location)
> > > {
> > > ...
> > > /* Can't do anything if source text is not available */
> > > if (estate == NULL || estate->es_sourceText == NULL)
> 
> 
> I'm wondering if that's really possible.  But pgss uses the QueryDesc, which
> should always have a query text (since pgss relies on that).

I cited those codes because I just wanted to say there's already an assumption
that query text in QueryDesc would be NULL, whether or not it is true.


> > My understanding of V5 patch, that checks for Non-Zero queryid,
> > manage properly case where sourceText is NULL.
> >
> > A NULL sourceText means that there was no Parsing for the associated
> > query, if there was no Parsing, there is no queryid (queryId=0),
> > and no planning counters update.
> >
> > It doesn't change pg_plan_query behaviour (no regression for Citus, IVM,
> > ...),
> > and was tested with success for IVM.
> >
> > If my understanding is wrong, then setting track_planning = false
> > would still be the work arround for the very rare (no core) extension(s)
> > that may hit the NULL query text assertion failure.
> >
> > What do you think about this ?
> 
> 
> I don't think that's a correct assumption.  I obviously didn't read all of
> citus extension, but it looks like what's happening is that they get generate 
> a
> custom Query from the original one, with all the modification needed for
> distributed execution and whatnot, which is then fed to the planner.  I think
> it's entirely mossible that the modified Query herits from a previously set
> queryid, while still not really having a query text.  And if citus doesn't do
> that, it doesn't seem like an illegal use cuse anyway.

Indeed. It can happen that queryid has some value while query_text is NULL.


> I'm instead attaching a v7 which removes the assert in pg_plan_query, and
> modify pgss_planner_hook to also ignore queries without a query text, as this
> seems the best option.

Thank you.
It also seems to me that is the best option.

BTW, I recheck the patchset.
I think codes are ready for committer but should we modify the documentation?
{min,max,mean,stddev}_time is now obsoleted so it is better to modify it to
{min,max,mean,stddev}_exec_time and add {min,max,mean,stddev}_plan_time.


--
Yoshikazu Imai





comments on elements of xlogctldata

2020-03-15 Thread Atsushi Torikoshi
Hi,

It seems the comments on SharedHotStandbyActive and
SharedRecoveryInProgress are the same in XLogCtlData.

How about modifying the comment on SharedHotStandbyActive?

Attached a patch.

Regards,
--
Torikoshi Atsushi


fix_comments_on_SharedHotStandbyActive_v1.patch
Description: Binary data


type of some table storage params on doc

2020-03-15 Thread Atsushi Torikoshi
Hi,

As far as I read the reloptions.c, autovacuum_vacuum_cost_delay,
autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor
are the members of relopt_real, so their type seems the same, real.

But the manual about storage parameters[1] says two of their type
are float4 and the other is floating point.

  > autovacuum_vacuum_cost_delay, toast.autovacuum_vacuum_cost_delay
(floating point)
  > autovacuum_vacuum_scale_factor, toast.autovacuum_vacuum_scale_factor
(float4)
  > autovacuum_analyze_scale_factor (float4)

And the manual about GUC says all these parameters are floating
point.

  > autovacuum_vacuum_cost_delay (floating point)
  > autovacuum_vacuum_scale_factor (floating point)
  > autovacuum_analyze_scale_factor (floating point)

Also other members of relopt_real such as seq_page_cost,
random_page_cost and vacuum_cleanup_index_scale_factor are
documented as floating point.


I think using float4 on storage parameters[1] are not consistent
so far, how about changing these parameters type from float4 to
floating point if there are no specific reasons using float4?


Attached a patch.
Any thought?

[1]https://www.postgresql.org/docs/devel/sql-createtable.htm
[2]https://www.postgresql.org/docs/devel/runtime-config-autovacuum.html


Regards,
--
Torikoshi Atsushi


fix_type_in_createtable_storage-params_v1.patch
Description: Binary data


Re: comments on elements of xlogctldata

2020-03-15 Thread Fujii Masao




On 2020/03/16 10:57, Atsushi Torikoshi wrote:

Hi,

It seems the comments on SharedHotStandbyActive and SharedRecoveryInProgress 
are the same in XLogCtlData.

How about modifying the comment on SharedHotStandbyActive?

Attached a patch.


Thanks for the report and patch!
The patch looks good to me.
I will commit it.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




add types to index storage params on doc

2020-03-15 Thread Atsushi Torikoshi
Hi,

The current manual on CREATE TABLE[1] describes storage
parameters with their types.
But manual on CREATE INDEX[2] describes storage parameters
WITHOUT their types.

I think it'll be better to add types to storage parameters
on CREATE INDEX for the consistency.

Attached a patch.
Any thought?

[1]
https://www.postgresql.org/docs/devel/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
[2]
https://www.postgresql.org/docs/devel/sql-createindex.html#SQL-CREAT`INDEX-STORAGE-PARAMETERS


Regards,
--
Torikoshi Atsushi
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 71f2bdb..c0653dd 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -369,7 +369,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-fillfactor
+fillfactor (integer)
  
   fillfactor storage parameter
  
@@ -400,7 +400,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-deduplicate_items
+deduplicate_items (boolean)
  
   deduplicate_items storage parameter
  
@@ -428,7 +428,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 

-vacuum_cleanup_index_scale_factor
+vacuum_cleanup_index_scale_factor (floating point)
  
   vacuum_cleanup_index_scale_factor
   storage parameter
@@ -448,7 +448,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-buffering
+buffering (string)
  
   buffering storage parameter
  
@@ -471,7 +471,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-fastupdate
+fastupdate (boolean)
  
   fastupdate storage parameter
  
@@ -499,7 +499,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-gin_pending_list_limit
+gin_pending_list_limit (integer)
  
   gin_pending_list_limit
   storage parameter
@@ -520,7 +520,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 

-pages_per_range
+pages_per_range (integer)
  
   pages_per_range storage parameter
  
@@ -535,7 +535,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 

-autosummarize
+autosummarize (boolean)
  
   autosummarize storage parameter
  


Re: ssl passphrase callback

2020-03-15 Thread Andreas Karlsson

On 2/18/20 11:39 PM, Andrew Dunstan wrote:

This should fix the issue, it happened when I switched to using a
pre-generated cert/key.


# Review

The patch still applies and passes the test suite, both with openssl 
enabled and with it disabled.


As for the feature I agree that it is nice to expose this callback to 
extension writers and I agree with the approach taken. The other 
proposals up-thread seem over engineered to me. Maybe if it was a 
general feature used in many places those proposals would be worth it, 
but I am still skeptical even then. This approach is so much simpler.


The only real risk I see is that if people install multiple libraries 
for this they will overwrite the hook for each other but we have other 
cases like that already so I think that is fine.


The patch moves secure_initialize() to after 
process_shared_preload_libraries() which in theory could break some 
extension but it does not seem like a likely thing for extensions to 
rely on. Or is it?


An idea would be to have the code in ssl_passphrase_func.c to warn if 
the ssl_passphrase_command GUC is set to make it more useful as example 
code for people implementing this hook.


# Nitpicking

The certificate expires in 2030 while all other certificates used in 
tests expires in 2046. Should we be consistent?


There is text in server.crt and server.key, while other certificates and 
keys used in the tests do not have this. Again, should we be consistent?


Empty first line in 
src/test/modules/ssl_passphrase_callback/t/001_testfunc.pl which should 
probably just be removed or replaced with a shebang.


There is an extra space between the parentheses in the line below. Does 
that follow our code style for Perl?


+unless ( ($ENV{with_openssl} || 'no') eq 'yes')

Missing space after comma in:

+ok(-e "$ddir/postmaster.pid","postgres started");

Missing space after comma in:

+ok(! -e "$ddir/postmaster.pid","postgres not started with bad passphrase");

Andreas




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Amit Kapila
On Sun, Mar 15, 2020 at 9:17 PM Dilip Kumar  wrote:
>
> On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila  wrote:
> >
> >
> > 1. Group members wait for page locks.  If you test that the leader
> > acquires the page lock and then member also tries to acquire the same
> > lock on the same index, it wouldn't block before the patch, but after
> > the patch, the member should wait for the leader to release the lock.
>
> Okay, I will test this part.
>
> > 2. Try to hit Assert in LockAcquireExtended (a) by trying to
> > re-acquire the page lock via the debugger,
>
> I am not sure whether it is true or not,  Because, if we are holding
> the page lock and we try the same page lock then the lock will be
> granted without reaching the code path.  However, I agree that this is
> not intended instead this is a side effect of allowing relation
> extension lock while holding the same relation extension lock.  So
> basically, now the situation is that if the lock is directly granted
> because we are holding the same lock then it will not go to the assert
> code.  IMHO, we don't need to add extra code to make it behave
> differently.  Please let me know what is your opinion on this.
>

I also don't think there is any reason to add code to prevent that.
Actually, what I wanted to test was to somehow hit the Assert for the
cases where it will actually hit if someone tomorrow tries to acquire
any other type of lock.  Can we mimic such a situation by hacking code
(say try to acquire some other type of heavyweight lock) or in some
way to hit the newly added Assert?

> (b) try to acquire the
> > relation extension lock after page lock and it should be allowed
> > (after acquiring page lock, we take relation extension lock in
> > following code path:
> > ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer).
>
> ok
>

Thanks.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: add types to index storage params on doc

2020-03-15 Thread Fujii Masao




On 2020/03/16 11:09, Atsushi Torikoshi wrote:

Hi,

The current manual on CREATE TABLE[1] describes storage
parameters with their types.
But manual on CREATE INDEX[2] describes storage parameters
WITHOUT their types.

I think it'll be better to add types to storage parameters
on CREATE INDEX for the consistency.

Attached a patch.
Any thought?


Thanks for the patch! It basically looks good to me.

-buffering
+buffering (string)

Isn't it better to use "enum" rather than "string"?
In the docs about enum GUC parameters, "enum" is used there.

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: backend type in log_line_prefix?

2020-03-15 Thread Fujii Masao




On 2020/03/15 19:32, Peter Eisentraut wrote:

On 2020-03-13 22:24, Peter Eisentraut wrote:

On 2020-03-10 19:07, Alvaro Herrera wrote:

I like these patches; the first two are nice cleanup.

My only gripe is that pgstat_get_backend_desc() is not really a pgstat
function; I think it should have a different name with a prototype in
miscadmin.h (next to the enum's new location, which I would put
someplace near the "pmod.h" comment rather than where you put it;
perhaps just above the AuxProcType definition), and implementation
probably in miscinit.c.


I have committed the refactoring patches with adjustments along these
lines.  The patch with the log_line_prefix and csvlog enhancements is
still under discussion.


I have committed that last one also, after some corrections.


Thanks for adding this nice feature!

I have one comment; You seem to need to update file-fdw.sgml so that
pglog table in the doc should include backend_type column.

Regards,


--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters




Re: add types to index storage params on doc

2020-03-15 Thread Peter Geoghegan
On Sun, Mar 15, 2020 at 7:10 PM Atsushi Torikoshi  wrote:
> I think it'll be better to add types to storage parameters
> on CREATE INDEX for the consistency.

Seems reasonable to me.

-- 
Peter Geoghegan




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Masahiko Sawada
On Mon, 16 Mar 2020 at 00:54, Dilip Kumar  wrote:
>
> On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila  wrote:
> >
> > On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar  wrote:
> > >
> > > I have modified 0001 and 0002 slightly,  Basically, instead of two
> > > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created
> > > a one function.
> > >
> >
> > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value)
> >
> > Can we rename the parameter as lock_held, acquired or something like
> > that so that it indicates what it intends to do and probably add a
> > comment for that variable atop of function?
>
> Done
>

I've looked at the patches and ISTM these work as expected.
IsRelationExtensionLockHeld and IsPageLockHeld are used only when
assertion is enabled. So how about making CheckAndSetLockHeld work
only if USE_ASSERT_CHECKING to avoid overheads?

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Online checksums verification in the backend

2020-03-15 Thread Michael Paquier
On Wed, Mar 11, 2020 at 08:18:23AM +0100, Julien Rouhaud wrote:
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.

+ * - if a block is not found in shared_buffers, the LWLock is relased and the
+ *   block is read from disk without taking any lock.  If an error is detected,
+ *   the read block will be discarded and retrieved again while holding the
+ *   LWLock.  This is because an error due to concurrent write is possible but
+ *   very unlikely, so it's better to have an optimistic approach to limit
+ *   locking overhead
This can be risky with false positives, no?  With a large amount of
shared buffer eviction you actually increase the risk of torn page
reads.  Instead of a logic relying on partition mapping locks, which
could be unwise on performance grounds, did you consider different
approaches?  For example a kind of pre-emptive lock on the page in
storage to prevent any shared buffer operation to happen while the
block is read from storage, that would act like a barrier.

+ * Vacuum's GUCs are used to avoid consuming too much resources while running
+ * this tool.
Shouldn't this involve separate GUCs instead of the VACUUM ones?  I
guess that this leads to the fact that this function may be better as
a contrib module, with the addition of some better-suited APIs in core
(see paragraph above).

+Run
+make check
+or
+make installcheck
Why is installcheck mentioned here?

I don't think that it is appropriate to place the SQL-callable part in
the existing checksum.c.  I would suggest instead a new file, say
checksumfuncs.c in src/backend/utils/adt/, holding any SQL functions
for checksums.

-SUBDIRS = perl regress isolation modules authentication recovery
 subscription
+SUBDIRS = perl regress isolation modules authentication check_relation \
+ recovery subscription
It seems to me that this test would be a good fit for
src/test/modules/test_misc/.

+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+ ForkNumber forknum)
Per the argument of bloat, I think that I would remove
check_all_relation() as this function could take a very long time to
run, and just make the SQL function strict.

+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ *   disk either before the end of the next checkpoint or during recovery in
+ *   case of unsafe shutdown
Wouldn't it actually be a good thing to check that the page on storage
is fine in this case?  This depends on the system settings and the
checkpoint frequency, but checkpoint_timeout can be extended up to 1
day.  And plenty of things could happen to the storage in one day,
including a base backup that includes a corrupted page on storage,
that this function would not be able to detect.

+ * - if a block is otherwise found in shared_buffers, an IO lock is taken on
+ *   the block and the block is then read from storage, ignoring the block in
+ *   shared_buffers
Yeah, I think that you are right here to check the page on storage
anyway.

+ *   we detect if a block is in shared_buffers or not.  See get_buffer()
+ *   comments for more details about the locking strategy.
get_buffer() does not exist in your patch, check_get_buffer() does.

+ * - if a block is not found in shared_buffers, the LWLock is relased and the
[...]
+ * To avoid torn page and possible false postives when reading data, and
Typos.

+   if (!DataChecksumsEnabled())
+   elog(ERROR, "Data checksums are not enabled");
Note that elog() is for the class of errors which are never expected,
and here a caller of pg_check_relation() with checksums disabled can
trigger that.  So you need to call ereport() with
ERRCODE_FEATURE_NOT_SUPPORTED.

+ * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to
+ *   disk either before the end of the next checkpoint or during recovery in
+ *   case of unsafe shutdown
Not sure that the indentation is going to react well on that part of
the patch, perhaps it would be better to add some "/*---" at the
beginning and end of the comment block to tell pgindent to ignore this
part?

Based on the feedback gathered on this thread, I guess that you should
have a SRF returning the list of broken blocks, as well as NOTICE
messages.  Another thing to consider is the addition of a range
argument to only check a certain portion of the blocks, say one
segment file at a time, etc.  Fine by me to not include in the first
flavor of the patch.

The patch needs documentation.
--
Michael


signature.asc
Description: PGP signature


Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Masahiko Sawada
On Fri, 13 Mar 2020 at 05:11, David Rowley  wrote:
>
> On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada
>  wrote:
> >
> > On Thu, 12 Mar 2020 at 16:28, David Rowley  wrote:
> > > Laurenz highlighted a seemingly very valid reason that the current
> > > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
> > > use the current scale factor of 0.2, then we'll run an insert-only
> > > vacuum every 200 million rows. If those INSERTs are one per
> > > transaction then the new feature does nothing as the wraparound vacuum
> > > will run instead. Since this feature was born due to large insert-only
> > > tables, this concern seems very valid to me.
> >
> > Yeah, I understand and agree that since most people would use default
> > values we can reduce mis-configuration cases by adding separate GUCs
> > that have appropriate default values for that purpose but on the other
> > hand I'm not sure it's worth that we cover the large insert-only table
> > case by adding separate GUCs in spite of being able to cover it even
> > by existing two GUCs.
>
> In light of the case above, do you have an alternative suggestion?
>
> > If we want to disable this feature on the
> > particular table, we can have a storage parameter that means not to
> > consider the number of inserted tuples rather than having multiple
> > GUCs that allows us to fine tuning. And IIUC even in the above case, I
> > think that if we trigger insert-only vacuum by comparing the number of
> > inserted tuples to the threshold computed by existing threshold and
> > scale factor, we can cover it.
>
> So you're suggesting we drive the insert-vacuums from existing
> scale_factor and threshold?  What about the 1 billion row table
> example above?

My suggestion is the initial approach proposed by Justin; comparing
the number of inserted tuples to the threshold computed by
autovacuum_vacum_threshold and autovacuum_vacuum_scale_factor in order
to trigger autovacuum. But as discussed, there is a downside; if the
number of inserted tuples are almost the same as, but a little larger
than, the number of dead tuples, we will trigger insert-only vacuum
but it's wasteful.

There is already a consensus on introducing new 2 parameters, but as
the second idea I'd like to add one (or two) GUC(s) to my suggestion,
say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio
of the number of inserted tuples for total number of tuples modified
and inserted, in order to trigger insert-only vacuum. For example,
suppose the table has 1,000,000 tuples and we set threshold = 0,
scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger
normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but
we will instead trigger insert-only autovacuum, which is a vacuum with
vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000
* 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples
aggressively. If we want to trigger insert-only vacuum only on
insert-only table we can set freeze_insert_ratio = 1.0. The down side
of this idea is that we cannot disable autovacuum triggered by the
number of inserted, although we might be able to introduce more one
GUC that controls whether to include the number of inserted tuples for
triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert =
on|off). The pros of this idea would be that we can ensure that
insert-only vacuum will run only in the case where the ratio of
insertion is large enough.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Mon, Mar 16, 2020 at 8:57 AM Masahiko Sawada
 wrote:
>
> On Mon, 16 Mar 2020 at 00:54, Dilip Kumar  wrote:
> >
> > On Sun, Mar 15, 2020 at 6:20 PM Amit Kapila  wrote:
> > >
> > > On Sun, Mar 15, 2020 at 4:34 PM Dilip Kumar  wrote:
> > > >
> > > > I have modified 0001 and 0002 slightly,  Basically, instead of two
> > > > function CheckAndSetLockHeld and CheckAndReSetLockHeld, I have created
> > > > a one function.
> > > >
> > >
> > > +CheckAndSetLockHeld(LOCALLOCK *locallock, bool value)
> > >
> > > Can we rename the parameter as lock_held, acquired or something like
> > > that so that it indicates what it intends to do and probably add a
> > > comment for that variable atop of function?
> >
> > Done
> >
>
> I've looked at the patches and ISTM these work as expected.

Thanks for verifying.

> IsRelationExtensionLockHeld and IsPageLockHeld are used only when
> assertion is enabled. So how about making CheckAndSetLockHeld work
> only if USE_ASSERT_CHECKING to avoid overheads?

That makes sense to me so updated the patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v10-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-.patch
Description: Binary data


v10-0003-Allow-relation-extension-lock-to-conflict-among-.patch
Description: Binary data


v10-0004-Allow-page-lock-to-conflict-among-parallel-group.patch
Description: Binary data


v10-0002-Add-assert-to-ensure-that-page-locks-don-t-parti.patch
Description: Binary data


Re: allow online change primary_conninfo

2020-03-15 Thread Michael Paquier
On Fri, Mar 13, 2020 at 11:17:38AM -0300, Alvaro Herrera wrote:
> ... oh, that's exactly what 0002 does.  So patch 0001 is only about
> making a temporary change to the create_temp_slot to be consistent with
> existing policy, before changing the policy.

Yes.  In my opinion, patch 0002 should not change the GUC mode of
wal_receiver_create_temp_slot as the discussion here is about 
primary_conninfo, even if both may share some logic regarding WAL
receiver shutdown and its restart triggered by the startup process.

Patch 0001 has actually been presented on this thread first:
https://www.postgresql.org/message-id/753391579708...@iva3-77ae5995f07f.qloud-c.yandex.net
And there is an independent patch registered in this CF:
https://commitfest.postgresql.org/27/2456/

Should we add patch 0001 as an open item for v13 as there is a risk of
forgetting this issue?  I have created a wiki blank page a couple of
weeks back:
https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items
--
Michael


signature.asc
Description: PGP signature


Re: effective_io_concurrency's steampunk spindle maths

2020-03-15 Thread Thomas Munro
On Tue, Mar 10, 2020 at 12:20 PM Thomas Munro  wrote:
> Here's a patch set to remove the spindle stuff, add a maintenance
> variant, and use the maintenance one in
> heap_compute_xid_horizon_for_tuples().

Pushed.




Re: Expose lock group leader pid in pg_stat_activity

2020-03-15 Thread Justin Pryzby
On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> So, AFAICT the LockHashPartitionLockByProc is required when
> iterating/modifying lockGroupMembers or lockGroupLink, but just
> getting the leader pid should be safe.

This still seems unsafe:

git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
+   /*
+* If a PGPROC entry was retrieved, display wait events 
and lock
+* group leader information if any.  To avoid extra 
overhead, no
+* extra lock is being held, so there is no guarantee of
+* consistency across multiple rows.
+*/
...
+   PGPROC *leader;
...
+   leader = proc->lockGroupLeader;
+   if (leader)
+   {
# does something guarantee that leader doesn't change ?
+   values[29] = Int32GetDatum(leader->pid);
+   nulls[29] = false;
}

Michael seems to have raised the issue:

On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> And actually, the way you are looking at the leader's PID is visibly
> incorrect and inconsistent because the patch takes no shared LWLock on
> the leader using LockHashPartitionLockByProc() followed by
> LWLockAcquire(), no?  That's incorrect because it could be perfectly
> possible to crash with this code between the moment you check if 
> lockGroupLeader is NULL and the moment you look at
> lockGroupLeader->pid if a process is being stopped in-between and
> removes itself from a lock group in ProcKill().  

But I don't see how it was addressed ?

I read this:

src/backend/storage/lmgr/lock.c: * completely valid.  We cannot safely 
dereference another backend's
src/backend/storage/lmgr/lock.c- * lockGroupLeader field without 
holding all lock partition locks, and
src/backend/storage/lmgr/lock.c- * it's not worth that.)

I think if you do:
|LWLockAcquire(&proc->backendLock, LW_SHARED);
..then it's at least *safe* to access leader->pid, but it may be inconsistent
unless you also call LockHashPartitionLockByProc.

I wasn't able to produce a crash, so maybe I missed something.

-- 
Justin




Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Justin Pryzby
On Fri, Mar 13, 2020 at 10:48:27PM +0100, Laurenz Albe wrote:
> On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote:
> > Possible it would be better to run VACUUM *without* freeze_min_age=0 ?  (I 
> > get
> > confused and have to spend 20min re-reading the vacuum GUC docs every time I
> > deal with this stuff, so maybe I'm off).
> > 
> > As I understand, the initial motivation of this patch was to avoid 
> > disruptive
> > anti-wraparound vacuums on insert-only table.  But if vacuum were triggered 
> > at
> > all, it would freeze the oldest tuples, which is all that's needed; 
> > especially
> > since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never 
> > need
> > to be vacuumed again.  Recently written tuples wouldn't be frozen, which is 
> > ok,
> > they're handled next time.
> 
> Freezing tuples too early is wasteful if the tuples get updated or deleted
> soon after, but based on the assumption that an autovacuum triggered by insert
> is dealing with an insert-mostly table, it is not that wasteful.

You're right that it's not *that* wasteful.  If it's a table that gets 90%
inserts/10% updates, then only 10% of its tuples will be frozen.  In the worst
case, it's the same tuples every time, and that's somewhat wasteful.  In the
best case, those tuples are clustered on a small number of pages.

-- 
Justin




Re: Online checksums verification in the backend

2020-03-15 Thread Masahiko Sawada
On Wed, 11 Mar 2020 at 16:18, Julien Rouhaud  wrote:
>
> On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> > On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier  wrote:
> > >
> > > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas  
> > > > wrote:
> > > >> Some people might prefer notices, because you can get those while the
> > > >> thing is still running, rather than a result set, which you will only
> > > >> see when the query finishes. Other people might prefer an SRF, because
> > > >> they want to have the data in structured form so that they can
> > > >> postprocess it. Not sure what you mean by "more globally."
> > > >
> > > > I meant having the results available system-wide, not only to the
> > > > caller.  I think that emitting a log/notice level should always be
> > > > done on top on whatever other communication facility we're using.
> > >
> > > The problem of notice and logs is that they tend to be ignored.  Now I
> > > don't see no problems either in adding something into the logs which
> > > can be found later on for parsing on top of a SRF returned by the
> > > caller which includes all the corruption details, say with pgbadger
> > > or your friendly neighborhood grep.  I think that any backend function
> > > should also make sure to call pgstat_report_checksum_failure() to
> > > report a report visible at database-level in the catalogs, so as it is
> > > possible to use that as a cheap high-level warning.  The details of
> > > the failures could always be dug from the logs or the result of the
> > > function itself after finding out that something is wrong in
> > > pg_stat_database.
> >
> > I agree that adding extra information in the logs and calling
> > pgstat_report_checksum_failure is a must do, and I changed that
> > locally.  However, I doubt that the logs is the right place to find
> > the details of corrupted blocks.  There's no guarantee that the file
> > will be accessible to the DBA, nor that the content won't get
> > truncated by the time it's needed.  I really think that corruption is
> > important enough to justify more specific location.
>
>
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.

In addition to comments from Michael-san, here are my comments:

1.
+   if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+   ereport(ERROR,
+   (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+errmsg("only superuser or a member of the
pg_read_server_files role may use this function")));
+
+   if (!DataChecksumsEnabled())
+   elog(ERROR, "Data checksums are not enabled");

I think it's better to reverse the order of the above checks.

2.
+#define CRF_COLS   5   /* Number of output arguments in the SRF */

Should it be SRF_COLS?

3.
+static void
+check_delay_point(void)
+{
+   /* Always check for interrupts */
+   CHECK_FOR_INTERRUPTS();
+
+   /* Nap if appropriate */
+   if (!InterruptPending && VacuumCostBalance >= VacuumCostLimit)
+   {
+   int msec;
+
+   msec = VacuumCostDelay * VacuumCostBalance / VacuumCostLimit;
+   if (msec > VacuumCostDelay * 4)
+   msec = VacuumCostDelay * 4;
+
+   pg_usleep(msec * 1000L);
+
+   VacuumCostBalance = 0;
+
+   /* Might have gotten an interrupt while sleeping */
+   CHECK_FOR_INTERRUPTS();
+   }
+}

Even if we use vacuum delay for this function, I think we need to set
VacuumDelayActive and return if it's false, or it's better to just
return if VacuumCostDelay == 0.

4.
+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+ ForkNumber forknum)

I also agree with Michael-san to remove this function. Instead we can
check all relations by:

select pg_check_relation(oid) from pg_class;

6.
Other typos

s/dirted/dirtied/
s/explictly/explicitly/

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: backup manifests

2020-03-15 Thread Suraj Kharage
Thank you, Robert.

Getting below warning while compiling the
v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.



*pg_validatebackup.c: In function
‘report_manifest_error’:pg_validatebackup.c:356:2: warning: function might
be possible candidate for ‘gnu_printf’ format attribute
[-Wsuggest-attribute=format]  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);*


To resolve this, can we use "pg_attribute_printf(2, 3)" in function
declaration something like below?
e.g:

diff --git a/src/bin/pg_validatebackup/parse_manifest.h
b/src/bin/pg_validatebackup/parse_manifest.h
index b0b18a5..25d140f 100644
--- a/src/bin/pg_validatebackup/parse_manifest.h
+++ b/src/bin/pg_validatebackup/parse_manifest.h
@@ -25,7 +25,7 @@ typedef void
(*json_manifest_perfile_callback)(JsonManifestParseContext *,
 size_t
size, pg_checksum_type checksum_type,
 int
checksum_length, uint8 *checksum_payload);
 typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
-char *fmt,
...);
+char
*fmt,...) pg_attribute_printf(2, 3);

 struct JsonManifestParseContext
 {
diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c
b/src/bin/pg_validatebackup/pg_validatebackup.c
index 0e7299b..6ccbe59 100644
--- a/src/bin/pg_validatebackup/pg_validatebackup.c
+++ b/src/bin/pg_validatebackup/pg_validatebackup.c
@@ -95,7 +95,7 @@ static void
record_manifest_details_for_file(JsonManifestParseContext *context,

 int checksum_length,

 uint8 *checksum_payload);
 static void report_manifest_error(JsonManifestParseContext *context,
- char
*fmt, ...);
+ char
*fmt,...) pg_attribute_printf(2, 3);

 static void validate_backup_directory(validator_context *context,

char *relpath, char *fullpath);


Typos:

0004 patch
unexpctedly => unexpectedly

0005 patch
bacup => backup

On Sat, Mar 14, 2020 at 2:04 AM Robert Haas  wrote:

> On Thu, Mar 12, 2020 at 10:47 AM tushar 
> wrote:
> > On 3/9/20 10:46 PM, Robert Haas wrote:
> > > Seems like expected behavior to me. We could consider providing a more
> > > descriptive error message, but there's now way for it to work.
> >
> > Right , Error message need to be more user friendly .
>
> OK. Done in the attached version, which also includes a few other changes:
>
> - I expanded the regression tests. They now cover every line of code
> in parse_manifest.c except for a few that I believe to be unreachable
> (though I might be mistaken). Coverage for pg_validatebackup.c is also
> improved, but it's not 100%; there are some cases that I don't know
> how to hit outside of a kernel malfunction, and others that I only
> know how to hit on non-Windows systems. For instance, it's easy to use
> perl to make a file inaccessible on Linux with chmod(0, $filename),
> but I gather that doesn't work on Windows. I'm going to spend a bit
> more time looking at this, but I think it's already reasonably good.
>
> - I fixed a couple of very minor bugs which I discovered by writing those
> tests.
>
> - I added documentation, in part based on a draft Mark Dilger shared
> with me off-list.
>
> I don't think this is committable just yet, but I think it's getting
> fairly close, so if anyone has major objections please speak up soon.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Dilip Kumar
On Mon, Mar 16, 2020 at 8:15 AM Amit Kapila  wrote:
>
> On Sun, Mar 15, 2020 at 9:17 PM Dilip Kumar  wrote:
> >
> > On Sun, Mar 15, 2020 at 5:58 PM Amit Kapila  wrote:
> > >
> > >
> > > 1. Group members wait for page locks.  If you test that the leader
> > > acquires the page lock and then member also tries to acquire the same
> > > lock on the same index, it wouldn't block before the patch, but after
> > > the patch, the member should wait for the leader to release the lock.
> >
> > Okay, I will test this part.
> >
> > > 2. Try to hit Assert in LockAcquireExtended (a) by trying to
> > > re-acquire the page lock via the debugger,
> >
> > I am not sure whether it is true or not,  Because, if we are holding
> > the page lock and we try the same page lock then the lock will be
> > granted without reaching the code path.  However, I agree that this is
> > not intended instead this is a side effect of allowing relation
> > extension lock while holding the same relation extension lock.  So
> > basically, now the situation is that if the lock is directly granted
> > because we are holding the same lock then it will not go to the assert
> > code.  IMHO, we don't need to add extra code to make it behave
> > differently.  Please let me know what is your opinion on this.
> >
>
> I also don't think there is any reason to add code to prevent that.
> Actually, what I wanted to test was to somehow hit the Assert for the
> cases where it will actually hit if someone tomorrow tries to acquire
> any other type of lock.  Can we mimic such a situation by hacking code
> (say try to acquire some other type of heavyweight lock) or in some
> way to hit the newly added Assert?

I have hacked the code by calling another heavyweight lock and the
assert is hit.

>
> > (b) try to acquire the
> > > relation extension lock after page lock and it should be allowed
> > > (after acquiring page lock, we take relation extension lock in
> > > following code path:
> > > ginInsertCleanup->ginEntryInsert->ginFindLeafPage->ginPlaceToPage->GinNewBuffer).

I have tested this part and it works as expected i.e. assert is not hit.

--test case
create table gin_test_tbl(i int4[]) with (autovacuum_enabled = off);
create index gin_test_idx on gin_test_tbl using gin (i);
insert into gin_test_tbl select array[1, 2, g] from generate_series(1, 2) g;
select gin_clean_pending_list('gin_test_idx');

BTW, this test is already covered by the existing gin.sql file so we
don't need to add any new test.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Refactor compile-time assertion checks for C/C++

2020-03-15 Thread Michael Paquier
On Fri, Mar 13, 2020 at 11:00:33AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
>> Hmm.  v3 actually broke the C++ fallback of StaticAssertExpr() and
>> StaticAssertStmt() (v1 did not), a simple fix being something like
>> the attached.
> 
> The buildfarm seems happy, so why do you think it's broken?

Extensions like the attached don't appreciate it, and we have nothing
like that in core.  Perhaps we could, but it is not really appealing
for all platforms willing to run the tests to require CXX or such..

> If we do need to change it, I'd be inclined to just use the do{}
> block everywhere, not bothering with the extra #if test.

Not sure what you mean here because we cannot use the do{} flavor
either for the C fallback, no?  See for example the definitions of
unconstify() in c.h.
--
Michael


blackhole_cplusplus.tar.gz
Description: application/gzip


signature.asc
Description: PGP signature


Re: Expose lock group leader pid in pg_stat_activity

2020-03-15 Thread Justin Pryzby
On Sun, Mar 15, 2020 at 11:27:52PM -0500, Justin Pryzby wrote:
> On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> > So, AFAICT the LockHashPartitionLockByProc is required when
> > iterating/modifying lockGroupMembers or lockGroupLink, but just
> > getting the leader pid should be safe.
> 
> This still seems unsafe:
> 
> git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
> +   /*
> +* If a PGPROC entry was retrieved, display wait 
> events and lock
> +* group leader information if any.  To avoid extra 
> overhead, no
> +* extra lock is being held, so there is no guarantee 
> of
> +* consistency across multiple rows.
> +*/
> ...
> +   PGPROC *leader;
> ...
> +   leader = proc->lockGroupLeader;
> +   if (leader)
> +   {
> # does something guarantee that leader doesn't change ?
> +   values[29] = 
> Int32GetDatum(leader->pid);
> +   nulls[29] = false;
> }
> 
> Michael seems to have raised the issue:
> 
> On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> > And actually, the way you are looking at the leader's PID is visibly
> > incorrect and inconsistent because the patch takes no shared LWLock on
> > the leader using LockHashPartitionLockByProc() followed by
> > LWLockAcquire(), no?  That's incorrect because it could be perfectly
> > possible to crash with this code between the moment you check if 
> > lockGroupLeader is NULL and the moment you look at
> > lockGroupLeader->pid if a process is being stopped in-between and
> > removes itself from a lock group in ProcKill().  
> 
> But I don't see how it was addressed ?
> 
> I read this:
> 
> src/backend/storage/lmgr/lock.c: * completely valid.  We cannot 
> safely dereference another backend's
> src/backend/storage/lmgr/lock.c- * lockGroupLeader field without 
> holding all lock partition locks, and
> src/backend/storage/lmgr/lock.c- * it's not worth that.)
> 
> I think if you do:
> |LWLockAcquire(&proc->backendLock, LW_SHARED);
> ..then it's at least *safe* to access leader->pid, but it may be inconsistent
> unless you also call LockHashPartitionLockByProc.
> 
> I wasn't able to produce a crash, so maybe I missed something.

I think I see.  Julien's v3 patch did this:
https://www.postgresql.org/message-id/attachment/106429/pgsa_leader_pid-v3.diff
+   if (proc->lockGroupLeader)
+   values[29] = 
Int32GetDatum(proc->lockGroupLeader->pid);

..which is racy because a proc with a leader might die and be replaced by
another proc without a leader between 1 and 2.

But the code since v4 does:

+   leader = proc->lockGroupLeader;
+   if (leader)
+   values[29] = Int32GetDatum(leader->pid);

..which is safe because PROCs are allocated in shared memory, so leader is for
sure a non-NULL pointer to a PROC.  leader->pid may be read inconsistently,
which is what the comment says: "no extra lock is being held".

-- 
Justin




Re: Expose lock group leader pid in pg_stat_activity

2020-03-15 Thread Michael Paquier
On Mon, Mar 16, 2020 at 12:43:41AM -0500, Justin Pryzby wrote:
> I think I see.  Julien's v3 patch did this:
> https://www.postgresql.org/message-id/attachment/106429/pgsa_leader_pid-v3.diff
> + if (proc->lockGroupLeader)
> + values[29] = 
> Int32GetDatum(proc->lockGroupLeader->pid);
> 
> ..which is racy because a proc with a leader might die and be replaced by
> another proc without a leader between 1 and 2.
> 
> But the code since v4 does:
> 
> + leader = proc->lockGroupLeader;
> + if (leader)
> +   values[29] = 
> Int32GetDatum(leader->pid);
> 
> ..which is safe because PROCs are allocated in shared memory, so leader is for
> sure a non-NULL pointer to a PROC.  leader->pid may be read inconsistently,
> which is what the comment says: "no extra lock is being held".

Yes, you have the correct answer here.  As shaped, the code relies on
the state of a PGPROC entry in shared memory.
--
Michael


signature.asc
Description: PGP signature


Re: backup manifests

2020-03-15 Thread Suraj Kharage
One more suggestion, recent commit (1933ae62) has added the PostgreSQL home
page to --help output.

e.g:
*PostgreSQL home page: >*

We might need to consider this change for pg_validatebackup binary.

On Mon, Mar 16, 2020 at 10:37 AM Suraj Kharage <
suraj.khar...@enterprisedb.com> wrote:

> Thank you, Robert.
>
> Getting below warning while compiling the
> v11-0003-pg_validatebackup-Validate-a-backup-against-the-.patch.
>
>
>
> *pg_validatebackup.c: In function
> ‘report_manifest_error’:pg_validatebackup.c:356:2: warning: function might
> be possible candidate for ‘gnu_printf’ format attribute
> [-Wsuggest-attribute=format]  pg_log_generic_v(PG_LOG_FATAL, fmt, ap);*
>
>
> To resolve this, can we use "pg_attribute_printf(2, 3)" in function
> declaration something like below?
> e.g:
>
> diff --git a/src/bin/pg_validatebackup/parse_manifest.h
> b/src/bin/pg_validatebackup/parse_manifest.h
> index b0b18a5..25d140f 100644
> --- a/src/bin/pg_validatebackup/parse_manifest.h
> +++ b/src/bin/pg_validatebackup/parse_manifest.h
> @@ -25,7 +25,7 @@ typedef void
> (*json_manifest_perfile_callback)(JsonManifestParseContext *,
>  size_t
> size, pg_checksum_type checksum_type,
>  int
> checksum_length, uint8 *checksum_payload);
>  typedef void (*json_manifest_error_callback)(JsonManifestParseContext *,
> -char
> *fmt, ...);
> +char
> *fmt,...) pg_attribute_printf(2, 3);
>
>  struct JsonManifestParseContext
>  {
> diff --git a/src/bin/pg_validatebackup/pg_validatebackup.c
> b/src/bin/pg_validatebackup/pg_validatebackup.c
> index 0e7299b..6ccbe59 100644
> --- a/src/bin/pg_validatebackup/pg_validatebackup.c
> +++ b/src/bin/pg_validatebackup/pg_validatebackup.c
> @@ -95,7 +95,7 @@ static void
> record_manifest_details_for_file(JsonManifestParseContext *context,
>
>int checksum_length,
>
>uint8 *checksum_payload);
>  static void report_manifest_error(JsonManifestParseContext *context,
> - char
> *fmt, ...);
> + char
> *fmt,...) pg_attribute_printf(2, 3);
>
>  static void validate_backup_directory(validator_context *context,
>
> char *relpath, char *fullpath);
>
>
> Typos:
>
> 0004 patch
> unexpctedly => unexpectedly
>
> 0005 patch
> bacup => backup
>
> On Sat, Mar 14, 2020 at 2:04 AM Robert Haas  wrote:
>
>> On Thu, Mar 12, 2020 at 10:47 AM tushar 
>> wrote:
>> > On 3/9/20 10:46 PM, Robert Haas wrote:
>> > > Seems like expected behavior to me. We could consider providing a more
>> > > descriptive error message, but there's now way for it to work.
>> >
>> > Right , Error message need to be more user friendly .
>>
>> OK. Done in the attached version, which also includes a few other changes:
>>
>> - I expanded the regression tests. They now cover every line of code
>> in parse_manifest.c except for a few that I believe to be unreachable
>> (though I might be mistaken). Coverage for pg_validatebackup.c is also
>> improved, but it's not 100%; there are some cases that I don't know
>> how to hit outside of a kernel malfunction, and others that I only
>> know how to hit on non-Windows systems. For instance, it's easy to use
>> perl to make a file inaccessible on Linux with chmod(0, $filename),
>> but I gather that doesn't work on Windows. I'm going to spend a bit
>> more time looking at this, but I think it's already reasonably good.
>>
>> - I fixed a couple of very minor bugs which I discovered by writing those
>> tests.
>>
>> - I added documentation, in part based on a draft Mark Dilger shared
>> with me off-list.
>>
>> I don't think this is committable just yet, but I think it's getting
>> fairly close, so if anyone has major objections please speak up soon.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>


-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.


Re: WIP/PoC for parallel backup

2020-03-15 Thread Rajkumar Raghuwanshi
Thanks for the patches.

I have verified reported issues with new patches, issues are fixed now.

I got another observation where If a new slot name given without -C option,
it leads to server crash error.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"

Thanks & Regards,
Rajkumar Raghuwanshi


On Fri, Mar 13, 2020 at 9:51 PM Asif Rehman  wrote:

>
> On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> wrote:
>
>> Hi Asif
>>
>> I have started testing this feature. I have applied v6 patch on commit
>> a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
>> I got few observations, please take a look.
>>
>> *--if backup failed, backup directory is not getting removed.*
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D
>> /tmp/test_bkp/bkp6
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D
>> /tmp/test_bkp/bkp6
>> pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not
>> empty
>>
>>
>> *--giving large number of jobs leading segmentation fault.*
>> ./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> .
>> .
>> .
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: FATAL:  number of
>> requested standby connections exceeds max_wal_senders (currently 10)
>> pg_basebackup: error: could not connect to server: could not fork new
>> process for connection: Resource temporarily unavailable
>>
>> could not fork new process for connection: Resource temporarily
>> unavailable
>> pg_basebackup: error: failed to create thread: Resource temporarily
>> unavailable
>> Segmentation fault (core dumped)
>>
>> --stack-trace
>> gdb -q -c core.11824 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D
>> /tmp/test_bkp/bkp10'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
>> pthread_join.c:46
>> 46  if (INVALID_NOT_TERMINATED_TD_P (pd))
>> Missing separate debuginfos, use: debuginfo-install
>> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
>> libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
>> openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0  pthread_join (threadid=140503120623360, thread_return=0x0) at
>> pthread_join.c:46
>> #1  0x00408e21 in cleanup_workers () at pg_basebackup.c:2840
>> #2  0x00403846 in disconnect_atexit () at pg_basebackup.c:316
>> #3  0x003921235a02 in __run_exit_handlers (status=1) at exit.c:78
>> #4  exit (status=1) at exit.c:100
>> #5  0x00408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0)
>> at pg_basebackup.c:2713
>> #6  0x00407946 in BaseBackup () at pg_basebackup.c:2127
>> #7  0x0040895c in main (argc=6, argv=0x7ffd566f4718) at
>> pg_basebackup.c:2668
>>
>>
>> *--with tablespace is in the same directory as data, parallel_backup
>> crashed*
>> [edb@localhost bin]$ ./initdb -D /tmp/data
>> [edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
>> [edb@localhost bin]$ mkdir /tmp/ts
>> [edb@localhost bin]$ ./psql postgres
>> psql (13devel)
>> Type "help" for help.
>>
>> postgres=# create tablespace ts location '/tmp/ts';
>> CREATE TABLESPACE
>> postgres=# create table tx (a int) tablespace ts;
>> CREATE TABLE
>> postgres=# \q
>> [edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
>> Segmentation fault (core dumped)
>>
>> --stack-trace
>> [edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
>> Loaded symbols for /lib64/libnss_files.so.2
>> Core was generated 

Re: error context for vacuum to include block number

2020-03-15 Thread Amit Kapila
On Thu, Mar 5, 2020 at 3:22 AM Justin Pryzby  wrote:
>
> On Wed, Mar 04, 2020 at 04:21:06PM +0900, Masahiko Sawada wrote:
> > Thank you for updating the patch. But we have two more places where we
> > do fsm vacuum.
>
> Oops, thanks.
>
> I realized that vacuum_page is called not only from lazy_vacuum_heap, but also
> directly from lazy_scan_heap, which failed to update errcbarg.  So I changed 
> to
> update errcbarg in vacuum_page.
>
> What about these other calls ?  I think granularity of individual function
> calls requires a debugger, but is it fine issue if errors here are attributed
> to (say) "scanning heap" ?
>
> GetRecordedFreeSpace
> heap_*_freeze_tuple
> heap_page_prune
> HeapTupleSatisfiesVacuum
> LockBufferForCleanup
> MarkBufferDirty
> Page*AllVisible
> PageGetHeapFreeSpace
> RecordPageWithFreeSpace
> visibilitymap_*
> VM_ALL_FROZEN
>

I think we can keep granularity the same as we have for progress
update functionality which means "scanning heap" is fine.  On similar
lines, it is not clear whether it is a good idea to keep a phase like
VACUUM_ERRCB_PHASE_VACUUM_FSM as it has added additional updates in
multiple places in the code.

Few other comments:
1.
+ /* Init vacrelstats for use as error callback by parallel worker: */
+ vacrelstats.relnamespace = get_namespace_name(RelationGetNamespace(onerel));

It looks a bit odd that the comment is ended with semicolon (:), is
there any reason for same?

2.
+ /* Setup error traceback support for ereport() */
+ update_vacuum_error_cbarg(vacrelstats, VACUUM_ERRCB_PHASE_SCAN_HEAP,
+ InvalidBlockNumber, NULL);
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
..
..
+ /* Init vacrelstats for use as error callback by parallel worker: */
+ vacrelstats.relnamespace = get_namespace_name(RelationGetNamespace(onerel));
+ vacrelstats.relname = pstrdup(RelationGetRelationName(onerel));
+ vacrelstats.indname = NULL;
+ vacrelstats.phase = VACUUM_ERRCB_PHASE_UNKNOWN; /* Not yet processing */
+
+ /* Setup error traceback support for ereport() */
+ errcallback.callback = vacuum_error_callback;
+ errcallback.arg = &vacrelstats;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+

I think the code can be bit simplified if we have a function
setup_vacuum_error_ctx which takes necessary parameters and fill the
required vacrelstats params, setup errcallback.  Then we can use
update_vacuum_error_cbarg at required places.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: WIP/PoC for parallel backup

2020-03-15 Thread Asif Rehman
On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Thanks for the patches.
>
> I have verified reported issues with new patches, issues are fixed now.
>
> I got another observation where If a new slot name given without -C
> option, it leads to server crash error.
>
> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
> test_bkp_slot
> pg_basebackup: error: could not send replication command
> "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
> pg_basebackup: error: could not list backup files: server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_basebackup: removing data directory "/tmp/bkp"
>

It seems to be an expected behavior. The START_BACKUP command has been
executed, and
pg_basebackup tries to start a WAL streaming process with a non-existent
slot, which results in
an error. So the backup is aborted while terminating all other processes.


--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca


Re: [Proposal] Global temporary tables

2020-03-15 Thread Prabhat Sahu
Hi Wenjing,
Please check the below scenario, where the Foreign table on GTT not showing
records.

postgres=# create extension postgres_fdw;
CREATE EXTENSION
postgres=# do $d$
begin
execute $$create server fdw foreign data wrapper postgres_fdw
options (host 'localhost',dbname 'postgres',port
'$$||current_setting('port')||$$')$$;
end;
$d$;
DO
postgres=# create user mapping for public server fdw;
CREATE USER MAPPING

postgres=# create table lt1 (c1 integer, c2 varchar(50));
CREATE TABLE
postgres=# insert into lt1 values (1,'c21');
INSERT 0 1
postgres=# create foreign table ft1 (c1 integer, c2 varchar(50)) server fdw
options (table_name 'lt1');
CREATE FOREIGN TABLE
postgres=# select * from ft1;
 c1 | c2
+-
  1 | c21
(1 row)

postgres=# create global temporary table gtt1 (c1 integer, c2 varchar(50));
CREATE TABLE
postgres=# insert into gtt1 values (1,'gtt_c21');
INSERT 0 1
postgres=# create foreign table f_gtt1 (c1 integer, c2 varchar(50)) server
fdw options (table_name 'gtt1');
CREATE FOREIGN TABLE

postgres=# select * from gtt1;
 c1 |   c2
+-
  1 | gtt_c21
(1 row)

postgres=# select * from f_gtt1;
 c1 | c2
+
(0 rows)

-- 

With Regards,
Prabhat Kumar Sahu
EnterpriseDB: http://www.enterprisedb.com


Re: WIP/PoC for parallel backup

2020-03-15 Thread Rajkumar Raghuwanshi
On Mon, Mar 16, 2020 at 11:52 AM Asif Rehman  wrote:

>
>
> On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> wrote:
>
>> Thanks for the patches.
>>
>> I have verified reported issues with new patches, issues are fixed now.
>>
>> I got another observation where If a new slot name given without -C
>> option, it leads to server crash error.
>>
>> [edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot
>> test_bkp_slot
>> pg_basebackup: error: could not send replication command
>> "START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
>> pg_basebackup: error: could not list backup files: server closed the
>> connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> pg_basebackup: removing data directory "/tmp/bkp"
>>
>
> It seems to be an expected behavior. The START_BACKUP command has been
> executed, and
> pg_basebackup tries to start a WAL streaming process with a non-existent
> slot, which results in
> an error. So the backup is aborted while terminating all other processes.
>
I think error message can be improved. current error message looks like
database server is crashed.

on PG same is existing with exit 1.
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp --slot
test_bkp_slot
pg_basebackup: error: could not send replication command
"START_REPLICATION": ERROR:  replication slot "test_bkp_slot" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bkp"


>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>


Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2020-03-15 Thread Kuntal Ghosh
On Mon, Mar 16, 2020 at 9:43 AM Dilip Kumar  wrote:
> On Mon, Mar 16, 2020 at 8:57 AM Masahiko Sawada
>  wrote:
> > IsRelationExtensionLockHeld and IsPageLockHeld are used only when
> > assertion is enabled. So how about making CheckAndSetLockHeld work
> > only if USE_ASSERT_CHECKING to avoid overheads?
>
> That makes sense to me so updated the patch.
+1

In v10-0001-Assert-that-we-don-t-acquire-a-heavyweight-lock-.patch,

+ * Indicate that the lock is released for a particular type of locks.
s/lock is/locks are

+ /* Indicate that the lock is acquired for a certain type of locks. */
s/lock is/locks are

In v10-0002-*.patch,

+ * Flag to indicate if the page lock is held by this backend.  We don't
+ * acquire any other heavyweight lock while holding the page lock except for
+ * relation extension.  However, these locks are never taken in reverse order
+ * which implies that page locks will also never participate in the deadlock
+ * cycle.
s/while holding the page lock except for relation extension/while
holding the page lock except for relation extension and page lock

+ * We don't acquire any other heavyweight lock while holding the page lock
+ * except for relation extension lock.
Same as above

Other than that, the patches look good to me. I've also done some
testing after applying the Test-group-deadlock patch provided by Amit
earlier in the thread. It works as expected.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com