Re: RFC: Logging plan of the running query

2022-03-09 Thread torikoshia

On 2022-02-08 01:13, Fujii Masao wrote:

AbortSubTransaction() should reset ActiveQueryDesc to
save_ActiveQueryDesc that ExecutorRun() set, instead of NULL?
Otherwise ActiveQueryDesc of top-level statement will be unavailable
after subtransaction is aborted in the nested statements.


I once agreed above suggestion and made v20 patch making 
save_ActiveQueryDesc a global variable, but it caused segfault when 
calling pg_log_query_plan() after FreeQueryDesc().


OTOH, doing some kind of reset of ActiveQueryDesc seems necessary since 
it also caused segfault when running pg_log_query_plan() during 
installcheck.


There may be a better way, but resetting ActiveQueryDesc to NULL seems 
safe and simple.
Of course it makes pg_log_query_plan() useless after a subtransaction is 
aborted.
However, if it does not often happen that people want to know the 
running query's plan whose subtransaction is aborted, resetting 
ActiveQueryDesc to NULL would be acceptable.


Attached is a patch that sets ActiveQueryDesc to NULL when a 
subtransaction is aborted.


How do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 5be784278e8e7aeeeadf60a772afccda7b59e6e4 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 9 Mar 2022 18:18:06 +0900
Subject: [PATCH v21] Add function to log the plan of the query currently
 running on the backend.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat
---
 doc/src/sgml/func.sgml   |  49 +++
 src/backend/access/transam/xact.c|  13 ++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 140 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/tcop/pquery.h|   2 +
 src/test/regress/expected/misc_functions.out |  54 +--
 src/test/regress/sql/misc_functions.sql  |  41 --
 17 files changed, 314 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8a802fb225..075056 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25461,6 +25461,25 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID.
+It will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+
+  
+
   

 
@@ -25574,6 +25593,36 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  query plan running on backend with PID 201116 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that when statements are executed inside a function, only the
+plan of the most deeply nested query is logged.

Re: Is it useful to record whether plans are generic or custom?

2020-12-03 Thread torikoshia

On 2020-12-04 14:29, Fujii Masao wrote:

On 2020/11/30 15:24, Tatsuro Yamada wrote:

Hi Torikoshi-san,



In this patch, exposing new columns is mandatory, but I think
it's better to make it optional by adding a GUC something
like 'pgss.track_general_custom_plans.

I also feel it makes the number of columns too many.
Just adding the total time may be sufficient.



I think this feature is useful for DBA. So I hope that it gets
committed to PG14. IMHO, many columns are Okay because DBA can
select specific columns by their query.
Therefore, it would be better to go with the current design.


But that design may waste lots of memory. No? For example, when
plan_cache_mode=force_custom_plan, the memory used for the columns
for generic plans is not used.



Yeah.

ISTM now that creating pg_stat_statements_xxx views
both for generic andcustom plans is better than my PoC patch.

And I'm also struggling with the following.

| However, I also began to wonder how effective it would be to just
| distinguish between generic and custom plans.  Custom plans can
| include all sorts of plans. and thinking cache validation, generic
| plans can also include various plans.

| Considering this, I'm starting to feel that it would be better to
| not just keeping whether generic or cutom but the plan itself as
| discussed in the below thread.


Yamada-san,

Do you think it's effective just distinguish between generic
and custom plans?

Regards,




Re: Get memory contexts of an arbitrary backend process

2020-12-04 Thread torikoshia

On 2020-12-03 10:36, Tom Lane wrote:

Fujii Masao  writes:
I'm starting to study how this feature behaves. At first, when I 
executed

the following query, the function never returned. ISTM that since
the autovacuum launcher cannot respond to the request of memory
contexts dump, the function keeps waiting infinity. Is this a bug?
Probably we should exclude non-backend proceses from the target
processes to dump? Sorry if this was already discussed.



 SELECT pg_get_backend_memory_contexts(pid) FROM pg_stat_activity;


Thanks for trying it!

It was not discussed explicitly, and I was going to do it later
as commented.


+   /* TODO: Check also whether backend or not. */




FWIW, I think this patch is fundamentally unsafe.  It's got a
lot of the same problems that I complained about w.r.t. the
nearby proposal to allow cross-backend stack trace dumping.
It does avoid the trap of thinking that it can do work in
a signal handler, but instead it supposes that it can do
work involving very high-level objects such as shared hash tables
in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
never going to be safe: the only real expectation the system
has is that CHECK_FOR_INTERRUPTS is called at places where our
state is sane enough that a transaction abort can clean up.
Trying to do things like taking LWLocks is going to lead to
deadlocks or worse.  We need not even get into the hard questions,
such as what happens when one process or the other exits
unexpectedly.


Thanks for reviewing!

I may misunderstand something, but the dumper works not at
CHECK_FOR_INTERRUPTS but during the client read, i.e.,
ProcessClientReadInterrupt().

Is it also unsafe?


BTW, since there was a comment that the shared hash table
used too much memory, I'm now rewriting this patch not to use
the shared hash table but a simpler static shared memory struct.


I also find the idea that this should be the same SQL function
as pg_get_backend_memory_contexts to be a seriously bad decision.
That means that it's not possible to GRANT the right to examine
only your own process's memory --- with this proposal, that means
granting the right to inspect every other process as well.

Beyond that, the fact that there's no way to restrict the capability
to just, say, other processes owned by the same user means that
it's not really safe to GRANT to non-superusers anyway.  Even with
such a restriction added, things are problematic, since for example
it would be possible to inquire into the workings of a
security-definer function executing in another process that
nominally is owned by your user.


I'm going to change the function name and restrict the executor to
superusers. Is it enough?


Regards,




Re: Get memory contexts of an arbitrary backend process

2020-12-09 Thread torikoshia

On 2020-12-04 19:16, torikoshia wrote:

On 2020-12-03 10:36, Tom Lane wrote:

Fujii Masao  writes:
I'm starting to study how this feature behaves. At first, when I 
executed

the following query, the function never returned. ISTM that since
the autovacuum launcher cannot respond to the request of memory
contexts dump, the function keeps waiting infinity. Is this a bug?
Probably we should exclude non-backend proceses from the target
processes to dump? Sorry if this was already discussed.


 SELECT pg_get_backend_memory_contexts(pid) FROM 
pg_stat_activity;


Thanks for trying it!

It was not discussed explicitly, and I was going to do it later
as commented.


+   /* TODO: Check also whether backend or not. */




FWIW, I think this patch is fundamentally unsafe.  It's got a
lot of the same problems that I complained about w.r.t. the
nearby proposal to allow cross-backend stack trace dumping.
It does avoid the trap of thinking that it can do work in
a signal handler, but instead it supposes that it can do
work involving very high-level objects such as shared hash tables
in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
never going to be safe: the only real expectation the system
has is that CHECK_FOR_INTERRUPTS is called at places where our
state is sane enough that a transaction abort can clean up.
Trying to do things like taking LWLocks is going to lead to
deadlocks or worse.  We need not even get into the hard questions,
such as what happens when one process or the other exits
unexpectedly.


Thanks for reviewing!

I may misunderstand something, but the dumper works not at
CHECK_FOR_INTERRUPTS but during the client read, i.e.,
ProcessClientReadInterrupt().

Is it also unsafe?


BTW, since there was a comment that the shared hash table
used too much memory, I'm now rewriting this patch not to use
the shared hash table but a simpler static shared memory struct.


Attached a rewritten patch.

Accordingly, I also slightly modified the basic design as below.

---
# Communication flow between the dumper and the requester
- (1) When requesting memory context dumping, the dumper changes
the struct on the shared memory state from 'ACCEPTABLE' to
'REQUESTING'.
- (2) The dumper sends the signal to the dumper process and wait on
the latch.
- (3) When the dumper noticed the signal, it changes the state to
'DUMPING'.
- (4) When the dumper completes dumping, it changes the state to
'DONE' and set the latch.
- (5) The requestor reads the dump file and shows it to the user.
Finally, the requestor removes the dump file and reset the shared
memory state to 'ACCEPTABLE'.

# Query cancellation
- When the requestor cancels dumping, e.g. signaling using ctrl-C,
the requestor changes the state of the shared memory to 'CANCELING'.
- The dumper checks the state when it tries to change the state to
'DONE' at (4), and if the state is 'CANCELING', it initializes the
dump file and reset the shared memory state to 'ACCEPTABLE'.

# Cleanup dump file and the shared memory
- In the normal case, the dumper removes the dump file and resets
the shared memory entry as described in (5).
- When something like query cancellation or process termination
happens on the dumper after (1) and before (3), in other words,
the state is 'REQUESTING', the requestor does the cleanup.
- When something happens on the dumper or the requestor after (3)
and before (4), in other words, the state is 'DUMPING', the dumper
does the cleanup. Specifically, if the requestor cancels the query,
it just changes the state to 'CANCELING' and the dumper notices it
and cleans up things later.
OTOH, when the dumper fails to dump, it cleans up the dump file and
reset the shared memory state.
- When something happens on the requestor after (4), i.e., the state
is 'DONE', the requestor does the cleanup.
- In the case of receiving SIGKILL or power failure, all dump files
are removed in the crash recovery process.
---





I also find the idea that this should be the same SQL function
as pg_get_backend_memory_contexts to be a seriously bad decision.
That means that it's not possible to GRANT the right to examine
only your own process's memory --- with this proposal, that means
granting the right to inspect every other process as well.

Beyond that, the fact that there's no way to restrict the capability
to just, say, other processes owned by the same user means that
it's not really safe to GRANT to non-superusers anyway.  Even with
such a restriction added, things are problematic, since for example
it would be possible to inquire into the workings of a
security-definer function executing in another process that
nominally is owned by your user.


I'm going to change the function name and restrict the executor to
superusers. Is it enough?


In the attached patch, I changed the fu

adding wait_start column to pg_locks

2020-12-14 Thread torikoshia

Hi,

When examining the duration of locks, we often do join on pg_locks
and pg_stat_activity and use columns such as query_start or
state_change.

However, since these columns are the moment when queries have
started or their state has changed, we cannot get the exact lock
duration in this way.

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

One problem with this idea would be the performance impact of
calling gettimeofday repeatedly.
To avoid it, I reused the result of the gettimeofday which was
called for deadlock_timeout timer start as suggested in the
previous discussion[1].

Attached a patch.

BTW in this patch, for fast path locks, wait_start is set to
zero because it seems the lock will not be waited for.
If my understanding is wrong, I would appreciate it if
someone could point out.


Any thoughts?


[1] 
https://www.postgresql.org/message-id/28804.1407907184%40sss.pgh.pa.us


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 1a6a7377877cc52e4b87a05bbb8ffae92cdb91ab Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 15 Dec 2020 10:55:32 +0900
Subject: [PATCH v1] Add wait_start field into pg_locks.

To examine the duration of locks, we did join on pg_locks and
pg_stat_activity and used columns such as query_start or state_change.
However, since they are the moment when queries have started or their
state has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks
started waiting.
---
 doc/src/sgml/catalogs.sgml  |  9 +
 src/backend/storage/lmgr/lock.c | 10 ++
 src/backend/storage/lmgr/proc.c |  2 ++
 src/backend/utils/adt/lockfuncs.c   |  9 -
 src/include/catalog/pg_proc.dat |  6 +++---
 src/include/storage/lock.h  |  3 +++
 src/test/regress/expected/rules.out |  5 +++--
 7 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 62711ee83f..19af0e9af4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10567,6 +10567,15 @@ SCRAM-SHA-256$:&l
lock table
   
  
+
+ 
+  
+   wait_start timestamptz
+  
+  
+   The time at which lock started waiting
+  
+ 
 

   
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index d86566f455..7b30508f95 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1196,6 +1196,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -3628,6 +3629,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there was no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3655,6 +3662,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3707,6 +3715,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = lock->wait_start;
 
 		el++;
 	}
@@ -4184,6 +4193,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7dc3911590..f3702cc681 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		lock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 
 	/*
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index f592292d06..5ee0953305 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 		   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start",
+		   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->t

Re: Get memory contexts of an arbitrary backend process

2021-01-03 Thread torikoshia
On Fri, Dec 25, 2020 at 6:08 PM Kasahara Tatsuhito 
 wrote:


Thanks for reviewing and kind suggestion!


Attached a rewritten patch.

Thanks for updating patch.

But when I had applyed the patch to the current HEAD and did make, I
got an error due to duplicate OIDs.
You need to rebase the patch.


Assigned another OID.


Accordingly, I also slightly modified the basic design as below.

---
# Communication flow between the dumper and the requester
- (1) When requesting memory context dumping, the dumper changes
the struct on the shared memory state from 'ACCEPTABLE' to
'REQUESTING'.
- (2) The dumper sends the signal to the dumper process and wait on
the latch.
- (3) When the dumper noticed the signal, it changes the state to
'DUMPING'.
- (4) When the dumper completes dumping, it changes the state to
'DONE' and set the latch.
- (5) The requestor reads the dump file and shows it to the user.
Finally, the requestor removes the dump file and reset the shared
memory state to 'ACCEPTABLE'.

# Query cancellation
- When the requestor cancels dumping, e.g. signaling using ctrl-C,
the requestor changes the state of the shared memory to 'CANCELING'.
- The dumper checks the state when it tries to change the state to
'DONE' at (4), and if the state is 'CANCELING', it initializes the
dump file and reset the shared memory state to 'ACCEPTABLE'.

# Cleanup dump file and the shared memory
- In the normal case, the dumper removes the dump file and resets
the shared memory entry as described in (5).
- When something like query cancellation or process termination
happens on the dumper after (1) and before (3), in other words,
the state is 'REQUESTING', the requestor does the cleanup.
- When something happens on the dumper or the requestor after (3)
and before (4), in other words, the state is 'DUMPING', the dumper
does the cleanup. Specifically, if the requestor cancels the query,
it just changes the state to 'CANCELING' and the dumper notices it
and cleans up things later.
OTOH, when the dumper fails to dump, it cleans up the dump file and
reset the shared memory state.
- When something happens on the requestor after (4), i.e., the state
is 'DONE', the requestor does the cleanup.
- In the case of receiving SIGKILL or power failure, all dump files
are removed in the crash recovery process.
---

If the dumper is terminated before it dumps, the requestor will appear
to enter an
infinite loop because the status of mcxtdumpShmem will not change.
The following are the steps to reproduce.

 - session1
   BEGIN; LOCK TABLE t;
   - session2
     SELECT * FROM t; -- wait
     - session3
       select pg_get_target_backend_memory_contexts(); 
-- wait

 - session1
   select pg_terminate_backend(); -- kill session2
     - session3 waits forever.

Therefore, you may need to set mcxtdumpShmem->dump_status to
MCXTDUMPSTATUS_CANCELING
or other status before the dumper terminates.


In this case, it may be difficult for the dumper to change dump_status 
because

it's waiting for latch and dump_memory_contexts() is not called yet.

Instead, it's possible for the requestor to check the existence of the 
dumper

process periodically during waiting.
I added this logic to the attached patch.



Also, although I have not been able to reproduce it, I believe that
with the current design,
if the requestor disappears right after the dumper dumps the memory 
information,

the dump file will remain.
Since the current design appears to allow only one requestor per
instance, when the requestor
requests a dump, it might be a good idea to delete any remaining dump
files, if any.


Although I'm not sure when the dump file remains, deleting any remaining 
dump

files seems good for safety.
I also added this idea to the attached patch.



The following are comments on the code.

+   proc = BackendPidGetProc(dst_pid);
+
+   if (proc == NULL)
+   {
+       ereport(WARNING,
+               (errmsg("PID %d is not a PostgreSQL server process", 
dst_pid)));

+
+       return (Datum) 1;
+   }
For now, background writer, checkpointer and WAL writer are belong to
the auxiliary process.
Therefore, if we specify the PIDs of these processes for
pg_get_target_backend_memory_contexts(),
"PID  is not a PostgreSQL server process" whould be outoput.
This confuses the user.
How about use AuxiliaryPidGetProc() to determine these processes?


Thanks and I modified the patch to output the below message when it's an
auxiliary process.

| PID %d is not a PostgreSQL backend process but an auxiliary process.



+               ereport(INFO,
+                   (errmsg("The request has failed and now PID %d is
requsting dumping.",
+                       mcxtdumpShmem->src_pid)));
+
+               LWLockRelease(McxtDumpLock);
You can release LWLock before ereport.


Modified to release the lock before ereport.


+   Assert(mcxtdumpShmem->dump_status = MCXTDUMPSTATUS_REQUESTING);
typo?
It might be "mcxtdumpShmem->dump_status == MCXTDUMPSTATUS_REQUESTING".


Ops, it's a serious typo

Re: adding wait_start column to pg_locks

2021-01-03 Thread torikoshia

On 2021-01-02 06:49, Justin Pryzby wrote:

On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

Attached a patch.


This is failing make check-world, would you send an updated patch ?

I added you as an author so it shows up here.
http://cfbot.cputube.org/atsushi-torikoshi.html


Thanks!

Attached an updated patch.

Regards,From 608bba31da1bc5d15db991662fa858cd4632d849 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 4 Jan 2021 09:53:17 +0900
Subject: [PATCH v2] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks
started waiting.
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml   |  9 +
 src/backend/storage/lmgr/lock.c  | 10 ++
 src/backend/storage/lmgr/proc.c  |  2 ++
 src/backend/utils/adt/lockfuncs.c|  9 -
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/storage/lock.h   |  3 +++
 src/test/regress/expected/rules.out  |  5 +++--
 8 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..c0aecb0288 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
 AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
---+--+--+--+---++---+-+---+--++-+--+-+--
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start 
+--+--+--+--+---++---+-+---+--++-+--+-+--+
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..626e5672bd 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10578,6 +10578,15 @@ SCRAM-SHA-256$<iteration count>:&l
lock table
   
  
+
+ 
+  
+   wait_start timestamptz
+  
+  
+   The time at which lock started waiting
+  
+ 
 

   
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..27969d3772 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1195,6 +1195,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -3627,6 +3628,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there was no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3654,6 +3661,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3706,6 +3714,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = lock->wait_start;
 
 		el++;
 	}
@@ -4183,6 +4192,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 57717f666d..56aa8b7f6b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, Deadlo

Re: adding wait_start column to pg_locks

2021-02-17 Thread torikoshia

On 2021-02-16 16:59, Fujii Masao wrote:

On 2021/02/15 15:17, Fujii Masao wrote:



On 2021/02/10 10:43, Fujii Masao wrote:



On 2021/02/09 23:31, torikoshia wrote:

On 2021-02-09 22:54, Fujii Masao wrote:

On 2021/02/09 19:11, Fujii Masao wrote:



On 2021/02/09 18:13, Fujii Masao wrote:



On 2021/02/09 17:48, torikoshia wrote:

On 2021-02-05 18:49, Fujii Masao wrote:

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:
64-bit fetches are not atomic on some platforms. So spinlock 
is necessary when updating "waitStart" without holding the 
partition lock? Also GetLockStatusData() needs spinlock when 
reading "waitStart"?


Also it might be worth thinking to use 64-bit atomic 
operations like

pg_atomic_read_u64(), for that.


Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and 
pg_atomic_write_u64().


waitStart is TimestampTz i.e., int64, but it seems 
pg_atomic_read_xxx and pg_atomic_write_xxx only supports 
unsigned int, so I cast the type.


I may be using these functions not correctly, so if something 
is wrong, I would appreciate any comments.



About the documentation, since your suggestion seems better 
than v6, I used it as is.


Thanks for updating the patch!

+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+    pg_atomic_write_u64(&MyProc->waitStart,
+    
pg_atomic_read_u64((pg_atomic_uint64 *) &now));


pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+    deadlockStart = 
get_timeout_start_time(DEADLOCK_TIMEOUT);

+    pg_atomic_write_u64(&MyProc->waitStart,
+    pg_atomic_read_u64((pg_atomic_uint64 *) 
&deadlockStart));


Same as above.

+    /*
+ * Record waitStart reusing the deadlock timeout 
timer.

+ *
+ * It would be ideal this can be synchronously done 
with updating
+ * lock information. Howerver, since it gives 
performance impacts
+ * to hold partitionLock longer time, we do it here 
asynchronously.

+ */

IMO it's better to comment why we reuse the deadlock timeout 
timer.


 proc->waitStatus = waitStatus;
+    pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart 
can be

accessed concurrently there.

I updated the patch and addressed the above review comments. 
Patch attached.

Barring any objection, I will commit this version.


Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).


Thanks for the test! I pushed the patch.


But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10


-    relation | locktype |    mode
--+--+-
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0

"rorqual" reported that the above error happened in the server 
built with

--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that 
"waitStart"
atomic variable in the dummy proc created at the end of prepare 
transaction
was not initialized. I updated the patch so that 
pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared 
transaction.

Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression 
tests.


Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and 
confirmed

that all tests have passed.


Thanks for the test!

I found another bug in the patch. InitProcess() initializes 
"waitStart",
but previously InitAuxiliaryProcess() did not. This could cause 
"invalid

spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.


I pushed this version. Thanks!


While reading the patch again, I found two minor things.

1. As discussed in another thread [1], the atomic variable "waitStart" 
should
  be initialized at the postmaster startup rather than the startup of 
each

  child process. I changed "waitStart" so that it's initialized in
  InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() 
in

  InitProcess() and InitAuxiliaryProcess().

2. Thanks to the above c

Re: Printing backtrace of postgres processes

2021-02-28 Thread torikoshia

Hi,

I also think this feature would be useful when supporting
environments that lack debugger or debug symbols.
I think such environments are not rare.


+ for more information. 
This
+will help in identifying where exactly the backend process is 
currently

+executing.

When I read this, I expected a backtrace would be generated at
the moment when it receives the signal, but actually it just
sets a flag that causes the next CHECK_FOR_INTERRUPTS to print
a backtrace.

How about explaining the timing of the backtrace generation?


+print backtrace of superuser backends. This feature is not 
supported

+for postmaster, logging and statistics process.

Since the current patch use BackendPidGetProc(), it does not
support this feature not only postmaster, logging, and
statistics but also checkpointer, background writer, and
walwriter.

And when I specify pid of these PostgreSQL processes, it
says "PID  is not a PostgreSQL server process".

I think it may confuse users, so it might be worth
changing messages for those PostgreSQL processes.
AuxiliaryPidGetProc() may help to do it.


diff --git a/src/backend/postmaster/checkpointer.c 
b/src/backend/postmaster/checkpointer.c

index 54a818b..5fae328 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -57,6 +57,7 @@
 #include "storage/shmem.h"
 #include "storage/smgr.h"
 #include "storage/spin.h"
+#include "tcop/tcopprot.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
 #include "utils/resowner.h"
@@ -547,6 +548,13 @@ HandleCheckpointerInterrupts(void)
if (ProcSignalBarrierPending)
ProcessProcSignalBarrier();

+   /* Process printing backtrace */
+   if (PrintBacktracePending)
+   {
+   PrintBacktracePending = false;
+   set_backtrace(NULL, 0);
+   }
+

Although it implements backtrace for checkpointer, when
I specified pid of checkpointer it was refused from
BackendPidGetProc().


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2021-03-04 Thread torikoshia

On 2021-01-14 19:11, torikoshia wrote:

Since pg_get_target_backend_memory_contexts() waits to dump memory and
it could lead dead lock as below.

  - session1
  BEGIN; TRUNCATE t;

  - session2
  BEGIN; TRUNCATE t; -- wait

  - session1
  SELECT * FROM pg_get_target_backend_memory_contexts(); --wait


Thanks for notifying me, Fujii-san.


Attached v8 patch that prohibited calling the function inside 
transactions.


Regrettably, this modification could not cope with the advisory lock and
I haven't come up with a good way to deal with it.

It seems to me that the architecture of the requestor waiting for the
dumper leads to this problem and complicates things.


Considering the discussion printing backtrace discussion[1], it seems
reasonable that the requestor just sends a signal and dumper dumps to
the log file.

Since I found a past discussion that was doing exactly what I thought
reasonable[2], I'm going to continue that discussion if there are no
objections.


Any thought?


[1] 
https://www.postgresql.org/message-id/flat/CALDaNm3ZzmFS-=r7oduzj7y7bgqv+n06kqyft6c3xzdoknk...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/20171212044330.3nclev2sfrab36tf%40alap3.anarazel.de#6f28be9839c74779ed6aaa75616124f5



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Printing backtrace of postgres processes

2021-03-05 Thread torikoshia

On 2021-03-04 21:55, Bharath Rupireddy wrote:
On Mon, Mar 1, 2021 at 10:43 AM torikoshia  
wrote:

Since the current patch use BackendPidGetProc(), it does not
support this feature not only postmaster, logging, and
statistics but also checkpointer, background writer, and
walwriter.

And when I specify pid of these PostgreSQL processes, it
says "PID  is not a PostgreSQL server process".

I think it may confuse users, so it might be worth
changing messages for those PostgreSQL processes.
AuxiliaryPidGetProc() may help to do it.


Exactly this was the doubt I got when I initially reviewed this patch.
And I felt it should be discussed in a separate thread, you may want
to update your thoughts there [1].

[1] -
https://www.postgresql.org/message-id/CALj2ACW7Rr-R7mBcBQiXWPp%3DJV5chajjTdudLiF5YcpW-BmHhg%40mail.gmail.com


Thanks!
I'm going to join the discussion there.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?

2021-03-14 Thread torikoshia

On 2021-03-07 19:16, Bharath Rupireddy wrote:

On Fri, Feb 5, 2021 at 5:15 PM Bharath Rupireddy
 wrote:


pg_terminate_backend and pg_cancel_backend with postmaster PID produce
"PID  is not a PostgresSQL server process" warning [1], which
basically implies that the postmaster is not a PostgreSQL process at
all. This is a bit misleading because the postmaster is the parent of
all PostgreSQL processes. Should we improve the warning message if the
given PID is postmasters' PID?


+1. I felt it was a bit confusing when reviewing a thread[1].



If yes, how about  a generic message for both of the functions -
"signalling postmaster process is not allowed" or "cannot signal
postmaster process" or some other better suggestion?

[1] 2471176 ---> is postmaster PID.
postgres=# select pg_terminate_backend(2471176);
WARNING:  PID 2471176 is not a PostgreSQL server process
 pg_terminate_backend
--
 f
(1 row)
postgres=# select pg_cancel_backend(2471176);
WARNING:  PID 2471176 is not a PostgreSQL server process
 pg_cancel_backend
---
 f
(1 row)


I'm attaching a small patch that emits a warning "signalling
postmaster with PID %d is not allowed" for postmaster and "signalling
PostgreSQL server process with PID %d is not allowed" for auxiliary
processes such as checkpointer, background writer, walwriter.

However, for stats collector and sys logger processes, we still get
"PID X is not a PostgreSQL server process" warning because they
don't have PGPROC entries(??). So BackendPidGetProc and
AuxiliaryPidGetProc will not help and even pg_stat_activity is not
having these processes' pid.


I also ran into the same problem while creating a patch in [2].

I'm now wondering if changing the message to something like
"PID  is not a PostgreSQL backend process".

"backend process' is now defined as "Process of an instance
which acts on behalf of a client session and handles its
requests." in Appendix.


[1] 
https://www.postgresql.org/message-id/CALDaNm3ZzmFS-%3Dr7oDUzj7y7BgQv%2BN06Kqyft6C3xZDoKnk_6w%40mail.gmail.com


[2] 
https://www.postgresql.org/message-id/0271f440ac77f2a4180e0e56ebd944d1%40oss.nttdata.com



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2020-10-27 Thread torikoshia

On 2020-10-23 13:46, Kyotaro Horiguchi wrote:

Wait...

Attachments: 
0003-Enabled-pg_get_backend_memory_contexts-to-collect.patch


For a moment I thought that the number is patch number but the
predecessors are 0002-Enabled..collect.patch and 0001-(same
name). It's not mandatory but we usually do as the follows and it's
the way of git.

v1-0001-Enabled...collect.patch
v2-0001-Enabled...collect.patch

The vn is added by -v option for git-format-patch.


Sorry for the confusion. I'll follow that way next time.


At Thu, 22 Oct 2020 21:32:00 +0900, torikoshia
 wrote in

> > I added a shared hash table consisted of minimal members
> > mainly for managing whether the file is dumped or not.
> > Some members like 'loc' seem useful in the future, but I
> > haven't added them since it's not essential at this point.
> Yes, that would be good.
> +        /*
> +         * Since we allow only one session can request to dump
> memory context at
> +         * the same time, check whether the dump files already exist.
> +         */
> +        while (stat(dumpfile, &stat_tmp) == 0 || stat(tmpfile,
> &stat_tmp) == 0)
> +        {
> +            pg_usleep(100L);
> +        }
> If pg_get_backend_memory_contexts() is executed by two or more
> sessions at the same time, it cannot be run exclusively in this way.
> Currently it seems to cause a crash when do it so.
> This is easy to reproduce and can be done as follows.
> [session-1]
> BEGIN;
> LOCK TABKE t1;
>   [Session-2]
>   BEGIN;
>   LOCK TABLE t1; <- waiting
>     [Session-3]
>     select * FROM pg_get_backend_memory_contexts();
>       [Session-4]
>       select * FROM pg_get_backend_memory_contexts( session-2>);
> If you issue commit or abort at session-1, you will get SEGV.
> Instead of checking for the existence of the file, it might be better
> to use a hash (mcxtdumpHash) entry with LWLock.

Thanks!
Added a LWLock and changed the way from checking the file existence
to finding the hash entry.



> +        if (proc == NULL)
> +        {
> +            ereport(WARNING,
> +                    (errmsg("PID %d is not a PostgreSQL server
> process", dst_pid)));
> +            return (Datum) 1;
> +        }
> Shouldn't it clear the hash entry before return?

Yeah. added codes for removing the entry.


+   entry = AddEntryToMcxtdumpHash(dst_pid);
+
+		/* Check whether the target process is PostgreSQL backend process. 
*/

+   /* TODO: Check also whether backend or not. */
+   proc = BackendPidGetProc(dst_pid);
+
+   if (proc == NULL)
+   {
+   ereport(WARNING,
+   (errmsg("PID %d is not a PostgreSQL server 
process", dst_pid)));
+
+   LWLockAcquire(McxtDumpHashLock, LW_EXCLUSIVE);
+
+   if (hash_search(mcxtdumpHash, &dst_pid, HASH_REMOVE, 
NULL) == NULL)
+   elog(WARNING, "hash table corrupted");
+
+   LWLockRelease(McxtDumpHashLock);
+
+   return (Datum) 1;
+   }

Why do you enter a useles entry then remove it immedately?


Do you mean I should check the process existence first
since it enables us to skip entering hash entries?



+		PG_ENSURE_ERROR_CLEANUP(McxtReqKill, (Datum) 
Int32GetDatum(dst_pid));

+   {
+   SendProcSignal(dst_pid, PROCSIG_DUMP_MEMORY, 
InvalidBackendId);

"PROCSIG_DUMP_MEMORY" is somewhat misleading. Hwo about
"PROCSIG_DUMP_MEMCXT" or "PROCSIG_DUMP_MEMORY_CONTEXT"?


I'll go with "PROCSIG_DUMP_MEMCXT".



I thought that the hash table would prevent multiple reqestors from
making a request at once, but the patch doesn't seem to do that.

+   /* Wait until target process finished dumping file. */
+   while (entry->dump_status == MCXTDUMPSTATUS_NOTYET)

This needs LWLock. And this could read the entry after reused by
another backend if the dumper process is gone. That isn't likely to
happen, but theoretically the other backend may set it to
MCXTDUMPSTATUS_NOTYET inbetween two successive check on the member.


Thanks for your notification.
I'll use an LWLock.



+   /*
+* Make dump file ends with 'D'.
+* This is checked by the caller when reading the file.
+*/
+   fputc('E', fpout);

Which is right?


Sorry, the comment was wrong..



+   fputc('E', fpout);
+
+   CHECK_FOR_INTERRUPTS();

This means that the process accepts another request and rewrite the
file even while the first requester is reading it. And, the file can

Re: Is it useful to record whether plans are generic or custom?

2020-11-11 Thread torikoshia

On 2020-09-29 02:39, legrand legrand wrote:

Hi Atsushi,

+1: Your proposal is a good answer for time based performance analysis
(even if parsing durationor blks are not differentiated) .

As it makes pgss number of columns wilder, may be an other solution
would be to create a pg_stat_statements_xxx view with the same key
as pgss (dbid,userid,queryid) and all thoses new counters.


Thanks for your ideas and sorry for my late reply.

It seems creating pg_stat_statements_xxx views both for generic and
custom plans is better than my PoC patch.

However, I also began to wonder how effective it would be to just
distinguish between generic and custom plans.  Custom plans can
include all sorts of plans. and thinking cache validation, generic
plans can also include various plans.

Considering this, I'm starting to feel that it would be better to
not just keeping whether generic or cutom but the plan itself as
discussed in the below thread.

https://www.postgresql.org/message-id/flat/CAKU4AWq5_jx1Vyai0_Sumgn-Ks0R%2BN80cf%2Bt170%2BzQs8x6%3DHew%40mail.gmail.com#f57e64b8d37697c808e4385009340871


Any thoughts?


Regards,

--
Atsushi Torikoshi




Re: Get memory contexts of an arbitrary backend process

2020-11-16 Thread torikoshia

On 2020-10-28 15:32, torikoshia wrote:

On 2020-10-23 13:46, Kyotaro Horiguchi wrote:



I think we might need to step-back to basic design of this feature
since this patch seems to have unhandled corner cases that are
difficult to find.


I've written out the basic design below and attached
corresponding patch.

  # Communication flow between the dumper and the requester
  - (1) When REQUESTING memory context dumping, the dumper adds an entry 
to the shared memory. The entry manages the dump state and it is set to 
'REQUESTING'.

  - (2) The dumper sends the signal to the dumper and wait on the latch.
  - (3) The dumper looks into the corresponding shared memory entry and 
changes its state to 'DUMPING'.
  - (4) When the dumper completes dumping, it changes the state to 
'DONE' and set the latch.
  - (5) The dumper reads the dump file and shows it to the user. 
Finally, the dumper removes the dump file and reset the shared memory 
entry.


  # Query cancellation
  - When the requestor cancels dumping, e.g. signaling using ctrl-C, the 
requestor changes the status of the shared memory entry to 'CANCELING'.
  - The dumper checks the status when it tries to change the state to 
'DONE' at (4), and if the state is 'CANCELING', it removes the dump file 
and reset the shared memory entry.


  # Cleanup dump file and the shared memory entry
  - In the normal case, the dumper removes the dump file and resets the 
shared memory entry as described in (5).
  - When something like query cancellation or process termination 
happens on the dumper after (1) and before (3), in other words, the 
state is 'REQUESTING', the requestor does the cleanup.
  - When something happens on the dumper or the requestor after (3) and 
before (4), in other words, the state is 'DUMPING', the dumper does the 
cleanup. Specifically, if the requestor cancels the query, it just 
changes the state to 'CANCELING' and the dumper notices it and cleans up 
things later. OTOH, when the dumper fails to dump, it cleans up the dump 
file and deletes the entry on the shared memory.
  - When something happens on the requestor after (4), i.e., the state 
is 'DONE', the requestor does the cleanup.
  - In the case of receiving SIGKILL or power failure, all dump files 
are removed in the crash recovery process.



Although there was a suggestion that shared memory hash
table should be changed to more efficient structures,
I haven't done it in this patch.
I think it can be treated separately, I'm going to work
on that later.


On 2020-11-11 00:07, Georgios Kokolatos wrote:

Hi,

I noticed that this patch fails on the cfbot.
For this, I changed the status to: 'Waiting on Author'.

Cheers,
//Georgios

The new status of this patch is: Waiting on Author


Thanks for your notification and updated the patch.
Changed the status to: 'Waiting on Author'.

Regards,

--
Atsushi TorikoshiFrom c6d06b11d16961acd59bfa022af52cb5fc668b3e Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 16 Nov 2020 11:49:03 +0900
Subject: [PATCH v4] Enabled pg_get_backend_memory_contexts() to collect
 arbitrary backend process's memory contexts.

Previsouly, pg_get_backend_memory_contexts() could only get the
local memory contexts. This patch enables to get memory contexts
of the arbitrary backend process which PID is specified by the
argument.
---
 src/backend/access/transam/xlog.c|   7 +
 src/backend/catalog/system_views.sql |   4 +-
 src/backend/postmaster/pgstat.c  |   3 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/storage/ipc/ipci.c   |   2 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lwlocknames.txt |   1 +
 src/backend/tcop/postgres.c  |   5 +
 src/backend/utils/adt/mcxtfuncs.c| 615 ++-
 src/backend/utils/init/globals.c |   1 +
 src/bin/initdb/initdb.c  |   3 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |   4 +-
 src/bin/pg_rewind/filemap.c  |   3 +
 src/include/catalog/pg_proc.dat  |  11 +-
 src/include/miscadmin.h  |   1 +
 src/include/pgstat.h |   3 +-
 src/include/storage/procsignal.h |   1 +
 src/include/utils/mcxtfuncs.h|  52 ++
 src/test/regress/expected/rules.out  |   2 +-
 19 files changed, 697 insertions(+), 28 deletions(-)
 create mode 100644 src/include/utils/mcxtfuncs.h

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a1078a7cfc..f628fa8b53 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -73,6 +73,7 @@
 #include "storage/sync.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
+#include "utils/mcxtfuncs.h"

Re: Is it useful to record whether plans are generic or custom?

2020-11-17 Thread torikoshia

On 2020-11-12 14:23, Pavel Stehule wrote:


yes, the plan self is very interesting information - and information
if plan was generic or not is interesting too. It is other dimension
of query - maybe there can be rule - for any query store max 100 most
slows plans with all attributes. The next issue is fact so first first
5 execution of generic plans are not generic in real. This fact should
be visible too.


Thanks!
However, AFAIU, we can know whether the plan type is generic or custom
from the plan information as described in the manual.

-- https://www.postgresql.org/docs/devel/sql-prepare.html
If a generic plan is in use, it will contain parameter symbols $n, 
while a custom plan will have the supplied parameter values substituted 
into it.


If we can get the plan information, the case like 'first 5 execution
of generic plans are not generic in real' does not happen, doesn't it?


Regards,

--
Atsushi Torikoshi




[doc] adding way to examine the plan type of prepared statements

2020-11-17 Thread torikoshia

Hi,


Currently, EXPLAIN is the only way to know whether the plan is generic 
or custom according to the manual of PREPARE.


  https://www.postgresql.org/docs/devel/sql-prepare.html

After commit d05b172, we can also use pg_prepared_statements view to 
examine the plan types.


How about adding this explanation like the attached patch?


Regards,

--
Atsushi TorikoshiFrom 2c8f66637075fcb2f802a2b9cfd354f2ef18 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 12 Nov 2020 17:00:19 +0900
Subject: [PATCH v1] After commit d05b172, we can use pg_prepared_statements
 view to examine whether the plan is generic or custom. This patch adds this
 explanation in the manual of PREPARE.

---
 doc/src/sgml/ref/prepare.sgml | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
index 57a34ff83c..2268c222a9 100644
--- a/doc/src/sgml/ref/prepare.sgml
+++ b/doc/src/sgml/ref/prepare.sgml
@@ -179,6 +179,13 @@ EXPLAIN EXECUTE name(parameter_values
 
+  
+   To examine how many times each prepared statement chose generic and
+   custom plan cumulatively in the current session, refer
+   pg_prepared_statements
+   system view.
+  
+
   
Although the main point of a prepared statement is to avoid repeated parse
analysis and planning of the statement, PostgreSQL will
-- 
2.18.1



[doc] plan invalidation when statistics are update

2020-11-17 Thread torikoshia

Hi,

AFAIU, when the planner statistics are updated, generic plans are 
invalidated and PostgreSQL recreates. However, the manual doesn't seem 
to explain it explicitly.


  https://www.postgresql.org/docs/devel/sql-prepare.html

I guess this case is included in 'whenever database objects used in the 
statement have definitional (DDL) changes undergone', but I feel it's 
hard to infer.


Since updates of the statistics can often happen, how about describing 
this case explicitly like an attached patch?



Regards,

--
Atsushi TorikoshiFrom d71dbb0b100f706f19d92175b72f9e1833a8a442 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 12 Nov 2020 17:18:29 +0900
Subject: [PATCH v1] When the planner statistics are updated, generic plans are
 invalidated and PostgreSQL recreates. However, the manual didn't explain it
 explicitly. This patch adds this case as a example.

---
 doc/src/sgml/ref/prepare.sgml | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
index 57a34ff83c..4075de5689 100644
--- a/doc/src/sgml/ref/prepare.sgml
+++ b/doc/src/sgml/ref/prepare.sgml
@@ -185,7 +185,10 @@ EXPLAIN EXECUTE name(parameter_values changes
+   statement. For example, when the planner statistics of the statement
+   are updated, PostgreSQL re-analyzes and
+   re-plans the statement.
+   Also, if the value of  changes
from one use to the next, the statement will be re-parsed using the new
search_path.  (This latter behavior is new as of
PostgreSQL 9.3.)  These rules make use of a
-- 
2.18.1



Re: [doc] plan invalidation when statistics are update

2020-11-18 Thread torikoshia

On 2020-11-18 11:35, Fujii Masao wrote:

Thanks for your comment!


On 2020/11/18 11:04, torikoshia wrote:

Hi,

AFAIU, when the planner statistics are updated, generic plans are 
invalidated and PostgreSQL recreates. However, the manual doesn't seem 
to explain it explicitly.


   https://www.postgresql.org/docs/devel/sql-prepare.html

I guess this case is included in 'whenever database objects used in 
the statement have definitional (DDL) changes undergone', but I feel 
it's hard to infer.


Since updates of the statistics can often happen, how about describing 
this case explicitly like an attached patch?


+1 to add that note.

-   statement.  Also, if the value of  
changes
+   statement. For example, when the planner statistics of the 
statement

+   are updated, PostgreSQL re-analyzes and
+   re-plans the statement.

I don't think "For example," is necessary.

"planner statistics of the statement" sounds vague? Does the statement
is re-analyzed and re-planned only when the planner statistics of 
database

objects used in the statement are updated? If yes, we should describe
that to make the note a bit more explicitly?


Yes. As far as I confirmed, updating statistics which are not used in
prepared statements doesn't trigger re-analyze and re-plan.

Since plan invalidations for DDL changes and statistcal changes are 
caused

by PlanCacheRelCallback(Oid 'relid'), only the prepared statements using
'relid' relation seem invalidated.

Attached updated patch.


Regards,

-
Atsushi TorikoshiFrom f8c051e57e1ca15e2b91d3e69fe0531c0b7bf7ca Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 19 Nov 2020 13:23:18 +0900
Subject: [PATCH v2] When the planner statistics are updated, generic plans are
 invalidated and PostgreSQL recreates. However, the manual didn't explain it
 explicitly. This patch adds an explanation for this case.

---
 doc/src/sgml/ref/prepare.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
index 57a34ff83c..5a6dd481bc 100644
--- a/doc/src/sgml/ref/prepare.sgml
+++ b/doc/src/sgml/ref/prepare.sgml
@@ -185,7 +185,9 @@ EXPLAIN EXECUTE name(parameter_values changes
+   statement. Similarly, whenever the planner statistics of database
+   objects used in the statement have updated, re-analysis and re-planning
+   happen.  Also, if the value of  changes
from one use to the next, the statement will be re-parsed using the new
search_path.  (This latter behavior is new as of
PostgreSQL 9.3.)  These rules make use of a
-- 
2.18.1



Re: [doc] adding way to examine the plan type of prepared statements

2020-11-18 Thread torikoshia

On 2020-11-18 11:04, torikoshia wrote:

Hi,


Currently, EXPLAIN is the only way to know whether the plan is generic
or custom according to the manual of PREPARE.

  https://www.postgresql.org/docs/devel/sql-prepare.html

After commit d05b172, we can also use pg_prepared_statements view to
examine the plan types.

How about adding this explanation like the attached patch?


Sorry, but on second thought, since it seems better to add
the explanation to the current description of pg_prepared_statements,
I modified the patch.


Regards,

--
Atsushi TorikoshiFrom ec969fa55c2ffc71ce0b94e923e013d650de2220 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 19 Nov 2020 14:45:49 +0900
Subject: [PATCH v2] After commit d05b172, we can use pg_prepared_statements
 view to examine the number of generic and custom were chosen. This patch adds
 this explanation in the manual of PREPARE.

---
 doc/src/sgml/ref/prepare.sgml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/prepare.sgml b/doc/src/sgml/ref/prepare.sgml
index 57a34ff83c..3cc5f9de4a 100644
--- a/doc/src/sgml/ref/prepare.sgml
+++ b/doc/src/sgml/ref/prepare.sgml
@@ -204,7 +204,8 @@ EXPLAIN EXECUTE name(parameter_values
You can see all prepared statements available in the session by querying the
pg_prepared_statements
-   system view.
+   system view. This view also shows the numbers of generic and custom plans
+   were chosen.
   
  
 
-- 
2.18.1



Re: [doc] plan invalidation when statistics are update

2020-11-25 Thread torikoshia

On 2020-11-25 14:13, Fujii Masao wrote:

On 2020/11/24 23:14, Fujii Masao wrote:



On 2020/11/19 14:33, torikoshia wrote:

On 2020-11-18 11:35, Fujii Masao wrote:

Thanks for your comment!


On 2020/11/18 11:04, torikoshia wrote:

Hi,

AFAIU, when the planner statistics are updated, generic plans are 
invalidated and PostgreSQL recreates. However, the manual doesn't 
seem to explain it explicitly.


   https://www.postgresql.org/docs/devel/sql-prepare.html

I guess this case is included in 'whenever database objects used in 
the statement have definitional (DDL) changes undergone', but I 
feel it's hard to infer.


Since updates of the statistics can often happen, how about 
describing this case explicitly like an attached patch?


+1 to add that note.

-   statement.  Also, if the value of linkend="guc-search-path"/> changes
+   statement. For example, when the planner statistics of the 
statement
+   are updated, PostgreSQL re-analyzes 
and

+   re-plans the statement.

I don't think "For example," is necessary.

"planner statistics of the statement" sounds vague? Does the 
statement
is re-analyzed and re-planned only when the planner statistics of 
database
objects used in the statement are updated? If yes, we should 
describe

that to make the note a bit more explicitly?


Yes. As far as I confirmed, updating statistics which are not used in
prepared statements doesn't trigger re-analyze and re-plan.

Since plan invalidations for DDL changes and statistcal changes are 
caused
by PlanCacheRelCallback(Oid 'relid'), only the prepared statements 
using

'relid' relation seem invalidated.> Attached updated patch.


Thanks for confirming that and updating the patch!


force re-analysis and re-planning of the statement before using it
whenever database objects used in the statement have undergone
definitional (DDL) changes since the previous use of the prepared
-   statement.  Also, if the value of  
changes

+   statement. Similarly, whenever the planner statistics of database
+   objects used in the statement have updated, re-analysis and 
re-planning

+   happen.

"been" should be added between "have" and "updated" in the above 
"objects

 used in the statement have updated"?


You're right.

I'm inclined to add "since the previous use of the prepared statement" 
into
also the second description, to make it clear. But if we do that, it's 
better

to merge the above two description into one, as follows?

whenever database objects used in the statement have undergone
-   definitional (DDL) changes since the previous use of the prepared
+   definitional (DDL) changes or the planner statistics of them have
+   been updated since the previous use of the prepared
statement.  Also, if the value of  
changes


Thanks, it seems better.


Regards,




Re: Is it useful to record whether plans are generic or custom?

2021-01-12 Thread torikoshia

 wrote in



ISTM now that creating pg_stat_statements_xxx views
both for generic andcustom plans is better than my PoC patch.


On my second thought, it also makes pg_stat_statements too complicated
compared to what it makes possible..

I'm also worrying that whether taking generic and custom plan execution
time or not would be controlled by a GUC variable, and the default
would be not taking them.
Not many people will change the default.

Since the same queryid can contain various queries (different plan,
different parameter $n, etc.), I also started to feel that it is not
appropriate to get the execution time of only generic/custom queries
separately.

I suppose it would be normal practice to store past results of
pg_stat_statements for future comparisons.
If this is the case, I think that if we only add the number of
generic plan execution, it will give us a hint to notice the cause
of performance degradation due to changes in the plan between
generic and custom.

For example, if there is a clear difference in the number of times
the generic plan is executed between before and after performance
degradation as below, it would be natural to check if there is a
problem with the generic plan.

  [after performance degradation]
  =# SELECT query, calls, generic_calls FROM pg_stat_statements where 
query like '%t1%';

  query| calls | generic_calls
  -+---+---
   PREPARE p1 as select * from t1 where i = $1 |  1100 |50

  [before performance degradation]
  =# SELECT query, calls, generic_calls FROM pg_stat_statements where 
query like '%t1%';

  query| calls | generic_calls
  -+---+---
   PREPARE p1 as select * from t1 where i = $1 |  1000 | 0


Attached a patch that just adds a generic call counter to
pg_stat_statements.

Any thoughts?


Regards,

--
Atsushi Torikoshidiff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..7fdef315ae 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT blk_write_time float8,
 OUT wal_records int8,
 OUT wal_fpi int8,
-OUT wal_bytes numeric
+OUT wal_bytes numeric,
+OUT generic_calls int8
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 72a117fc19..171c39f857 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -77,10 +77,12 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/spin.h"
+#include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
@@ -192,6 +194,7 @@ typedef struct Counters
 	int64		wal_records;	/* # of WAL records generated */
 	int64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
+	int64		generic_calls;	/* # of times generic plans executed */
 } Counters;
 
 /*
@@ -277,6 +280,9 @@ static int	exec_nested_level = 0;
 /* Current nesting depth of planner calls */
 static int	plan_nested_level = 0;
 
+/* Current plan type */
+static bool	is_plan_type_generic = false;
+
 /* Saved hook values in case of unload */
 static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
@@ -1034,6 +1040,20 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 */
 	if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
+		/*
+		 * Since ActivePortal is not available at ExecutorEnd, we preserve
+		 * the plan type here.
+		 */
+		Assert(ActivePortal);
+
+		if (ActivePortal->cplan)
+		{
+			if (ActivePortal->cplan->is_generic)
+is_plan_type_generic = true;
+			else
+is_plan_type_generic = false;
+		}
+
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
 		 * space is allocated in the per-query context so it will go away at
@@ -1427,6 +1447,8 @@ pgss_store(const char *query, uint64 queryId,
 			e->counters.max_time[kind] = total_time;
 			e->counters.mean_time[kind] = total_time;
 		}
+		else if (kind == PGSS_EXEC && is_plan_type_generic)
+			e->counters.generic_calls += 1;
 		else
 		{
 			/*
@@ -1510,8 +1532,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_1	18
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23
-#define PG

Re: Get memory contexts of an arbitrary backend process

2021-01-12 Thread torikoshia

v7 that fixes recent conflicts.

It also changed the behavior of requestor when another requestor is
already working for simplicity.
In this case, v6 patch makes the requestor wait. v7 patch makes the
requestor quit.


Regards,

--
Atsushi TorikoshiFrom f20e48d99f2770bfec275805185aa5ce08661fce Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 12 Jan 2021 20:55:43 +0900
Subject: [PATCH v7] After commit 3e98c0bafb28de, we can display the usage of
 the memory contexts using pg_backend_memory_contexts system view. However,
 its target is limited to the process attached to the current session. This
 patch introduces pg_get_target_backend_memory_contexts() and makes it
 possible to collect memory contexts of the specified process.

---
 src/backend/access/transam/xlog.c|   7 +
 src/backend/catalog/system_views.sql |   3 +-
 src/backend/postmaster/pgstat.c  |   3 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/storage/ipc/ipci.c   |   2 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lwlocknames.txt |   1 +
 src/backend/tcop/postgres.c  |   5 +
 src/backend/utils/adt/mcxtfuncs.c| 731 ++-
 src/backend/utils/init/globals.c |   1 +
 src/bin/initdb/initdb.c  |   3 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |   4 +-
 src/bin/pg_rewind/filemap.c  |   3 +
 src/include/catalog/pg_proc.dat  |  12 +-
 src/include/miscadmin.h  |   1 +
 src/include/pgstat.h |   3 +-
 src/include/storage/procsignal.h |   1 +
 src/include/utils/mcxtfuncs.h|  44 ++
 18 files changed, 810 insertions(+), 21 deletions(-)
 create mode 100644 src/include/utils/mcxtfuncs.h

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ede93ad7fd..4cab47a61d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -73,6 +73,7 @@
 #include "storage/sync.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
+#include "utils/mcxtfuncs.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/relmapper.h"
@@ -6993,6 +6994,12 @@ StartupXLOG(void)
 		 */
 		pgstat_reset_all();
 
+		/*
+		 * Reset dump files in pg_memusage, because target processes do
+		 * not exist any more.
+		 */
+		RemoveMemcxtFile(0);
+
 		/*
 		 * If there was a backup label file, it's done its job and the info
 		 * has now been propagated into pg_control.  We must get rid of the
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..7419c496b2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -558,7 +558,8 @@ CREATE VIEW pg_backend_memory_contexts AS
 SELECT * FROM pg_get_backend_memory_contexts();
 
 REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
-REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_target_backend_memory_contexts FROM PUBLIC;
 
 -- Statistics views
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..8eb2d062b0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4045,6 +4045,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_DUMP_MEMORY_CONTEXT:
+			event_name = "DumpMemoryContext";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..c67e71d79b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -184,6 +184,9 @@ static const char *const excludeDirContents[] =
 	/* Contents zeroed on startup, see StartupSUBTRANS(). */
 	"pg_subtrans",
 
+	/* Skip memory context dump files. */
+	"pg_memusage",
+
 	/* end of list */
 	NULL
 };
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..18a1dd5a74 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -45,6 +45,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/mcxtfuncs.h"
 #include "utils/snapmgr.h"
 
 /* GUCs */
@@ -267,6 +268,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	McxtDumpShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..106e125cc2 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -28,6 +28,7 @@
 #include "storage/shmem.h"
 #include "storage/

Re: Get memory contexts of an arbitrary backend process

2021-01-14 Thread torikoshia

Since pg_get_target_backend_memory_contexts() waits to dump memory and
it could lead dead lock as below.

  - session1
  BEGIN; TRUNCATE t;

  - session2
  BEGIN; TRUNCATE t; -- wait

  - session1
  SELECT * FROM pg_get_target_backend_memory_contexts(2>); --wait



Thanks for notifying me, Fujii-san.


Attached v8 patch that prohibited calling the function inside 
transactions.



Regards,

--
Atsushi TorikoshiFrom 840185c1ad40cb7bc40333ab38927667c4d48c1d Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 14 Jan 2021 18:20:43 +0900
Subject: [PATCH v8] After commit 3e98c0bafb28de, we can display the usage of
 the memory contexts using pg_backend_memory_contexts system view. However,
 its target is limited to the process attached to the current session. This
 patch introduces pg_get_target_backend_memory_contexts() and makes it
 possible to collect memory contexts of the specified process.

---
 src/backend/access/transam/xlog.c|   7 +
 src/backend/catalog/system_views.sql |   3 +-
 src/backend/postmaster/pgstat.c  |   3 +
 src/backend/replication/basebackup.c |   3 +
 src/backend/storage/ipc/ipci.c   |   2 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lwlocknames.txt |   1 +
 src/backend/tcop/postgres.c  |   5 +
 src/backend/utils/adt/mcxtfuncs.c| 742 ++-
 src/backend/utils/init/globals.c |   1 +
 src/bin/initdb/initdb.c  |   3 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |   4 +-
 src/bin/pg_rewind/filemap.c  |   3 +
 src/include/catalog/pg_proc.dat  |  12 +-
 src/include/miscadmin.h  |   1 +
 src/include/pgstat.h |   3 +-
 src/include/storage/procsignal.h |   1 +
 src/include/utils/mcxtfuncs.h|  44 ++
 18 files changed, 821 insertions(+), 21 deletions(-)
 create mode 100644 src/include/utils/mcxtfuncs.h

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b18257c198..45381c343a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -74,6 +74,7 @@
 #include "storage/sync.h"
 #include "utils/builtins.h"
 #include "utils/guc.h"
+#include "utils/mcxtfuncs.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 #include "utils/relmapper.h"
@@ -7009,6 +7010,12 @@ StartupXLOG(void)
 		 */
 		pgstat_reset_all();
 
+		/*
+		 * Reset dump files in pg_memusage, because target processes do
+		 * not exist any more.
+		 */
+		RemoveMemcxtFile(0);
+
 		/*
 		 * If there was a backup label file, it's done its job and the info
 		 * has now been propagated into pg_control.  We must get rid of the
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5d89e77dbe..7419c496b2 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -558,7 +558,8 @@ CREATE VIEW pg_backend_memory_contexts AS
 SELECT * FROM pg_get_backend_memory_contexts();
 
 REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
-REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_target_backend_memory_contexts FROM PUBLIC;
 
 -- Statistics views
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3f24a33ef1..8eb2d062b0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4045,6 +4045,9 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_XACT_GROUP_UPDATE:
 			event_name = "XactGroupUpdate";
 			break;
+		case WAIT_EVENT_DUMP_MEMORY_CONTEXT:
+			event_name = "DumpMemoryContext";
+			break;
 			/* no default case, so that compiler will warn */
 	}
 
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 0f54635550..c67e71d79b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -184,6 +184,9 @@ static const char *const excludeDirContents[] =
 	/* Contents zeroed on startup, see StartupSUBTRANS(). */
 	"pg_subtrans",
 
+	/* Skip memory context dump files. */
+	"pg_memusage",
+
 	/* end of list */
 	NULL
 };
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index f9bbe97b50..18a1dd5a74 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -45,6 +45,7 @@
 #include "storage/procsignal.h"
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
+#include "utils/mcxtfuncs.h"
 #include "utils/snapmgr.h"
 
 /* GUCs */
@@ -267,6 +268,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	McxtDumpShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 583efaecff..106e

Re: adding wait_start column to pg_locks

2021-01-14 Thread torikoshia

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:
Looking at the code, this happens as the wait start time is being 
recorded in
the lock record itself, so always contains the value reported by the 
latest lock

acquisition attempt.


I think you are right and wait_start should not be recorded
in the LOCK.


On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas :


On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
 wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...


That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.


Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).


AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.


Regards,

--
Atsushi Torikoshi




Re: adding wait_start column to pg_locks

2021-01-17 Thread torikoshia

On 2021-01-15 15:23, torikoshia wrote:

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:
Looking at the code, this happens as the wait start time is being 
recorded in
the lock record itself, so always contains the value reported by the 
latest lock

acquisition attempt.


I think you are right and wait_start should not be recorded
in the LOCK.


On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas :


On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
 wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...


That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.


Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).


AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.


Attached a patch.

I noticed previous patches left the wait_start untouched even after
it acquired lock.
The patch also fixed it.

Any thoughts?


Regards,

--
Atsushi TorikoshiFrom 62ff3e4dba7d45c260a62a33425cb2d1e6b822c9 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 18 Jan 2021 10:01:35 +0900
Subject: [PATCH v4] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks started
waiting.
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml   | 10 ++
 src/backend/storage/lmgr/lock.c  |  8 
 src/backend/storage/lmgr/proc.c  |  4 
 src/backend/utils/adt/lockfuncs.c|  9 -
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/storage/lock.h   |  2 ++
 src/include/storage/proc.h   |  1 +
 src/test/regress/expected/rules.out  |  5 +++--
 9 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..c0aecb0288 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
 AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
---+--+--+--+---++---+-+---+--++-+--+-+--
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start 
+--+--+--+--+---++---+-+---+--++-+--+-+--+
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..a5ce0835a9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10589,6 +10589,16 @@ SCRAM-SHA-256$<iteration count>:&l
lock table
   
  
+
+ 
+  
+   wait_start timestamptz
+  
+  
+   Lock acquisition wait start time. NULL if
+   lock acquired.
+  
+ 
 

   
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..5b5fb474e0 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3627,6 +3627,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3

TOAST condition for column size

2021-01-18 Thread torikoshia

Hi,

When I created a table consisting of 400 VARCHAR columns and tried
to INSERT a record which rows were all the same size, there were
cases where I got an error due to exceeding the size limit per
row.

  =# -- create a table consisting of 400 VARCHAR columns
  =# CREATE TABLE t1 (c1 VARCHAR(100),
  c2 VARCHAR(100),
  ...
  c400 VARCHAR(100));

  =# -- insert one record which rows are all 20 bytes
  =# INSERT INTO t1 VALUES (repeat('a', 20),
repeat('a', 20),
...
repeat('a', 20));
ERROR:  row is too big: size 8424, maximum size 8160

What is interesting is that it failed only when the size of each
column was 20~23 bytes, as shown below.

  size of each column  |  result
  ---
  18 bytes |  success
  19 bytes |  success
  20 bytes |  failure
  21 bytes |  failure
  22 bytes |  failure
  23 bytes |  failure
  24 bytes |  success
  25 bytes |  success


When the size of each column was 19 bytes or less, it succeeds
because the row size is within a page size.
When the size of each column was 24 bytes or more, it also
succeeds because columns are TOASTed and the row size is reduced
to less than one page size.
OTOH, when it's more than 19 bytes and less than 24 bytes,
columns aren't TOASTed because it doesn't meet the condition of
the following if statement.

 --src/backend/access/table/toast_helper.c

   toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
 bool for_compression, bool check_main)
   ...(snip)...
   int32biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
   ...(snip)...
   if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here
   {
   biggest_attno = i;
   biggest_size = ttc->ttc_attr[i].tai_size;
   }


Since TOAST_POINTER_SIZE is 18 bytes but
MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed
until its size becomes larger than 24 bytes.

I confirmed these sizes in my environment but AFAIU they would be
the same size in any environment.

So, as a result of adjusting the alignment, 20~23 bytes seems to
fail.

I wonder if it might be better not to adjust the alignment here
as an attached patch because it succeeded in inserting 20~23
bytes records.
Or is there reasons to add the alignment here?

I understand that TOAST is not effective for small data and it's
not recommended to create a table containing hundreds of columns,
but I think cases that can be successful should be successful.

Any thoughts?


Regards,

--
Atsushi Torikoshidiff --git a/src/backend/access/table/toast_helper.c b/src/backend/access/table/toast_helper.c
index fb36151ce5..e916c0f95c 100644
--- a/src/backend/access/table/toast_helper.c
+++ b/src/backend/access/table/toast_helper.c
@@ -183,7 +183,7 @@ toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
 	TupleDesc	tupleDesc = ttc->ttc_rel->rd_att;
 	int			numAttrs = tupleDesc->natts;
 	int			biggest_attno = -1;
-	int32		biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
+	int32		biggest_size = TOAST_POINTER_SIZE;
 	int32		skip_colflags = TOASTCOL_IGNORE;
 	int			i;
 


Re: TOAST condition for column size

2021-01-20 Thread torikoshia

On 2021-01-19 19:32, Amit Kapila wrote:

On Mon, Jan 18, 2021 at 7:53 PM torikoshia
Because no benefit is to be expected by compressing it. The size will
be mostly the same. Also, even if we somehow try to fit this data via
toast, I think reading speed will be slower because for all such
columns an extra fetch from toast would be required. Another thing is
you or others can still face the same problem with 17-byte column
data. I don't this is the right way to fix it. I don't have many good
ideas but I think you can try by (a) increasing block size during
configure, (b) reduce the number of columns, (c) create char columns
of somewhat bigger size say greater than 24 bytes to accommodate your
case.

I know none of these are good workarounds but at this moment I can't
think of better alternatives.


Thanks for your explanation and workarounds!



On 2021-01-20 00:40, Tom Lane wrote:

Dilip Kumar  writes:
On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila  
wrote:

Won't it be safe because we don't align individual attrs of type
varchar where length is less than equal to 127?



Yeah right,  I just missed that point.


Yeah, the minimum on biggest_size has nothing to do with alignment
decisions.  It's just a filter to decide whether it's worth trying
to toast anything.
Having said that, I'm pretty skeptical of this patch: I think its
most likely real-world effect is going to be to waste cycles (and
create TOAST-table bloat) on the way to failing anyway.  I do not
think that toasting a 20-byte field down to 18 bytes is likely to be
a productive thing to do in typical situations.  The given example
looks like a cherry-picked edge case rather than a useful case to
worry about.


I agree with you, it seems only work when there are many columns with
19 ~ 23 bytes of data and it's not a normal case.
I'm not sure, but a rare exception might be some geographic data.
That's the situation I heard that problem happened.


Regards,

--
Atsushi Torikoshi




Re: adding wait_start column to pg_locks

2021-01-21 Thread torikoshia

On 2021-01-21 12:48, Fujii Masao wrote:

Thanks for updating the patch! I think that this is really useful 
feature!!


Thanks for reviewing!


I have two minor comments.

+  role="column_definition">

+   wait_start timestamptz

The column name "wait_start" should be "waitstart" for the sake of 
consistency

with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of 
"lock_type",

"virtualtransaction" is used instead of "virtual_transaction", etc.

+   Lock acquisition wait start time. NULL if
+   lock acquired.



Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.


There seems the case where the wait start time is NULL even when 
"grant"
is false. It's better to add note about that case into the docs? For 
example,
I found that the wait start time is NULL while the startup process is 
waiting

for the lock. Is this only that case?


Thanks, this is because I set 'waitstart' in the following
condition.

  ---src/backend/storage/lmgr/proc.c
  > 1250 if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.


Any thoughts?


Regards,

--
Atsushi TorikoshiFrom 6beb1c61e72c797c915427ae4e36d6bab9e0594c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 22 Jan 2021 13:51:00 +0900
Subject: [PATCH v5] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml   | 10 ++
 src/backend/storage/ipc/standby.c|  8 ++--
 src/backend/storage/lmgr/lock.c  |  8 
 src/backend/storage/lmgr/proc.c  |  4 
 src/backend/utils/adt/lockfuncs.c|  9 -
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/storage/lock.h   |  2 ++
 src/include/storage/proc.h   |  1 +
 src/test/regress/expected/rules.out  |  5 +++--
 10 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
 AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
---+--+--+--+---++---+-+---+--++-+--+-+--
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+--+--+--+--+---++---+-+---+--++-+--+-+--+---
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..ba003ce393 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10589,6 +10589,16 @@ SCRAM-SHA-256$:&l
lock table
   
  
+
+ 
+  
+   waitstart timestamptz
+  
+  
+   Lock acquisition wait start time. NULL if
+   lock acquired
+  
+ 
 

   
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..819e00e4ab 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,17 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	if (MyProc->waitStart == 0)
+		MyProc->waitStart = now;
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..ffad4e94bc 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3627,6 +3627,12 @@ GetLockStatusData(void)
 			i

Re: adding wait_start column to pg_locks

2021-02-02 Thread torikoshia

On 2021-01-25 23:44, Fujii Masao wrote:

Another comment is; Doesn't the change of MyProc->waitStart need the
lock table's partition lock? If yes, we can do that by moving
LWLockRelease(partitionLock) just after the change of
MyProc->waitStart, but which causes the time that lwlock is being held
to be long. So maybe we need another way to do that.


Thanks for your comments!

It would be ideal for the consistency of the view to record "waitstart" 
during holding the table partition lock.
However, as you pointed out, it would give non-negligible performance 
impacts.


I may miss something, but as far as I can see, the influence of not 
holding the lock is that "waitstart" can be NULL even though "granted" 
is false.


I think people want to know the start time of the lock when locks are 
held for a long time.

In that case, "waitstart" should have already been recorded.

If this is true, I think the current implementation may be enough on the 
condition that users understand it can happen that "waitStart" is NULL 
and "granted" is false.


Attached a patch describing this in the doc and comments.


Any Thoughts?

Regards,


--
Atsushi TorikoshiFrom 03c6e1ed6ffa215ee898b5a6a75d77277fb8e672 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 2 Feb 2021 21:32:36 +0900
Subject: [PATCH v6] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the lock duration in this way. This patch adds
 a new field "waitstart" preserving lock acquisition wait start time.

Note that updating this field and lock acquisition are not performed
synchronously for performance reasons.  Therefore, depending on the
timing, it can happen that waitstart is NULL even though granted is
false.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml   | 14 ++
 src/backend/storage/ipc/standby.c| 16 ++--
 src/backend/storage/lmgr/lock.c  |  8 
 src/backend/storage/lmgr/proc.c  | 10 ++
 src/backend/utils/adt/lockfuncs.c|  9 -
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/storage/lock.h   |  2 ++
 src/include/storage/proc.h   |  1 +
 src/test/regress/expected/rules.out  |  5 +++--
 10 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
 AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
---+--+--+--+---++---+-+---+--++-+--+-+--
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+--+--+--+--+---++---+-+---+--++-+--+-+--+---
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..d81d6e1c52 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10592,6 +10592,20 @@ SCRAM-SHA-256$:&l
lock table
   
  
+
+ 
+  
+   waitstart timestamptz
+  
+  
+   Lock acquisition wait start time.
+   Note that updating this field and lock acquisition are not performed
+   synchronously for performance reasons.  Therefore, depending on the
+   timing, it can happen that waitstart is
+   NULL even though
+   granted is false.
+  
+ 
 

   
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..2282229568 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,25 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0

Re: Is it useful to record whether plans are generic or custom?

2021-02-03 Thread torikoshia

Chengxi Sun, Yamada-san, Horiguchi-san,

Thanks for all your comments.
Adding only the number of generic plan execution seems acceptable.

On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi 
 wrote:

Note that ActivePortal is the closest nested portal. So it gives the
wrong result for nested portals.


I may be wrong, but I thought it was ok since the closest nested portal 
is the portal to be executed.


ActivePortal is used in ExecutorStart hook in the patch.
And as far as I read PortalStart(), ActivePortal is changed to the 
portal to be executed before ExecutorStart().


If possible, could you tell me the specific case which causes wrong 
results?


Regards,

--
Atsushi Torikoshi




Re: adding wait_start column to pg_locks

2021-02-04 Thread torikoshia

On 2021-02-03 11:23, Fujii Masao wrote:
64-bit fetches are not atomic on some platforms. So spinlock is 
necessary when updating "waitStart" without holding the partition 
lock? Also GetLockStatusData() needs spinlock when reading 
"waitStart"?


Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.


Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and 
pg_atomic_write_u64().


waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx 
and pg_atomic_write_xxx only supports unsigned int, so I cast the type.


I may be using these functions not correctly, so if something is wrong, 
I would appreciate any comments.



About the documentation, since your suggestion seems better than v6, I 
used it as is.



Regards,

--
Atsushi TorikoshiFrom 38a3d8996c4b1690cf18cdb1015e270201d34330 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 4 Feb 2021 23:23:36 +0900
Subject: [PATCH v7] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the lock duration in this way. This patch adds
 a new field "waitstart" preserving lock acquisition wait start time.

Note that updating this field and lock acquisition are not performed
synchronously for performance reasons.  Therefore, depending on the
timing, it can happen that waitstart is NULL even though granted is
false.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a...@oss.nttdata.com

modifies
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml   | 13 +
 src/backend/storage/ipc/standby.c| 17 +++--
 src/backend/storage/lmgr/lock.c  |  8 
 src/backend/storage/lmgr/proc.c  | 14 ++
 src/backend/utils/adt/lockfuncs.c|  9 -
 src/include/catalog/pg_proc.dat  |  6 +++---
 src/include/storage/lock.h   |  2 ++
 src/include/storage/proc.h   |  1 +
 src/test/regress/expected/rules.out  |  5 +++--
 10 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
 AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
---+--+--+--+---++---+-+---+--++-+--+-+--
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+--+--+--+--+---++---+-+---+--++-+--+-+--+---
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..7df4c30a65 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10592,6 +10592,19 @@ SCRAM-SHA-256$:&l
lock table
   
  
+
+ 
+  
+   waitstart timestamptz
+  
+  
+   Time when the server process started waiting for this lock,
+   or null if the lock is held.
+   Note that this can be null for a very short period of time after
+   the wait started even though granted
+   is false.
+  
+ 
 

   
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..1c8135ba74 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,26 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Record waitStart using the current time obtained for comparison
+	 * with ltime.
+	 *
+	 * It would be ideal this can be synchronously done with updating
+	 * lock information. Howerver, since it gives performance impacts
+	 * to hold partitionLock longer time, we do it here asynchronously.
+	 */
+	if (pg_atomic_read_u64(&MyProc->waitStart) ==

Re: Is it useful to record whether plans are generic or custom?

2021-02-07 Thread torikoshia

On 2021-02-04 11:19, Kyotaro Horiguchi wrote:

At Thu, 04 Feb 2021 10:16:47 +0900, torikoshia
 wrote in

Chengxi Sun, Yamada-san, Horiguchi-san,

Thanks for all your comments.
Adding only the number of generic plan execution seems acceptable.

On Mon, Jan 25, 2021 at 2:10 PM Kyotaro Horiguchi
 wrote:
> Note that ActivePortal is the closest nested portal. So it gives the
> wrong result for nested portals.

I may be wrong, but I thought it was ok since the closest nested
portal is the portal to be executed.


After executing the inner-most portal, is_plan_type_generic has a
value for the inner-most portal and it won't be changed ever after. At
the ExecutorEnd of all the upper-portals see the value for the
inner-most portal left behind is_plan_type_generic nevertheless the
portals at every nest level are independent.


ActivePortal is used in ExecutorStart hook in the patch.
And as far as I read PortalStart(), ActivePortal is changed to the
portal to be executed before ExecutorStart().

If possible, could you tell me the specific case which causes wrong
results?


Running a plpgsql function that does PREPRE in a query that does
PREPARE?


Thanks for your explanation!

I confirmed that it in fact happened.

To avoid it, attached patch preserves the is_plan_type_generic before 
changing it and sets it back at the end of pgss_ExecutorEnd().


Any thoughts?


Regards,

--
Atsushi Torikoshidiff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..7fdef315ae 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT blk_write_time float8,
 OUT wal_records int8,
 OUT wal_fpi int8,
-OUT wal_bytes numeric
+OUT wal_bytes numeric,
+OUT generic_calls int8
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..f5801016d6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -77,10 +77,12 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/spin.h"
+#include "tcop/pquery.h"
 #include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
@@ -192,6 +194,7 @@ typedef struct Counters
 	int64		wal_records;	/* # of WAL records generated */
 	int64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
+	int64		generic_calls;	/* # of times generic plans executed */
 } Counters;
 
 /*
@@ -277,6 +280,10 @@ static int	exec_nested_level = 0;
 /* Current nesting depth of planner calls */
 static int	plan_nested_level = 0;
 
+/* Current and previous plan type */
+static bool	is_plan_type_generic = false;
+static bool	is_prev_plan_type_generic = false;
+
 /* Saved hook values in case of unload */
 static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
@@ -1034,6 +1041,23 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	 */
 	if (pgss_enabled(exec_nested_level) && queryDesc->plannedstmt->queryId != UINT64CONST(0))
 	{
+		/*
+		 * Since ActivePortal is not available at ExecutorEnd, we preserve
+		 * the current and previous plan type here.
+		 * Previous one is necessary since portals can be nested.
+		 */
+		Assert(ActivePortal);
+
+		is_prev_plan_type_generic = is_plan_type_generic;
+
+		if (ActivePortal->cplan)
+		{
+			if (ActivePortal->cplan->is_generic)
+is_plan_type_generic = true;
+			else
+is_plan_type_generic = false;
+		}
+
 		/*
 		 * Set up to track total elapsed time in ExecutorRun.  Make sure the
 		 * space is allocated in the per-query context so it will go away at
@@ -1122,6 +1146,9 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
    NULL);
 	}
 
+	/* Storing has done. Set is_plan_type_generic back to the original. */
+	is_plan_type_generic = is_prev_plan_type_generic;
+
 	if (prev_ExecutorEnd)
 		prev_ExecutorEnd(queryDesc);
 	else
@@ -1427,6 +1454,8 @@ pgss_store(const char *query, uint64 queryId,
 			e->counters.max_time[kind] = total_time;
 			e->counters.mean_time[kind] = total_time;
 		}
+		else if (kind == PGSS_EXEC && is_plan_type_generic)
+			e->counters.generic_calls += 1;
 		else
 		{
 			/*
@@ -1510,8 +1539,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_1	18
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STAT

Re: adding wait_start column to pg_locks

2021-02-09 Thread torikoshia

On 2021-02-05 18:49, Fujii Masao wrote:

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:
64-bit fetches are not atomic on some platforms. So spinlock is 
necessary when updating "waitStart" without holding the partition 
lock? Also GetLockStatusData() needs spinlock when reading 
"waitStart"?


Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.


Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and 
pg_atomic_write_u64().


waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx 
and pg_atomic_write_xxx only supports unsigned int, so I cast the 
type.


I may be using these functions not correctly, so if something is 
wrong, I would appreciate any comments.



About the documentation, since your suggestion seems better than v6, I 
used it as is.


Thanks for updating the patch!

+   if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+   pg_atomic_write_u64(&MyProc->waitStart,
+   
pg_atomic_read_u64((pg_atomic_uint64 *) &now));

pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+   deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+   pg_atomic_write_u64(&MyProc->waitStart,
+   pg_atomic_read_u64((pg_atomic_uint64 *) 
&deadlockStart));

Same as above.

+   /*
+* Record waitStart reusing the deadlock timeout timer.
+*
+* It would be ideal this can be synchronously done with 
updating
+* lock information. Howerver, since it gives performance 
impacts
+* to hold partitionLock longer time, we do it here 
asynchronously.
+*/

IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+   pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch 
attached.

Barring any objection, I will commit this version.


Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).


Regards,

--
Atsushi Torikoshi




Re: adding wait_start column to pg_locks

2021-02-09 Thread torikoshia

On 2021-02-09 22:54, Fujii Masao wrote:

On 2021/02/09 19:11, Fujii Masao wrote:



On 2021/02/09 18:13, Fujii Masao wrote:



On 2021/02/09 17:48, torikoshia wrote:

On 2021-02-05 18:49, Fujii Masao wrote:

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:
64-bit fetches are not atomic on some platforms. So spinlock is 
necessary when updating "waitStart" without holding the 
partition lock? Also GetLockStatusData() needs spinlock when 
reading "waitStart"?


Also it might be worth thinking to use 64-bit atomic operations 
like

pg_atomic_read_u64(), for that.


Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and 
pg_atomic_write_u64().


waitStart is TimestampTz i.e., int64, but it seems 
pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned 
int, so I cast the type.


I may be using these functions not correctly, so if something is 
wrong, I would appreciate any comments.



About the documentation, since your suggestion seems better than 
v6, I used it as is.


Thanks for updating the patch!

+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+    pg_atomic_write_u64(&MyProc->waitStart,
+    pg_atomic_read_u64((pg_atomic_uint64 
*) &now));


pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+    deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+    pg_atomic_write_u64(&MyProc->waitStart,
+    pg_atomic_read_u64((pg_atomic_uint64 *) 
&deadlockStart));


Same as above.

+    /*
+ * Record waitStart reusing the deadlock timeout timer.
+ *
+ * It would be ideal this can be synchronously done with 
updating
+ * lock information. Howerver, since it gives performance 
impacts
+ * to hold partitionLock longer time, we do it here 
asynchronously.

+ */

IMO it's better to comment why we reuse the deadlock timeout timer.

 proc->waitStatus = waitStatus;
+    pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can 
be

accessed concurrently there.

I updated the patch and addressed the above review comments. Patch 
attached.

Barring any objection, I will commit this version.


Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).


Thanks for the test! I pushed the patch.


But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10


-relation | locktype |mode
--+--+-
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0

"rorqual" reported that the above error happened in the server built 
with

--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that 
"waitStart"
atomic variable in the dummy proc created at the end of prepare 
transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() 
is

called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.


Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.



BTW, while investigating this issue, I found that pg_stat_wal_receiver 
also

could cause this error even in the current master (without the patch).
I will report that in separate thread.



https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16


"prion" reported the following error. But I'm not sure how the changes 
of
pg_locks caused this error. I found that Heikki also reported at [1] 
that

"prion" failed with the same error but was not sure how it happened.
This makes me think for now that this issue is not directly related to
the pg_locks changes.


Thanks! I was wondering how these errors were related to the commit.


Regards,

--
Atsushi Torikoshi


-
pg_dump: error: query failed: ERROR:  missing chunk number 0 for toast
value 1 in pg_toast_2619
pg_dump: error: query was: SELECT
a.attnum,
a.attname,
a.atttypmod,
a.attstattarget,
a.attstorage,
t.typstorage

Re: RFC: Logging plan of the running query

2022-01-06 Thread torikoshia

On 2021-11-26 12:39, torikoshia wrote:
Since the patch could not be applied to the HEAD anymore, I also 
updated it.


Updated the patch for fixing compiler warning about the format on 
windows.



--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 7 Jan 2022 12:31:03 +0900
Subject: [PATCH v15] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 19 files changed, 355 insertions(+), 63 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e58efce586..9804574c10 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25543,6 +25563,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  plan of the query running on backend with PID 17793 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is logged.
+   
+
   
 
   
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b..173e268be3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_replslotdi

Re: RFC: Logging plan of the running query

2022-01-07 Thread torikoshia

On 2022-01-07 14:30, torikoshia wrote:

Updated the patch for fixing compiler warning about the format on 
windows.


I got another compiler warning, updated the patch again.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 7 Jan 2022 19:38:29 +0900
Subject: [PATCH v16] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 19 files changed, 355 insertions(+), 63 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e58efce586..9804574c10 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25430,6 +25430,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25543,6 +25563,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  plan of the query running on backend with PID 17793 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is logged.
+   
+
   
 
   
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 3a4fa9091b..173e268be3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_logicalmapdir() FROM PUBLIC;
 
 REVOKE EXECUTE ON FUNCTION pg_ls_replslotdir(text) FROM PUBLIC;

Re: Should we improve "PID XXXX is not a PostgreSQL server process" warning for pg_terminate_backend(<>)?

2021-03-16 Thread torikoshia

On 2021-03-16 20:51, Bharath Rupireddy wrote:
On Mon, Mar 15, 2021 at 11:23 AM torikoshia 
 wrote:


On 2021-03-07 19:16, Bharath Rupireddy wrote:
> On Fri, Feb 5, 2021 at 5:15 PM Bharath Rupireddy
>  wrote:
>>
>> pg_terminate_backend and pg_cancel_backend with postmaster PID produce
>> "PID  is not a PostgresSQL server process" warning [1], which
>> basically implies that the postmaster is not a PostgreSQL process at
>> all. This is a bit misleading because the postmaster is the parent of
>> all PostgreSQL processes. Should we improve the warning message if the
>> given PID is postmasters' PID?

+1. I felt it was a bit confusing when reviewing a thread[1].


Hmmm.


> I'm attaching a small patch that emits a warning "signalling
> postmaster with PID %d is not allowed" for postmaster and "signalling
> PostgreSQL server process with PID %d is not allowed" for auxiliary
> processes such as checkpointer, background writer, walwriter.
>
> However, for stats collector and sys logger processes, we still get
> "PID X is not a PostgreSQL server process" warning because they
> don't have PGPROC entries(??). So BackendPidGetProc and
> AuxiliaryPidGetProc will not help and even pg_stat_activity is not
> having these processes' pid.

I also ran into the same problem while creating a patch in [2].


I have not gone through that thread though. Is there any way we can
detect those child processes(stats collector, sys logger) that are
forked by the postmaster from a backend process? Thoughts?


I couldn't find good ways to do that, and thus I'm now wondering
just changing the message.


I'm now wondering if changing the message to something like
"PID  is not a PostgreSQL backend process".

"backend process' is now defined as "Process of an instance
which acts on behalf of a client session and handles its
requests." in Appendix.


Yeah, that looks good to me. IIUC, we can just change the message from
"PID  is not a PostgreSQL server process" to "PID  is not a
PostgreSQL backend process" and we don't need look for AuxiliaryProcs
or PostmasterPid.



Changing log messages can affect operations, especially when people
monitor the log message strings, but improving "PID  is not a
PostgreSQL server process" does not seem to cause such problems.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Get memory contexts of an arbitrary backend process

2021-03-17 Thread torikoshia

On 2021-03-05 14:22, Fujii Masao wrote:

On 2021/03/04 18:32, torikoshia wrote:

On 2021-01-14 19:11, torikoshia wrote:
Since pg_get_target_backend_memory_contexts() waits to dump memory 
and

it could lead dead lock as below.

  - session1
  BEGIN; TRUNCATE t;

  - session2
  BEGIN; TRUNCATE t; -- wait

  - session1
  SELECT * FROM pg_get_target_backend_memory_contexts(); --wait


Thanks for notifying me, Fujii-san.


Attached v8 patch that prohibited calling the function inside 
transactions.


Regrettably, this modification could not cope with the advisory lock 
and

I haven't come up with a good way to deal with it.

It seems to me that the architecture of the requestor waiting for the
dumper leads to this problem and complicates things.


Considering the discussion printing backtrace discussion[1], it seems
reasonable that the requestor just sends a signal and dumper dumps to
the log file.


+1


Thanks!

I remade the patch and introduced a function
pg_print_backend_memory_contexts(PID) which prints the memory contexts 
of

the specified PID to elog.

  =# SELECT pg_print_backend_memory_contexts(450855);

  ** log output **
  2021-03-17 15:21:01.942 JST [450855] LOG:  Printing memory contexts of 
PID 450855
  2021-03-17 15:21:01.942 JST [450855] LOG:  level: 0 TopMemoryContext: 
68720 total in 5 blocks; 16312 free (15 chunks); 52408 used
  2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 Prepared Queries: 
65536 total in 4 blocks; 35088 free (14 chunks); 30448 used
  2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 pgstat 
TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 
chunks); 6784 used

  ..(snip)..
  2021-03-17 15:21:01.942 JST [450855] LOG:  level: 2 CachedPlanSource: 
4096 total in 3 blocks; 680 free (0 chunks); 3416 used: PREPARE hoge_200 
AS SELECT * FROM pgbench_accounts WHERE aid = 
1...
  2021-03-17 15:21:01.942 JST [450855] LOG:  level: 3 CachedPlanQuery: 
4096 total in 3 blocks; 464 free (0 chunks); 3632 used

  ..(snip)..
  2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 Timezones: 104128 
total in 2 blocks; 2584 free (0 chunks); 101544 used
  2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 ErrorContext: 8192 
total in 1 blocks; 7928 free (5 chunks); 264 used
  2021-03-17 15:21:01.945 JST [450855] LOG:  Grand total: 2802080 bytes 
in 1399 blocks; 480568 free (178 chunks); 2321512 used



As above, the output is almost the same as MemoryContextStatsPrint()
except for the way of expression of the level.
MemoryContextStatsPrint() uses indents, but
pg_print_backend_memory_contexts() writes it as "level: %d".

Since there was discussion about enlarging StringInfo may cause
errors on OOM[1], this patch calls elog for each context.

As with MemoryContextStatsPrint(), each context shows 100
children at most.
I once thought it should be configurable, but something like
pg_print_backend_memory_contexts(PID, num_children) needs to send
the 'num_children' from requestor to dumper and it seems to require
another infrastructure.
Creating a new GUC for this seems overkill.
If MemoryContextStatsPrint(), i.e. showing 100 children at most is
enough, this hard limit may be acceptable.

Only superusers can call pg_print_backend_memory_contexts().

I'm going to add documentation and regression tests.


Any thoughts?


[1] 
https://www.postgresql.org/message-id/CAMsr%2BYGh%2Bsso5N6Q%2BFmYHLWC%3DBPCzA%2B5GbhYZSGruj2d0c7Vvg%40mail.gmail.com

"r_d/strengthen_perf/print_memcon.md" 110L, 5642C written


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONdiff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..e116f4a1be 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 /*
  * The SIGUSR1 signal is multiplexed to support signaling multiple event
@@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void)
 	/* latch will be set by procsignal_sigusr1_handler */
 }
 
+/*
+ * HandleProcSignalPrintMemoryContext
+ *
+ * Handle receipt of an interrupt indicating print memory context.
+ * Signal handler portion of interrupt handling.
+ */
+static void
+HandleProcSignalPrintMemoryContext(void)
+{
+	InterruptPending = true;
+	PrintMemoryContextPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
+
 /*
  * Perform global barrier related interrupt checking.
  *
@@ -580,6 +595,25 @@ ProcessProcSignalBarrier(void)
 	ConditionVariableBroadcast(&MyProcSignalSlot->pss_barrierCV);
 }
 
+/*
+ * ProcessPrintMemoryContextInterrupt
+ *		The portion of print memory context interrupt handling that runs
+ *		outside of the signal handler.
+ */
+void
+ProcessPrintMemoryContextInterrupt(void)
+{
+	PrintMemoryContextPending = false;
+

Re: Get memory contexts of an arbitrary backend process

2021-03-21 Thread torikoshia

On 2021-03-18 15:09, Fujii Masao wrote:

Thanks for your comments!


On 2021/03/17 22:24, torikoshia wrote:

I remade the patch and introduced a function
pg_print_backend_memory_contexts(PID) which prints the memory contexts 
of

the specified PID to elog.


Thanks for the patch!



   =# SELECT pg_print_backend_memory_contexts(450855);

   ** log output **
   2021-03-17 15:21:01.942 JST [450855] LOG:  Printing memory contexts 
of PID 450855
   2021-03-17 15:21:01.942 JST [450855] LOG:  level: 0 
TopMemoryContext: 68720 total in 5 blocks; 16312 free (15 chunks); 
52408 used
   2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 Prepared 
Queries: 65536 total in 4 blocks; 35088 free (14 chunks); 30448 used
   2021-03-17 15:21:01.942 JST [450855] LOG:  level: 1 pgstat 
TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 
chunks); 6784 used

   ..(snip)..
   2021-03-17 15:21:01.942 JST [450855] LOG:  level: 2 
CachedPlanSource: 4096 total in 3 blocks; 680 free (0 chunks); 3416 
used: PREPARE hoge_200 AS SELECT * FROM pgbench_accounts WHERE aid = 
1...
   2021-03-17 15:21:01.942 JST [450855] LOG:  level: 3 
CachedPlanQuery: 4096 total in 3 blocks; 464 free (0 chunks); 3632 
used

   ..(snip)..
   2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 Timezones: 
104128 total in 2 blocks; 2584 free (0 chunks); 101544 used
   2021-03-17 15:21:01.945 JST [450855] LOG:  level: 1 ErrorContext: 
8192 total in 1 blocks; 7928 free (5 chunks); 264 used
   2021-03-17 15:21:01.945 JST [450855] LOG:  Grand total: 2802080 
bytes in 1399 blocks; 480568 free (178 chunks); 2321512 used



As above, the output is almost the same as MemoryContextStatsPrint()
except for the way of expression of the level.
MemoryContextStatsPrint() uses indents, but
pg_print_backend_memory_contexts() writes it as "level: %d".


This format looks better to me.



Since there was discussion about enlarging StringInfo may cause
errors on OOM[1], this patch calls elog for each context.

As with MemoryContextStatsPrint(), each context shows 100
children at most.
I once thought it should be configurable, but something like
pg_print_backend_memory_contexts(PID, num_children) needs to send
the 'num_children' from requestor to dumper and it seems to require
another infrastructure.
Creating a new GUC for this seems overkill.
If MemoryContextStatsPrint(), i.e. showing 100 children at most is
enough, this hard limit may be acceptable.


Can't this number be passed via shared memory?


The attached patch uses static shared memory to pass the number.

As documented, the current implementation allows that when multiple
pg_print_backend_memory_contexts() called in succession or
simultaneously, max_children can be the one of another
pg_print_backend_memory_contexts().

I had tried to avoid this by adding some state information and using
before_shmem_exit() in case of process termination for cleaning up the
state information as in the patch I presented earlier, but since kill()
returns success before the dumper called signal handler, it seemed
there were times when we couldn't clean up the state.

Since this happens only when multiple pg_print_backend_memory_contexts()
are called and their specified number of children are different, and the
effect is just the not intended number of children to print, it might be
acceptable.

Or it might be better to wait for some seconds if num_chilren on shared
memory is not the initialized value(meaning some other process is
requesting to print memory contexts).


Only superusers can call pg_print_backend_memory_contexts().


+   /* Only allow superusers to signal superuser-owned backends. */
+   if (superuser_arg(proc->roleId) && !superuser())

The patch seems to allow even non-superuser to request to print the 
memory
contexts if the target backend is owned by non-superuser. Is this 
intentional?

I think that only superuser should be allowed to execute
pg_print_backend_memory_contexts() whoever owns the target backend.
Because that function can cause lots of log messages.


Thanks, it's not intentional, modified it.


I'm going to add documentation and regression tests.


Added them.

Regards,diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..e834b923e4 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24781,6 +24781,33 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_print_backend_memory_contexts
+
+pg_print_backend_memory_contexts (
+  pid integer,
+  max_children integer )
+boolean
+   
+   
+Prints the memory contexts whose backend process has the specified
+process ID.
+max_children limits the max number of children
+to print per one parent context.
+Note that when multiple
+pg_print_

Re: Is it useful to record whether plans are generic or custom?

2021-03-23 Thread torikoshia

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!

I just tried this feature. When I set plan_cache_mode to 
force_generic_plan

and executed the following queries, I found that
pg_stat_statements.generic_calls
and pg_prepared_statements.generic_plans were not the same.
Is this behavior expected? I was thinking that they are basically the 
same.


It's not expected behavior, fixed.



DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXECUTE hoge(1);
EXECUTE hoge(1);
EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE
statement LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1;


SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query
LIKE '%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 2 | PREPARE hoge AS SELECT * FROM
pgbench_accounts WHERE aid = $1




When I executed the prepared statements via EXPLAIN ANALYZE, I found
pg_stat_statements.generic_calls was not incremented. Is this behavior 
expected?

Or we should count generic_calls even when executing the queries via
ProcessUtility()?


I think prepared statements via EXPLAIN ANALYZE also should be counted
for consistency with  pg_prepared_statements.

Since ActivePortal did not keep the plan type in the 
ProcessUtility_hook,

I moved the global variables 'is_plan_type_generic' and
'is_prev_plan_type_generic' from pg_stat_statements to plancache.c.



DEALLOCATE ALL;
SELECT pg_stat_statements_reset();
PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE aid = $1;
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);
EXPLAIN ANALYZE EXECUTE hoge(1);

SELECT generic_plans, statement FROM pg_prepared_statements WHERE
statement LIKE '%hoge%';
 generic_plans |   statement
---+
 3 | PREPARE hoge AS SELECT * FROM pgbench_accounts WHERE 
aid = $1;


SELECT calls, generic_calls, query FROM pg_stat_statements WHERE query
LIKE '%hoge%';
 calls | generic_calls | query
---+---+---
 3 | 0 | PREPARE hoge AS SELECT * FROM
pgbench_accounts WHERE aid = $1
 3 | 0 | EXPLAIN ANALYZE EXECUTE hoge(1)




Regards,diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..887c4b2be8 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t   | t | t
 (7 rows)
 
+--
+-- Track the number of generic plan
+--
+CREATE TABLE pgss_test (i int, j int, k int);
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SET plan_cache_mode TO force_generic_plan;
+SET pg_stat_statements.track_utility = TRUE;
+PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1;
+EXECUTE pgss_p1(1);
+ i 
+---
+(0 rows)
+
+-- EXPLAIN ANALZE should be recorded
+PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1);
+  QUERY PLAN   
+---
+ Seq Scan on pgss_test (actual rows=0 loops=1)
+   Filter: (j = $1)
+(2 rows)
+
+-- Nested Portal
+PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1;
+BEGIN;
+DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements;
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p2
+(1 row)
+
+EXECUTE pgss_p3(1);
+ k 
+---
+(0 rows)
+
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p1
+(1 row)
+
+COMMIT;
+SELECT calls, generic_calls, query FROM pg_stat_statements;
+ calls | generic_calls |  query   
+---+---+--
+ 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements
+ 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements
+ 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1
+ 2 | 0 | FETCH IN pgss_c1
+ 1 | 0 | BEGIN
+ 1 | 0 | SELECT pg_stat_statements_reset()
+ 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE p

Re: Get memory contexts of an arbitrary backend process

2021-03-24 Thread torikoshia

On 2021-03-23 17:24, Kyotaro Horiguchi wrote:

Thanks for reviewing and suggestions!


At Mon, 22 Mar 2021 15:09:58 +0900, torikoshia
 wrote in

>> If MemoryContextStatsPrint(), i.e. showing 100 children at most is
>> enough, this hard limit may be acceptable.
> Can't this number be passed via shared memory?

The attached patch uses static shared memory to pass the number.


"pg_print_backend_memory_contexts"

That name looks like as if it returns the result as text when used on
command-line. We could have pg_get_backend_memory_context(bool
dump_to_log (or where to dump), int limit).  Or couldn't we name it
differently even in the ase we add a separate function?


Redefined pg_get_backend_memory_contexts() as
pg_get_backend_memory_contexts(pid, int max_children).

When pid equals 0, pg_get_backend_memory_contexts() prints local memory
contexts as original pg_get_backend_memory_contexts() does.
In this case, 'max_children' is ignored.

When 'pid' does not equal 0 and it is the PID of the client backend,
 memory contexts are logged through elog().



+/*
+ * MaxChildrenPerContext
+ *   Max number of children to print per one parent context.
+ */
+int  *MaxChildrenPerContext = NULL;

Perhaps it'd be better to have a struct even if it consists only of
one member.  (Aligned) C-int values are atomic so we can omit the
McxtPrintLock. (I don't think it's a problem even if it is modifed
while reading^^:)


Fixed them.


+ if(max_children <= 0)
+ {
+ ereport(WARNING,
+ (errmsg("%d is invalid value", 
max_children),
+  errhint("second parameter is the number 
of context and it must

be set to a value greater than or equal to 1")));

It's annoying to choose a number large enough when I want to dump
children unlimitedly.  Couldn't we use 0 to specify "unlimited"?


Modified as you suggested.

+ (errmsg("%d is invalid value", 
max_children),
+  errhint("second parameter is the number 
of context and it must

be set to a value greater than or equal to 1")));

For the main message, (I think) we usually spell the "%d is invalid
value" as "maximum number of children must be positive" or such.  For
the hint, we don't need a copy of the primary section of the
documentation here.


Modified it to "The maximum number of children must be greater than 0".



I think we should ERROR out for invalid parameters, at least for
max_children.  I'm not sure about pid since we might call it based on
pg_stat_activity..


Changed to ERROR out when the 'max_children' is less than 0.

Regarding pid, I left it untouched considering the consistency with 
other

signal sending functions such as pg_cancel_backend().

+ if(!SendProcSignal(pid, PROCSIG_PRINT_MEMORY_CONTEXT, 
InvalidBackendId))


We know the backendid of the process here.


Added it.



+ if (is_dst_stderr)
+ {
+ for (i = 0; i <= level; i++)
+ fprintf(stderr, "  ");

The fprintf path is used nowhere in the patch at all. It can be used
while attaching debugger but I'm not sure we need that code.  The
footprint of this patch is largely shrinked by removing it.


According to the past discussion[1], people wanted MemoryContextStats
as it was, so I think it's better that MemoryContextStats can be used
as before.



+ strcat(truncated_ident, delimiter);

strcpy is sufficient here.  And we don't need the delimiter to be a
variable.  (we can copy a string literal into truncate_ident, then
count the length of truncate_ident, instead of the delimiter
variable.)


True.

+ $current_logfiles = slurp_file($node->data_dir . 
'/current_logfiles');

...
+my $lfname = $current_logfiles;
+$lfname =~ s/^stderr //;
+chomp $lfname;

$node->logfile is the current log file name.

+ 'target PID is not PostgreSQL server process');

Maybe "check if PID check is working" or such?  And, we can do
something like the following to exercise in a more practical way.

 select pg_print_backend...(pid,) from pg_stat_activity where
backend_type = 'checkpointer';


It seems better.


As documented, the current implementation allows that when multiple
pg_print_backend_memory_contexts() called in succession or
simultaneously, max_children can be the one of another
pg_print_backend_memory_contexts().
I had tried to avoid this by adding some state information and using
before_shmem_exit() in case of process termination for cleaning up the
state information as in the patch I presented earlier, but since
kill()
returns success before the dumper called signal handler, it seemed
there were time

Re: Get memory contexts of an arbitrary backend process

2021-03-25 Thread torikoshia

On 2021-03-25 22:02, Fujii Masao wrote:

On 2021/03/25 0:17, torikoshia wrote:

On 2021-03-23 17:24, Kyotaro Horiguchi wrote:

Thanks for reviewing and suggestions!


The patched version failed to be compiled as follows. Could you fix 
this issue?


Sorry, it included a header file that's not contained in
the current version patch.

Attached new one.

mcxtfuncs.c:22:10: fatal error: utils/mcxtfuncs.h: No such file or 
directory

 #include "utils/mcxtfuncs.h"
  ^~~
compilation terminated.
make[4]: *** [: mcxtfuncs.o] Error 1
make[4]: *** Waiting for unfinished jobs
make[3]: *** [../../../src/backend/common.mk:39: adt-recursive] Error 2
make[3]: *** Waiting for unfinished jobs
make[2]: *** [common.mk:39: utils-recursive] Error 2
make[1]: *** [Makefile:42: all-backend-recurse] Error 2
make: *** [GNUmakefile:11: all-src-recurse] Error 2

https://cirrus-ci.com/task/4621477321375744

Regards,
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1d3429fbd9..a4017a0760 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24821,6 +24821,37 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_get_backend_memory_contexts
+
+pg_get_backend_memory_contexts (
+  pid integer,
+  max_children integer )
+setof record
+   
+   
+Get memory contexts whose backend process has the specified process ID.
+max_children limits the max number of children
+to print per one parent context. 0 means unlimited.
+When pid equals 0,
+pg_get_backend_memory_contexts displays all
+the memory contexts of the local process regardless of
+max_children. 
+When pid does not equal 0,
+memory contexts will be printed based on the log configuration set.
+See  for more information.
+Only superusers can call this function even when the specified process
+is non-superuser backend.
+Note that when multiple
+pg_get_backend_memory_contexts called in
+succession or simultaneously, max_children can
+be the one of another
+pg_get_backend_memory_contexts.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 0dca65dc7b..48a1a0e958 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
 CREATE VIEW pg_backend_memory_contexts AS
-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(0, 0);
 
 REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
-REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC;
 
 -- Statistics views
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..ed5393324a 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -46,6 +46,7 @@
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
 #include "utils/snapmgr.h"
+#include "utils/memutils.h"
 
 /* GUCs */
 int			shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -269,6 +270,7 @@ CreateSharedMemoryAndSemaphores(void)
 	BTreeShmemInit();
 	SyncScanShmemInit();
 	AsyncShmemInit();
+	McxtLogShmemInit();
 
 #ifdef EXEC_BACKEND
 
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..c61d5079e2 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 /*
  * The SIGUSR1 signal is multiplexed to support signaling multiple event
@@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void)
 	/* latch will be set by procsignal_sigusr1_handler */
 }
 
+/*
+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating logging memory context.
+ * Signal handler portion of interrupt handling.
+ */
+static void
+HandleProcSignalLogMemoryContext(void)
+{
+	InterruptPending = true;
+	LogMemoryContextPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
+
 /*
  * Perform global barrier related interrupt checking.
  *
@@ -580,6 +595,27 @@ ProcessProcSignalBarrier(void)
 	ConditionVariableBroadcast(&MyProcSignalSlot->pss_barrierCV);
 }
 
+/*
+ * ProcessLogMemoryContextInterrupt
+ *		The portion of logging memory context interrupt handling that runs
+ *		outside of the signal handler.
+ */
+void
+ProcessLogMemoryContextInterrupt(void)
+{
+	int		max_childre

Re: Is it useful to record whether plans are generic or custom?

2021-03-25 Thread torikoshia

On 2021-03-25 22:14, Fujii Masao wrote:

On 2021/03/23 16:32, torikoshia wrote:

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!


Thanks for updating the patch!

PostgreSQL Patch Tester reported that the patched version failed to be 
compiled

at Windows. Could you fix this issue?
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238



It seems PGDLLIMPORT was necessary..
Attached a new one.

Regards.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..887c4b2be8 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -251,6 +251,72 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t   | t | t
 (7 rows)
 
+--
+-- Track the number of generic plan
+--
+CREATE TABLE pgss_test (i int, j int, k int);
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+SET plan_cache_mode TO force_generic_plan;
+SET pg_stat_statements.track_utility = TRUE;
+PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1;
+EXECUTE pgss_p1(1);
+ i 
+---
+(0 rows)
+
+-- EXPLAIN ANALZE should be recorded
+PREPARE pgss_p2 AS SELECT j FROM pgss_test WHERE j = $1;
+EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1);
+  QUERY PLAN   
+---
+ Seq Scan on pgss_test (actual rows=0 loops=1)
+   Filter: (j = $1)
+(2 rows)
+
+-- Nested Portal
+PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1;
+BEGIN;
+DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements;
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p2
+(1 row)
+
+EXECUTE pgss_p3(1);
+ k 
+---
+(0 rows)
+
+FETCH IN pgss_c1;
+  name   
+-
+ pgss_p1
+(1 row)
+
+COMMIT;
+SELECT calls, generic_calls, query FROM pg_stat_statements;
+ calls | generic_calls |  query   
+---+---+--
+ 1 | 0 | DECLARE pgss_c1 CURSOR FOR SELECT name FROM pg_prepared_statements
+ 0 | 0 | SELECT calls, generic_calls, query FROM pg_stat_statements
+ 1 | 1 | PREPARE pgss_p1 AS SELECT i FROM pgss_test WHERE i = $1
+ 2 | 0 | FETCH IN pgss_c1
+ 1 | 0 | BEGIN
+ 1 | 0 | SELECT pg_stat_statements_reset()
+ 1 | 1 | EXPLAIN (ANALYZE, COSTS OFF, SUMMARY OFF, TIMING OFF) EXECUTE pgss_p2(1)
+ 1 | 0 | COMMIT
+ 1 | 1 | PREPARE pgss_p3 AS SELECT k FROM pgss_test WHERE k = $1
+(9 rows)
+
+SET pg_stat_statements.track_utility = FALSE;
+DEALLOCATE ALL;
+DROP TABLE pgss_test;
 --
 -- pg_stat_statements.track = none
 --
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..7fdef315ae 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -44,7 +44,8 @@ CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT blk_write_time float8,
 OUT wal_records int8,
 OUT wal_fpi int8,
-OUT wal_bytes numeric
+OUT wal_bytes numeric,
+OUT generic_calls int8
 )
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'pg_stat_statements_1_8'
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..b14919c989 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -81,6 +81,7 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 #include "utils/timestamp.h"
 
 PG_MODULE_MAGIC;
@@ -192,6 +193,7 @@ typedef struct Counters
 	int64		wal_records;	/* # of WAL records generated */
 	int64		wal_fpi;		/* # of WAL full page images generated */
 	uint64		wal_bytes;		/* total amount of WAL generated in bytes */
+	int64		generic_calls;	/* # of times generic plans executed */
 } Counters;
 
 /*
@@ -1446,6 +1448,10 @@ pgss_store(const char *query, uint64 queryId,
 			if (e->counters.max_time[kind] < total_time)
 e->counters.max_time[kind] = total_time;
 		}
+
+		if (kind == PGSS_EXEC && is_plan_type_generic)
+			e->counters.generic_calls += 1;
+
 		e->counters.rows += rows;
 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
 		e->counters.shared_blks_read += bufusage->shared_blks_read;
@@ -1510,8 +1516,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #de

Re: Get memory contexts of an arbitrary backend process

2021-03-28 Thread torikoshia

On 2021-03-26 14:08, Kyotaro Horiguchi wrote:

At Fri, 26 Mar 2021 14:02:49 +0900, Fujii Masao
 wrote in



On 2021/03/26 13:28, Kyotaro Horiguchi wrote:
>> "some contexts are omitted"
>> "n child contexts: total_bytes = ..."
> Sorry I missed that is already implemented.  So my opnion is I agree
> with limiting with a fixed-number, and preferablly sorted in
> descending order of... totalspace/nblocks?

This may be an improvement, but makes us modify
MemoryContextStatsInternal()
very much. I'm afraid that it's too late to do that at this stage...
What about leaving the output order as it is at the first version?


So I said "preferably":p  (with a misspelling...)
I'm fine with that.

regards.


Thanks for the comments!

Attached a new patch.

It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
of the specified backend process.

The number of child contexts to be logged per parent is limited to 100
as with MemoryContextStats().

As written in commit 7b5ef8f2d07, which limits the verbosity of
memory context statistics dumps, it supposes that practical cases
where the dump gets long will typically be huge numbers of
siblings under the same parent context; while the additional
debugging value from seeing details about individual siblings
beyond 100 will not be large.

Thoughts?


Regards.From e5ab553c1e5b7fa53c51e0e4fa4472bdaeced4e1 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 29 Mar 2021 09:30:23 +0900
Subject: [PATCH] After commit 3e98c0bafb28de, we can display the usage of
 memory contexts using pg_backend_memory_contexts system view. However, its
 target process is limited to the backend which is showing the view. This
 patch introduces pg_log_backend_memory_contexts(pid) which logs memory
 contexts of the specified backend process.

Currently the number of child contexts to be logged per parent
is limited to 100.
As with MemoryContextStats(), it supposes that practical cases
where the dump gets long will typically be huge numbers of
siblings under the same parent context; while the additional
debugging value from seeing details about individual siblings
beyond 100 will not be large.
---
 doc/src/sgml/func.sgml|  20 +++
 src/backend/storage/ipc/procsignal.c  |  37 
 src/backend/tcop/postgres.c   |   3 +
 src/backend/utils/adt/mcxtfuncs.c |   2 +-
 src/backend/utils/init/globals.c  |   1 +
 src/backend/utils/mmgr/aset.c |   8 +-
 src/backend/utils/mmgr/generation.c   |   8 +-
 src/backend/utils/mmgr/mcxt.c | 164 ++
 src/backend/utils/mmgr/slab.c |   9 +-
 src/include/catalog/pg_proc.dat   |   6 +
 src/include/miscadmin.h   |   1 +
 src/include/nodes/memnodes.h  |   6 +-
 src/include/storage/procsignal.h  |   3 +
 src/include/utils/memutils.h  |   3 +-
 .../t/002_log_memory_context_validation.pl|  31 
 15 files changed, 257 insertions(+), 45 deletions(-)
 create mode 100644 src/test/modules/test_misc/t/002_log_memory_context_validation.pl

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 19285ae136..7a80607366 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24871,6 +24871,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backend_memory_contexts
+
+pg_log_backend_memory_contexts ( pid integer )
+boolean
+   
+   
+Log the memory contexts whose backend process has the specified
+process ID.
+Memory contexts will be printed based on the log configuration set.
+See  for more information.
+The number of child contexts per parent is limited to 100.
+For contexts with more than 100 children, summary will be shown.
+Only superusers can log the memory contexts even when the specified
+process is non-superuser backend.
+   
+  
+
   

 
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..550aa2ffea 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 /*
  * The SIGUSR1 signal is multiplexed to support signaling multiple event
@@ -440,6 +441,20 @@ HandleProcSignalBarrierInterrupt(void)
 	/* latch will be set by procsignal_sigusr1_handler */
 }
 
+/*
+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating log memory context.
+ * Signal handler portion of interrupt handling.
+ */
+static void
+HandleProcSignalLogMemoryContext(void)
+{
+	InterruptPending = true;
+	LogMemoryContextPending = true;
+	/* latch will be set by procsignal_sigusr1_handler */
+}
+
 /*
  * Perform global ba

Re: Get memory contexts of an arbitrary backend process

2021-03-30 Thread torikoshia

On 2021-03-30 02:28, Fujii Masao wrote:

Thanks for reviewing and kind suggestions!


It adds pg_log_backend_memory_contexts(pid) which logs memory contexts
of the specified backend process.

The number of child contexts to be logged per parent is limited to 100
as with MemoryContextStats().

As written in commit 7b5ef8f2d07, which limits the verbosity of
memory context statistics dumps, it supposes that practical cases
where the dump gets long will typically be huge numbers of
siblings under the same parent context; while the additional
debugging value from seeing details about individual siblings
beyond 100 will not be large.

Thoughts?


I'm OK with 100. We should comment why we chose 100 for that.


Added following comments.

+   /*
+* When a backend process is consuming huge memory, logging all 
its
+* memory contexts might overrun available disk space. To 
prevent
+* this, we limit the number of child contexts per parent to 
100.

+*
+* As with MemoryContextStats(), we suppose that practical cases
+* where the dump gets long will typically be huge numbers of
+* siblings under the same parent context; while the additional
+* debugging value from seeing details about individual siblings
+* beyond 100 will not be large.
+*/
+   MemoryContextStatsDetail(TopMemoryContext, 100, false);



Here are some review comments.

Isn't it better to move HandleProcSignalLogMemoryContext() and
ProcessLogMemoryContextInterrupt() to mcxt.c from procsignal.c
(like the functions for notify interrupt are defined in async.c)
because they are the functions for memory contexts?


Agreed.
Also renamed HandleProcSignalLogMemoryContext to
HandleLogMemoryContextInterrupt.


+ * HandleProcSignalLogMemoryContext
+ *
+ * Handle receipt of an interrupt indicating log memory context.
+ * Signal handler portion of interrupt handling.

IMO it's better to comment why we need to separate the function into 
two,
i.e., HandleProcSignalLogMemoryContext() and 
ProcessLogMemoryContextInterrupt(),
like the comment for other similar function explains. What about the 
followings?


Thanks! Changed them to the suggested one.


---
HandleLogMemoryContextInterrupt

Handle receipt of an interrupt indicating logging of memory contexts.

All the actual work is deferred to ProcessLogMemoryContextInterrupt(),
because we cannot safely emit a log message inside the signal handler.
---
ProcessLogMemoryContextInterrupt

Perform logging of memory contexts of this backend process.

Any backend that participates in ProcSignal signaling must arrange to
call this function if we see LogMemoryContextPending set. It is called
from CHECK_FOR_INTERRUPTS(), which is enough because the target process
for logging of memory contexts is a backend.
---


+   if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
+   HandleProcSignalLogMemoryContext();
+
if (CheckProcSignal(PROCSIG_BARRIER))
HandleProcSignalBarrierInterrupt();

The code for memory context logging interrupt came after barrier 
interrupt
in other places, e.g., procsignal.h. Why is this order of code 
different?


Fixed.


+/*
+ * pg_log_backend_memory_contexts
+ * Print memory context of the specified backend process.

Isn't it better to move pg_log_backend_memory_contexts() to mcxtfuncs.c
from mcxt.c because this is the SQL function memory contexts?


Agreed.

IMO we should comment why we allow only superuser to call this 
function.

What about the following?


Thanks!
Modified the patch according to the suggestions.


-
Signal a backend process to log its memory contexts.

Only superusers are allowed to signal to log the memory contexts
because allowing any users to issue this request at an unbounded rate
would cause lots of log messages and which can lead to denial of 
service.

-

+   PGPROC  *proc = BackendPidGetProc(pid);
+
+   /* Check whether the target process is PostgreSQL backend process. */
+   if (proc == NULL)

What about adding more comments as follows?

-
+   /*
+	 * BackendPidGetProc returns NULL if the pid isn't valid; but by the 
time
+	 * we reach kill(), a process for which we get a valid proc here 
might
+	 * have terminated on its own.  There's no way to acquire a lock on 
an
+	 * arbitrary process to prevent that. But since this mechanism is 
usually

+* used to debug a backend running and consuming lots of memory,
+* that it might end on its own first and its memory contexts are not
+* logged is not a problem.
+*/
+   if (proc == NULL)
+   {
+   /*
+* This is just a warning so a loop-through-resultset will not 
abort
+* if one backend logged its memory contexts during the run.
+*/
+

Re: Get memory contexts of an arbitrary backend process

2021-03-30 Thread torikoshia

On 2021-03-31 04:36, Fujii Masao wrote:

On 2021/03/30 22:06, torikoshia wrote:

Modified the patch according to the suggestions.


Thanks for updating the patch!

I applied the cosmetic changes to the patch and added the example of
the function call into the document. Attached is the updated version
of the patch. Could you check this version?



Thanks a lot!


+The memory contexts will be logged in the log file. For example:

When 'log_destination = stderr' and 'logging_collector = off', it does
not log in the file but in the stderr.

Description like below would be a bit more accurate but I'm wondering
it's  repeating the same words.

+ The memory contexts will be logged based on the log configuration set. 
For example:


How do you think?


+
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts
+
+ t
+(1 row)
+
+The memory contexts will be logged in the log file. For example:
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free 
(5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 
1 blocks; 1408 free (0 chunks); 6784 used


The line "The memory contexts will be logged in the log file. For 
example:"

is neither nor SQL command and its outputs, it might be better to
differentiate it.


What about the following like attached patch?

+
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts
+
+ t
+(1 row)
+
+The memory contexts will be logged in the log file. For example:
+
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free 
(5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 
1 blocks; 1408 free (0 chunks); 6784 used

...(snip)...
+LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 
chunks); 264 used
+LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 
chunks); 1029560 used

+


Regards.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fbf6062d0a..ce01d51b21 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24917,6 +24917,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backend_memory_contexts
+
+pg_log_backend_memory_contexts ( pid integer )
+boolean
+   
+   
+Logs the memory contexts whose backend process has the specified
+process ID.
+Memory contexts will be logged based on the log configuration set.
+See  for more information.
+Only superusers can log the memory contexts.
+   
+  
+
   

 
@@ -24987,6 +25004,36 @@ SELECT collation for ('foo' COLLATE "de_DE");
 pg_stat_activity view.

 
+   
+pg_log_backend_memory_contexts can be used
+to log the memory contexts of the backend process. For example,
+
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts 
+
+ t
+(1 row)
+
+The memory contexts will be logged in the log file. For example:
+
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used
+LOG:  level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used
+LOG:  level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used
+LOG:  level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used
+LOG:  level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used
+LOG:  level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used
+LOG:  level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
+...
+LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
+LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used
+
+For more than 100 child contexts under the same parent one,
+100 child contexts and a summary of the remaining ones will be logged.
+   
+
   
 
   
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..eac6895141 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "stora

Re: Get memory contexts of an arbitrary backend process

2021-04-04 Thread torikoshia

On 2021-04-01 19:13, Fujii Masao wrote:

On 2021/03/31 15:16, Kyotaro Horiguchi wrote:

+ The memory contexts will be logged based on the log configuration
set. For example:

How do you think?


How about "The memory contexts will be logged in the server log" ?
I think "server log" doesn't suggest any concrete target.


Or just using "logged" is enough?

Also I'd like to document that one message for each memory context is 
logged.

So what about the following?

One message for each memory context will be logged. For example,



Agreed.

BTW, there was a conflict since c30f54ad732(Detect POLLHUP/POLLRDHUP 
while

running queries), attached v9.


Regards,diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3cf243a16a..a20be435ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24913,6 +24913,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backend_memory_contexts
+
+pg_log_backend_memory_contexts ( pid integer )
+boolean
+   
+   
+Logs the memory contexts whose backend process has the specified
+process ID.
+Memory contexts will be logged based on the log configuration set.
+See  for more information.
+Only superusers can log the memory contexts.
+   
+  
+
   

 
@@ -24983,6 +25000,36 @@ SELECT collation for ('foo' COLLATE "de_DE");
 pg_stat_activity view.

 
+   
+pg_log_backend_memory_contexts can be used
+to log the memory contexts of the backend process. For example,
+
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts 
+
+ t
+(1 row)
+
+One message for each memory context will be logged. For example:
+
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used
+LOG:  level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used
+LOG:  level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used
+LOG:  level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used
+LOG:  level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used
+LOG:  level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used
+LOG:  level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
+...
+LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
+LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used
+
+For more than 100 child contexts under the same parent one,
+100 child contexts and a summary of the remaining ones will be logged.
+   
+
   
 
   
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c6a8d4611e..eac6895141 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -30,6 +30,7 @@
 #include "storage/shmem.h"
 #include "storage/sinval.h"
 #include "tcop/tcopprot.h"
+#include "utils/memutils.h"
 
 /*
  * The SIGUSR1 signal is multiplexed to support signaling multiple event
@@ -657,6 +658,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_BARRIER))
 		HandleProcSignalBarrierInterrupt();
 
+	if (CheckProcSignal(PROCSIG_LOG_MEMORY_CONTEXT))
+		HandleLogMemoryContextInterrupt();
+
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_DATABASE))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_DATABASE);
 
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ad351e2fd1..330ec5b028 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3327,6 +3327,9 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (LogMemoryContextPending)
+		ProcessLogMemoryContextInterrupt();
 }
 
 
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
index c02fa47550..fe9b7979e2 100644
--- a/src/backend/utils/adt/mcxtfuncs.c
+++ b/src/backend/utils/adt/mcxtfuncs.c
@@ -18,6 +18,8 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "mb/pg_wchar.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 
 /* --
@@ -61,7 +63,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
 
 	/* Examine the context itself */
 	memset(&stat, 0, sizeof(stat));
-	(*context->methods->stats) (context, NULL, (void *) &level, &stat);
+	(*context->methods->stats) (context, NULL, (void *) &level, &stat, true);
 
 	memset(values, 0, sizeof(values));
 	memset(nulls, 0, sizeof(nulls));
@@ -155,3 +157,59 @@ pg_

Re: Is it useful to record whether plans are generic or custom?

2021-04-05 Thread torikoshia

On 2021-03-26 17:46, Fujii Masao wrote:

On 2021/03/26 0:33, torikoshia wrote:

On 2021-03-25 22:14, Fujii Masao wrote:

On 2021/03/23 16:32, torikoshia wrote:

On 2021-03-05 17:47, Fujii Masao wrote:

Thanks for your comments!


Thanks for updating the patch!

PostgreSQL Patch Tester reported that the patched version failed to 
be compiled

at Windows. Could you fix this issue?
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.131238



It seems PGDLLIMPORT was necessary..
Attached a new one.


Thanks for updating the patch!

In my test, generic_calls for a utility command was not incremented
before PL/pgSQL function was executed. Maybe this is expected behavior.
But it was incremented after the function was executed. Is this a bug?
Please see the following example.


Thanks for reviewing!

It's a bug and regrettably it seems difficult to fix it during this
commitfest.

Marked the patch as "Withdrawn".


Regards,




Re: Get memory contexts of an arbitrary backend process

2021-04-05 Thread torikoshia

On 2021-04-05 12:59, Fujii Masao wrote:

On 2021/04/05 12:20, Zhihong Yu wrote:


Thanks for reviewing!


+ * On receipt of this signal, a backend sets the flag in the signal
+ * handler, and then which causes the next CHECK_FOR_INTERRUPTS()



I think the 'and then' is not needed:


Although I wonder either would be fine, removed the words.

+        * This is just a warning so a loop-through-resultset will not 
abort

+        * if one backend logged its memory contexts during the run.

The pid given by arg 0 is not a PostgreSQL server process. Which other 
backend could it be ?


This is the comment that I added wrongly. So the comment should be
"This is just a warning so a loop-through-resultset will not abort
if one backend terminated on its own during the run.",
like pg_signal_backend(). Thought?


+1.

Attached v10 patch.


Regards,From 8931099cbf3d6e6ef24150496cb795413785f808 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 5 Apr 2021 20:40:12 +0900
Subject: [PATCH v10] After commit 3e98c0bafb28de, we can display the usage of
 memory contexts using pg_backend_memory_contexts system view. However, its
 target process is limited to the backend which is showing the view. This
 patch introduces pg_log_backend_memory_contexts(pid) which logs memory
 contexts of the specified backend process.

Currently the number of child contexts to be logged per parent is limited
to 100.
As with MemoryContextStats(), it supposes that practical cases where the
dump gets long will typically be huge numbers of siblings under the same
parent context; while the additional debugging value from seeing details
about individual siblings beyond 100 will not be large.
---
 doc/src/sgml/func.sgml   |  47 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/tcop/postgres.c  |   3 +
 src/backend/utils/adt/mcxtfuncs.c|  60 ++-
 src/backend/utils/init/globals.c |   1 +
 src/backend/utils/mmgr/aset.c|   8 +-
 src/backend/utils/mmgr/generation.c  |   8 +-
 src/backend/utils/mmgr/mcxt.c| 171 +++
 src/backend/utils/mmgr/slab.c|   9 +-
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/miscadmin.h  |   1 +
 src/include/nodes/memnodes.h |   6 +-
 src/include/storage/procsignal.h |   1 +
 src/include/utils/memutils.h |   5 +-
 src/test/regress/expected/misc_functions.out |  13 ++
 src/test/regress/sql/misc_functions.sql  |   9 +
 16 files changed, 305 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3cf243a16a..a20be435ca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -24913,6 +24913,23 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_backend_memory_contexts
+
+pg_log_backend_memory_contexts ( pid integer )
+boolean
+   
+   
+Logs the memory contexts whose backend process has the specified
+process ID.
+Memory contexts will be logged based on the log configuration set.
+See  for more information.
+Only superusers can log the memory contexts.
+   
+  
+
   

 
@@ -24983,6 +25000,36 @@ SELECT collation for ('foo' COLLATE "de_DE");
 pg_stat_activity view.

 
+   
+pg_log_backend_memory_contexts can be used
+to log the memory contexts of the backend process. For example,
+
+postgres=# SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+ pg_log_backend_memory_contexts 
+
+ t
+(1 row)
+
+One message for each memory context will be logged. For example:
+
+LOG:  logging memory contexts of PID 10377
+STATEMENT:  SELECT pg_log_backend_memory_contexts(pg_backend_pid());
+LOG:  level: 0; TopMemoryContext: 80800 total in 6 blocks; 14432 free (5 chunks); 66368 used
+LOG:  level: 1; pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1408 free (0 chunks); 6784 used
+LOG:  level: 1; TopTransactionContext: 8192 total in 1 blocks; 7720 free (1 chunks); 472 used
+LOG:  level: 1; RowDescriptionContext: 8192 total in 1 blocks; 6880 free (0 chunks); 1312 used
+LOG:  level: 1; MessageContext: 16384 total in 2 blocks; 5152 free (0 chunks); 11232 used
+LOG:  level: 1; Operator class cache: 8192 total in 1 blocks; 512 free (0 chunks); 7680 used
+LOG:  level: 1; smgr relation table: 16384 total in 2 blocks; 4544 free (3 chunks); 11840 used
+LOG:  level: 1; TransactionAbortContext: 32768 total in 1 blocks; 32504 free (0 chunks); 264 used
+...
+LOG:  level: 1; ErrorContext: 8192 total in 1 blocks; 7928 free (3 chunks); 264 used
+LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 used
+
+For more than 100 child contexts under the same parent one,
+100 child contexts and a summary of the r

Re: Get memory contexts of an arbitrary backend process

2021-04-05 Thread torikoshia

On 2021-04-06 00:08, Fujii Masao wrote:

On 2021/04/05 21:03, torikoshia wrote:

On 2021-04-05 12:59, Fujii Masao wrote:

On 2021/04/05 12:20, Zhihong Yu wrote:


Thanks for reviewing!


+ * On receipt of this signal, a backend sets the flag in the signal
+ * handler, and then which causes the next CHECK_FOR_INTERRUPTS()



I think the 'and then' is not needed:


Although I wonder either would be fine, removed the words.

+        * This is just a warning so a loop-through-resultset will 
not abort

+        * if one backend logged its memory contexts during the run.

The pid given by arg 0 is not a PostgreSQL server process. Which 
other backend could it be ?


This is the comment that I added wrongly. So the comment should be
"This is just a warning so a loop-through-resultset will not abort
if one backend terminated on its own during the run.",
like pg_signal_backend(). Thought?


+1.

Attached v10 patch.


Thanks for updating the patch!

I updated the patch as follows. Could you check the attached patch?


Thanks a lot!

I don't have any objections to your improvements.

Regards,




Re: Is it useful to record whether plans are generic or custom?

2020-07-22 Thread torikoshia

On 2020-07-20 13:57, torikoshia wrote:


As I proposed earlier in this thread, I'm now trying to add information
about generic/cudstom plan to pg_stat_statements.
I'll share the idea and the poc patch soon.


Attached a poc patch.

Main purpose is to decide (1) the user interface and (2) the
way to get the plan type from pg_stat_statements.

(1) the user interface
I added a new boolean column 'generic_plan' to both
pg_stat_statements view and the member of the hash key of
pg_stat_statements.

This is because as Legrand pointed out the feature seems
useful under the condition of differentiating all the
counters for a queryid using a generic plan and the one
using a custom one.

I thought it might be preferable to make a GUC to enable
or disable this feature, but changing the hash key makes
it harder.

(2) way to get the plan type from pg_stat_statements
To know whether the plan is generic or not, I added a
member to CachedPlan and get it in the ExecutorStart_hook
from ActivePortal.
I wished to do it in the ExecutorEnd_hook, but the
ActivePortal is not available on executorEnd, so I keep
it on a global variable newly defined in pg_stat_statements.


Any thoughts?

This is a poc patch and I'm going to do below things later:

- update pg_stat_statements version
- change default value for the newly added parameter in
  pg_stat_statements_reset() from -1 to 0(since default for
  other parameters are all 0)
- add regression tests and update docs



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 793eafad8e988b6754c9d89e0ea14b64b07eef81 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi
Date: Wed, 22 Jul 2020 16:00:04 +0900
Subject: [PATCH] [poc] Previously the number of custom and generic plans are
 recoreded only in pg_prepared_statements, meaning we could only track them
 regarding current session. This patch records them in pg_stat_statements and
 it enables to track them regarding all sessions of the PostgreSQL instance.

---
 .../pg_stat_statements--1.6--1.7.sql  |  3 +-
 .../pg_stat_statements--1.7--1.8.sql  |  1 +
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++
 src/backend/utils/cache/plancache.c   |  2 +
 src/include/utils/plancache.h |  1 +
 5 files changed, 41 insertions(+), 10 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
index 6fc3fed4c9..5ab0a26b77 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
@@ -12,7 +12,8 @@ DROP FUNCTION pg_stat_statements_reset();
 /* Now redefine */
 CREATE FUNCTION pg_stat_statements_reset(IN userid Oid DEFAULT 0,
 	IN dbid Oid DEFAULT 0,
-	IN queryid bigint DEFAULT 0
+	IN queryid bigint DEFAULT 0,
+	IN generic_plan int DEFAULT -1
 )
 RETURNS void
 AS 'MODULE_PATHNAME', 'pg_stat_statements_reset_1_7'
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..0d7c4e7343 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -15,6 +15,7 @@ DROP FUNCTION pg_stat_statements(boolean);
 CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT userid oid,
 OUT dbid oid,
+OUT generic_plan bool,
 OUT queryid bigint,
 OUT query text,
 OUT plans int8,
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 14cad19afb..5d74dc04cd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -78,9 +78,11 @@
 #include "storage/ipc.h"
 #include "storage/spin.h"
 #include "tcop/utility.h"
+#include "tcop/pquery.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 
 PG_MODULE_MAGIC;
 
@@ -156,6 +158,7 @@ typedef struct pgssHashKey
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
 	uint64		queryid;		/* query identifier */
+	bool			is_generic_plan;
 } pgssHashKey;
 
 /*
@@ -266,6 +269,9 @@ static int	exec_nested_level = 0;
 /* Current nesting depth of planner calls */
 static int	plan_nested_level = 0;
 
+/* Current plan type */
+static bool	is_generic_plan = false;
+
 /* Saved hook values in case of unload */
 static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
 static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
@@ -367,7 +373,7 @@ static char *qtext_fetch(Size query_offset, int query_len,
 		 char *buffer, Size buffer_size);
 static bool need_gc_qtexts(void);
 static void gc_qtexts(void);
-static void entry_reset(Oid userid, Oid dbid, uint64 queryid);
+stati

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2020-07-28 Thread torikoshia

On 2020-07-14 20:24, Julien Rouhaud wrote:

On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?


Thanks for the notice, v10 attached!


And here are minor typos.

 79 +* utility statements.  Note that we don't compute a 
queryId

for prepared
 80 +* statemets related utility, as those will inherit from 
the

underlying
 81 +* statements's one (except DEALLOCATE which is entirely
untracked).

statemets -> statements
statements's -> statements' or statement's?


Thanks!  I went with "statement's".


Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in 
log_line_prefix

was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?


```
  $ initdb --no-locale -D data


  $ edit postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
logging_collector = on
log_line_prefix = '%m [%p] queryid:%Q '
log_statement = 'all'

  $ pg_ctl start -D data

  $ psql
  =# CREATE EXTENSION pg_stat_statements;

  =# CREATE TABLE t1 (i int);
  =# INSERT INTO t1 VALUES (0),(1);
  =# SELECT queryid, query FROM pg_stat_activity;

  -- query ids are all 0 on the log
  $ view log
  2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG:  statement: CREATE 
TABLE t1 (i int);
  2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG:  statement: INSERT 
INTO t1 VALUES (0),(1);
  2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG:  statement: SELECT * 
FROM t1;


  -- on pg_stat_activity and pgss, query ids are not 0
  $ psql
  =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE 
'%t1%';

 queryid|query
  
--+--

1109063694563750779 | SELECT * FROM t1;
   -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity 
WHERE query LIKE '%t1%';

  (2 rows)

  =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE 
'%t1%';

 queryid|  query
  --+-
   -5028988130796701553 | CREATE TABLE t1 (i int)
1109063694563750779 | SELECT * FROM t1
2726469050076420724 | INSERT INTO t1 VALUES ($1),($2)

```


And here is a minor typo.
 optionnally -> optionally


753 +   /* query identifier, optionnally computed using 
post_parse_analyze_hook */



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-07-31 Thread torikoshia

On 2020-07-30 15:13, Kasahara Tatsuhito wrote:

Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia  
wrote:
- whether information for identifying parent-child relation is 
necessary

or it's an overkill

I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

- if this information is necessary, memory address is suitable or 
other

means like assigning unique numbers are required

IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.


Thanks for your opinion!

I also feel it'll be sufficient to know not the exact memory context
of the parent but the name of the parent context.

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.




On 2020-07-10 08:30:22 +0900, torikoshia wrote:

On 2020-07-08 22:12, Fujii Masao wrote:



Another comment about parent column is: dynahash can be parent?
If yes, its indent instead of name should be displayed in parent
column?



I'm not sure yet, but considering the changes in the future, it seems
better to do so.


Attached a patch which displays ident as parent when dynahash is a
parent.

I could not find the case when dynahash can be a parent so I tested it
using attached test purposed patch.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 31 Jul 2020 16:20:29 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.

This patch implements a new SQL-callable function
pg_get_backend_memory_contexts which exposes memory usage of the
local backend.
It also adds a new view pg_backend_memory_contexts for exposing
local backend memory contexts.

---
 doc/src/sgml/catalogs.sgml   | 122 +++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 140 +++
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 284 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 26fda20d19..5bfc983a90 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9266,6 +9266,11 @@ SCRAM-SHA-256$<iteration count>:&l
   materialized views
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_policies
   policies
@@ -10544,6 +10549,123 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
 
  
 
+ 
+  pg_backend_memory_contexts
+
+  
+   pg_backend_memory_contexts
+  
+
+  
+   The view pg_backend_memory_contexts displays all
+   the local backend memory contexts.
+  
+  
+   pg_backend_memory_contexts contains one row
+   for each memory context.
+  
+
+  
+   pg_backend_memory_contexts Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   name text
+  
+  
+   Name of the memory context
+  
+ 
+
+ 
+  
+   ident text
+  
+  
+   Identification information of the memory context. This field is truncated at 1024 bytes
+  
+ 
+
+ 
+  
+   parent text
+  
+  
+   Name of the parent of this memory context
+  
+ 
+
+ 
+  
+   level int4
+  
+  
+   Distance from TopMemoryContext in context tree
+  
+ 
+
+ 
+  
+   total_bytes int8
+  
+  
+   Total bytes allocated for this memory context
+  
+ 
+
+ 
+  
+   total_nblocks int8
+  
+  
+   Total number of blocks allocated for this memory context
+  
+ 
+
+ 
+  
+   free_bytes int8
+  
+  
+   Free space in bytes
+  
+ 
+
+ 
+  
+   free_chunks int8
+  
+  
+   Total number of free chunks
+  
+ 
+
+ 
+  
+   used_bytes int8
+  
+  
+   Used space in bytes
+  
+ 
+
+   
+  
+
+ 
+
  
   pg_matviews
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8625cbeab6..ba5a23ac25 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -554,6 +554,9 @@ CREATE VIEW pg_shmem_allocations AS
 REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
+CREAT

Re: Is it useful to record whether plans are generic or custom?

2020-07-31 Thread torikoshia

On 2020-07-30 14:31, Fujii Masao wrote:

On 2020/07/22 16:49, torikoshia wrote:

On 2020-07-20 13:57, torikoshia wrote:

As I proposed earlier in this thread, I'm now trying to add 
information

about generic/cudstom plan to pg_stat_statements.
I'll share the idea and the poc patch soon.


Attached a poc patch.


Thanks for the POC patch!

With the patch, when I ran "CREATE EXTENSION pg_stat_statements",
I got the following error.

ERROR:  function pg_stat_statements_reset(oid, oid, bigint) does not 
exist


Oops, sorry about that.
I just fixed it there for now.



Main purpose is to decide (1) the user interface and (2) the
way to get the plan type from pg_stat_statements.

(1) the user interface
I added a new boolean column 'generic_plan' to both
pg_stat_statements view and the member of the hash key of
pg_stat_statements.

This is because as Legrand pointed out the feature seems
useful under the condition of differentiating all the
counters for a queryid using a generic plan and the one
using a custom one.


I don't like this because this may double the number of entries in 
pgss.

Which means that the number of entries can more easily reach
pg_stat_statements.max and some entries will be discarded.



I thought it might be preferable to make a GUC to enable
or disable this feature, but changing the hash key makes
it harder.


What happens if the server was running with this option enabled and 
then
restarted with the option disabled? Firstly two entries for the same 
query
were stored in pgss because the option was enabled. But when it's 
disabled
and the server is restarted, those two entries should be merged into 
one

at the startup of server? If so, that's problematic because it may take
a long time.

Therefore I think that it's better and simple to just expose the number 
of

times generic/custom plan was chosen for each query.

Regards,


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 793eafad8e988b6754c9d89e0ea14b64b07eef81 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi
Date: Fri, 31 Jul 2020 17:52:14 +0900
Subject: [PATCH] Previously the number of custom and generic plans are
 recoreded only in pg_prepared_statements, meaning we could only track them
 regarding current session. This patch records them in pg_stat_statements and
 it enables to track them regarding all sessions of the PostgreSQL instance.

---
 .../pg_stat_statements--1.6--1.7.sql  |  5 ++-
 .../pg_stat_statements--1.7--1.8.sql  |  1 +
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++
 src/backend/utils/cache/plancache.c   |  2 +
 src/include/utils/plancache.h |  1 +
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
index 6fc3fed4c9..fd7aa05c92 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.6--1.7.sql
@@ -12,11 +12,12 @@ DROP FUNCTION pg_stat_statements_reset();
 /* Now redefine */
 CREATE FUNCTION pg_stat_statements_reset(IN userid Oid DEFAULT 0,
 	IN dbid Oid DEFAULT 0,
-	IN queryid bigint DEFAULT 0
+	IN queryid bigint DEFAULT 0,
+	IN generic_plan int DEFAULT -1
 )
 RETURNS void
 AS 'MODULE_PATHNAME', 'pg_stat_statements_reset_1_7'
 LANGUAGE C STRICT PARALLEL SAFE;
 
 -- Don't want this to be available to non-superusers.
-REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_stat_statements_reset(Oid, Oid, bigint, int) FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
index 0f63f08f7e..0d7c4e7343 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -15,6 +15,7 @@ DROP FUNCTION pg_stat_statements(boolean);
 CREATE FUNCTION pg_stat_statements(IN showtext boolean,
 OUT userid oid,
 OUT dbid oid,
+OUT generic_plan bool,
 OUT queryid bigint,
 OUT query text,
 OUT plans int8,
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 6b91c62c31..14c580a95e 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -78,9 +78,11 @@
 #include "storage/ipc.h"
 #include "storage/spin.h"
 #include "tcop/utility.h"
+#include "tcop/pquery.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/plancache.h"
 
 PG_MODULE_MAGIC;
 
@@ -156,6 +158,7 @@ typedef struct pgssHashKey
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
 	uint64		queryid;		/* query identifier */

Re: Creating a function for exposing memory usage of backend process

2020-08-10 Thread torikoshia

On 2020-08-08 10:44, Michael Paquier wrote:

On Fri, Jul 31, 2020 at 03:23:52PM -0400, Robert Haas wrote:
On Fri, Jul 31, 2020 at 4:25 AM torikoshia 
 wrote:

And as Fujii-san told me in person, exposing memory address seems
not preferable considering there are security techniques like
address space layout randomization.


Yeah, exactly. ASLR wouldn't do anything to improve security if there
were no other security bugs, but there are, and some of those bugs are
harder to exploit if you don't know the precise memory addresses of
certain data structures. Similarly, exposing the addresses of our
internal data structures is harmless if we have no other security
bugs, but if we do, it might make those bugs easier to exploit. I
don't think this information is useful enough to justify taking that
risk.


FWIW, this is the class of issues where it is possible to print some
areas of memory, or even manipulate the stack so as it was possible to
pass down a custom pointer, so exposing the pointer locations is a
real risk, and this has happened in the past.  Anyway, it seems to me
that if this part is done, we could just make it superuser-only with
restrictive REVOKE privileges, but I am not sure that we have enough
user cases to justify this addition.



Thanks for your comments!

I convinced that exposing pointer locations introduce security risks
and it seems better not to do so.

And I now feel identifying exact memory context by exposing memory
address or other means seems overkill.
Showing just the context name of the parent would be sufficient and
0007 pattch takes this way.


On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409)
and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Creating a function for exposing memory usage of backend process

2020-08-18 Thread torikoshia

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be 
located
just after pg_available_extension_versions entry. Because the rows in 
the table

"System Views" should be located in alphabetical order.


+ 
+  pg_backend_memory_contexts

Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can interpret 
this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


     The view pg_backend_memory_contexts 
displays all
     the memory contexts of the server process attached to the current 
session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is actually 
NULL,

"context->name" would cause segmentation fault, so ISTM that the check
will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.

| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash table 
name.
+  * To make it possible to identify its parent, we also 
display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") == 
0)

+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it better 
to use
"const char *parent" instead of "MemoryContext parent", as the argument 
of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the name
but also the ident of the "parent", I could not help but adding similar
codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 055af903a3dbf146d97dd3fb01a6a7d3d3bd2ae0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Tue, 18 Aug 2020 18:17:42 +0900
Subject: [PATCH] Add a function exposing memory usage of local backend.
 pg_get_backend_memory_contexts which exposes memory usage of the local
 backend. It also adds a new view pg_backend_memory_contexts for exposing
 local backend memory contexts.

---
 doc/src/sgml/catalogs.sgml   | 122 ++
 src/backend/catalog/system_views.sql |   3 +
 src/backend/utils/mmgr/mcxt.c| 147 +++
 src/include/catalog/pg_proc.dat  |   9 ++
 src/test/regress/expected/rules.out  |  10 ++
 5 files changed, 291 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index fc329c5cff..1232b24e74 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9226,6 +9226,11 @@ SCRAM-SHA-256$:&l
   available versions of extensions
  
 
+ 
+  pg_backend_memory_contexts
+  backend memory contexts
+ 
+
  
   pg_config
   compile-time configuration parameters
@@ -9577

Re: Creating a function for exposing memory usage of backend process

2020-08-18 Thread torikoshia

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should be 
located
just after pg_available_extension_versions entry. Because the rows 
in the table

"System Views" should be located in alphabetical order.


+ 
+  
pg_backend_memory_contexts


Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can 
interpret this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


 The view pg_backend_memory_contexts 
displays all
 the memory contexts of the server process attached to the 
current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is 
actually NULL,
"context->name" would cause segmentation fault, so ISTM that the 
check

will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash 
table name.
+  * To make it possible to identify its parent, we also 
display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") 
== 0)

+ values[2] = CStringGetTextDatum(parent->ident);
+ else
+ values[2] = CStringGetTextDatum(parent->name);

PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it better 
to use
"const char *parent" instead of "MemoryContext parent", as the 
argument of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the 
name
but also the ident of the "parent", I could not help but adding 
similar

codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is "dynahash").

 	for (child = context->firstchild; child != NULL; child = 
child->nextchild)

{
-   const char *parentname;
-
-   /*
-* We labeled dynahash contexts with just the hash table name.
-* To make it possible to identify its parent, we also use
-* the hash table as its context name.
-*/
-   if (context->ident && strcmp(context->name, "dynahas

Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread torikoshia

On 2020-08-19 15:48, Fujii Masao wrote:

On 2020/08/19 9:43, torikoshia wrote:

On 2020-08-18 22:54, Fujii Masao wrote:

On 2020/08/18 18:41, torikoshia wrote:

On 2020-08-17 21:19, Fujii Masao wrote:

On 2020/08/17 21:14, Fujii Masao wrote:

On 2020-08-07 16:38, Kasahara Tatsuhito wrote:
The following review has been posted through the commitfest 
application:

make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:    tested, passed

I tested the latest
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch)
with the latest PG-version 
(199cec9779504c08aaa8159c6308283156547409)

and test was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Thanks for your testing!


Thanks for updating the patch! Here are the review comments.


Thanks for reviewing!



+ 
+  linkend="view-pg-backend-memory-contexts">pg_backend_memory_contexts

+  backend memory contexts
+ 

The above is located just after pg_matviews entry. But it should 
be located
just after pg_available_extension_versions entry. Because the rows 
in the table

"System Views" should be located in alphabetical order.


+ 
+ 
pg_backend_memory_contexts


Same as above.


Modified both.




+   The view pg_backend_memory_contexts 
displays all

+   the local backend memory contexts.

This description seems a bit confusing because maybe we can 
interpret this
as "... displays the memory contexts of all the local backends" 
wrongly. Thought?

What about the following description, instead?


 The view pg_backend_memory_contexts 
displays all
 the memory contexts of the server process attached to the 
current session.


Thanks! it seems better.


+    const char *name = context->name;
+    const char *ident = context->ident;
+
+    if (context == NULL)
+    return;

The above check "context == NULL" is useless? If "context" is 
actually NULL,
"context->name" would cause segmentation fault, so ISTM that the 
check

will never be performed.

If "context" can be NULL, the check should be performed before 
accessing
to "contect". OTOH, if "context" must not be NULL per the 
specification of

PutMemoryContextStatsTupleStore(), assertion test checking
"context != NULL" should be used here, instead?


Yeah, "context" cannot be NULL because "context" must be 
TopMemoryContext

or it is already checked as not NULL as follows(child != NULL).

I added the assertion check.


Isn't it better to add AssertArg(MemoryContextIsValid(context)), 
instead?


Thanks, that's better.



| for (child = context->firstchild; child != NULL; child = 
child->nextchild)

| {
|  ...
| PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
|   child, 
parentname, level + 1);

| }


Here is another comment.

+ if (parent == NULL)
+ nulls[2] = true;
+ else
+ /*
+  * We labeled dynahash contexts with just the hash 
table name.
+  * To make it possible to identify its parent, we 
also display

+  * parent's ident here.
+  */
+ if (parent->ident && strcmp(parent->name, "dynahash") 
== 0)
+ values[2] = 
CStringGetTextDatum(parent->ident);

+ else
+ values[2] = 
CStringGetTextDatum(parent->name);


PutMemoryContextsStatsTupleStore() doesn't need "parent" memory 
context,
but uses only the name of "parent" memory context. So isn't it 
better to use
"const char *parent" instead of "MemoryContext parent", as the 
argument of

the function? If we do that, we can simplify the above code.


Thanks, the attached patch adopted the advice.

However, since PutMemoryContextsStatsTupleStore() used not only the 
name
but also the ident of the "parent", I could not help but adding 
similar

codes before calling the function.
The total amount of codes and complexity seem not to change so much.

Any thoughts? Am I misunderstanding something?


I was thinking that we can simplify the code as follows.
That is, we can just pass "name" as the argument of
PutMemoryContextsStatsTupleStore()
since "name" indicates context->name or ident (if name is 
"dynahash").


 for (child = context->firstchild; child != NULL; child = 
child->nextchild)

 {
-    const char *parentname;
-
-    /*
- * We labeled dynahash contexts with just the hash table 
name.

- * To make it possible to identify its parent, we also use
- * the hash table as its context name.
- */
-    if (context->ident && strcmp(context->name, "d

Re: Creating a function for exposing memory usage of backend process

2020-08-21 Thread torikoshia

Thanks for all your comments!

Thankfully it seems that this feature is regarded as not
meaningless one, I'm going to do some improvements.


On Wed, Aug 19, 2020 at 10:56 PM Michael Paquier  
wrote:

On Wed, Aug 19, 2020 at 06:12:02PM +0900, Fujii Masao wrote:

On 2020/08/19 17:40, torikoshia wrote:
Yes, I didn't add regression tests because of the unstability of the 
output.
I thought it would be OK since other views like pg_stat_slru and 
pg_shmem_allocations

didn't have tests for their outputs.


You're right.


If you can make a test with something minimal and with a stable
output, adding a test is helpful IMO, or how can you make easily sure
that this does not get broken, particularly in the event of future
refactorings, or even with platform-dependent behaviors?


OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


On Thu, Aug 20, 2020 at 12:02 AM Tom Lane  wrote:

Michael Paquier  writes:
> By the way, I was looking at the code that has been committed, and I
> think that it is awkward to have a SQL function in mcxt.c, which is a
> rather low-level interface.  I think that this new code should be
> moved to its own file, one suggestion for a location I have being
> src/backend/utils/adt/mcxtfuncs.c.

I agree with that,


Thanks for pointing out.
Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


On Thu, Aug 20, 2020 at 11:09 AM Fujii Masao 
 wrote:

On 2020/08/20 0:01, Tom Lane wrote:

Given the lack of clear use-case, and the possibility (admittedly
not strong) that this is still somehow a security hazard, I think
we should revert it.  If it stays, I'd like to see restrictions
on who can read the view.



For example, allowing only the role with pg_monitor to see this view?


Attached a patch adding that restriction.
(0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch)

Of course, this restriction makes pg_backend_memory_contexts hard to use
when the user of the target session is not granted pg_monitor because 
the

scope of this view is session local.

In this case, I imagine additional operations something like temporarily
granting pg_monitor to that user.

Thoughts?


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 23fe541d5bd3cead787bb7c638f0086b9c2e13eb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 21:22:10 +0900
Subject: [PATCH] Added a regression test for pg_backend_memory_contexts.

---
 src/test/regress/expected/sysviews.out | 7 +++
 src/test/regress/sql/sysviews.sql  | 3 +++
 2 files changed, 10 insertions(+)

diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..06e09fd10b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -19,6 +19,13 @@ select count(*) >= 0 as ok from pg_available_extensions;
  t
 (1 row)
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+ ok 
+
+ t
+(1 row)
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
  ok 
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..2c3b88c855 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -12,6 +12,9 @@ select count(*) >= 0 as ok from pg_available_extension_versions;
 
 select count(*) >= 0 as ok from pg_available_extensions;
 
+-- There will surely be at least one context.
+select count(*) > 0 as ok from pg_backend_memory_contexts;
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
 
-- 
2.18.1

From 4eee73933874fbab91643e7461717ba9038d8d76 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 21 Aug 2020 19:01:38 +0900
Subject: [PATCH] Rellocated the codes for pg_backend_memory_contexts in mcxt.c
 to src/backend/utils/adt/mcxtfuncs.c as they are low low-level interface.

---
 src/backend/utils/adt/Makefile|   1 +
 src/backend/utils/adt/mcxtfuncs.c | 157 ++
 src/backend/utils/mmgr/mcxt.c | 137 --
 3 files changed, 158 insertions(+), 137 deletions(-)
 create mode 100644 src/backend/utils/adt/mcxtfuncs.c

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 5d2aca8cfe..54d5c37947 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -57,6 +57,7 @@ OBJS = \
 	lockfuncs.o \
 	mac.o \
 	mac8.o \
+	mcxtfuncs.o \
 	misc.o \
 	name.o \
 	network.o \
diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
new file mode 100644
index 00..50e1b07ff0
--- /de

Re: Creating a function for exposing memory usage of backend process

2020-08-23 Thread torikoshia

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one considering
the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


The same code is moved around line-by-line.

Of course, this restriction makes pg_backend_memory_contexts hard to 
use
when the user of the target session is not granted pg_monitor because 
the

scope of this view is session local.

In this case, I imagine additional operations something like 
temporarily

granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of internal
structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  This
is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.

Attached a patch.



It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.
--
Michael



On 2020/08/20 2:59, Kasahara Tatsuhito wrote:
I totally agree that it's not *enough*, but in contrast to you I 
think
it's a good step. Subsequently we should add a way to get any 
backends

memory usage.
It's not too hard to imagine how to serialize it in a way that can be
easily deserialized by another backend. I am imagining something like
sending a procsignal that triggers (probably at CFR() time) a backend 
to
write its own memory usage into pg_memusage/ or something 
roughly

like that.


Sounds good. Maybe we can also provide the SQL-callable function
or view to read pg_memusage/, to make the analysis easier.

+1



I'm thinking about starting a new thread to discuss exposing other
backend's memory context.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

From dc4fade9111dc3f91e992c4d5af393dd5ed03270 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 24 Jul 2020 11:14:32 +0900
Subject: [PATCH] Previously pg_backend_memory_contexts doesn't have any
 restriction and anyone could access it. However, this view contains some
 internal information of the memory context. This policy could cause security
 issues. This patch revokes all on pg_shmem_allocations from public and only
 the superusers can access it.

---
 doc/src/sgml/catalogs.sgml   | 4 
 src/backend/catalog/system_views.sql | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 1232b24e74..9fe260ecff 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9697,6 +9697,10 @@ SCRAM-SHA-256$<iteration count>:&l

   
 
+  
+   By default, the pg_backend_memory_contexts view can be
+   read only by superusers.
+  
  
 
  
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ba5a23ac25..a2d61302f9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -557,6 +557,9 @@ REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 CREATE VIEW pg_backend_memory_contexts AS
 SELECT * FROM pg_get_backend_memory_contexts();
 
+REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+
 -- Statistics views
 
 CREATE VIEW pg_stat_all_tables AS
-- 
2.18.1



Re: Creating a function for exposing memory usage of backend process

2020-08-24 Thread torikoshia

On 2020-08-24 13:13, Fujii Masao wrote:

On 2020/08/24 13:01, torikoshia wrote:

On 2020-08-22 21:18, Michael Paquier wrote:

Thanks for reviewing!


On Fri, Aug 21, 2020 at 11:27:06PM +0900, torikoshia wrote:

OK. Added a regression test on sysviews.sql.
(0001-Added-a-regression-test-for-pg_backend_memory_contex.patch)

Fujii-san gave us an example, but I added more simple one 
considering

the simplicity of other tests on that.


What you have sent in 0001 looks fine to me.  A small test is much
better than nothing.


+1

But as I proposed upthread, what about a bit complicated test as 
follows,

e.g., to confirm that the internal logic for level works expectedly?

 SELECT name, ident, parent, level, total_bytes >= free_bytes FROM
pg_backend_memory_contexts WHERE level = 0;


OK!
Attached an updated patch.







Added a patch for relocating the codes to mcxtfuncs.c.
(patches/0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch)


Thanks for the patch! Looks good to me.
Barring any objection, I will commit this patch at first.




The same code is moved around line-by-line.

Of course, this restriction makes pg_backend_memory_contexts hard to 
use
when the user of the target session is not granted pg_monitor 
because the

scope of this view is session local.

In this case, I imagine additional operations something like 
temporarily

granting pg_monitor to that user.


Hmm.  I am not completely sure either that pg_monitor is the best fit
here, because this view provides information about a bunch of 
internal

structures.  Something that could easily be done though is to revoke
the access from public, and then users could just set up GRANT
permissions post-initdb, with pg_monitor as one possible choice.  
This

is the safest path by default, and this stuff is of a caliber similar
to pg_shmem_allocations in terms of internal contents.


I think this is a better way than what I did in
0001-Rellocated-the-codes-for-pg_backend_memory_contexts-.patch.


You mean 
0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch?


Oops, I meant 
0001-Restrict-the-access-to-pg_backend_memory_contexts-to.patch.






Attached a patch.


Thanks for updating the patch! This also looks good to me.



It seems to me that you are missing one "REVOKE ALL on
pg_backend_memory_contexts FROM PUBLIC" in patch 0003.

By the way, if that was just for me, I would remove used_bytes, which
is just a computation from the total and free numbers.  I'll defer
that point to Fujii-san.


Yeah, I was just thinking that displaying also used_bytes was useful,
but this might be inconsistent with the other views' ways.

Regards,


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

From 335b9eb0c60a7f12debd4c45d435888109b2bfcf Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 24 Aug 2020 21:28:20 +0900
Subject: [PATCH] Added a regression test for pg_backend_memory_contexts.

---
 src/test/regress/expected/sysviews.out | 9 +
 src/test/regress/sql/sysviews.sql  | 5 +
 2 files changed, 14 insertions(+)

diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..1cffc3349d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -19,6 +19,15 @@ select count(*) >= 0 as ok from pg_available_extensions;
  t
 (1 row)
 
+-- The entire output of pg_backend_memory_contexts is not stable,
+-- we test only the existance and basic condition of TopMemoryContext.
+select name, ident, parent, level, total_bytes >= free_bytes
+  from pg_backend_memory_contexts where level = 0;
+   name   | ident | parent | level | ?column? 
+--+---++---+--
+ TopMemoryContext |   || 0 | t
+(1 row)
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
  ok 
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..ac4a0e1cbb 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -12,6 +12,11 @@ select count(*) >= 0 as ok from pg_available_extension_versions;
 
 select count(*) >= 0 as ok from pg_available_extensions;
 
+-- The entire output of pg_backend_memory_contexts is not stable,
+-- we test only the existance and basic condition of TopMemoryContext.
+select name, ident, parent, level, total_bytes >= free_bytes
+  from pg_backend_memory_contexts where level = 0;
+
 -- At introduction, pg_config had 23 entries; it may grow
 select count(*) > 20 as ok from pg_config;
 
-- 
2.18.1



Get memory contexts of an arbitrary backend process

2020-08-31 Thread torikoshia

Hi,

After commit 3e98c0bafb28de, we can display the usage of the
memory contexts using pg_backend_memory_contexts system
view.

However, its target is limited to the  process attached to
the current session.

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.


  =# -- PID of the target process is 17051
  =# SELECT * FROM  pg_get_backend_memory_contexts(17051) ;
   name  | ident |  parent  | level | 
total_bytes | total_nblocks | free_bytes | free_chunks | used_bytes
  
---+---+--+---+-+---++-+
   TopMemoryContext  |   |  | 0 |   
68720 | 5 |  16816 |  16 |  51904
   RowDescriptionContext |   | TopMemoryContext | 1 |
8192 | 1 |   6880 |   0 |   1312
   MessageContext|   | TopMemoryContext | 1 |   
65536 | 4 |  19912 |   1 |  45624

   ...

It doesn't display contexts of all the backends but only
the contexts of specified process.
I think it would be enough because I suppose this function
is used after investigations using ps command or other OS
level utilities.


The rough idea of implementation is like below:

  1. send  a signal to the specified process
  2. signaled process dumps its memory contexts to a file
  3. read the dumped file and display it to the user


Any thoughts?

[1] 
https://www.postgresql.org/message-id/72a656e0f71d0860161e0b3f67e4d771%40oss.nttdata.com



Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 7decfda337bbc422fece4c736a719b6fcfdc5cf3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 31 Aug 2020 18:20:34 +0900
Subject: [PATCH] Enabled pg_get_backend_memory_contexts() to collect arbitrary
 backend process's memory contexts.

Previously, pg_get_backend_memory_contexts() could only get the
memory contexts of the process which kicked it. This patch enables
to get memory contexts of the arbitrary process which PID is
specified by the argument.
---
 doc/src/sgml/func.sgml   |  13 +
 src/backend/catalog/system_views.sql |   4 +-
 src/backend/replication/basebackup.c |   3 +
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/tcop/postgres.c  |   5 +
 src/backend/utils/adt/mcxtfuncs.c| 315 ++-
 src/backend/utils/init/globals.c |   1 +
 src/bin/initdb/initdb.c  |   3 +-
 src/bin/pg_basebackup/t/010_pg_basebackup.pl |   2 +-
 src/bin/pg_rewind/filemap.c  |   3 +
 src/include/catalog/pg_proc.dat  |   7 +-
 src/include/miscadmin.h  |   1 +
 src/include/storage/procsignal.h |   1 +
 src/include/utils/mcxtfuncs.h|  21 ++
 src/test/regress/expected/rules.out  |   2 +-
 15 files changed, 364 insertions(+), 21 deletions(-)
 create mode 100644 src/include/utils/mcxtfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b9f591296a..cc9a458334 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21062,6 +21062,19 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);

   
 
+  
+   
+
+ pg_get_backend_memory_contexts
+
+pg_get_backend_memory_contexts ( integer )
+setof records
+   
+   
+Returns all the memory contexts of the specified process ID.
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a2d61302f9..88fb837ecd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;
 
 CREATE VIEW pg_backend_memory_contexts AS
-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(-1);
 
 REVOKE ALL ON pg_backend_memory_contexts FROM PUBLIC;
-REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts() FROM PUBLIC;
+REVOKE EXECUTE ON FUNCTION pg_get_backend_memory_contexts FROM PUBLIC;
 
 -- Statistics views
 
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 6064384e32..f69d851b6b 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -184,6 +184,9 @@ static const char *const excludeDirContents[] =
 	/* Contents zeroed on startup, see StartupSUBTRANS(). */
 	"pg_subtrans",
 
+	/* Skip memory context dumped files. */
+	"pg_memusage",
+
 	/* end of list */
 	NULL
 };
diff --git a/src/backend/storage/ipc/pro

Re: Get memory contexts of an arbitrary backend process

2020-09-02 Thread torikoshia

On 2020-09-01 03:29, Pavel Stehule wrote:

Hi

po 31. 8. 2020 v 17:03 odesílatel Kasahara Tatsuhito
 napsal:


Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia
 wrote:

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

+1


Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.

Thanks, it's a very good patch for discussion.


It doesn't display contexts of all the backends but only
the contexts of specified process.

or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.


The rough idea of implementation is like below:

1. send  a signal to the specified process
2. signaled process dumps its memory contexts to a file
3. read the dumped file and display it to the user

I agree with the overview of the idea.
Here are some comments and questions.


Thanks for the comments!



- Currently, "the signal transmission for dumping memory
information"
and "the read & output of dump information "
are on the same interface, but I think it would be better to
separate them.
How about providing the following three types of functions for
users?
- send a signal to specified pid
- check the status of the signal sent and received
- read the dumped information


Is this for future extensibility to make it possible to get
other information like the current execution plan which was
suggested by Pavel?

If so, I agree with considering extensibility, but I'm not
sure it's necessary whether providing these types of
functions for 'users'.


- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
Therefore, it would be good to have management information to
check
the status of each process.
A simple idea is that ..
- send a signal to dump to a PID, it first record following
information into the shared hash.
pid (specified pid)
loc (dump location, currently might be ASAP)
recv (did the pid process receive a signal? first false)
dumped (did the pid process dump a mem information?  first
false)
- specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
- specified process finish dump mem information,  update the
status
in the shared hash.


Adding management information on shared memory seems necessary
when we want to have more controls over dumping like 'dump
location' or any other information such as 'current execution
plan'.
I'm going to consider this.



- Does it allow one process to output multiple dump files?
It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?


 For a very long time there has been similar discussion about taking
session query and session execution plans from other sessions.

I am not sure how necessary information is in the memory dump, but I
am sure so taking the current execution plan and complete text of the
current query is pretty necessary information.

but can be great so this infrastructure can be used for any debugging
purpose.


Thanks!
It would be good if some part of this effort can be an infrastructure
of other debugging.
It may be hard, but I will keep your comment in mind.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION



Regards

Pavel


Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com [1]



Links:
--
[1] http://gmail.com





Re: Get memory contexts of an arbitrary backend process

2020-09-02 Thread torikoshia

Thanks for reviewing!

I'm going to modify the patch according to your comments.

On 2020-09-01 10:54, Andres Freund wrote:

Hi,

On 2020-08-31 20:22:18 +0900, torikoshia wrote:

After commit 3e98c0bafb28de, we can display the usage of the
memory contexts using pg_backend_memory_contexts system
view.

However, its target is limited to the  process attached to
the current session.

As discussed in the thread[1], it'll be useful to make it
possible to get the memory contexts of an arbitrary backend
process.

Attached PoC patch makes pg_get_backend_memory_contexts()
display meory contexts of the specified PID of the process.


Awesome!



It doesn't display contexts of all the backends but only
the contexts of specified process.
I think it would be enough because I suppose this function
is used after investigations using ps command or other OS
level utilities.


It can be used as a building block if all are needed. Getting the
infrastructure right is the big thing here, I think. Adding more
detailed views on top of that data later is easier.



diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql

index a2d61302f9..88fb837ecd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -555,10 +555,10 @@ REVOKE ALL ON pg_shmem_allocations FROM PUBLIC;
 REVOKE EXECUTE ON FUNCTION pg_get_shmem_allocations() FROM PUBLIC;

 CREATE VIEW pg_backend_memory_contexts AS
-SELECT * FROM pg_get_backend_memory_contexts();
+SELECT * FROM pg_get_backend_memory_contexts(-1);


-1 is odd. Why not use NULL or even 0?


+   else
+   {
+   int rc;
+   int parent_len = strlen(parent);
+   int name_len = strlen(name);
+
+   /*
+* write out the current memory context information.
+* Since some elements of values are reusable, we write it out.


Not sure what the second comment line here is supposed to mean?



+*/
+   fputc('D', fpout);
+   rc = fwrite(values, sizeof(values), 1, fpout);
+   rc = fwrite(nulls, sizeof(nulls), 1, fpout);
+
+		/* write out information which is not resuable from serialized 
values */


s/resuable/reusable/



+   rc = fwrite(&name_len, sizeof(int), 1, fpout);
+   rc = fwrite(name, name_len, 1, fpout);
+   rc = fwrite(&idlen, sizeof(int), 1, fpout);
+   rc = fwrite(clipped_ident, idlen, 1, fpout);
+   rc = fwrite(&level, sizeof(int), 1, fpout);
+   rc = fwrite(&parent_len, sizeof(int), 1, fpout);
+   rc = fwrite(parent, parent_len, 1, fpout);
+   (void) rc;  /* we'll check for 
error with ferror */
+
+   }


This format is not descriptive. How about serializing to json or
something? Or at least having field names?

Alternatively, build the same tuple we build for the SRF, and serialize
that. Then there's basically no conversion needed.


@@ -117,6 +157,8 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate 
*tupstore,

 Datum
 pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
 {
+   int pid =  PG_GETARG_INT32(0);
+
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
TupleDesc   tupdesc;
Tuplestorestate *tupstore;
@@ -147,11 +189,258 @@ 
pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)


MemoryContextSwitchTo(oldcontext);

-   PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
-   
TopMemoryContext, NULL, 0);
+   if (pid == -1)
+   {
+   /*
+* Since pid -1 indicates target is the local process, simply
+* traverse memory contexts.
+*/
+   PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
+   TopMemoryContext, 
"", 0, NULL);
+   }
+   else
+   {
+   /*
+* Send signal for dumping memory contexts to the target 
process,
+* and read the dumped file.
+*/
+   FILE   *fpin;
+   chardumpfile[MAXPGPATH];
+
+   SendProcSignal(pid, PROCSIG_DUMP_MEMORY, InvalidBackendId);
+
+   snprintf(dumpfile, sizeof(dumpfile), "pg_memusage/%d", pid);
+
+   while (true)
+   {
+   CHECK_FOR_INTERRUPTS();
+
+   pg_usleep(1L);
+


Need better signalling back/forth here.


Do you mean I should also send another signal from the dumped
process to the caller of the pg_get_backend_memory_contexts()
when it finishes dumping?

Regards,



--
Atsushi Torikoshi
NTT DATA CORPORATION






+/*
+ * dump_memory_contexts
+ * Dumping local

Re: Get memory contexts of an arbitrary backend process

2020-09-10 Thread torikoshia

On 2020-09-04 21:46, Tomas Vondra wrote:

On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:

On Fri, Sep 4, 2020 at 2:40 AM Tom Lane  wrote:

Kasahara Tatsuhito  writes:
> Yes, but it's not only for future expansion, but also for the
> usability and the stability of this feature.
> For example, if you want to read one dumped file multiple times and analyze 
it,
> you will want the ability to just read the dump.

If we design it to make that possible, how are we going to prevent 
disk

space leaks from never-cleaned-up dump files?

In my thought, with features such as a view that allows us to see a
list of dumped files,
it would be better to have a function that simply deletes the dump
files associated with a specific PID,
or to delete all dump files.
Some files may be dumped with unexpected delays, so I think the
cleaning feature will be necessary.
( Also, as the pgsql_tmp file, it might better to delete dump files
when PostgreSQL start.)

Or should we try to delete the dump file as soon as we can read it?



IMO making the cleanup a responsibility of the users (e.g. by exposing
the list of dumped files through a view and expecting users to delete
them in some way) is rather fragile.

I don't quite see what's the point of designing it this way. It was
suggested this improves stability and usability of this feature, but
surely making it unnecessarily complex contradicts both points?

IMHO if the user needs to process the dump repeatedly, what's 
preventing
him/her from storing it in a file, or something like that? At that 
point

it's clear it's up to them to remove the file. So I suggest to keep the
feature as simple as possible - hand the dump over and delete.


+1.
If there are no other objections, I'm going to accept this
suggestion.

Regards




Re: RFC: Logging plan of the running query

2021-10-14 Thread torikoshia

On 2021-10-13 23:28, Ekaterina Sokolova wrote:

Hi, hackers!

• The last version of patch is correct applied. It changes 8 files
from /src/backend, and 9 other files.

• I have 1 error and 1 warning during compilation on Mac.

explain.c:4985:25: error: implicit declaration of function
'GetLockMethodLocalHash' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
hash_seq_init(&status, GetLockMethodLocalHash());
explain.c:4985:25: warning: incompatible integer to pointer conversion
passing 'int' to parameter of type 'HTAB *' (aka 'struct HTAB *')
[-Wint-conversion]
hash_seq_init(&status, GetLockMethodLocalHash());

This error doesn't appear at my second machine with Ubuntu.

I found the reason. You delete #ifdef USE_ASSERT_CHECKING from
implementation of function GetLockMethodLocalHash(void), but this
ifdef exists around function declaration. There may be a situation,
when implementation exists without declaration, so files with using of
function produce errors. I create new version of patch with fix of
this problem.


Thanks for fixing that!


I'm agree that seeing the details of a query is a useful feature, but
I have several doubts:

1) There are lots of changes of core's code. But not all users need
this functionality. So adding this functionality like extension seemed
more reasonable.


It would be good if we can implement this feature in an extension, but 
as pg_query_state extension needs applying patches to PostgreSQL, I 
think this kind of feature needs PostgreSQL core modification.
IMHO extensions which need core modification are not easy to use in 
production environments..



2) There are many tools available to monitor the status of a query.
How much do we need another one? For example:
• pg_stat_progress_* is set of views with current status of
ANALYZE, CREATE INDEX, VACUUM, CLUSTER, COPY, Base Backup. You can
find it in PostgreSQL documentation [1].
• pg_query_state is contrib with 2 patches for core (I hope
someday Community will support adding this patches to PostgreSQL). It
contains function with printing table with pid, full query text, plan
and current progress of every node like momentary EXPLAIN ANALYSE for
SELECT, UPDATE, INSERT, DELETE. So it supports every flags and formats
of EXPLAIN. You can find current version of pg_query_state on github
[2]. Also I found old discussion about first version of it in
Community [3].


Thanks for introducing the extension!

I only took a quick look at pg_query_state, I have some questions.

pg_query_state seems using shm_mq to expose the plan information, but 
there was a discussion that this kind of architecture would be tricky to 
do properly [1].

Does pg_query_state handle difficulties listed on the discussion?

It seems the caller of the pg_query_state() has to wait until the target 
process pushes the plan information into shared memory, can it lead to 
deadlock situations?
I came up with this question because when trying to make a view for 
memory contexts of other backends, we encountered deadlock situations. 
After all, we gave up view design and adopted sending signal and 
logging.


Some of the comments of [3] seem useful for my patch, I'm going to 
consider them. Thanks!



3) Have you measured the overload of your feature? It would be really
interesting to know the changes in speed and performance.


I haven't measured it yet, but I believe that the overhead for backends 
which are not called pg_log_current_plan() would be slight since the 
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might 
not slight, but since the target process are assumed dealing with 
long-running query and the user want to know its plan, its overhead 
would be worth the cost.



Thank you for working on this issue. I would be glad to continue to
follow the development of this issue.


Thanks for your help!

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: RFC: Logging plan of the running query

2021-10-15 Thread torikoshia

On 2021-10-15 15:17, torikoshia wrote:

I only took a quick look at pg_query_state, I have some questions.

pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?


Sorry, I forgot to add the URL.
[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com



It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock situations.
After all, we gave up view design and adopted sending signal and
logging.


Discussion at the following URL.
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: RFC: Logging plan of the running query

2021-11-04 Thread torikoshia

On 2021-11-02 20:32, Ekaterina Sokolova wrote:
Thanks for your response!


Hi!

I'm here to answer your questions about contrib/pg_query_state.

I only took a quick look at pg_query_state, I have some questions.



pg_query_state seems using shm_mq to expose the plan information, but
there was a discussion that this kind of architecture would be tricky
to do properly [1].
Does pg_query_state handle difficulties listed on the discussion?
[1] 
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


I doubt that it was the right link.


Sorry for make you confused, here is the link.

  
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com



But on the topic I will say that extension really use shared memory,
interaction is implemented by sending / receiving messages. This
architecture provides the required reliability and convenience.


As described in the link, using shared memory for this kind of work 
would need DSM and It'd be also necessary to exchange information 
between requestor and responder.


For example, when I looked at a little bit of pg_query_state code, it 
looks like the size of the queue is fixed at QUEUE_SIZE, and I wonder 
how plans that exceed QUEUE_SIZE are handled.



It seems the caller of the pg_query_state() has to wait until the
target process pushes the plan information into shared memory, can it
lead to deadlock situations?
I came up with this question because when trying to make a view for
memory contexts of other backends, we encountered deadlock 
situations.

After all, we gave up view design and adopted sending signal and
logging.


Discussion at the following URL.
https://www.postgresql.org/message-id/9a50371e15e741e295accabc72a41df1%40oss.nttdata.com


Before extracting information about side process we check its state.
Information will only be retrieved for a process willing to provide
it. Otherwise, we will receive an error message about impossibility of
getting query execution statistics + process status. Also checking
fact of extracting your own status exists. This is even verified in
tests.

Thanks for your attention.
Just in case, I am ready to discuss this topic in more detail.


I imagined the following procedure.
Does it cause dead lock in pg_query_state?

- session1
BEGIN; TRUNCATE t;

- session2
BEGIN; TRUNCATE t; -- wait

- session1
SELECT * FROM pg_query_state(); -- wait and dead locked?


About overhead:
I haven't measured it yet, but I believe that the overhead for 
backends

which are not called pg_log_current_plan() would be slight since the
patch just adds the logic for saving QueryDesc on ExecutorRun().
The overhead for backends which is called pg_log_current_plan() might
not slight, but since the target process are assumed dealing with
long-running query and the user want to know its plan, its overhead
would be worth the cost.

I think it would be useful for us to have couple of examples with a
different number of rows compared to using without this functionality.


Do you have any expectaion that the number of rows would affect the 
performance of this functionality?
This patch adds some codes to ExecutorRun(), but I thought the number of 
rows would not give impact on the performance.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: RFC: Logging plan of the running query

2021-11-15 Thread torikoshia

On 2021-11-13 22:29, Bharath Rupireddy wrote:
Thanks for your review!


On Wed, Oct 13, 2021 at 7:58 PM Ekaterina Sokolova
 wrote:

Thank you for working on this issue. I would be glad to continue to
follow the development of this issue.


Thanks for the patch. I'm not sure if v11 is the latest patch, if yes,
I have the following comments:

1) Firstly, v11 patch isn't getting applied on the master -
http://cfbot.cputube.org/patch_35_3142.log.

Updated the patch.


2) I think we are moving away from if (!superuser()) checks, see the
commit [1]. The goal is to let the GRANT-REVOKE system deal with who
is supposed to run these system functions. Since
pg_log_current_query_plan also writes the info to server logs, I think
it should do the same thing as commit [1] did for
pg_log_backend_memory_contexts.

With v11, you are re-introducing the superuser() check in the
pg_log_backend_memory_contexts which is wrong.


Yeah, I removed superuser() check and make it possible to be executed by 
non-superusers when users are granted to do so.


3) I think SendProcSignalForLogInfo can be more generic, meaning, it
can also send signal to auxiliary processes if asked to do this will
simplify the things for pg_log_backend_memory_contexts and other
patches like pg_print_backtrace. I would imagine it to be "bool
SendProcSignalForLogInfo(pid_t pid, ProcSignalReason reason, bool
signal_aux_proc);".


I agree with your idea.
Since sending signals to auxiliary processes to dump memory contexts and 
pg_print_backtrace is still under discussion, IMHO it would be better to 
refactor SendProcSignalForLogInfo after these patches are commited.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 5499167a7ecc6f040d5fec817cf36a7ba0b5cbff Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 15 Nov 2021 21:20:43 +0900
Subject: [PATCH v12] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_current_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_current_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova
---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/procsignal.h |   1 +
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 17 files changed, 333 insertions(+), 61 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 24447c0017..e12e1feeca 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25345,6 +25345,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_current_query_plan
+
+pg_log_current_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25458,6 +25478,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunk

Re: RFC: Logging plan of the running query

2021-11-15 Thread torikoshia

On 2021-11-13 03:37, Justin Pryzby wrote:


I reviewed this version of the patch - I have some language fixes.


Thanks for your review!
Attached patch that reflects your comments.


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 15 Nov 2021 22:31:00 +0900
Subject: [PATCH v13] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 18 files changed, 355 insertions(+), 61 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 24447c0017..7ffaa9a55d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25345,6 +25345,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25458,6 +25478,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  plan of the query running on backend with PID 17793 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 width=97)
+  Output: aid, bid, abalance, filler
+Settings: work_mem = '1MB'
+
+Note that nested statements (statements executed inside a function) are not
+considered for logging. Only the plan of the most deeply nested query is logged.
+   
+
   
 
   
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 54c93b16c4..d7f0010e47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -701,6 +701,8 @@ REVOKE EXECUTE ON FUNCTION pg_ls_dir(text,boolean,boolean) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_log_backend_memory_contexts(integer) FROM PUBLIC;
 
+REVOKE EXECUTE ON FUNCTION pg_log_query_plan(integer) F

Re: RFC: Logging plan of the running query

2021-11-16 Thread torikoshia

On 2021-11-15 23:15, Bharath Rupireddy wrote:


I have another comment: isn't it a good idea that an overloaded
version of the new function pg_log_query_plan can take the available
explain command options as a text argument? I'm not sure if it is
possible to get the stats like buffers, costs etc of a running query,
if yes, something like pg_log_query_plan(pid, 'buffers',
'costs');? It looks to be an overkill at first sight, but these
can be useful to know more detailed plan of the query.


I also think the overloaded version would be useful.
However as discussed in [1], it seems to introduce other difficulties.
I think it would be enough that the first version of pg_log_query_plan 
doesn't take any parameters.


[1] 
https://www.postgresql.org/message-id/ce86e4f72f09d5497e8ad3a162861d33%40oss.nttdata.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: RFC: Logging plan of the running query

2021-11-25 Thread torikoshia

On 2021-11-17 22:44, Ekaterina Sokolova wrote:

Hi!

You forgot my last fix to build correctly on Mac. I have added it.


Thanks for the notification!
Since the patch could not be applied to the HEAD anymore, I also updated 
it.




About our discussion of pg_query_state:

torikoshia писал 2021-11-04 15:49:

I doubt that it was the right link.

Sorry for make you confused, here is the link.
https://www.postgresql.org/message-id/CA%2BTgmobkpFV0UB67kzXuD36--OFHwz1bs%3DL_6PZbD4nxKqUQMw%40mail.gmail.com


Thank you. I'll see it soon.


I imagined the following procedure.
Does it cause dead lock in pg_query_state?

- session1
BEGIN; TRUNCATE t;

- session2
BEGIN; TRUNCATE t; -- wait

- session1
SELECT * FROM pg_query_state(); -- wait and dead 
locked?


As I know, pg_query_state use non-blocking read and write. I have
wrote few tests trying to deadlock it (on 14 version), but all
finished correctly.

Have a nice day. Please feel free to contact me if you need any
further information.


Thanks for your information and help!

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom b8367e22d7a9898e4b85627ba8c203be273fc22f Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 26 Nov 2021 10:31:00 +0900
Subject: [PATCH v14] Add function to log the untruncated query string and its
 plan for the query currently running on the backend with the specified
 process ID.

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Since some codes, tests and comments of
pg_log_query_plan() are the same with
pg_log_backend_memory_contexts(), this patch also refactors
them to make them common.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar, Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby

---
 doc/src/sgml/func.sgml   |  45 +++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 117 ++-
 src/backend/executor/execMain.c  |  10 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/ipc/signalfuncs.c|  55 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   7 ++
 src/backend/utils/adt/mcxtfuncs.c|  36 +-
 src/backend/utils/init/globals.c |   1 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   3 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/storage/signalfuncs.h|  22 
 src/include/tcop/pquery.h|   1 +
 src/test/regress/expected/misc_functions.out |  54 +++--
 src/test/regress/sql/misc_functions.sql  |  42 +--
 19 files changed, 355 insertions(+), 63 deletions(-)
 create mode 100644 src/include/storage/signalfuncs.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0a725a6711..b84ead4341 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25358,6 +25358,26 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID along with the untruncated
+query string.
+They will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+   
+  
+
   

 
@@ -25471,6 +25491,31 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-11 Thread torikoshia
On Wed, Jan 10, 2024 at 4:42 PM Masahiko Sawada  
wrote:



Yeah, I'm still thinking it's better to implement this feature
incrementally. Given we're closing to feature freeze, I think it's
unlikely to get the whole feature into PG17 since there are still many
design discussions we need in addition to what Torikoshi-san pointed
out. The feature like "ignore errors" or "logging errors" would have
higher possibilities. Even if we get only these parts of the whole
"error table" feature into PG17, it will make it much easier to

implement "error tables" feature.

+1.
I'm also going to make patch for "logging errors", since this 
functionality is isolated from v7 patch.



Seems promising. I'll look at the patch.

Thanks a lot!
Sorry to attach v2 if you already reviewed v1..

On 2024-01-11 12:13, jian he wrote:
On Tue, Jan 9, 2024 at 10:36 PM torikoshia  
wrote:


On Tue, Dec 19, 2023 at 10:14 AM Masahiko Sawada 


wrote:
> If we want only such a feature we need to implement it together (the
> patch could be split, though). But if some parts of the feature are
> useful for users as well, I'd recommend implementing it incrementally.
> That way, the patches can get small and it would be easy for reviewers
> and committers to review/commit them.

Jian, how do you think this comment?

Looking back at the discussion so far, it seems that not everyone 
thinks
saving table information is the best idea[1] and some people think 
just

skipping error data is useful.[2]

Since there are issues to be considered from the design such as
physical/logical replication treatment, putting error information to
table is likely to take time for consensus building and development.

Wouldn't it be better to follow the following advice and develop the
functionality incrementally?

On Fri, Dec 15, 2023 at 4:49 AM Masahiko Sawada
 wrote:
> So I'm thinking we may be able to implement this
> feature incrementally. The first step would be something like an
> option to ignore all errors or an option to specify the maximum number
> of errors to tolerate before raising an ERROR. The second step would
> be to support logging destinations such as server logs and tables.


Attached a patch for this "first step" with reference to v7 patch, 
which

logged errors and simpler than latest one.
- This patch adds new option SAVE_ERROR_TO, but currently only 
supports

'none', which means just skips error data. It is expected to support
'log' and 'table'.
- This patch Skips just soft errors and don't handle other errors such
as missing column data.


Hi.
I made the following change based on your patch
(v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch)

* when specified SAVE_ERROR_TO, move the initialization of
ErrorSaveContext to the function BeginCopyFrom.
I think that's the right place to initialize struct CopyFromState 
field.

* I think your patch when there are N rows have malformed data, then it
will initialize N ErrorSaveContext.
In the struct CopyFromStateData, I changed it to ErrorSaveContext 
*escontext.

So if an error occurred, you can just set the escontext accordingly.
* doc: mention "If this option is omitted, COPY
stops operation at the first error."
* Since we only support 'none' for now, 'none' means we don't want
ErrorSaveContext metadata,
 so we should set cstate->escontext->details_wanted to false.


BTW I have question and comment about v15 patch:

> +   {
> +   /*
> +   *
> +   * InputFunctionCall is more faster than
> InputFunctionCallSafe.
> +   *
> +   */

Have you measured this?
When I tested it in an older patch, there were no big difference[3].

Thanks for pointing it out, I probably was over thinking.

  > -   SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P 
SECURITY

SELECT
  > +   SAVEPOINT SAVE_ERROR SCALAR SCHEMA SCHEMAS SCROLL SEARCH 
SECOND_P

SECURITY SELECT

There was a comment that we shouldn't add new keyword for this[4].


Thanks for pointing it out.


Thanks for reviewing!

Updated the patch merging your suggestions except below points:


+   cstate->num_errors = 0;


Since cstate is already initialized in below lines, this may be 
redundant.


| /* Allocate workspace and zero all fields */
| cstate = (CopyFromStateData *) palloc0(sizeof(CopyFromStateData));



 +   Assert(!cstate->escontext->details_wanted);


I'm not sure this is necessary, considering we're going to add other 
options like 'table' and 'log', which need details_wanted soon.



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom a3f14a0e7e9a7b5fb961ad6b6b7b163cf6534a26 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 12 Jan 2024 11:32

doc: add LITERAL tag to RETURNING

2024-01-11 Thread torikoshia

Hi,

RETURNING is usually tagged with appropriate tags, such as , 
but not in the 'query' section of COPY.


https://www.postgresql.org/docs/devel/sql-copy.html

Would it be better to put  here as well?

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 3c9efe404310bf01d79b2f0f006541ebc0b170a0 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 12 Jan 2024 14:33:47 +0900
Subject: [PATCH v1] Added literal tag for RETURNING.

---
 doc/src/sgml/ref/copy.sgml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69c33..e2ffbbdf84 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -128,10 +128,10 @@ COPY { table_name [ ( 
  
   For INSERT, UPDATE and
-  DELETE queries a RETURNING clause must be provided,
-  and the target relation must not have a conditional rule, nor
-  an ALSO rule, nor an INSTEAD rule
-  that expands to multiple statements.
+  DELETE queries a RETURNING clause
+  must be provided, and the target relation must not have a conditional
+  rule, nor an ALSO rule, nor an
+  INSTEAD rule that expands to multiple statements.
  
 


base-commit: 08c3ad27eb5348d0cbffa843a3edb11534f9904a
-- 
2.39.2



Re: doc: add LITERAL tag to RETURNING

2024-01-14 Thread torikoshia

On 2024-01-12 20:56, Alvaro Herrera wrote:

On 2024-Jan-12, Ashutosh Bapat wrote:

On Fri, Jan 12, 2024 at 11:27 AM torikoshia 
 wrote:

>
> RETURNING is usually tagged with appropriate tags, such as ,
> but not in the 'query' section of COPY.



The patch looks good.


Good catch, pushed.  It has user-visible effect, so I backpatched it.


Thanks for your review and push.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-15 Thread torikoshia

On 2024-01-16 00:17, Alexander Korotkov wrote:
On Mon, Jan 15, 2024 at 8:44 AM Masahiko Sawada  
wrote:


On Mon, Jan 15, 2024 at 8:21 AM Alexander Korotkov 
 wrote:

>
> On Sun, Jan 14, 2024 at 10:35 PM Masahiko Sawada  
wrote:
> > Thank you for updating the patch. Here are two comments:
> >
> > ---
> > +   if (cstate->opts.save_error_to != COPY_SAVE_ERROR_TO_UNSPECIFIED &&
> > +   cstate->num_errors > 0)
> > +   ereport(WARNING,
> > +   errmsg("%zd rows were skipped due to data type 
incompatibility",
> > +  cstate->num_errors));
> > +
> > /* Done, clean up */
> > error_context_stack = errcallback.previous;
> >
> > If a malformed input is not the last data, the context message seems odd:
> >
> > postgres(1:1769258)=# create table test (a int);
> > CREATE TABLE
> > postgres(1:1769258)=# copy test from stdin (save_error_to none);
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF signal.
> > >> a
> > >> 1
> > >>
> > 2024-01-15 05:05:53.980 JST [1769258] WARNING:  1 rows were skipped
> > due to data type incompatibility
> > 2024-01-15 05:05:53.980 JST [1769258] CONTEXT:  COPY test, line 3: ""
> > COPY 1
> >
> > I think it's better to report the WARNING after resetting the
> > error_context_stack. Or is a WARNING really appropriate here? The
> > v15-0001-Make-COPY-FROM-more-error-tolerant.patch[1] uses NOTICE but
> > the v1-0001-Add-new-COPY-option-SAVE_ERROR_TO.patch[2] changes it to
> > WARNING without explanation.
>
> Thank you for noticing this.  I think NOTICE is more appropriate here.
> There is nothing to "worry" about: the user asked to ignore the errors
> and we did.  And yes, it doesn't make sense to use the last line as
> the context.  Fixed.
>
> > ---
> > +-- test missing data: should fail
> > +COPY check_ign_err FROM STDIN WITH (save_error_to none);
> > +1  {1}
> > +\.
> >
> > We might want to cover the extra data cases too.
>
> Agreed, the relevant test is added.

Thank you for updating the patch. I have one minor point:

+   if (cstate->opts.save_error_to != 
COPY_SAVE_ERROR_TO_UNSPECIFIED &&

+   cstate->num_errors > 0)
+   ereport(NOTICE,
+   errmsg("%zd rows were skipped due to
data type incompatibility",
+  cstate->num_errors));
+

We can use errmsg_plural() instead.


Makes sense.  Fixed.


I have a question about the option values; do you think we need to
have another value of SAVE_ERROR_TO option to explicitly specify the
current default behavior, i.e. not accept any error? With the v4
patch, the user needs to omit SAVE_ERROR_TO option to accept errors
during COPY FROM. If we change the default behavior in the future,
many users will be affected and probably end up changing their
applications to keep the current default behavior.


Valid point.  I've implemented the handling of CopySaveErrorToChoice
in a similar way to CopyHeaderChoice.

Please, check the revised patch attached.


Thanks for updating the patch!

Here is a minor comment:


+/*
+ * Extract a defGetCopySaveErrorToChoice value from a DefElem.
+ */


Should be Extract a "CopySaveErrorToChoice"?


BTW I'm thinking we should add a column to pg_stat_progress_copy that 
counts soft errors. I'll suggest this in another thread.



--
Regards,
Alexander Korotkov


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Add tuples_skipped to pg_stat_progress_copy

2024-01-16 Thread torikoshia

Hi,

132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to 
skip malformed data, but there is no way to watch the number of skipped 
rows during COPY.


Attached patch adds tuples_skipped to pg_stat_progress_copy, which 
counts the number of skipped tuples because source data is malformed.

If SAVE_ERROR_TO is not specified, this column remains zero.

The advantage would be that users can quickly notice and stop COPYing 
when there is a larger amount of skipped data than expected, for 
example.


As described in commit log, it is expected to add more choices for 
SAVE_ERROR_TO like 'log' and using such options may enable us to know 
the number of skipped tuples during COPY, but exposed in 
pg_stat_progress_copy would be easier to monitor.



What do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 98e546ff2de380175708ce003f67c993299a3fb3 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 17 Jan 2024 13:41:44 +0900
Subject: [PATCH v1] Add tuples_skipped to pg_stat_progress_copy

132de9968840c enabled COPY to skip malformed data, but there is no way to watch the number of skipped rows during COPY.

This patch adds tuples_skipped to pg_stat_progress_copy, which counts the number of skipped tuple because source data is malformed.
If SAVE_ERROR_TO is not specified, this column remains zero.

Needs catalog bump.
---
 doc/src/sgml/monitoring.sgml | 10 ++
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/commands/copyfrom.c  |  5 +
 src/include/commands/progress.h  |  1 +
 src/test/regress/expected/rules.out  |  3 ++-
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b804eb8b5e..96ed774670 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5779,6 +5779,16 @@ FROM pg_stat_get_backend_idset() AS backendid;
WHERE clause of the COPY command.
   
  
+
+ 
+  
+   tuples_skipped bigint
+  
+  
+   Number of tuples skipped because they contain malformed data
+   (if SAVE_ERROR_TO is specified, otherwise zero).
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e43e36f5ac..6288270e2b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,7 +1318,8 @@ CREATE VIEW pg_stat_progress_copy AS
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
 S.param3 AS tuples_processed,
-S.param4 AS tuples_excluded
+S.param4 AS tuples_excluded,
+S.param7 AS tuples_skipped
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 4058b08134..fe33b0facf 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -650,6 +650,7 @@ CopyFrom(CopyFromState cstate)
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
+	int64		skipped = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -1012,6 +1013,10 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
+			/* Report that this tuple was skipped by the SAVE_ERROR_TO clause */
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+			 ++skipped);
+
 			continue;
 		}
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index a458c8c50a..73afa77a9c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -142,6 +142,7 @@
 #define PROGRESS_COPY_TUPLES_EXCLUDED 3
 #define PROGRESS_COPY_COMMAND 4
 #define PROGRESS_COPY_TYPE 5
+#define PROGRESS_COPY_TUPLES_SKIPPED 6
 
 /* Commands of COPY (as advertised via PROGRESS_COPY_COMMAND) */
 #define PROGRESS_COPY_COMMAND_FROM 1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 55f2e95352..5e846b01e6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1988,7 +1988,8 @@ pg_stat_progress_copy| SELECT s.pid,
 s.param1 AS bytes_processed,
 s.param2 AS bytes_total,
 s.param3 AS tuples_processed,
-s.param4 AS tuples_excluded
+s.param4 AS tuples_excluded,
+s.param7 AS tuples_skipped
FROM (pg_stat_get_progress_info('COPY'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
  LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_create_index| SELECT s.pid,

base-commit: 65c5864d7fac46516f17ee89085e349a87ee5bd7
-- 
2.39.2



Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-16 Thread torikoshia

Hi,

Thanks for applying!

+   errmsg_plural("%zd row were skipped due 
to data type incompatibility",


Sorry, I just noticed it, but 'were' should be 'was' here?


BTW I'm thinking we should add a column to pg_stat_progress_copy that
counts soft errors. I'll suggest this in another thread.

Please do!


I've started it here:

https://www.postgresql.org/message-id/d12fd8c99adcae2744212cb23feff...@oss.nttdata.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-17 Thread torikoshia

On 2024-01-18 10:10, jian he wrote:
On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada  
wrote:


On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>
> Alexander Korotkov  writes:
> > On Wed, Jan 17, 2024 at 9:49 AM Kyotaro Horiguchi
> >  wrote:
> >> On the other hand, SAVE_ERROR_TO takes 'error' or 'none', which
> >> indicate "immediately error out" and 'just ignore the failure'
> >> respectively, but these options hardly seem to denote a 'location',
> >> and appear more like an 'action'. I somewhat suspect that this
> >> parameter name intially conceived with the assupmtion that it would
> >> take file names or similar parameters. I'm not sure if others will
> >> agree, but I think the parameter name might not be the best
> >> choice. For instance, considering the addition of the third value
> >> 'log', something like on_error_action (error, ignore, log) would be
> >> more intuitively understandable. What do you think?
>
> > Probably, but I'm not sure about that.  The name SAVE_ERROR_TO assumes
> > the next word will be location, not action.  With some stretch we can
> > assume 'error' to be location.  I think it would be even more stretchy
> > to think that SAVE_ERROR_TO is followed by action.
>
> The other problem with this terminology is that with 'none', what it
> is doing is the exact opposite of "saving" the errors.  I agree we
> need a better name.

Agreed.

>
> Kyotaro-san's suggestion isn't bad, though I might shorten it to
> error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
> You will need a separate parameter anyway to specify the destination
> of "log", unless "none" became an illegal table name when I wasn't
> looking.  I don't buy that one parameter that has some special values
> while other values could be names will be a good design.  Moreover,
> what if we want to support (say) log-to-file along with log-to-table?
> Trying to distinguish a file name from a table name without any other
> context seems impossible.

I've been thinking we can add more values to this option to log errors
not only to the server logs but also to the error table (not sure
details but I imagined an error table is created for each table on
error), without an additional option for the destination name. The
values would be like error_action {error|ignore|save-logs|save-table}.



another idea:
on_error {error|ignore|other_future_option}
if not specified then by default ERROR.
You can also specify ERROR or IGNORE for now.

I agree, the parameter "error_action" is better than "location".


I'm not sure whether error_action or on_error is better, but either way 
"error_action error" and "on_error error" seems a bit odd to me.

I feel "stop" is better for both cases as Tom suggested.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-18 Thread torikoshia

On 2024-01-18 16:59, Alexander Korotkov wrote:
On Thu, Jan 18, 2024 at 4:16 AM torikoshia  
wrote:

On 2024-01-18 10:10, jian he wrote:
> On Thu, Jan 18, 2024 at 8:57 AM Masahiko Sawada 
> wrote:
>> On Thu, Jan 18, 2024 at 6:38 AM Tom Lane  wrote:
>> > Kyotaro-san's suggestion isn't bad, though I might shorten it to
>> > error_action {error|ignore|log} (or perhaps "stop" instead of "error")?
>> > You will need a separate parameter anyway to specify the destination
>> > of "log", unless "none" became an illegal table name when I wasn't
>> > looking.  I don't buy that one parameter that has some special values
>> > while other values could be names will be a good design.  Moreover,
>> > what if we want to support (say) log-to-file along with log-to-table?
>> > Trying to distinguish a file name from a table name without any other
>> > context seems impossible.
>>
>> I've been thinking we can add more values to this option to log errors
>> not only to the server logs but also to the error table (not sure
>> details but I imagined an error table is created for each table on
>> error), without an additional option for the destination name. The
>> values would be like error_action {error|ignore|save-logs|save-table}.
>>
>
> another idea:
> on_error {error|ignore|other_future_option}
> if not specified then by default ERROR.
> You can also specify ERROR or IGNORE for now.
>
> I agree, the parameter "error_action" is better than "location".

I'm not sure whether error_action or on_error is better, but either 
way

"error_action error" and "on_error error" seems a bit odd to me.
I feel "stop" is better for both cases as Tom suggested.


OK.  What about this?
on_error {stop|ignore|other_future_option}
where other_future_option might be compound like "file 'copy.log'" or
"table 'copy_log'".


Thanks, also +1 from me.

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-01-19 Thread torikoshia

On 2024-01-16 18:41, Melih Mutlu wrote:

Hi,

Thanks for reviewing.

torikoshia , 10 Oca 2024 Çar, 09:37
tarihinde şunu yazdı:


+ 
+  
+   context_id int4
+  
+  
+   Current context id. Note that the context id is a

temporary id

and may
+   change in each invocation
+  
+ 
+
+ 
+  
+   path int4[]
+  
+  
+   Path to reach the current context from TopMemoryContext.
Context ids in
+   this list represents all parents of the current context.

This

can be
+   used to build the parent and child relation
+  
+ 
+
+ 
+  
+   total_bytes_including_children
int8
+  
+  
+   Total bytes allocated for this memory context including

its

children
+  
+ 


These columns are currently added to the bottom of the table, but it
may
be better to put semantically similar items close together and
change
the insertion position with reference to other system views. For
example,

- In pg_group and pg_user, 'id' is placed on the line following
'name',
so 'context_id' be placed on the line following 'name'
- 'path' is similar with 'parent' and 'level' in that these are
information about the location of the context, 'path' be placed to
next
to them.

If we do this, orders of columns in the system view should be the
same,
I think.


I've done what you suggested. Also moved
"total_bytes_including_children" right after "total_bytes".


14dd0f27d have introduced new macro foreach_int.
It seems to be able to make the code a bit simpler and the commit
log
says this macro is primarily intended for use in new code. For
example:


Makes sense. Done.


Thanks for updating the patch!

+   Current context id. Note that the context id is a temporary id 
and may

+   change in each invocation
+  
+ 


It clearly states that the context id is temporary, but I am a little 
concerned about users who write queries that refer to this view multiple 
times without using CTE.


If you agree, how about adding some description like below you mentioned 
before?


We still need to use cte since ids are not persisted and might change 
in

each run of pg_backend_memory_contexts. Materializing the result can
prevent any inconsistencies due to id change. Also it can be even good 
for

performance reasons as well.


We already have additional description below the table which explains 
each column of the system view. For example pg_locks:

https://www.postgresql.org/docs/devel/view-pg-locks.html


Also giving an example query something like this might be useful.

  -- show all the parent context names of ExecutorState
  with contexts as (
select * from pg_backend_memory_contexts
  )
  select name from contexts where array[context_id] <@ (select path from 
contexts where name = 'ExecutorState');



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-19 Thread torikoshia

On 2024-01-18 23:59, jian he wrote:

Hi.
patch refactored based on "on_error {stop|ignore}"
doc changes:

--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,7 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL { ( column_name [, ...] ) | * }
 FORCE_NULL { ( column_name [, ...] ) | * }
-SAVE_ERROR_TO 'class="parameter">location'
+ON_ERROR 'class="parameter">error_action'
 ENCODING 'class="parameter">encoding_name'

 
  
@@ -375,20 +375,20 @@ COPY { table_name [ ( 


-SAVE_ERROR_TO
+ON_ERROR
 
  
-  Specifies to save error information to class="parameter">
-  location when there is malformed data in the 
input.

-  Currently, only error (default) and
none
+  Specifies which 
+  error_action to perform when there is malformed
data in the input.
+  Currently, only stop (default) and
ignore
   values are supported.
-  If the error value is specified,
+  If the stop value is specified,
   COPY stops operation at the first error.
-  If the none value is specified,
+  If the ignore value is specified,
   COPY skips malformed data and continues 
copying data.

   The option is allowed only in COPY FROM.
-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.
  


Thanks for making the patch!

Here are some comments:


-  The none value is allowed only when
-  not using binary format.
+  Only stop value is allowed only when
+  using binary format.


The second 'only' may be unnecessary.

-   /* If SAVE_ERROR_TO is specified, skip rows 
with soft errors */
+   /* If ON_ERROR is specified with IGNORE, skip 
rows with soft errors */


This is correct now, but considering future works which add other 
options like "file 'copy.log'" and

"table 'copy_log'", it may be better not to limit the case to 'IGNORE'.
How about something like this?

  If ON_ERROR is specified and the value is not STOP, skip rows with 
soft errors



-COPY x from stdin (format BINARY, save_error_to none);
-COPY x to stdin (save_error_to none);
+COPY x from stdin (format BINARY, ON_ERROR ignore);
+COPY x from stdin (ON_ERROR unsupported);
 COPY x to stdin (format TEXT, force_quote(a));
 COPY x from stdin (format CSV, force_quote(a));


In the existing test for copy2.sql, the COPY options are written in 
lower case(e.g. 'format') and option value(e.g. 'BINARY') are written in 
upper case.

It would be more consistent to align them.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2024-01-19 Thread torikoshia

On 2024-01-19 22:27, Alexander Korotkov wrote:

Hi!

On Fri, Jan 19, 2024 at 2:37 PM torikoshia  
wrote:

Thanks for making the patch!


The patch is pushed!  The proposed changes are incorporated excluding 
this.



> -   /* If SAVE_ERROR_TO is specified, skip rows
> with soft errors */
> +   /* If ON_ERROR is specified with IGNORE, skip
> rows with soft errors */

This is correct now, but considering future works which add other
options like "file 'copy.log'" and
"table 'copy_log'", it may be better not to limit the case to 
'IGNORE'.

How about something like this?

   If ON_ERROR is specified and the value is not STOP, skip rows with
soft errors


I think when we have more options, then we wouldn't just skip rows
with soft errors but rather save them.  So, I left this comment as is
for now.


Agreed.
Thanks for the notification!



--
Regards,
Alexander Korotkov


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Add tuples_skipped to pg_stat_progress_copy

2024-01-22 Thread torikoshia

On 2024-01-17 14:47, Masahiko Sawada wrote:
On Wed, Jan 17, 2024 at 2:22 PM torikoshia  
wrote:


Hi,

132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
skip malformed data, but there is no way to watch the number of 
skipped

rows during COPY.

Attached patch adds tuples_skipped to pg_stat_progress_copy, which
counts the number of skipped tuples because source data is malformed.
If SAVE_ERROR_TO is not specified, this column remains zero.

The advantage would be that users can quickly notice and stop COPYing
when there is a larger amount of skipped data than expected, for
example.

As described in commit log, it is expected to add more choices for
SAVE_ERROR_TO like 'log' and using such options may enable us to know
the number of skipped tuples during COPY, but exposed in
pg_stat_progress_copy would be easier to monitor.


What do you think?


+1

The patch is pretty simple. Here is a comment:

+   (if SAVE_ERROR_TO is specified, otherwise 
zero).

+  
+ 

To be precise, this counter only advances when a value other than
'ERROR' is specified to SAVE_ERROR_TO option.


Thanks for your comment and review!

Updated the patch according to your comment and option name change by 
b725b7eec.



BTW, based on this patch, I think we can add another option which 
specifies the maximum tolerable number of malformed rows.
I remember this was discussed in [1], and feel it would be useful when 
loading 'dirty' data but there is a limit to how dirty it can be.

Attached 0002 is WIP patch for this(I haven't added doc yet).

This may be better discussed in another thread, but any comments(e.g. 
necessity of this option, option name) are welcome.



[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 571ada768bdb68a31f295cbcb28f4348f253989d Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 22 Jan 2024 23:57:24 +0900
Subject: [PATCH v2 1/2] Add tuples_skipped to pg_stat_progress_copy

132de9968840c enabled COPY to skip malformed data, but there is no way to watch
the number of skipped rows during COPY.

This patch adds tuples_skipped to pg_stat_progress_copy, which counts the
number of skipped tuple because source data is malformed.
This column only advances when a value other than stop is specified to ON_ERROR.

Needs catalog bump.
---
 doc/src/sgml/monitoring.sgml | 11 +++
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/commands/copyfrom.c  |  5 +
 src/include/commands/progress.h  |  1 +
 src/test/regress/expected/rules.out  |  3 ++-
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6e74138a69..cfc13b3580 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5780,6 +5780,17 @@ FROM pg_stat_get_backend_idset() AS backendid;
WHERE clause of the COPY command.
   
  
+
+ 
+  
+   tuples_skipped bigint
+  
+  
+   Number of tuples skipped because they contain malformed data.
+   This counter only advances when a value other than
+   stop is specified to ON_ERROR.
+  
+ 
 

   
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e43e36f5ac..6288270e2b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1318,7 +1318,8 @@ CREATE VIEW pg_stat_progress_copy AS
 S.param1 AS bytes_processed,
 S.param2 AS bytes_total,
 S.param3 AS tuples_processed,
-S.param4 AS tuples_excluded
+S.param4 AS tuples_excluded,
+S.param7 AS tuples_skipped
 FROM pg_stat_get_progress_info('COPY') AS S
 LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 173a736ad5..8ab3777664 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -650,6 +650,7 @@ CopyFrom(CopyFromState cstate)
 	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
+	int64		skipped = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
@@ -1012,6 +1013,10 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
+			/* Report that this tuple was skipped by the ON_ERROR clause */
+			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
+			 ++skipped);
+
 			continue;
 		}
 
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index a458c8c50a..73afa77a9c 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -142,6 +142,7 @@
 #define PROGRESS_COPY_TUPLES_EXCLUDED 3
 #define PROGRESS_COPY_COMMAND 4
 #define PROGRESS_

Re: Add tuples_skipped to pg_stat_progress_copy

2024-01-24 Thread torikoshia

On 2024-01-24 17:05, Masahiko Sawada wrote:
On Tue, Jan 23, 2024 at 1:02 AM torikoshia  
wrote:


On 2024-01-17 14:47, Masahiko Sawada wrote:
> On Wed, Jan 17, 2024 at 2:22 PM torikoshia 
> wrote:
>>
>> Hi,
>>
>> 132de9968840c introduced SAVE_ERROR_TO option to COPY and enabled to
>> skip malformed data, but there is no way to watch the number of
>> skipped
>> rows during COPY.
>>
>> Attached patch adds tuples_skipped to pg_stat_progress_copy, which
>> counts the number of skipped tuples because source data is malformed.
>> If SAVE_ERROR_TO is not specified, this column remains zero.
>>
>> The advantage would be that users can quickly notice and stop COPYing
>> when there is a larger amount of skipped data than expected, for
>> example.
>>
>> As described in commit log, it is expected to add more choices for
>> SAVE_ERROR_TO like 'log' and using such options may enable us to know
>> the number of skipped tuples during COPY, but exposed in
>> pg_stat_progress_copy would be easier to monitor.
>>
>>
>> What do you think?
>
> +1
>
> The patch is pretty simple. Here is a comment:
>
> +   (if SAVE_ERROR_TO is specified, otherwise
> zero).
> +  
> + 
>
> To be precise, this counter only advances when a value other than
> 'ERROR' is specified to SAVE_ERROR_TO option.

Thanks for your comment and review!

Updated the patch according to your comment and option name change by
b725b7eec.


Thanks! The patch looks good to me. I'm going to push it tomorrow,
barring any objections.


Thanks!



BTW, based on this patch, I think we can add another option which
specifies the maximum tolerable number of malformed rows.
I remember this was discussed in [1], and feel it would be useful when
loading 'dirty' data but there is a limit to how dirty it can be.
Attached 0002 is WIP patch for this(I haven't added doc yet).


Yeah, it could be a good option.


This may be better discussed in another thread, but any comments(e.g.
necessity of this option, option name) are welcome.


I'd recommend forking a new thread for this option. As far as I
remember, there also was an opinion that "reject limit" stuff is not
very useful.


OK, I'll make another thread for this.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Add new error_action COPY ON_ERROR "log"

2024-01-25 Thread torikoshia

Hi,

As described in 9e2d870119, COPY ON_EEOR is expected to have more 
"error_action".

(Note that option name was changed by b725b7eec)

I'd like to have a new option "log", which skips soft errors and logs 
information that should have resulted in errors to PostgreSQL log.


I think this option has some advantages like below:

1) We can know which number of line input data was not loaded and 
reason.


  Example:

  =# copy t1 from stdin with (on_error log);
  Enter data to be copied followed by a newline.
  End with a backslash and a period on a line by itself, or an EOF 
signal.

  >> 1
  >> 2
  >> 3
  >> z
  >> \.
  LOG:  invalid input syntax for type integer: "z"
  NOTICE:  1 row was skipped due to data type incompatibility
  COPY 3

  =# \! tail data/log/postgresql*.log
  LOG:  22P02: invalid input syntax for type integer: "z"
  CONTEXT:  COPY t1, line 4, column i: "z"
  LOCATION:  pg_strtoint32_safe, numutils.c:620
  STATEMENT:  copy t1 from stdin with (on_error log);


2) Easier maintenance than storing error information in tables or 
proprietary log files.
For example, in case a large number of soft errors occur, some 
mechanisms are needed to prevent an infinite increase in the size of the 
destination data, but we can left it to PostgreSQL's log rotation.



Attached a patch.
This basically comes from previous discussion[1] which did both "ignore" 
and "log" soft error.


As shown in the example above, the log output to the client does not 
contain CONTEXT, so I'm a little concerned that client cannot see what 
line of the input data had a problem without looking at the server log.



What do you think?

[1] 
https://www.postgresql.org/message-id/c0fb57b82b150953f26a5c7e340412e8%40oss.nttdata.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 04e643facfea4b4e8dd174d22fbe5e008747a91a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 26 Jan 2024 01:17:59 +0900
Subject: [PATCH v1] Add new error_action "log" to ON_ERROR option

Currently ON_ERROR option only has "ignore" to skip malformed data and
there are no ways to know where and why COPY skipped them.

"log" skips malformed data as well as "ignore", but it logs information that
should have resulted in errors to PostgreSQL log.


---
 doc/src/sgml/ref/copy.sgml  |  8 ++--
 src/backend/commands/copy.c |  4 +++-
 src/backend/commands/copyfrom.c | 24 
 src/include/commands/copy.h |  1 +
 src/test/regress/expected/copy2.out | 14 +-
 src/test/regress/sql/copy2.sql  |  9 +
 6 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..9662c90a8b 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -380,12 +380,16 @@ COPY { table_name [ ( 
   Specifies which 
   error_action to perform when there is malformed data in the input.
-  Currently, only stop (default) and ignore
-  values are supported.
+  Currently, only stop (default), ignore
+  and log values are supported.
   If the stop value is specified,
   COPY stops operation at the first error.
   If the ignore value is specified,
   COPY skips malformed data and continues copying data.
+  If the log value is specified,
+  COPY behaves the same as ignore, exept that
+  it logs information that should have resulted in errors to PostgreSQL log at
+  INFO level.
   The option is allowed only in COPY FROM.
   Only stop value is allowed when
   using binary format.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..812ca63350 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 		return COPY_ON_ERROR_STOP;
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore" or "log" values.
 	 */
 	sval = defGetString(def);
 	if (pg_strcasecmp(sval, "stop") == 0)
 		return COPY_ON_ERROR_STOP;
 	if (pg_strcasecmp(sval, "ignore") == 0)
 		return COPY_ON_ERROR_IGNORE;
+	if (pg_strcasecmp(sval, "log") == 0)
+		return COPY_ON_ERROR_LOG;
 
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..7886bd5353 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1013,6 +1013,23 @@ CopyFrom(CopyFromState cstate)
  */
 cstate->escontext->error_occurred = false;
 
+			else if (cstate->opts.on_error == COPY_ON_ERROR_LOG)
+			{
+/* Adjust elevel so we don't jump out */
+cstate->escontext->error_data->elevel = LOG;
+
+/*
+ * Despite the name, this won't raise an error since elevel is
+ * LOG now.
+ */
+ThrowErrorData(cstate->escontext->error_data);
+
+/* Initialize escontext in preparation for next soft error */

Add new COPY option REJECT_LIMIT

2024-01-26 Thread torikoshia

Hi,

9e2d870 enabled the COPY command to skip soft error, and I think we can 
add another option which specifies the maximum tolerable number of soft 
errors.


I remember this was discussed in [1], and feel it would be useful when 
loading 'dirty' data but there is a limit to how dirty it can be.


Attached a patch for this.

What do you think?


[1] 
https://www.postgresql.org/message-id/752672.1699474336%40sss.pgh.pa.us



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 7f111e98e21654c4ca338c93d7cbb4ec9acaabcb Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 26 Jan 2024 18:32:40 +0900
Subject: [PATCH v1] Add new COPY option REJECT_LIMIT

REJECT_LIMIT specifies the maximum tolerable number of malformed rows.
If input data has more malformed errors than this value, entire COPY fails.
This option must be used with ON_ERROR to be set to other than stop.
---
 doc/src/sgml/ref/copy.sgml  | 13 +
 src/backend/commands/copy.c | 16 
 src/backend/commands/copyfrom.c |  6 ++
 src/include/commands/copy.h |  1 +
 src/test/regress/expected/copy2.out | 10 ++
 src/test/regress/sql/copy2.sql  | 21 +
 6 files changed, 67 insertions(+)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..8982e8464a 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -393,6 +393,19 @@ COPY { table_name [ ( 

 
+   
+REJECT_LIMIT
+
+ 
+  Specifies the maximum tolerable number of malformed rows.
+  If input data has caused more malformed errors than this value, entire
+  COPY fails.
+  This option must be used with ON_ERROR to be set to
+  other than stop.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..ca5263d588 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -615,6 +615,22 @@ ProcessCopyOptions(ParseState *pstate,
 			on_error_specified = true;
 			opts_out->on_error = defGetCopyOnErrorChoice(defel, pstate, is_from);
 		}
+		else if (strcmp(defel->defname, "reject_limit") == 0)
+		{
+			int64	reject_limit = defGetInt64(defel);
+
+			if (!opts_out->on_error)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("REJECT_LIMIT requires ON_ERROR to be set to other than stop")));
+			if (opts_out->reject_limit > 0)
+errorConflictingDefElem(defel, pstate);
+			if (reject_limit <= 0)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("REJECT_LIMIT must be greater than zero")));
+			opts_out->reject_limit = reject_limit;
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..15066887ea 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1017,6 +1017,12 @@ CopyFrom(CopyFromState cstate)
 			pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
 		 ++skipped);
 
+			if (cstate->opts.reject_limit > 0 && skipped > cstate->opts.reject_limit)
+ereport(ERROR,
+		(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+		 errmsg("exceeded the number specified by REJECT LIMIT \"%d\"",
+cstate->opts.reject_limit)));
+
 			continue;
 		}
 
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index b3da3cb0be..8f8dab9524 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -73,6 +73,7 @@ typedef struct CopyFormatOptions
 	bool	   *force_null_flags;	/* per-column CSV FN flags */
 	bool		convert_selectively;	/* do selective binary conversion? */
 	CopyOnErrorChoice on_error; /* what to do when error happened */
+	int			reject_limit;	/* tolerable number of malformed rows */
 	List	   *convert_select; /* list of column names (can be NIL) */
 } CopyFormatOptions;
 
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 25c401ce34..28de7a2685 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -108,6 +108,10 @@ COPY x to stdin (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdin (format BINARY, on_error unsupported);
 ^
+COPY x from stdin with (reject_limit 3);
+ERROR:  REJECT_LIMIT requires ON_ERROR to be set to other than stop
+COPY x from stdin with (on_error ignore, reject_limit 0);
+ERROR:  REJECT_LIMIT must be greater than zero
 -- too many columns in column list: should fail
 COPY x (a, b, c, d, e, d, c) from stdin;
 ERROR:  column "d" specified more than once
@@ -751,6 +755,12 @@ CONTEXT:  COPY check_ign_err, line 1: "1	{1}"
 COPY check_ign_err FROM STDIN WITH (on_error ignore);
 ERROR:  extra data after last expected column
 CONTEXT:  COPY check_ign_err, line 1: "1	{1}	3	abc"
+-- 

Re: Small fix on COPY ON_ERROR document

2024-01-28 Thread torikoshia

On 2024-01-27 00:04, David G. Johnston wrote:

On Fri, Jan 26, 2024 at 2:30 AM Yugo NAGATA 
wrote:


On Fri, 26 Jan 2024 00:00:57 -0700
"David G. Johnston"  wrote:


I will need to make this tweak and probably a couple others to my

own

suggestions in 12 hours or so.



And here is my v2.

Notably I choose to introduce the verbiage "soft error" and then
define in the ON_ERROR clause the specific soft error that matters
here - "invalid input syntax".

I also note the log message behavior when ignore mode is chosen.  I
haven't confirmed that it is accurate but that is readily tweaked if
approved of.

David J.


Thanks for refining the doc.



+  Specifies which how to behave when encountering a soft error.


To be consistent with other parts in the manual[1][2], should be “soft” 
error?


+  An error_action 
value of

+  stop means fail the command, while
+  ignore means discard the input row and 
continue with the next one.

+  The default is stop


Is "." required at the end of the line?

+ 
+  The only relevant soft error is "invalid input syntax", which 
manifests when attempting

+  to create a column value from the text input.
+ 

I think it is not restricted to "invalid input syntax".
We can handle out of range error:

  =# create table t1(i int);
  CREATE TABLE

  =# copy t1  from stdin with(ON_ERROR ignore);
  Enter data to be copied followed by a newline.
  End with a backslash and a period on a line by itself, or an EOF
  signal.
  >> 1
  >> \.
  NOTICE:  1 row was skipped due to data type incompatibility
  COPY 0


Also, I'm a little concerned that users might wonder what soft error is.

Certainly there are already references to "soft" errors in the manual, 
but they seem to be for developer, such as creating new TYPE for 
PostgreSQL.


It might be better to describe what soft error is like below:


-- src/backend/utils/fmgr/README
An error reported "softly" must be safe, in the sense that there is
no question about our ability to continue normal processing of the
transaction.


[1] https://www.postgresql.org/docs/devel/sql-createtype.html
[2] https://www.postgresql.org/docs/devel/functions-info.html

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Add new error_action COPY ON_ERROR "log"

2024-01-28 Thread torikoshia
On Fri, Jan 26, 2024 at 10:44 PM jian he  
wrote:



I doubt the following part:
  If the log value is specified,
  COPY behaves the same as
ignore, exept that
  it logs information that should have resulted in errors to
PostgreSQL log at
  INFO level.

I think it does something like:
When an error happens, cstate->escontext->error_data->elevel will be 
ERROR

you manually change the cstate->escontext->error_data->elevel to LOG,
then you call ThrowErrorData.

but it's not related to `INFO level`?
my log_min_messages is default, warning.


Thanks!

Modified them to NOTICE in accordance with the following summary 
message:

NOTICE:  x row was skipped due to data type incompatibility



On 2024-01-27 00:43, David G. Johnston wrote:

On Thu, Jan 25, 2024 at 9:42 AM torikoshia
 wrote:


Hi,

As described in 9e2d870119, COPY ON_EEOR is expected to have more
"error_action".
(Note that option name was changed by b725b7eec)

I'd like to have a new option "log", which skips soft errors and
logs
information that should have resulted in errors to PostgreSQL log.


Seems like an easy win but largely unhelpful in the typical case.  I
suppose ETL routines using this feature may be running on their
machine under root or "postgres" but in a system where they are not
this very useful information is inaccessible to them.  I suppose the
DBA could set up an extractor to send these specific log lines
elsewhere but that seems like enough hassle to disfavor this approach
and favor one that can place the soft error data and feedback into
user-specified tables in the same database.  Setting up temporary
tables or unlogged tables probably is going to be a more acceptable
methodology than trying to get to the log files.

David J.


I agree that not a few people would prefer to store error information in 
tables and there have already been suggestions[1].


OTOH not everyone thinks saving table information is the best idea[2].

I think it would be desirable for ON_ERROR to be in a form that allows 
the user to choose where to store error information from among some 
options, such as table, log and file.


"ON_ERROR log" would be useful at least in the case of 'running on their 
machine under root or "postgres"' as you pointed out.



[1] 
https://www.postgresql.org/message-id/CACJufxEkkqnozdnvNMGxVAA94KZaCPkYw_Cx4JKG9ueNaZma_A%40mail.gmail.com


[2] 
https://www.postgresql.org/message-id/20231109002600.fuihn34bjqqgm...@awork3.anarazel.de


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 5f44cc7525641302842a3d67c14ebb09615bf67b Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 29 Jan 2024 12:02:32 +0900
Subject: [PATCH v2] Add new error_action "log" to ON_ERROR option

Currently ON_ERROR option only has "ignore" to skip malformed data and
there are no ways to know where and why COPY skipped them.

"log" skips malformed data as well as "ignore", but it logs information that
should have resulted in errors to PostgreSQL log.
---
 doc/src/sgml/ref/copy.sgml  |  9 +++--
 src/backend/commands/copy.c |  4 +++-
 src/backend/commands/copyfrom.c | 24 
 src/include/commands/copy.h |  1 +
 src/test/regress/expected/copy2.out | 18 +-
 src/test/regress/sql/copy2.sql  |  9 +
 6 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 21a5c4a052..3d949f04a4 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -380,12 +380,17 @@ COPY { table_name [ ( 
   Specifies which 
   error_action to perform when there is malformed data in the input.
-  Currently, only stop (default) and ignore
-  values are supported.
+  Currently, only stop (default), ignore
+  and log values are supported.
   If the stop value is specified,
   COPY stops operation at the first error.
   If the ignore value is specified,
   COPY skips malformed data and continues copying data.
+  If the log value is specified,
+  COPY behaves the same as ignore,
+  except that it logs information that should have resulted in errors to
+  PostgreSQL log at NOTICE
+  level.
   The option is allowed only in COPY FROM.
   Only stop value is allowed when
   using binary format.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cc0786c6f4..812ca63350 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -415,13 +415,15 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 		return COPY_ON_ERROR_STOP;
 
 	/*
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore" or "log" values.
 	 */
 	sval = defGetString(def);
 	if (pg_strcasecmp(sval, "stop") =

Re: RFC: Logging plan of the running query

2024-01-29 Thread torikoshia

Hi,

Updated the patch to fix typos and move 
ProcessLogQueryPlanInterruptActive from errfinish() to AbortTransaction.



BTW since the thread is getting long, I list the some points of the 
discussion so far:


# Safety concern
## Catalog access inside CFI
- it seems safe if the CFI call is inside an existing valid 
transaction/query state[1]


- We did some tests, for example calling ProcessLogQueryPlanInterrupt() 
in every single CHECK_FOR_INTERRUPTS()[2]. This test passed on my env 
but was stucked on James's env, so I modified to exit 
ProcessLogQueryPlanInterrupt() when target process is inside of lock 
acquisition code[3]


## Risk of calling EXPLAIN code in CFI
- EXPLAIN is not a simple logic code, and there exists risk calling it 
from CFI. For example, if there is a bug, we may find ourselves in a 
situation where we can't cancel the query


- it's a trade-off that's worth making for the introspection benefits 
this patch would provide?[4]


# Design
- Although some suggested it should be in auto_explain, current patch 
introduces this feature to the core[5]


- When the target query is nested, only the most inner query's plan is 
explained. In future, all the nested queries' plans are expected to 
explained optionally like auto_explain.log_nested_statements[6]


- When the target process is a parallel worker, the plan is not shown[6]

- When the target query is nested and its subtransaction is aborted, 
pg_log_query_plan cannot log the parental query plan after the abort 
even parental query is running[7]


- The output corresponds to EXPLAIN with VERBOSE, COST, SETTINGS and 
FORMAT text. It doesn't do ANALYZE or show the progress of the query 
execution. Future work proposed by Rafael Thofehrn Castro may realize 
this[8]


- To prevent assertion error, this patch ensures no page lock is held by 
checking all the LocalLock entries before running explain code, but 
there is a discussion that ginInsertCleanup() should be modified[9]



It may be not so difficult to improve some of restrictions in "Design", 
but I'd like to limit the scope of the 1st patch to make it simpler.



[1] 
https://www.postgresql.org/message-id/CAAaqYe9euUZD8bkjXTVcD9e4n5c7kzHzcvuCJXt-xds9X4c7Fw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/0e0e7ca08dff077a625c27a5e0c2ef0a%40oss.nttdata.com
[4] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[5] 
https://www.postgresql.org/message-id/CAAaqYe_1EuoTudAz8mr8-qtN5SoNtYRm4JM2J9CqeverpE3B2A%40mail.gmail.com
[6] 
https://www.postgresql.org/message-id/CAExHW5sh4ahrJgmMAGfptWVmESt1JLKCNm148XVxTunRr%2B-6gA%40mail.gmail.com
[7] 
https://www.postgresql.org/message-id/3d121ed5f81cef588bac836b43f5d1f9%40oss.nttdata.com
[8] 
https://www.postgresql.org/message-id/c161b5e7e1888eb9c9eb182a7d9dcf89%40oss.nttdata.com
[9] 
https://www.postgresql.org/message-id/20220201.172757.1480996662235658750.horikyota.ntt%40gmail.com


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 65786ad6c2a9b656c3fd36a45118a39a66da0236 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 29 Jan 2024 21:40:04 +0900
Subject: [PATCH v35] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat

Co-authored-by: James Coleman 
---
 contrib/auto_explain/auto_explain.c  |  23 +-
 doc/src/sgml/func.sgml   |  50 +
 src/backend/access/transam/xact.c|  17 ++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 208 ++-
 src/backend/executor/execMain.c  |  14 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/init/globals.c |   2 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   9 +
 src/include/mis

Re: Small fix on COPY ON_ERROR document

2024-02-01 Thread torikoshia

On 2024-02-01 15:16, Yugo NAGATA wrote:

On Mon, 29 Jan 2024 15:47:25 +0900
Yugo NAGATA  wrote:


On Sun, 28 Jan 2024 19:14:58 -0700
"David G. Johnston"  wrote:

> > Also, I think "invalid input syntax" is a bit ambiguous. For example,
> > COPY FROM raises an error when the number of input column does not match
> > to the table schema, but this error is not ignored by ON_ERROR while
> > this seems to fall into the category of "invalid input syntax".
>
>
>
> It is literally the error text that appears if one were not to ignore it.
> It isn’t a category of errors.  But I’m open to ideas here.  But being
> explicit with what on actually sees in the system seemed preferable to
> inventing new classification terms not otherwise used.

Thank you for explanation! I understood the words was from the error 
messages
that users actually see. However, as Torikoshi-san said in [1], errors 
other
than valid input syntax (e.g. range error) can be also ignored, 
therefore it

would be better to describe to be ignored errors more specifically.

[1] 
https://www.postgresql.org/message-id/7f1457497fa3bf9dfe486f162d1c8ec6%40oss.nttdata.com


>
> >
> > So, keeping consistency with the existing description, we can say:
> >
> > "Specifies which how to behave when encountering an error due to
> >  column values unacceptable to the input function of each attribute's
> >  data type."
>
>
> Yeah, I was considering something along those lines as an option as well.
> But I’d rather add that wording to the glossary.

Although I am still be not convinced if we have to introduce the words
"soft error" to the documentation, I don't care it if there are no 
other

opposite opinions.


Attached is a updated patch v3, which is a version that uses the above
wording instead of "soft error".


>
> > Currently, ON_ERROR doesn't support other soft errors, so it can explain
> > it more simply without introducing the new concept, "soft error" to users.
> >
> >
> Good point.  Seems we should define what user-facing errors are ignored
> anywhere in the system and if we aren’t consistently leveraging these in
> all areas/commands make the necessary qualifications in those specific
> places.
>

> > I think "left in a deleted state" is also unclear for users because this
> > explains the internal state but not how looks from user's view.How about
> > leaving the explanation "These rows will not be visible or accessible" in
> > the existing statement?
> >
>
> Just visible then, I don’t like an “or” there and as tuples at least they
> are accessible to the system, in vacuum especially.  But I expected the
> user to understand “as if you deleted it” as their operational concept more
> readily than visible.  I think this will be read by people who haven’t read
> MVCC to fully understand what visible means but know enough to run vacuum
> to clean up updated and deleted data as a rule.

Ok, I agree we can omit "or accessible". How do you like the 
followings?

Still redundant?

 "If the command fails, these rows are left in a deleted state;
  these rows will not be visible, but they still occupy disk space. "


Also, the above statement is used in the patch.


Thanks for updating the patch!

I like your description which doesn't use the word soft error.


Here are minor comments:

+  ignore means discard the input row and 
continue with the next one.

+  The default is stop


Is "." required at the end of the line?

 An NOTICE level context message containing the 
ignored row count is


Should 'An' be 'A'?

Also, I wasn't sure the necessity of 'context'.
It might be possible to just say "A NOTICE message containing the 
ignored row count.."

considering below existing descriptions:

  doc/src/sgml/pltcl.sgml: a NOTICE message each 
time a supported command is

  doc/src/sgml/pltcl.sgml- executed:

  doc/src/sgml/plpgsql.sgml: This example trigger simply raises a 
NOTICE message
  doc/src/sgml/plpgsql.sgml- each time a supported command is 
executed.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Add new COPY option REJECT_LIMIT

2024-02-01 Thread torikoshia

On 2024-01-27 00:20, David G. Johnston wrote:

Thanks for your comments!


On Fri, Jan 26, 2024 at 2:49 AM torikoshia
 wrote:


Hi,

9e2d870 enabled the COPY command to skip soft error, and I think we
can
add another option which specifies the maximum tolerable number of
soft
errors.

I remember this was discussed in [1], and feel it would be useful
when
loading 'dirty' data but there is a limit to how dirty it can be.

Attached a patch for this.

What do you think?


I'm opposed to adding this particular feature.

When implementing this kind of business rule I'd need the option to
specify a percentage, not just an absolute value.


Yeah, it seems useful for some cases.
Actually, Greenplum enables to specify not only the max number of bad 
rows but also its percentage[1].


I may be wrong, but considering some dataloaders support something like 
reject_limit(Redshift supports MAXERROR[2], pg_bulkload supports 
PARSE_ERRORS[3]), specifying the "number" of the bad row might also be 
useful.


I think we can implement reject_limit specified by percentage simply 
calculating the ratio of skipped and processed at the end of CopyFrom() 
like this:


  if (cstate->opts.reject_limit > 0 && (double) skipped / (processed + 
skipped) > cstate->opts.reject_limit_percent)

  ereport(ERROR,
  
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
   errmsg("exceeded the ratio specified 
by ..




I would focus on trying to put the data required to make this kind of
determination into a place where applications implementing such
business rules and monitoring can readily get at it.  The "ERRORS TO"
and maybe a corresponding "STATS TO" option where a table can be
specified for the system to place the problematic data and stats about
the copy itself.


It'd be nice to have such informative tables, but I believe the benefit 
of reject_limit is it fails the entire loading when the threshold is 
exceeded.
I imagine when we just have error and stats information tables for COPY, 
users have to delete the rows when they confirmed too many errors in 
these tables.



[1]https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-load-topics-g-handling-load-errors.html
[2]https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html
[3]https://ossc-db.github.io/pg_bulkload/pg_bulkload.html

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row

2024-02-04 Thread torikoshia

Hi,

On 2024-02-03 15:22, jian he wrote:

The idea of on_error is to tolerate errors, I think.
if a column has a not null constraint, let it cannot be used with
(on_error 'null')



+   /*
+* we can specify on_error 'null', but it can only apply to 
columns

+* don't have not null constraint.
+   */
+   if (att->attnotnull && cstate->opts.on_error == 
COPY_ON_ERROR_NULL)

+   ereport(ERROR,
+   (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+errmsg("copy on_error 'null' cannot be used with 
not null constraint column")));


This means we cannot use ON_ERROR 'null' even when there is one column 
which have NOT NULL constraint, i.e. primary key, right?
IMHO this is strong constraint and will decrease the opportunity to use 
this feature.


It might be better to allow error_action 'null' for tables which have 
NOT NULL constraint columns, and when facing soft errors for those rows, 
skip that row or stop COPY.



Based on this, I've made a patch.
based on COPY Synopsis: ON_ERROR 'error_action'
on_error 'null', the  keyword NULL should be single quoted.


As you mentioned, single quotation seems a little odd..

I'm not sure what is the best name and syntax for this feature, but 
since current error_action are verbs('stop' and 'ignore'), I feel 'null' 
might not be appropriate.



demo:
COPY check_ign_err FROM STDIN WITH (on_error 'null');
1 {1} a
2 {2} 1
3 {3} 2
4 {4} b
a {5} c
\.

\pset null NULL

SELECT * FROM check_ign_err;
  n   |  m  |  k
--+-+--
1 | {1} | NULL
2 | {2} |1
3 | {3} |2
4 | {4} | NULL
 NULL | {5} | NULL


Since we notice the number of ignored rows when ON_ERROR is 'ignore', 
users may want to know the number of rows which was changed to NULL when 
using ON_ERROR 'null'.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: RFC: Logging plan of the running query

2024-02-06 Thread torikoshia

Hi Ashutosh,

On 2024-02-06 19:51, Ashutosh Bapat wrote:

Thanks for the summary. It is helpful. I think patch is also getting 
better.


I have a few questions and suggestions


Thanks for your comments.


1. Prologue of GetLockMethodLocalHash() mentions
 * NOTE: When there are many entries in LockMethodLocalHash, calling 
this
 * function and looking into all of them can lead to performance 
problems.

 */
How bad this performance could be. Let's assume that a query is taking
time and pg_log_query_plan() is invoked to examine the plan of this
query. Is it possible that the looping over all the locks itself takes
a lot of time delaying the query execution further?


I think it depends on the number of local locks, but I've measured cpu 
time for this page lock check by adding below codes and 
v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which 
calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just for 
your information:


  diff --git a/src/backend/commands/explain.c 
b/src/backend/commands/explain.c

  index 5f7d77d567..65b7cb4925 100644
  --- a/src/backend/commands/explain.c
  +++ b/src/backend/commands/explain.c
  @@ -44,6 +44,8 @@

  +#include "time.h"
  ...
  @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
   * we check all the LocalLock entries and when finding even 
one, give up

   * logging the plan.
   */
  +   start = clock();
  hash_seq_init(&status, GetLockMethodLocalHash());
  while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != 
NULL)

  {
  if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
  {
  ereport(LOG_SERVER_ONLY,
  errmsg("ignored request for logging query plan due 
to page lock conflicts"),

  errdetail("You can try again in a moment."));
  hash_seq_term(&status);

  ProcessLogQueryPlanInterruptActive = false;
  return;
  }
  }
  +   end = clock();
  +   cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
  +
  +   ereport(LOG,
  +   errmsg("all locallock entry search took: %f", 
cpu_time_used));

  +

There were about 3 million log lines which recorded the cpu time, and 
the duration was quite short:


  =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
  =# select max(d), min(d), avg(d) from cpu_time ;
 max| min |  avg
  --+-+---
   0.000116 |   0 | 4.706274625332238e-07

I'm not certain that this is valid for actual use cases, but these 
results seem to suggest that it will not take that long.




2. What happens if auto_explain is enabled in the backend and
pg_log_query_plan() is called on the same backend? Will they conflict?
I think we should add a test for the same.


Hmm, I think they don't conflict since they just refer QueryDesc and 
don't modify it and don't use same objects for locking.
(I imagine 'conflict' here is something like 'hard conflict' in 
replication[2].)


Actually using both auto_explain and pg_log_query_plan() output each 
logs separately:


  (pid:62835)=# select pg_sleep(10);
  (pid:7)=# select pg_log_query_plan(62835);

  (pid:7)=# \! cat data/log/postgres.log
  ...
  2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  0: query plan 
running on backend with PID 62835 is:

Query Text: select pg_sleep(10);
Result  (cost=0.00..0.01 rows=1 width=4)
  Output: pg_sleep('10'::double precision)
Query Identifier: 3506829283127886044
  2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:  
ProcessLogQueryPlanInterrupt, explain.c:5336
  2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  0: duration: 
1.868 ms  plan:

Query Text: select pg_sleep(10);
Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=1.802..1.804 rows=1 loops=1)



Using injection point support we should be able to add tests for
testing pg_log_query_plan behaviour when there are page locks held or
when auto_explain (with instrumentation) and pg_log_query_plan() work
on the same query plan. Use injection point to make the backend
running query wait at a suitable point to delay its execution and fire
pg_log_query_plan() from other backend. May be the same test could
examine the server log file to see if the plan is indeed output to the
server log file.

Given that the feature will be used when the things have already gone
wrong, it should not make things more serious. So more testing and
especially automated would help.


Thanks for the advice, it seems a good idea.
I'm going to try to add tests using injection point.


[1] 
https://www.postgresql.org/message-id/CAAaqYe8LXVXQhYy3yT0QOHUymdM%3Duha0dJ0%3DBEPzVAx2nG1gsw%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/devel/hot-standby.html#HOT-STANDBY-CONFLICT


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation




Re: RFC: Logging plan of the running query

2024-02-07 Thread torikoshia

On 2024-02-07 13:58, Ashutosh Bapat wrote:
On Wed, Feb 7, 2024 at 9:38 AM torikoshia  
wrote:


Hi Ashutosh,

On 2024-02-06 19:51, Ashutosh Bapat wrote:

> Thanks for the summary. It is helpful. I think patch is also getting
> better.
>
> I have a few questions and suggestions

Thanks for your comments.

> 1. Prologue of GetLockMethodLocalHash() mentions
>  * NOTE: When there are many entries in LockMethodLocalHash, calling
> this
>  * function and looking into all of them can lead to performance
> problems.
>  */
> How bad this performance could be. Let's assume that a query is taking
> time and pg_log_query_plan() is invoked to examine the plan of this
> query. Is it possible that the looping over all the locks itself takes
> a lot of time delaying the query execution further?

I think it depends on the number of local locks, but I've measured cpu
time for this page lock check by adding below codes and
v27-0002-Testing-attempt-logging-plan-on-ever-CFI-call.patch[1], which
calls ProcessLogQueryPlanInterrupt() in every CFI on my laptop just 
for

your information:

   diff --git a/src/backend/commands/explain.c
b/src/backend/commands/explain.c
   index 5f7d77d567..65b7cb4925 100644
   --- a/src/backend/commands/explain.c
   +++ b/src/backend/commands/explain.c
   @@ -44,6 +44,8 @@

   +#include "time.h"
   ...
   @@ -5287,6 +5292,7 @@ ProcessLogQueryPlanInterrupt(void)
* we check all the LocalLock entries and when finding even
one, give up
* logging the plan.
*/
   +   start = clock();
   hash_seq_init(&status, GetLockMethodLocalHash());
   while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) 
!=

NULL)
   {
   if (LOCALLOCK_LOCKTAG(*locallock) == LOCKTAG_PAGE)
   {
   ereport(LOG_SERVER_ONLY,
   errmsg("ignored request for logging query plan 
due

to page lock conflicts"),
   errdetail("You can try again in a moment."));
   hash_seq_term(&status);

   ProcessLogQueryPlanInterruptActive = false;
   return;
   }
   }
   +   end = clock();
   +   cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
   +
   +   ereport(LOG,
   +   errmsg("all locallock entry search took: %f",
cpu_time_used));
   +

There were about 3 million log lines which recorded the cpu time, and
the duration was quite short:

   =# -- Extracted cpu_time_used from log and loaded it to cpu_time.d.
   =# select max(d), min(d), avg(d) from cpu_time ;
  max| min |  avg
   --+-+---
0.000116 |   0 | 4.706274625332238e-07

I'm not certain that this is valid for actual use cases, but these
results seem to suggest that it will not take that long.


What load did you run? I don't think any query in make check would
take say thousands of locks.


Sorry, I forgot to write it but ran make check as you imagined.


The prologue refers to a very populated
lock hash table. I think that will happen if thousands of tables are
queried in a single query OR a query runs on a partitioned table with
thousands of partitions. May be we want to try that scenario.


OK, I'll try such cases.


> 2. What happens if auto_explain is enabled in the backend and
> pg_log_query_plan() is called on the same backend? Will they conflict?
> I think we should add a test for the same.

Hmm, I think they don't conflict since they just refer QueryDesc and
don't modify it and don't use same objects for locking.
(I imagine 'conflict' here is something like 'hard conflict' in
replication[2].)


By conflict, I mean the two features behave weird when used together
e.g give wrong results or crash etc.



Actually using both auto_explain and pg_log_query_plan() output each
logs separately:

   (pid:62835)=# select pg_sleep(10);
   (pid:7)=# select pg_log_query_plan(62835);

   (pid:7)=# \! cat data/log/postgres.log
   ...
   2024-02-06 21:44:17.837 JST [62835:4:0] LOG:  0: query plan
running on backend with PID 62835 is:
 Query Text: select pg_sleep(10);
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: pg_sleep('10'::double precision)
 Query Identifier: 3506829283127886044
   2024-02-06 21:44:17.837 JST [62835:5:0] LOCATION:
ProcessLogQueryPlanInterrupt, explain.c:5336
   2024-02-06 21:44:26.974 JST [62835:6:0] LOG:  0: duration:
1.868 ms  plan:
 Query Text: select pg_sleep(10);
 Result  (cost=0.00..0.01 rows=1 width=4) (actual
time=1.802..1.804 rows=1 loops=1)

> Using injection point support we should be able to add tests for
> testing pg_log_query_plan behaviour when there are page locks held or
> when auto_explain (with in

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-02-26 Thread torikoshia

On 2023-02-06 15:00, Tom Lane wrote:

Andres Freund  writes:
On February 5, 2023 9:12:17 PM PST, Tom Lane  
wrote:

Damir Belyalov  writes:
InputFunctionCallSafe() is good for detecting errors from 
input-functions
but there are such errors from NextCopyFrom () that can not be 
detected
with InputFunctionCallSafe(), e.g. "wrong number of columns in 
row''.


If you want to deal with those, then there's more work to be done to 
make
those bits non-error-throwing.  But there's a very finite amount of 
code

involved and no obvious reason why it couldn't be done.



I'm not even sure it makes sense to avoid that kind of error. And
invalid column count or such is something quite different than failing
some data type input routine, or falling a constraint.


I think it could be reasonable to put COPY's overall-line-format
requirements on the same level as datatype input format violations.
I agree that trying to trap every kind of error is a bad idea,
for largely the same reason that the soft-input-errors patches
only trap certain kinds of errors: it's too hard to tell whether
an error is an "internal" error that it's scary to continue past.


Is it a bad idea to limit the scope of allowing errors to 'soft' errors 
in InputFunctionCallSafe()?


I think it could be still useful for some usecases.

  diff --git a/src/test/regress/sql/copy2.sql 
b/src/test/regress/sql/copy2.sql


  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +1  {1} 1
  +a  {2} 2
  +3  {3} 33
  +4  {a, 4}  4
  +
  +5  {5} 5
  +\.
  +SELECT * FROM check_ign_err;

  diff --git a/src/test/regress/expected/copy2.out 
b/src/test/regress/expected/copy2.out

  index 090ef6c7a8..08e8056fc1 100644

  +-- tests for IGNORE_DATATYPE_ERRORS option
  +CREATE TABLE check_ign_err (n int, m int[], k int);
  +COPY check_ign_err FROM STDIN WITH IGNORE_DATATYPE_ERRORS;
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  value "33" is out of range for type integer
  +WARNING:  invalid input syntax for type integer: "a"
  +WARNING:  invalid input syntax for type integer: ""
  +SELECT * FROM check_ign_err;
  + n |  m  | k
  +---+-+---
  + 1 | {1} | 1
  + 5 | {5} | 5
  +(2 rows)

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom 16877d4cdd64db5f85bed9cd559e618d8211e598 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 27 Feb 2023 12:02:16 +0900
Subject: [PATCH v1] Add COPY option IGNORE_DATATYPE_ERRORS

---
 src/backend/commands/copy.c  |  8 
 src/backend/commands/copyfrom.c  | 11 +++
 src/backend/commands/copyfromparse.c | 12 ++--
 src/backend/parser/gram.y|  8 +++-
 src/bin/psql/tab-complete.c  |  3 ++-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  2 ++
 src/include/parser/kwlist.h  |  1 +
 src/test/regress/expected/copy2.out  | 14 ++
 src/test/regress/sql/copy2.sql   | 12 
 10 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e34f583ea7..2f1cfb3f4d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -410,6 +410,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified= false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -449,6 +450,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified= true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index af52faca6d..24eec6a27d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -959,6 +959,7 @@ CopyFrom(CopyFromState cstate)
 	{
 		TupleTableSlot *myslot;
 		bool		skip_tuple;
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
 
 		CHECK_FOR_INTERRUPTS();
 
@@ -991,10 +992,20 @@ CopyFrom(CopyFromState cstate)
 
 		ExecClearTuple(myslot);
 
+		if (cstate->opts.ignore_datatype_errors)
+		{
+			escontext.details_wanted = true;
+			cstate->escontext = escontext;
+		}
+
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
 			break;
 
+		/* Soft error occured, skip this tuple */
+		if(cstate->escontext.error_occurred)
+			continue;
+
 		ExecStoreVirtualTuple(myslot)

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-06 Thread torikoshia

On 2023-03-06 23:03, Daniel Gustafsson wrote:

On 28 Feb 2023, at 15:28, Damir Belyalov  wrote:


Tested patch on all cases: CIM_SINGLE, CIM_MULTI, CIM_MULTI_CONDITION. 
As expected it works.

Also added a description to copy.sgml and made a review on patch.

Thanks for your tests and improvements!

I added 'ignored_errors' integer parameter that should be output after 
the option is finished.
All errors were added to the system logfile with full detailed 
context. Maybe it's better to log only error message.

Certainly.

FWIW, Greenplum has a similar construct (but which also logs the errors 
in the
db) where data type errors are skipped as long as the number of errors 
don't
exceed a reject limit.  If the reject limit is reached then the COPY 
fails:


LOG ERRORS [ SEGMENT REJECT LIMIT  [ ROWS | PERCENT ]]

IIRC the gist of this was to catch then the user copies the wrong input 
data or
plain has a broken file.  Rather than finding out after copying n rows 
which

are likely to be garbage the process can be restarted.

This version of the patch has a compiler error in the error message:

copyfrom.c: In function ‘CopyFrom’:
copyfrom.c:1008:29: error: format ‘%ld’ expects argument of type ‘long
int’, but argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}
[-Werror=format=]
1008 | ereport(WARNING, errmsg("Errors: %ld", cstate->ignored_errors));
 |  ^ ~~
 |  |
 |  uint64 {aka long
long unsigned int}


On that note though, it seems to me that this error message leaves a 
bit to be

desired with regards to the level of detail.

+1.
I felt just logging "Error: %ld" would make people wonder the meaning of 
the %ld. Logging something like ""Error: %ld data type errors were 
found" might be clearer.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: Record queryid when auto_explain.log_verbose is on

2023-03-06 Thread torikoshia

On 2023-03-07 08:50, Imseih (AWS), Sami wrote:

I am wondering if this patch should be backpatched?

The reason being is in auto_explain documentation [1],
there is a claim of equivalence of the auto_explain.log_verbose
option and EXPLAIN(verbose)

". it's equivalent to the VERBOSE option of EXPLAIN."

This can be quite confusing for users of the extension.
The documentation should either be updated or a backpatch
all the way down to 14, which the version the query identifier
was moved to core. I am in favor of the latter.

Any thoughts?


We discussed a bit whether to backpatch this, but agreed that it would 
be better not to do so for the following reasons:


It's a bit annoying that the info is missing since pg 14, but we 
probably can't

backpatch this as it might break log parser tools.


What do you think?

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-03-17 Thread torikoshia

On 2023-03-07 18:09, Daniel Gustafsson wrote:

On 7 Mar 2023, at 09:35, Damir Belyalov  wrote:


I felt just logging "Error: %ld" would make people wonder the meaning 
of

the %ld. Logging something like ""Error: %ld data type errors were
found" might be clearer.

Thanks. For more clearance change the message to: "Errors were found: 
%".


I'm not convinced that this adds enough clarity to assist the user.  We 
also
shouldn't use "error" in a WARNING log since the user has explicitly 
asked to

skip rows on error, so it's not an error per se.

+1


How about something like:

  ereport(WARNING,
  (errmsg("%ld rows were skipped due to data type
incompatibility", cstate->ignored_errors),
   errhint("Skipped rows can be inspected in the database log
for reprocessing.")));
Since skipped rows cannot be inspected in the log when 
log_error_verbosity is set to terse,

it might be better without this errhint.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




  1   2   3   4   >